img
Contract TypeContract TypeFull-time
Workplace typeWorkplace typeOn-site
LocationLocationRiyadh

About the Role

Qualcomm Middle East Information Technology Company LLC is seeking an experienced ML Operations & Customer Support Engineer to join their Customer Engineering team in Riyadh, KSA. This customer-facing role focuses on supporting strategic customers in deploying AI inference workloads on advanced Qualcomm AI inference accelerators. These accelerators utilize Qualcomm's expertise in hardware-accelerated AI to provide high-performance, energy-efficient generative AI and computer vision inference solutions for modern data centers. The position requires a strong background in ML model deployment, systems engineering, rack-scale management software, DevOps/MLOps automation, and cross-functional collaboration to ensure system uptime, reliability, and performance, while resolving customer support cases within defined SLAs/KPIs. This role is essential for ensuring customer success with Qualcomm's AI technology, involving deep dives into ML inference pipelines, systems troubleshooting, and data center operations, in collaboration with customers and internal teams.

Key Responsibilities

  • Serve as the primary technical escalation point for customer issues related to AI inference workloads.
  • Manage end-to-end case resolution, ensuring adherence to Service Level Agreements (SLAs) and Key Performance Indicators (KPIs).
  • Lead incident response, triage, and root cause analysis (RCA) for critical issues.
  • Provide timely and transparent communication to customers regarding issue status and resolution progress.
  • Maintain high levels of customer satisfaction and service reliability.
  • Ensure high availability and uptime of customer AI deployments, particularly rack-scale systems.
  • Monitor system health, performance metrics, and workload behavior to proactively identify potential issues.
  • Implement and manage failover, redundancy, and resiliency mechanisms for continuous operation.
  • Proactively identify operational risks and implement preventative actions.
  • Support the deployment, optimization, and troubleshooting of ML inference pipelines.
  • Debug issues across model, runtime, system, and hardware layers.
  • Analyze model performance, including latency, throughput, and accuracy tradeoffs, in production environments.
  • Support various ML frameworks such as PyTorch, TensorFlow, and ONNX, and model conversion flows.
  • Assist in applying model optimization techniques including quantization, batching, compilation, and runtime tuning.
  • Support AI workloads in bare-metal and virtualized environments.
  • Troubleshoot issues across Linux operating systems, drivers, firmware, and the networking stack.
  • Support deployment and maintenance using Infrastructure as Code (IaC) and automation tools.
  • Work with Data Center Infrastructure Management (DCIM) tools and monitoring systems.
  • Coordinate with hardware vendors for accelerator, server, and networking-related issues.
  • Implement and manage monitoring systems, including logs, metrics, and traces.
  • Build dashboards to track uptime, SLA adherence, performance, and utilization metrics.
  • Automate repetitive operational tasks using scripts and workflows.
  • Establish and enforce runbooks and standard operating procedures (SOPs).
  • Collaborate closely with Customer Engineering, Product, Engineering, and Support teams.
  • Provide structured feedback to engineering teams for product improvements and defect resolution.
  • Support customer onboarding, deployment readiness, and operational handover processes.
  • Participate in customer reviews, escalations, and technical deep dives.

Qualifications and Experience

  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
  • 10-15+ years of experience in ML operations, systems engineering, or customer support engineering.
  • Proven experience in customer-facing technical roles with SLA-driven support models.
  • Strong experience with AI/ML inference workloads in production environments.
  • Deep understanding of end-to-end ML inference pipelines.
  • Hands-on experience with Linux systems, system bring-up, drivers, and debugging tools.
  • Strong understanding of AI accelerator architecture and system bottlenecks.
  • Experience with model deployment, optimization, and performance tuning.
  • Experience with data center operations and rack-scale deployments.
  • Familiarity with bare-metal, virtualization, and containerization technologies such as Docker and Kubernetes.
  • Knowledge of networking concepts including TCP/IP, RDMA, and storage systems.
  • Experience with cloud and hybrid environments.
  • Experience with monitoring and observability tools like Prometheus, Grafana, and ELK stack.
  • Strong skills in incident management, RCA, and production operations.
  • Experience defining and tracking SLAs, KPIs, and operational metrics.
  • Proficiency in Python, Bash, or similar scripting languages.
  • Experience in automation, DevOps, and MLOps tooling.
  • Strong problem-solving and diagnostic skills.
  • Excellent communication and customer engagement skills.
  • Ability to operate effectively in high-pressure, mission-critical environments.
  • High attention to detail with a focus on quality, reliability, and accountability.
  • Experience with Qualcomm Cloud AI or similar AI accelerator platforms.
  • Experience supporting large-scale AI deployments (LLMs, CV pipelines, generative AI).
  • Familiarity with inference runtimes (TensorRT, ONNX Runtime, custom runtimes).
  • Experience with CI/CD pipelines for ML deployment.

Required Skills and Competencies

  • ML inference pipelines
  • Systems troubleshooting
  • Data center operations
  • ML model deployment
  • Systems engineering
  • Rack-scale management software
  • DevOps/MLOps automation
  • Cross-functional collaboration
  • Customer Support
  • SLA Ownership
  • Incident response
  • Triage
  • Root cause analysis (RCA)
  • Customer satisfaction
  • Service reliability
  • High availability
  • System health monitoring
  • Performance metrics
  • Failover, redundancy, and resiliency mechanisms
  • Risk identification and preventative actions
  • AI inference workload support
  • ML inference pipeline optimization
  • Model performance analysis
  • PyTorch, TensorFlow, ONNX
  • Model conversion flows
  • Model optimization techniques (quantization, batching, compilation, runtime tuning)
  • Bare-metal and virtualized environments
  • Linux OS, drivers, firmware, and networking stack
  • Infrastructure as Code (IaC) and automation tools
  • DCIM tools and monitoring systems
  • Logs, metrics, and traces
  • Dashboards for uptime, SLA adherence, performance, and utilization
  • Automating repetitive operational tasks
  • Scripts and workflows
  • Runbooks and Standard Operating Procedures (SOPs)
  • Customer Engineering, Product, and Support teams collaboration
  • Customer onboarding, deployment readiness, and operational handover
  • Customer reviews and technical deep dives
  • AI/ML inference workloads
  • Linux systems, system bring-up, and debugging tools
  • AI accelerator architecture and system bottlenecks
  • Model performance tuning
  • Rack-scale deployments
  • Virtualization and containerization technologies (Docker, Kubernetes)
  • Networking concepts (TCP/IP, RDMA, storage systems)
  • Cloud and hybrid environments
  • Monitoring/observability tools (Prometheus, Grafana, ELK)
  • Incident management and production operations
  • Operational metrics definition and tracking
  • Python, Bash, and scripting languages
  • DevOps and MLOps tooling
  • Problem-solving and diagnostic skills
  • Communication and customer engagement
  • High-pressure and mission-critical environments
  • Attention to detail, quality, reliability, and accountability

Work Location and Type

This is a full-time position based in Riyadh, Saudi Arabia.


Requirements

  • Requires +10 Years experience

Similar Jobs

IT Infrastructure and Systems Design Expert

📣 Job AdNew

Zakat, Tax and Customs Authority

Full-time

About the Role

The Zakat, Tax and Customs Authority (ZATCA) is seeking an IT Infrastructure and Systems Design Expert to join its team in Riyadh, Saudi Arabia. This role is responsible for leading work activities and providing expert guidance in the development and enhancement of ZATCA's IT infrastructure and systems. The position requires autonomous work with minimal direction, functioning as an internal consultant to ensure operational excellence and the successful delivery of IT projects.

Jobholders at this level contribute to long-term objectives by offering expert support in the design and development of IT infrastructure and systems capabilities. This includes leading the creation of Request for Proposals (RFPs), supervising the assessment of change requests, and reviewing the performance of existing IT components to recommend improvements.

Key Responsibilities

  • Lead the development of new, detailed IT infrastructure and systems designs by staying informed on the latest trends and technologies, collecting business and technical requirements, and assessing current ZATCA IT infrastructure and systems to identify design gaps and challenges.
  • Conduct requirement gathering sessions with stakeholders, interpret these requirements to finalize the development of RFPs related to IT infrastructure and systems, and contribute to vendor evaluation and selection in cooperation with relevant functions.
  • Receive and analyze change requests related to IT infrastructure and systems, prioritize them, and provide recommendations with an implementation schedule in cooperation with relevant functions.
  • Collect inputs for IT risk assessments and analyses for new technology introductions, and collaborate with IT Infrastructure & Networks Planning & Monitoring to develop risk mitigation plans.
  • Analyze periodical performance reports of ZATCA IT Infrastructure and systems against set KPIs and provide recommendations for performance improvement.
  • Adhere to all relevant policies, processes, and standard operating procedures to ensure work is carried out in a controlled and consistent manner.
  • Assist in solving escalated problems and provide support to junior team members for efficient work execution.
  • Escalate complex problems to the relevant personnel for proper closure of cases and issues.
  • Perform other duties as requested by management.
  • Train junior staff on various job activities to facilitate knowledge transfer.
  • Provide clear direction, prioritize tasks, assign and delegate responsibilities, and monitor the workflow of subordinates or junior staff.
  • Support junior staff or direct reports in executing their duties in accordance with established policies and processes.

Qualifications and Requirements

  • A Bachelor's degree in Computer Science, Information Technology, or an equivalent field is required.
  • A Master's degree in Information Technology, Computer Science, Business Administration, or an equivalent field is preferred.
  • A minimum of 5 years of relevant experience in IT infrastructure and systems design is required.

Required Skills

  • Expertise in IT infrastructure and systems design.
  • Advanced skills in IT Systems Management and IT Change Management.
  • Proficiency in Collaboration and Communication, Professionalism, Project Management, IT Compliance, Results Orientation, Customer Focus, and Vendor Management.
  • Developing skills in Enablement of Change and Innovation.

Work Environment

This is a full-time position based in Riyadh, Saudi Arabia, with the Zakat, Tax and Customs Authority.

breifcase5-10 years

locationRiyadh

less than a minute ago

IT Specialist

📣 Job AdNew

eSense

Full-time

About the Role

eSense is seeking a motivated and detail-oriented IT Specialist to join our team in Riyadh, Saudi Arabia. This full-time position is designed for individuals with 0-1 years of experience looking to begin their career in IT support. The IT Specialist will serve as the primary point of contact for users, providing essential first-line technical assistance across various IT domains, including cloud services, on-premises infrastructure, and end-user devices. This role is critical for ensuring the smooth operation of our IT systems through prompt incident response, service request fulfillment, and onsite support.

The successful candidate will be adaptable, possess strong communication abilities, and demonstrate a capacity to manage a diverse range of technologies in both remote and in-person settings. This role offers an opportunity to gain hands-on experience and develop a strong foundation in IT support within a dynamic environment.

Key Responsibilities

  • Serve as the initial point of contact for users requiring IT support via phone, email, or the ticketing system.
  • Diagnose and resolve Level 1 technical issues related to hardware, software, networks, and cloud services.
  • Escalate complex incidents and service requests to appropriate higher-level support teams when necessary.
  • Provide direct onsite support for end-user devices, printers, peripherals, and network connectivity issues.
  • Assist users with cloud environments, such as Microsoft 365 and Azure, including account access, configuration, and basic troubleshooting.
  • Maintain accurate and comprehensive documentation of reported issues, troubleshooting steps, and resolutions within the IT service management system.
  • Participate in routine IT maintenance tasks, including patching, software updates, backup procedures, and system checks.
  • Collaborate effectively with cross-functional IT teams to ensure the timely and efficient delivery of IT services and support.
  • Engage in continuous learning and knowledge sharing to stay updated with emerging IT technologies and best practices in support.
  • Promote and enforce IT security and compliance best practices during all support activities.

Qualifications and Requirements

  • Diploma or Bachelor's degree in Information Technology, Computer Science, or a related field, or equivalent work experience.
  • 1-2 years of experience in IT support or helpdesk roles is preferred.
  • Basic knowledge of operating systems including Windows, macOS, and Linux.
  • Familiarity with cloud platforms such as Microsoft 365, Azure, or equivalent.
  • Understanding of fundamental networking concepts, including TCP/IP, DNS, DHCP, and VPN.
  • Hands-on experience with common IT hardware, including laptops, desktops, printers, and various peripherals.
  • Strong problem-solving and troubleshooting abilities.
  • Excellent verbal and written communication skills.
  • A strong customer service orientation with a commitment to user satisfaction.
  • Ability to effectively prioritize tasks and manage time efficiently.

Technical Skills and Competencies

  • Cloud Services (Microsoft 365, Azure)
  • On-premises Infrastructure Support
  • End-user Devices Support
  • Hardware and Software Troubleshooting
  • Network Troubleshooting (TCP/IP, DNS, DHCP, VPN)
  • Operating Systems (Windows, macOS, Linux)
  • IT Service Management
  • Patching and System Updates
  • Backup Procedures and System Checks
  • IT Security Principles and Compliance Awareness

Additional Information

Certifications such as CompTIA A+, Microsoft Certified: Azure Fundamentals, or ITIL Foundation are preferred but not required.

Work Environment Details

This is a full-time position based in Riyadh, Saudi Arabia. The role requires onsite presence and interaction with users.

breifcase0-1 years

locationRiyadh

Remote Job
1 minute ago

IT/OT Integration Engineer

📣 Job AdNew

Arabian Digital Solutions

Full-time

About the Role

Arabian Digital Solutions (ADS) is an industrial automation and engineering company serving critical sectors across Saudi Arabia and the Middle East. ADS specializes in end-to-end services for Distributed Control Systems (DCS), Programmable Logic Controllers (PLC), Supervisory Control and Data Acquisition (SCADA), oil and gas systems, and building management systems. We are seeking a dedicated IT/OT Integration Engineer to join our team in Riyadh, Saudi Arabia, to support our Industrial Digital Solutions initiatives. This role focuses on bridging the gap between IT systems and Operational Technology (OT) environments.

Key Responsibilities

  • Configure and maintain data connectivity from OT systems to the historian platform using protocols such as OPC UA, OPC DA, MQTT, Modbus, and similar technologies.
  • Reproduce customer-reported issues, collect necessary system logs, and collaborate with platform engineering teams for issue resolution.
  • Troubleshoot software and infrastructure issues, including services, ports, firewall rules, SSL certificates, networking configurations, and remote access solutions.
  • Produce technical documentation, deployment guides, and detailed issue reports for internal and client use.
  • Coordinate with internal ADS teams and client organizations, serving as the primary technical point of contact for the platform.

Qualifications and Experience

  • 2 to 5 years of hands-on experience in IT/OT systems, industrial software platforms, historian systems, automation data infrastructure, or software deployment.
  • Strong practical experience with Windows Server administration, including installation, service management, event log analysis, and network configuration.
  • Solid working knowledge of Linux, including command-line usage, service management, log review, and file system operations.
  • Demonstrated experience in software development and scripting using languages such as Python or C# for integration and automation.
  • Hands-on experience integrating industrial protocols like OPC and Modbus, and establishing OT to IT system connectivity.
  • Hands-on experience with Docker and Docker Compose for deploying, configuring, and troubleshooting containerized applications.
  • Working knowledge of SQL and fundamental database concepts.
  • Proven ability to troubleshoot complex software systems by analyzing logs, checking service status, and diagnosing network and security configurations.
  • Familiarity with OT environments, including SCADA, HMI, DCS, or industrial historian platforms such as OSIsoft PI, Wonderware, or Ignition.

Required Skills

  • Industrial Communication Protocols: OPC UA, OPC DA, MQTT, Modbus TCP/RTU, Modbus.
  • Server Administration: Windows Server administration, Linux command line.
  • Scripting and Development: Python, C#, Software Development.
  • Containerization: Docker, Docker Compose.
  • Databases: SQL, basic database concepts.
  • Industrial Systems: SCADA, HMI, DCS, Industrial historian platforms (*, OSIsoft PI, Wonderware, Ignition).
  • Networking: OT networking, VLANs, Firewalls, DMZ architecture.
  • General Skills: Technical documentation, Communication.

Work Environment and Location

This is a full-time, on-site position based in Riyadh, Saudi Arabia. The role requires strong written and verbal communication skills in English. Arabic language proficiency is considered an advantage.

breifcase2-5 years

locationRiyadh

Remote Job
1 minute ago