
ML Operations & Customer Support Engineer, Staff/Senior Staff level - Riyadh, KSA📣 Job Ad
| Contract Type | Full-time | |
| Workplace type | On-site | |
| Location | Riyadh |
About the Role
Qualcomm Middle East Information Technology Company LLC is seeking an experienced ML Operations & Customer Support Engineer to join their Customer Engineering team in Riyadh, KSA. This customer-facing role focuses on supporting strategic customers in deploying AI inference workloads on advanced Qualcomm AI inference accelerators. These accelerators utilize Qualcomm's expertise in hardware-accelerated AI to provide high-performance, energy-efficient generative AI and computer vision inference solutions for modern data centers. The position requires a strong background in ML model deployment, systems engineering, rack-scale management software, DevOps/MLOps automation, and cross-functional collaboration to ensure system uptime, reliability, and performance, while resolving customer support cases within defined SLAs/KPIs. This role is essential for ensuring customer success with Qualcomm's AI technology, involving deep dives into ML inference pipelines, systems troubleshooting, and data center operations, in collaboration with customers and internal teams.
Key Responsibilities
- Serve as the primary technical escalation point for customer issues related to AI inference workloads.
- Manage end-to-end case resolution, ensuring adherence to Service Level Agreements (SLAs) and Key Performance Indicators (KPIs).
- Lead incident response, triage, and root cause analysis (RCA) for critical issues.
- Provide timely and transparent communication to customers regarding issue status and resolution progress.
- Maintain high levels of customer satisfaction and service reliability.
- Ensure high availability and uptime of customer AI deployments, particularly rack-scale systems.
- Monitor system health, performance metrics, and workload behavior to proactively identify potential issues.
- Implement and manage failover, redundancy, and resiliency mechanisms for continuous operation.
- Proactively identify operational risks and implement preventative actions.
- Support the deployment, optimization, and troubleshooting of ML inference pipelines.
- Debug issues across model, runtime, system, and hardware layers.
- Analyze model performance, including latency, throughput, and accuracy tradeoffs, in production environments.
- Support various ML frameworks such as PyTorch, TensorFlow, and ONNX, and model conversion flows.
- Assist in applying model optimization techniques including quantization, batching, compilation, and runtime tuning.
- Support AI workloads in bare-metal and virtualized environments.
- Troubleshoot issues across Linux operating systems, drivers, firmware, and the networking stack.
- Support deployment and maintenance using Infrastructure as Code (IaC) and automation tools.
- Work with Data Center Infrastructure Management (DCIM) tools and monitoring systems.
- Coordinate with hardware vendors for accelerator, server, and networking-related issues.
- Implement and manage monitoring systems, including logs, metrics, and traces.
- Build dashboards to track uptime, SLA adherence, performance, and utilization metrics.
- Automate repetitive operational tasks using scripts and workflows.
- Establish and enforce runbooks and standard operating procedures (SOPs).
- Collaborate closely with Customer Engineering, Product, Engineering, and Support teams.
- Provide structured feedback to engineering teams for product improvements and defect resolution.
- Support customer onboarding, deployment readiness, and operational handover processes.
- Participate in customer reviews, escalations, and technical deep dives.
Qualifications and Experience
- Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- 10-15+ years of experience in ML operations, systems engineering, or customer support engineering.
- Proven experience in customer-facing technical roles with SLA-driven support models.
- Strong experience with AI/ML inference workloads in production environments.
- Deep understanding of end-to-end ML inference pipelines.
- Hands-on experience with Linux systems, system bring-up, drivers, and debugging tools.
- Strong understanding of AI accelerator architecture and system bottlenecks.
- Experience with model deployment, optimization, and performance tuning.
- Experience with data center operations and rack-scale deployments.
- Familiarity with bare-metal, virtualization, and containerization technologies such as Docker and Kubernetes.
- Knowledge of networking concepts including TCP/IP, RDMA, and storage systems.
- Experience with cloud and hybrid environments.
- Experience with monitoring and observability tools like Prometheus, Grafana, and ELK stack.
- Strong skills in incident management, RCA, and production operations.
- Experience defining and tracking SLAs, KPIs, and operational metrics.
- Proficiency in Python, Bash, or similar scripting languages.
- Experience in automation, DevOps, and MLOps tooling.
- Strong problem-solving and diagnostic skills.
- Excellent communication and customer engagement skills.
- Ability to operate effectively in high-pressure, mission-critical environments.
- High attention to detail with a focus on quality, reliability, and accountability.
- Experience with Qualcomm Cloud AI or similar AI accelerator platforms.
- Experience supporting large-scale AI deployments (LLMs, CV pipelines, generative AI).
- Familiarity with inference runtimes (TensorRT, ONNX Runtime, custom runtimes).
- Experience with CI/CD pipelines for ML deployment.
Required Skills and Competencies
- ML inference pipelines
- Systems troubleshooting
- Data center operations
- ML model deployment
- Systems engineering
- Rack-scale management software
- DevOps/MLOps automation
- Cross-functional collaboration
- Customer Support
- SLA Ownership
- Incident response
- Triage
- Root cause analysis (RCA)
- Customer satisfaction
- Service reliability
- High availability
- System health monitoring
- Performance metrics
- Failover, redundancy, and resiliency mechanisms
- Risk identification and preventative actions
- AI inference workload support
- ML inference pipeline optimization
- Model performance analysis
- PyTorch, TensorFlow, ONNX
- Model conversion flows
- Model optimization techniques (quantization, batching, compilation, runtime tuning)
- Bare-metal and virtualized environments
- Linux OS, drivers, firmware, and networking stack
- Infrastructure as Code (IaC) and automation tools
- DCIM tools and monitoring systems
- Logs, metrics, and traces
- Dashboards for uptime, SLA adherence, performance, and utilization
- Automating repetitive operational tasks
- Scripts and workflows
- Runbooks and Standard Operating Procedures (SOPs)
- Customer Engineering, Product, and Support teams collaboration
- Customer onboarding, deployment readiness, and operational handover
- Customer reviews and technical deep dives
- AI/ML inference workloads
- Linux systems, system bring-up, and debugging tools
- AI accelerator architecture and system bottlenecks
- Model performance tuning
- Rack-scale deployments
- Virtualization and containerization technologies (Docker, Kubernetes)
- Networking concepts (TCP/IP, RDMA, storage systems)
- Cloud and hybrid environments
- Monitoring/observability tools (Prometheus, Grafana, ELK)
- Incident management and production operations
- Operational metrics definition and tracking
- Python, Bash, and scripting languages
- DevOps and MLOps tooling
- Problem-solving and diagnostic skills
- Communication and customer engagement
- High-pressure and mission-critical environments
- Attention to detail, quality, reliability, and accountability
Work Location and Type
This is a full-time position based in Riyadh, Saudi Arabia.
Requirements
- Requires +10 Years experience
Similar Jobs
You may also like
- Related ML Operations & Customer Support Engineer, Staff/Senior Staff level - Riyadh, KSA Opportunities
- Barista Jobs in Jeddah
- Receptionist Jobs in Jeddah
- Data Entry Agent Jobs in Jeddah
- Branch Manager Jobs in Jeddah
- Host Jobs in Jeddah
- Other Job Fields in Riyadh
- Barista Jobs in Riyadh
- Receptionist Jobs in Riyadh
- Data Entry Agent Jobs in Riyadh
- Branch Manager Jobs in Riyadh
- Host Jobs in Riyadh
- Food & Beverage Crew Member Jobs in Riyadh
- General Accountant Jobs in Riyadh
- Branch Supervisor Jobs in Riyadh
- Cosmetics and Toiletries Seller Jobs in Riyadh
- Administrative Assistant Jobs in Riyadh
- Explore Jobs Across Saudi Arabia
- Accounts Assistant Jobs in Umluj
- Ticket Seller Jobs in Jeddah
- Sales Representative Jobs in Hawtat Bani Tamim
- Translator Jobs in Yanbu
- Human Resources Clerk Jobs in Riyadh
- Copywriter Jobs in Dammam
- Manufacturing officer Jobs in Riyadh
- Maintenance Supervisor Jobs in Dammam
- Public Relations Specialist Jobs in Medina
- Shoes and Bags Seller Jobs in Riyadh