
Infrastructure & Site Reliability Engineer Datacentre AI Engineering SA📣 Job Ad
| Contract Type | Full-time | |
| Workplace type | On-site | |
| Location | Riyadh |
Job Description
About the Role
Qualcomm is expanding its presence in Riyadh and is seeking to hire an Infrastructure & Site Reliability Engineer for its Datacentre AI Engineering team. This full-time position is based in Riyadh, Saudi Arabia, and requires 2-5 years of experience. The role focuses on supporting Qualcomm's growing infrastructure and critical AI use cases as Saudi Arabia advances its digital transformation.
Role Overview
This position involves the design, operation, and continuous improvement of large-scale AI inference systems within a datacenter environment. The engineer will ensure Qualcomm's AI infrastructure is reliable, scalable, and production-ready for advanced machine-learning workloads. The role demands strong systems and software engineering fundamentals, hands-on execution, and the ability to work independently while collaborating with cross-functional teams.
Key Responsibilities
- Design, deploy, and operate large-scale AI inference systems for critical AI workloads.
- Ensure the reliability, availability, and scalability of Qualcomm datacenter AI clusters.
- Develop and maintain software tools and support infrastructure for AI software stacks.
- Analyze software requirements and collaborate with architecture and hardware engineers.
- Build, deploy, and operate components supporting LLM inference, agentic AI workflows, and AI services.
- Improve model performance on AI100 deployments by working with models, systems, and software teams.
- Identify and implement optimizations for workloads on multi-SoC and multi-card systems.
- Apply Site Reliability Engineering (SRE) fundamentals including monitoring, alerting, incident response, and performance optimization.
- Support production ML systems using MLOps tools and operational best practices.
- Contribute to incident reviews, operational documentation, and continuous reliability improvements.
- Build and maintain observability tools, dashboards, and alerts.
- Monitor infrastructure and services using tools like Prometheus, Grafana, CloudWatch, and custom telemetry.
- Create and maintain technical documentation, runbooks, and knowledge-base articles.
- Develop automation to reduce manual operational tasks and improve system reliability.
- Support CI/CD pipelines for AI service and agent deployment.
- Apply Infrastructure-as-Code practices using tools such as Terraform and Ansible.
Required Qualifications and Skills
- Bachelor's or Master's degree in engineering, Computer Science, AI/ML, or a related field.
- 2–8 years of software, systems, or infrastructure engineering experience, preferably in production or datacenter environments.
- Experience with AI/ML workloads such as LLMs, NLP, Vision, Audio, or Recommendation systems.
- Understanding of ML inference concepts including batching, token streaming, and performance considerations.
- Hands-on experience with PyTorch and familiarity with modern ML frameworks.
- Familiarity with distributed inference, checkpointing, and accelerator-based compute environments.
- Experience supporting AI or ML applications in production environments.
- Familiarity with LLM inference pipelines and AI service operations.
- Strong programming skills in Python with experience building and supporting production systems.
- Experience with scripting and automation using Python and Bash.
- Familiarity with configuration management and orchestration tools.
- Strong Linux fundamentals including shell, containers, system services, and networking basics (DNS, TLS, HTTP/gRPC).
- Experience working with cluster schedulers such as Slurm or equivalent systems.
- Experience operating distributed systems with high availability and fault tolerance.
- Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, ELK, or Loki.
- Understanding of incident management, service health metrics, and system reliability monitoring.
- Solid understanding of SDLC, release processes, and operational reliability practices.
- Familiarity with CI/CD pipelines and Infrastructure-as-Code tools.
Preferred Skills
- Experience with GenAI, Agentic AI systems, or LLM orchestration frameworks.
- Exposure to LangChain, AutoGen, or RAG-based systems.
- Experience with additional ML frameworks such as TensorFlow, JAX, or Ray.
- Knowledge of GPU/accelerator-based systems and high-performance networking (RDMA, InfiniBand, RoCE).
- Experience with advanced MLOps workflows or large-scale AI platform operations.
Work Environment and Benefits
This is a full-time role based in Riyadh, Saudi Arabia. Qualcomm offers a competitive compensation package that includes salary, housing and transport allowance, stock (RSUs), and a performance-related bonus. Additional benefits include paid maternity and paternity leave, an employee stock purchase scheme, child education allowance, relocation and immigration support, and life and medical insurance. A Live+ Well reimbursement is also provided for health and recreational membership fees.
Requirements
- Requires 2-5 Years experience
Similar Jobs
You may also like
- Related Infrastructure & Site Reliability Engineer Datacentre AI Engineering SA Opportunities
- Waiter Jobs in Riyadh
- Sales Representative Jobs in Riyadh
- Nurse Specialist Jobs in Riyadh
- Data Entry Agent Jobs in Riyadh
- Videographer And Designer Jobs in Riyadh
- Other Job Fields in Riyadh
- Waiter Jobs in Riyadh
- Sales Representative Jobs in Riyadh
- Nurse Specialist Jobs in Riyadh
- Data Entry Agent Jobs in Riyadh
- Videographer And Designer Jobs in Riyadh
- Business Development Manager Jobs in Riyadh
- Marketing Specialist Jobs in Riyadh
- Sales Manager Jobs in Riyadh
- Sales Consultant Jobs in Riyadh
- Sales Accountant Jobs in Riyadh
- Explore Jobs Across Saudi Arabia
- Interior Design Engineer Jobs in Jeddah
- Human Resources Specialist Jobs in Al Baha
- Minitruck driver Jobs in Yanbu
- Digital Marketing Manager Jobs in Riyadh
- Pastry Chef Jobs in Riyadh
- Hotel Manager Jobs in Makkah
- Digital Marketing Specialist Jobs in Al Bukayriyah
- Marketing Specialist Jobs in Jeddah
- Medical Laboratory Technician Jobs in Abha
- Financial Accounts Manager Jobs in Riyadh