
Infrastructure & Site Reliability Engineer Datacentre AI Engineering SA📣 إعلان
| نوع العقد | دوام كامل | |
| طبيعة الوظيفة | بالموقع | |
| الموقع | الرياض |
وصف الوظيفة
About the Role
Qualcomm is expanding its presence in Riyadh and is seeking to hire an Infrastructure & Site Reliability Engineer for its Datacentre AI Engineering team. This full-time position is based in Riyadh, Saudi Arabia, and requires 2-5 years of experience. The role focuses on supporting Qualcomm's growing infrastructure and critical AI use cases as Saudi Arabia advances its digital transformation.
Role Overview
This position involves the design, operation, and continuous improvement of large-scale AI inference systems within a datacenter environment. The engineer will ensure Qualcomm's AI infrastructure is reliable, scalable, and production-ready for advanced machine-learning workloads. The role demands strong systems and software engineering fundamentals, hands-on execution, and the ability to work independently while collaborating with cross-functional teams.
Key Responsibilities
- Design, deploy, and operate large-scale AI inference systems for critical AI workloads.
- Ensure the reliability, availability, and scalability of Qualcomm datacenter AI clusters.
- Develop and maintain software tools and support infrastructure for AI software stacks.
- Analyze software requirements and collaborate with architecture and hardware engineers.
- Build, deploy, and operate components supporting LLM inference, agentic AI workflows, and AI services.
- Improve model performance on AI100 deployments by working with models, systems, and software teams.
- Identify and implement optimizations for workloads on multi-SoC and multi-card systems.
- Apply Site Reliability Engineering (SRE) fundamentals including monitoring, alerting, incident response, and performance optimization.
- Support production ML systems using MLOps tools and operational best practices.
- Contribute to incident reviews, operational documentation, and continuous reliability improvements.
- Build and maintain observability tools, dashboards, and alerts.
- Monitor infrastructure and services using tools like Prometheus, Grafana, CloudWatch, and custom telemetry.
- Create and maintain technical documentation, runbooks, and knowledge-base articles.
- Develop automation to reduce manual operational tasks and improve system reliability.
- Support CI/CD pipelines for AI service and agent deployment.
- Apply Infrastructure-as-Code practices using tools such as Terraform and Ansible.
Required Qualifications and Skills
- Bachelor's or Master's degree in engineering, Computer Science, AI/ML, or a related field.
- 2–8 years of software, systems, or infrastructure engineering experience, preferably in production or datacenter environments.
- Experience with AI/ML workloads such as LLMs, NLP, Vision, Audio, or Recommendation systems.
- Understanding of ML inference concepts including batching, token streaming, and performance considerations.
- Hands-on experience with PyTorch and familiarity with modern ML frameworks.
- Familiarity with distributed inference, checkpointing, and accelerator-based compute environments.
- Experience supporting AI or ML applications in production environments.
- Familiarity with LLM inference pipelines and AI service operations.
- Strong programming skills in Python with experience building and supporting production systems.
- Experience with scripting and automation using Python and Bash.
- Familiarity with configuration management and orchestration tools.
- Strong Linux fundamentals including shell, containers, system services, and networking basics (DNS, TLS, HTTP/gRPC).
- Experience working with cluster schedulers such as Slurm or equivalent systems.
- Experience operating distributed systems with high availability and fault tolerance.
- Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, ELK, or Loki.
- Understanding of incident management, service health metrics, and system reliability monitoring.
- Solid understanding of SDLC, release processes, and operational reliability practices.
- Familiarity with CI/CD pipelines and Infrastructure-as-Code tools.
Preferred Skills
- Experience with GenAI, Agentic AI systems, or LLM orchestration frameworks.
- Exposure to LangChain, AutoGen, or RAG-based systems.
- Experience with additional ML frameworks such as TensorFlow, JAX, or Ray.
- Knowledge of GPU/accelerator-based systems and high-performance networking (RDMA, InfiniBand, RoCE).
- Experience with advanced MLOps workflows or large-scale AI platform operations.
Work Environment and Benefits
This is a full-time role based in Riyadh, Saudi Arabia. Qualcomm offers a competitive compensation package that includes salary, housing and transport allowance, stock (RSUs), and a performance-related bonus. Additional benefits include paid maternity and paternity leave, an employee stock purchase scheme, child education allowance, relocation and immigration support, and life and medical insurance. A Live+ Well reimbursement is also provided for health and recreational membership fees.
متطلبات الوظيفة
- تتطلب ٢-٥ سنوات خبرة
وظائف مشابهة
قد يعجبك أيضاً
- وظائف ذات صلة بـ Infrastructure & Site Reliability Engineer Datacentre AI Engineering SA
- وظائف محاسب زبائن (كاشير) في الدمام
- وظائف مندوب مبيعات في الدمام
- وظائف Executive Secretary في الدمام
- وظائف موظف استقبال في الدمام
- وظائف Human Resources Specialist في الدمام
- مجالات وظيفية أخرى في الرياض
- وظائف محاسب زبائن (كاشير) في الرياض
- وظائف أخصائي تسويق إلكتروني في الرياض
- وظائف مندوب مبيعات في الرياض
- وظائف Project Coordinator في الرياض
- وظائف Business Analyst في الرياض
- وظائف Marketing Specialist في الرياض
- وظائف Executive Secretary في الرياض
- وظائف موظف استقبال في الرياض
- وظائف Human Resources Specialist في الرياض
- وظائف Waiter في الرياض
- استكشف الوظائف في أنحاء المملكة
- وظائف مدير مشتريات في مكة المكرمة
- وظائف أخصائي خدمة عملاء في الدمام
- وظائف فني أجهزة طبية في نفي
- وظائف Translator في المدينة المنورة
- وظائف مصور فيديو ومصمم في الرياض
- وظائف Human Resources Specialist في ابها
- وظائف Sales Engineer في الرياض
- وظائف Project Coordinator في جدة
- وظائف بائع في الرياض
- وظائف مندوب مبيعات في ابها