About the Role
Qualcomm Middle East Information Technology Company LLC is seeking an AI Platform & Inference Suite Engineer at the Staff/Senior Staff level to join our team in Riyadh, KSA. At Qualcomm, we are enabling a world where everyone and everything can be intelligently connected. This role is a customer-facing, highly technical position focused on supporting the enablement of rack-scale deep learning workloads on advanced Qualcomm AI inference accelerators. These accelerators leverage Qualcomm's expertise in hardware-accelerated AI to deliver high-performance, energy-efficient generative AI and computer vision inference solutions for modern data centers.
Core Responsibilities
The engineer will be instrumental in porting, optimizing, and validating deep learning AI models on production systems. This includes enabling Qualcomm's partners to develop and deploy advanced machine learning applications, such as computer vision, speech, generative AI, and multimodal reasoning models, using popular frameworks like PyTorch, TensorFlow, and ONNX on Qualcomm Cloud AI accelerators. Key responsibilities involve deploying, optimizing, and scaling deep learning AI models onto accelerator-based data center platforms, including model conversion workflows, quantization techniques (INT8 / mixed precision), and runtime integration and optimization. The role also requires integrating ML models onto Qualcomm's Cloud AI ML stack and driving improvements in model throughput, latency, and accuracy with clear trade-off analysis.
- Building, testing, and deploying scalable inference pipelines using serving frameworks such as vLLM, TGI, and Triton.
- Optimizing workloads for LLM and GenAI models across multi-SoC and multi-card architectures.
- Collaborating with engineering teams to analyze and refine training and inference for advanced deep learning applications, identifying bottlenecks across compute, memory, and runtime, and guiding optimization strategies.
- Contributing to Qualcomm's Cloud AI GitHub repository and developer documentation, sharing technical best practices and solutions.
- Developing and integrating end-to-end ML application pipelines with customer frameworks and libraries.
- Acting as a trusted technical advisor for customers deploying AI workloads, engaging in hardware sizing and architecture discussions, and providing technical guidance on AI model selection, deployment feasibility, system architecture, and performance expectations.
- Leading discussions on model capabilities and limitations based on real customer use cases, assessing AI model requirements, and recommending alternative model approaches when necessary.
- Aligning model characteristics with accelerator and system capabilities, and supporting customers in defining model selection strategies based on deployment realities.
- Evaluating performance characteristics of AI models in production scenarios and guiding architecture decisions around scaling strategies and hardware deployment sizing.
- Contributing to discussions on workload scalability limits and providing insights into capacity planning and infrastructure optimization.
- Driving discussions around end-to-end AI pipelines, including multi-model workflows and data preprocessing and post-processing stages, and guiding decisions on video and data processing stacks.
- Highlighting and explaining trade-offs between accuracy vs compatibility, model quality vs deployment feasibility, model simplification vs performance gains, and precision vs efficiency.
- Leading or supporting model capability validation in deployment environments and collaborating with customers to define inference assumptions and model sizing strategies for large-scale workloads.
Required Qualifications
Candidates should possess a Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience. The role requires 10–15+ years of experience in deep learning model development or deployment on CPUs/GPUs/ASICs, inference systems and optimization, and data center or edge AI platforms. Strong experience with model quantization and optimization techniques, AI model frameworks (*, PyTorch, TensorFlow), and model deployment pipelines is essential. Excellent C/C++/Python programming and software design skills, including debugging and performance analysis, are required.
- Hands-on expertise with Linux-based systems, low-level software, drivers, and system bring-up.
- Proven ability to analyze and optimize model performance in production environments.
- Solid understanding of AI inference hardware constraints and system-level performance bottlenecks.
- Strong communication skills and experience in customer-facing technical roles.
- Willingness to travel for customer engagements and strategic reviews.
- Skilled in deploying models on platforms that use hardware accelerators for inference.
- Experienced with managing multi-model workflows and building real-time AI systems, including computer vision, video, and analytics projects.
- Knowledgeable about distributed inference methods and handling large-scale model deployments.
- Proficient in developing and maintaining video processing workflows and using relevant software frameworks.
- Deep understanding of how system-level decisions affect performance in actual deployment environments.
- Capable of simplifying complex technical ideas into straightforward, useful advice for clients.
- Hands-on experience running deep learning models on popular ML frameworks such as PyTorch, TensorFlow, ONNX.
- Experience developing software solutions that run in Linux environments with containers and orchestration.
- Experience with source code and configuration management tools, with Git knowledge required.
- Customer-facing experience translating customer requirements into technical solutions (discovery, scoping, success criteria, and execution plans).
- Proven ability to build and deliver technical demos, proofs-of-concept, and reference applications for ML/GenAI workloads.
- Strong technical writing skills to produce customer-ready documentation and deliver partner training sessions.
- Experience driving issue triage and technical escalations with customers, coordinating across product, hardware, and software engineering teams to resolution.
- Excellent stakeholder management and communication skills, with the ability to present complex technical concepts clearly to both engineering and non-engineering audiences.
- A Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 6+ years of Software Engineering or related work experience, OR a Master's degree in Engineering, Information Systems, Computer Science, or related field and 5+ years of Software Engineering or related work experience, OR a PhD in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience.
- 3+ years of work experience with programming languages such as C, C++, Java, Python, etc.
Technical Skills and Expertise
Proficiency in AI model porting and optimization, model conversion workflows, and quantization techniques (INT8 / mixed precision) is expected. Expertise in runtime integration and optimization, machine learning models, and popular frameworks such as PyTorch, TensorFlow, and ONNX is required. Experience with inference pipelines, including vLLM, TGI, and Triton, is necessary for building and deploying scalable solutions. A strong understanding of LLM and GenAI models, deep learning applications, and AI inference hardware constraints, along with system-level performance bottlenecks, is crucial.
- Customer-facing technical engagement, AI model selection, deployment feasibility, system architecture, and performance expectations.
- Model-infrastructure alignment, understanding memory constraints, accelerator architecture, and scaling limitations.
- Performance and scalability engineering, including scaling strategies (horizontal vs vertical) and hardware deployment sizing.
- Workload scalability limits and end-to-end AI pipeline design, including multi-model workflows, data preprocessing, and post-processing stages.
- Video pipeline choices (*, FFMPEG vs GStreamer) and integration into inference pipelines, ensuring alignment with performance requirements and real-time constraints.
- Model trade-off analysis and validation, including model simplification and precision vs efficiency trade-offs.
- Deep learning model development, inference systems, and data center or edge AI platforms.
- Model quantization and optimization techniques, and model deployment pipelines.
- Proficiency in C++, Python, software design, debugging, and performance analysis.
- Hands-on experience with Linux-based systems, low-level software, drivers, and system bring-up.
- Understanding of AI inference hardware and system-level performance bottlenecks.
- Experience with hardware accelerators for inference, real-time AI systems, computer vision, and video analytics projects.
- Knowledge of distributed inference methods and large-scale model deployments.
- Proficiency in video processing workflows and relevant software frameworks.
- Experience with Linux environments, containers, orchestration, and source code management tools like Git.
- Customer requirement translation into technical solutions, including discovery, scoping, success criteria, and execution plans.
- Ability to build and deliver technical demos, proofs-of-concept, and reference applications.
- Technical writing skills for documentation and partner training.
- Experience with issue triage, technical escalations, and stakeholder management.
Work Environment and Location
This is a full-time position based in Riyadh, KSA. The role involves customer-facing interactions and may require travel for customer engagements and strategic reviews. Qualcomm is an equal opportunity employer and is committed to providing an accessible process for individuals with disabilities. Employees are expected to abide by all applicable policies and procedures, including security and confidential information requirements.