Remote Quality Specifications And Metrics Specialist Jobs in Saudi Arabia

Turing

About the Role

Turing, a leading research accelerator for cutting-edge AI labs and a trusted partner for global enterprises, announces the need to hire an AI Quality Analyst (Arabic) for remote work. This contract role requires 0-1 years of experience and focuses on evaluating a new personalization feature for Gemini. The analyst will assess how effectively the AI model uses information from past Gemini conversations, Gmail activity, Google Search, and YouTube activity to provide relevant and helpful responses. This position requires a blend of creativity in designing prompts based on personal experiences and analytical rigor in evaluating the outputs of personalized AI.

Key Tasks and Responsibilities

Design and implement multi-turn conversational prompts (typically 1-5 turns) that require the AI to use personal information and experiences.
Evaluate model responses based on the intent of the initial prompt, verifying that personalization is applied appropriately.
Analyze responses for "Grounding" issues, ensuring that claims related to the user are supported by evidence and are not false inferences or hallucinations.
Assess the quality of integration to ensure personal data is seamlessly incorporated into the response without robotic exaggeration.
Rigorously evaluate and compare two model responses side-by-side (SxS) to determine which is more helpful, usable, and enjoyable.
Write clear and defensible justifications for comparisons, explicitly referencing where issues or positive aspects occurred in the conversation.
Extract and verify "Debug Info" from the model to ensure chat summaries and data sources are used correctly.
Maintain strict data hygiene by deleting evaluation conversations to prevent contamination of future chat logs.

Qualifications and Requirements

Ability to read and write Arabic at a high level of proficiency, as Arabic is the focus language for this project.
Willingness to use a primary personal Google account (not a test account) and enable personal data sources for genuine evaluation.
Full-time availability within the local time zone.
Proven ability to evaluate nuanced and ambiguous AI responses, particularly assessing the quality of personalization.
Experience in designing creative, multi-turn starting prompts based on personal context to thoroughly test model capabilities.
Understanding of personalization concepts, including the ability to identify incorrect personalization, weak inferences, and forced connections.
Meticulous attention to detail, with the ability to review model responses side-by-side (SxS) and identify nuances in naturalness and exaggeration.
Superior ability to write clear, concise, and structured justifications for model ratings, explicitly referencing specific turn numbers.
Ability to provide constructive feedback and detailed commentary.
Excellent communication and collaboration skills.
Self-motivated and able to work independently in a remote work environment.
Requires a desktop/laptop setup with a good internet connection.
Bachelor's degree or equivalent experience in a relevant field (*, Politics, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field).
Strongly preferred experience in data annotation, AI quality evaluation, content moderation, or a related role.

Core Skills

Native proficiency in Arabic
Personal account usage
Schedule flexibility
Exceptional analytical thinking
Creative prompt engineering
Strong evaluation capability
Meticulous attention to detail
Excellent writing skills
Feedback provision
Communication and collaboration
Independence
Technical setup (desktop/laptop with good internet connection)

Additional Role Details

This role is a remote contract with Turing, based in Saudi Arabia. The role requires a full-time commitment, with a minimum of 4 hours per day and 30 hours per week, including 4 hours of overlap with the Pacific Standard Time (PST) zone. 30 or 40 hours per week commitment options are available. The contract duration is 3 months.

0-1 years

Saudi Arabia

Remote AI Quality Analyst (Arabic)

Turing

About the Role

Turing, a leading research accelerator for cutting-edge AI labs and a trusted partner for global enterprises, announces the need for a "Remote AI Quality Analyst (Arabic)" for a contract position in Saudi Arabia. This role focuses on evaluating a new personalization feature for Gemini, with an emphasis on how effectively AI uses user data from past conversations, Gmail, Google Search, and YouTube activity to provide relevant and helpful responses. The position requires a unique blend of creativity in prompt engineering and analytical rigor in evaluating AI outputs.

Key Tasks and Responsibilities

Design and implement multi-turn conversational prompts (typically 1-5 turns) that require the AI to use your personal information and experiences.
Evaluate model responses based on your intent from the initial prompt, verifying if personalization has been applied appropriately.
Analyze responses for "Grounding" issues, ensuring that claims about you are supported by evidence and are not false inferences or hallucinations.
Assess the quality of "Integration" to ensure personal data is naturally incorporated into the response without robotic "over-narration".
Accurately rate and rank two model responses side-by-side (SxS) to determine which is more helpful, user-friendly, and generally enjoyable.
Write clear and defensible justifications for your comparisons, explicitly referencing where issues or positive aspects occurred in the conversation.
Extract and verify "Debug Info" from the model to confirm that chat summaries and data sources are being used correctly.
Maintain strict data hygiene by deleting evaluation conversations to prevent them from polluting your future chat history.

Qualifications and Requirements

Ability to read and write Arabic at a highly proficient level, as Arabic is the pivotal language for this project.
Willingness to use your primary personal Google account (not a test account) and enable personal data sources for genuine evaluation.
Full-time availability in your local time zone is required, as the team operates globally 24/7.
Proven ability to evaluate nuanced and absent AI responses, with a particular assessment of personalization quality.
Experience in designing creative, multi-turn starting prompts based on personal context to thoroughly test model capabilities.
Understanding of personalization concepts, including the ability to identify incorrect personalization, weak inferences, and forced connections.
Meticulous attention to detail, with the ability to review model responses side-by-side (SxS) and detect nuances in naturalness and over-narration.
Superior ability to write clear, concise, and structured justifications for model ratings, explicitly referencing specific turn numbers.
Ability to provide constructive feedback and detailed commentary.
Excellent communication and collaboration skills.
Self-motivated and able to work independently in a remote work environment.
Desktop/laptop setup with a good internet connection is required.
Bachelor's degree or equivalent experience in a relevant field such as Politics, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field.
Experience in data annotation, AI quality evaluation, content moderation, or a related role is highly preferred.

Required Skills

Native proficiency in Arabic
Personal account usage
Schedule flexibility
Exceptional analytical thinking
Creative prompt engineering
Strong evaluation capability
Meticulous attention to detail
Excellent written communication
Feedback provision
Communication and collaboration
Independence
Technical setup

Contract and Commitment Details

This is a full-time contract position requiring a commitment of at least 30 hours per week. Options for 30 or 40 hours of work per week are available. The contract duration is 3 months. The role requires 4 hours of overlap with Pacific Standard Time (PST).

0-1 years

Saudi Arabia

Remote AI Quality Analyst (Arabic)

Turing

About the Role

Turing, a leading research accelerator for advanced AI labs and a trusted partner for global enterprises, announces its need for a **Remote AI Quality Analyst** proficient in the Arabic language. This contract role, requiring a commitment of at least 30 hours per week, focuses on evaluating a new personalization feature for Gemini, aiming to ensure the AI model's effectiveness in utilizing user data from past conversations, Gmail, Google Search, and YouTube activity to provide more relevant and helpful responses. The role demands a unique blend of creativity in prompt engineering and analytical rigor in evaluating AI outputs.

Key Tasks and Responsibilities

Design and execute multi-turn conversational prompts (typically 1-5 turns) that require the AI to use your personal information and experiences.
Evaluate the model's responses based on your intent from the initial prompt, verifying if personalization has been applied appropriately.
Analyze responses for "Grounding" issues, ensuring claims about you are supported by evidence and not false inferences or hallucinations.
Assess the quality of "Integration" to ensure personal data is seamlessly incorporated into the response without robotic "over-narration."
Accurately rate and compare two model responses side-by-side (SxS) to determine which is more helpful, usable, and enjoyable.
Write clear and defensible justifications for your comparisons, explicitly referencing where issues or positive aspects occurred in the conversation.
Extract and verify "Debug Info" from the model to confirm correct usage of chat summaries and data sources.
Maintain strict data hygiene by deleting evaluation conversations to prevent contamination of your future chat history.

Qualifications and Requirements

High proficiency in reading and writing Arabic, as Arabic is the focus language for this project.
Willingness to use your primary personal Google account (not a test account) and enable personal data sources for genuine evaluation.
Full-time availability within your local time zone is required, as we are building a 24-hour global operations team.
Demonstrate exceptional analytical thinking and the ability to evaluate nuanced and rich AI responses, with a particular focus on personalization quality.
Experience in designing creative multi-turn starter prompts based on personal context to thoroughly test model capabilities.
Strong evaluation skills, including the ability to identify incorrect personalization, weak inferences, and forced connections.
Meticulous attention to detail, with the ability to review model responses side-by-side (SxS) and detect subtle differences in naturalness and over-narration.
Superior ability to write clear, concise, and structured justifications for model rankings, explicitly referencing specific turn numbers.
Ability to provide constructive feedback and detailed commentary.
Excellent communication and collaboration skills.
Self-motivated and able to work independently in a remote environment.
Possess a desktop/laptop computer with a good internet connection.
Bachelor's degree or equivalent experience in a relevant field (*, Politics, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field).
Experience in data annotation, AI quality evaluation, content moderation, or a related role is highly preferred.

Core Skills

Arabic Language Proficiency
Personal Account Usage
Schedule Flexibility
Analytical Thinking
Creative Prompt Engineering
Evaluation Capability
Attention to Detail
Written Communication
Feedback Provision
Communication and Collaboration
Self-Reliance and Motivation
Technical Setup (Desktop/Laptop with good internet)

Work Details

This is a **full-time contract** role, requiring a commitment of no less than 30 hours per week, with the option to commit to either 30 or 40 hours per week. The work must include 4 hours of overlap with Pacific Standard Time (PST). The engagement duration is 3 months. Work is remote, with a focus on the Saudi Arabia region.

0-1 years

Saudi Arabia

Remote AI Quality Analyst (Arabic)

Turing

About the Role

Turing, a leading research accelerator for cutting-edge AI labs and a trusted partner for global enterprises deploying advanced AI systems, announces a need for a Remote AI Quality Analyst fluent in Arabic. This contract role, focused on Saudi Arabia, is vital for evaluating a new personalization feature for Gemini. The incumbent will assess how effectively the AI model leverages user data from past conversations, Gmail, Google Search, and YouTube activity to provide more relevant and helpful responses. The role demands a unique blend of creativity in prompt engineering and analytical rigor in evaluating AI outputs.

Role Responsibilities

Design and execute multi-turn conversational prompts, typically ranging from 1 to 5 turns, requiring the AI to use personal information and expertise.
Evaluate model responses against the initial prompt's intent, verifying appropriate application of personalization.
Analyze responses for grounding issues, ensuring claims about the user are supported by evidence and not based on false inferences or hallucinations.
Assess the quality of integration to ensure personal data is seamlessly incorporated into the response without sounding robotic or overly narrative.
Rigorously evaluate and rank two model responses side-by-side (SxS) to determine which is more helpful, user-friendly, and enjoyable overall.
Write clear and defensible justifications for comparisons, explicitly referencing specific turns where conversational issues or positive aspects occurred.
Extract and verify "Debug Info" from the model to confirm correct usage of chat summaries and data sources.
Maintain strict data hygiene by deleting evaluation conversations to prevent contamination of future chat logs.

Qualifications and Requirements

Ability to read and write Arabic at a high level of proficiency, as Arabic is the language of focus for this project.
Willingness to use a primary personal Google account (not a test account) and enable personal data sources for genuine evaluation.
Full-time availability in the local time zone is required, to contribute to a 24-hour global operations team.
Proven ability to evaluate nuanced and subjective AI responses, with a particular focus on personalization quality.
Experience in designing creative, multi-turn starter prompts based on personal context to thoroughly test model capabilities.
Understanding of personalization concepts, including the ability to identify incorrect personalization, weak inferences, and forced connections.
Meticulous attention to detail, with the ability to review model responses side-by-side (SxS) and identify nuances in naturalness and over-narratization.
Superior ability to write clear, concise, and structured justifications for model rankings, explicitly referencing specific turn numbers.
Ability to provide constructive feedback and detailed commentary.
Excellent communication and collaboration skills.
Self-motivated and able to work independently in a remote work environment.
A desktop/laptop setup with a good internet connection must be available.
Bachelor's degree or equivalent experience in a relevant field such as Politics, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field.
Experience in data annotation, AI quality evaluation, content moderation, or a related role is highly preferred.

Core Skills

Arabic Fluency
Personal Account Usage
Schedule Flexibility
Exceptional Analytical Thinking
Creative Prompt Engineering
Strong Evaluation Capability
Meticulous Attention to Detail
Excellent Writing Skills
Feedback Provision
Communication and Collaboration
Independence and Self-Motivation
Technical Setup Readiness

Additional Details

This is a contract role with Turing. It requires a minimum commitment of 4 hours per day and 30 hours per week, with 4 hours of overlap with Pacific Standard Time (PST). Two commitment options are available: 30 hours/week or 40 hours/week. The engagement duration is 3 months. Upon application, candidates will receive an email with a login link to access the portal and complete their profile.

0-1 years

Saudi Arabia

Bilingual Community Manager

DataAnnotation

94 / Hour

Full-time

About the Role

DataAnnotation is seeking a skilled Bilingual Community Manager to join its remote team. This position is integral to the advancement of AI models, with a specific focus on training and enhancing AI chatbots. The role involves evaluating model progress, logic, and overall quality through expert linguistic analysis and problem-solving.

Role Context and Objectives

As a Bilingual Community Manager, you will engage in conversations with AI chatbots in both Arabic and English. The primary objective is to measure their development, assess performance, and identify areas for improvement. This includes evaluating existing interactions and creating new conversational scenarios to guide the AI's understanding and responses, requiring an expert-level grasp of linguistics.

Key Responsibilities

Engage AI chatbots in diverse and complex conversations in both Arabic and English to assess their capabilities.
Evaluate AI model outputs for correctness, logical consistency, and overall performance.
Develop and write novel conversational content to guide and improve the AI's understanding and responses.
Identify and solve problems that hinder the quality and effectiveness of AI models.
Measure and report on the progress of AI chatbot development.

Qualifications and Requirements

Fluency in both Arabic and English is essential.
Demonstrated expertise in writing and grammar skills in both languages.
A detail-oriented approach to tasks and analysis.
Expert-level knowledge of linguistics is required.
A Bachelor's degree (current, in progress, or completed) is preferred but not mandatory.

Work Arrangement and Compensation

This is a full-time, remote independent contract position. The role offers flexibility in choosing projects and setting your own schedule. Compensation is competitive, starting at $25+ USD per hour, with potential for bonuses based on high-quality and high-volume work. Payments will be processed via PayPal. Applicants must reside in Saudi Arabia to be considered for this role.

5-10 years

Saudi Arabia

5 days ago

AI-native QA Engineer

Jobgether

Full-time

About the Role

This position is listed on behalf of a partner company. The partner company is seeking an AI-native QA Engineer to be based in Saudi Arabia. This role is designed for a QA professional who will leverage AI to redefine quality assurance processes and scaling. The position operates within a fast-moving startup environment focused on experimentation, automation, and speed. The core of this role involves building intelligent QA systems powered by Large Language Models (LLMs) to accelerate test creation, execution, and defect analysis. The engineer will contribute to enhancing product reliability across complex AI-driven workflows and agent-based systems. Collaboration with engineering, product, and R&D teams will ensure releases are robust, validated, and production-ready. This is a high-impact role where QA is integrated as a core component of the AI development lifecycle.

In this role, you will design and evolve AI-powered QA systems to improve automation coverage, enhance defect detection, and strengthen release confidence for AI-driven products.

Key Responsibilities

Build and maintain AI-powered QA frameworks that automate test design, test execution, and validation workflows using LLM-assisted tools and modern automation stacks.
Develop automated test suites for functional, regression, integration, and performance testing across complex agent-based systems.
Utilize LLMs (such as GPT, Claude, and similar models) to accelerate test generation, bug analysis, and workflow validation.
Analyze defects and production incidents, identify root causes, and implement preventive testing strategies to reduce regressions.
Integrate automated testing pipelines into CI/CD workflows to ensure continuous quality validation.
Collaborate with engineering and product teams to define test strategies and ensure robust release validation.
Improve QA processes, expand test coverage, and enhance automation efficiency across evolving systems.

Required Qualifications

3+ years of QA experience, with at least 2+ years in test automation roles.
Proven hands-on experience using LLMs (GPT, Claude, Llama, or similar) in QA workflows and automation pipelines.
Strong programming skills in Python, with expertise in PyTest and test automation frameworks.
Solid understanding of REST APIs, client-server architecture, and web service testing.
Experience working with SQL databases such as PostgreSQL or ClickHouse.
Familiarity with CI/CD pipelines and automated testing integration.
Strong analytical thinking and the ability to work independently on complex systems and logic.
Ability to align work with EST time zone requirements.

Key Skills

AI-native QA Engineering
Artificial Intelligence (AI)
Large Language Models (LLMs)
Test Automation
Automation Coverage
Defect Detection
Release Confidence
AI-Powered QA Frameworks
LLM-Assisted Tools
Modern Automation Stacks
Automated Test Suites
Functional Testing
Regression Testing
Integration Testing
Performance Testing
Agent-Based Systems
Test Generation
Bug Analysis
Workflow Validation
Defect Analysis
Root Cause Analysis
Preventive Testing Strategies
CI/CD Pipelines
Continuous Quality Validation
Test Strategies
Release Validation
QA Processes
Test Coverage
Automation Efficiency
Python Programming
PyTest
Test Automation Frameworks
REST APIs
Client-Server Architecture
Web Service Testing
SQL Databases
PostgreSQL
ClickHouse
Automated Testing Integration
Analytical Thinking
Complex Systems and Logic
AI Tools

Work Environment and Details

This is a full-time position based in Saudi Arabia. The role is part of a fast-paced startup culture that emphasizes strong engineering collaboration. The company offers a competitive compensation package aligned with experience. A remote-first work environment is provided, along with 20 paid time off days plus * holidays. Access to a modern AI-driven technology stack is available, offering a high degree of autonomy and freedom to experiment with new tools and approaches. Support for professional development and learning reimbursement is also provided.

Jobgether utilizes an AI-powered matching process for efficient and objective candidate review. Shortlisted candidates are shared with the hiring company for final decisions. By applying, you acknowledge that Jobgether will process your personal data for evaluation and sharing with the hiring employer, based on legitimate interest and pre-contractual measures. You may exercise your data rights at any time. AI tools may support parts of the hiring process, assisting the recruitment team.

2-5 years

Saudi Arabia