DevSecOps / Site Reliability Engineer📣 Job Ad

in Fnrco

20 days ago

	Contract Type	Full-time
	Workplace type	On-site
	Location	Dammam

Job Description

About the Role

FNRCO is seeking a skilled DevSecOps / Site Reliability Engineer to join our team in Dammam, Eastern Province, Saudi Arabia. This full-time position is focused on the design, operation, and security of regulated cloud environments, primarily on AWS and GCP. The role is central to our Sovereign Cloud strategy, requiring a combination of cloud operations, security engineering, and SRE best practices to ensure platform reliability, security, and compliance.

The ideal candidate will be instrumental in embedding security-by-design principles and automation-first practices across our sovereign cloud platforms. This includes managing Kubernetes, implementing Infrastructure as Code, automating CI/CD pipelines, and ensuring robust secrets management. A strong understanding of cloud security, networking fundamentals, and compliance-driven environments is essential, as is the ability to lead cross-functional incident response efforts.

Key Responsibilities

Deploy, manage, and maintain the organization's Sovereign Cloud strategy, ensuring compliance with regulatory and data residency requirements on AWS/GCP public cloud.
Manage and operate Kubernetes clusters, including upgrades, scaling, and workload optimization.
Investigate, analyze, and remediate cloud security incidents, proactively identifying and mitigating vulnerabilities within AWS environments.
Support and enhance the organization's AWS cloud strategy by embedding security best practices and continuously improving the cloud security posture.
Implement and tune security tools, automate policy-driven responses, and advocate DevSecOps practices to ensure secure-by-design cloud operations.
Implement and manage Infrastructure as Code (Terraform) to provision, modify, and secure cloud resources.
Maintain and optimize CI/CD pipelines using Git/GitLab, ensuring secure and automated deployments.
Manage secrets and secure access using HashiCorp Vault, including token lifecycle, access policies, and secrets rotation.
Troubleshoot complex infrastructure, networking, container, and performance issues across distributed systems.
Monitor system health and performance using Datadog, define and manage SLIs/SLOs, and drive continuous reliability improvements aligned with SRE principles.
Manage 24x7 alerting and incident response through PagerDuty, perform root cause analysis (RCA), and actively contribute to incident, problem, and change management processes.
Conduct proactive system hardening, vulnerability remediation, performance tuning, and capacity planning across cloud environments.
Develop automation using Python/Bash/Terraform/Ansible to reduce manual effort, improve operational efficiency, and strengthen platform resilience.

Qualifications and Requirements

4-5 years of hands-on experience in DevSecOps and Site Reliability Engineering (SRE) roles.
Proven hands-on expertise managing Kubernetes clusters.
Experience with Terraform for Infrastructure as Code (IaC) deployment.
Experience with Git/GitLab for CI/CD and version control best practices.
Experience working with tools such as HashiCorp Vault, Datadog, PagerDuty, and Confluence.
Strong understanding of Cloud Security principles, including IAM, encryption, network security, container security, and vulnerability management.
Experience in incident management, change management, and root cause analysis processes.
Strong understanding of SRE principles (SLIs, SLOs, error budgets, reliability metrics).
Solid grasp of networking fundamentals, including TCP/IP, DNS, load balancing, firewalls, VPNs, and private endpoints.
Experience or exposure to regulated and compliance-driven environments.
Ability to lead incident bridges during P1/P2 outages, coordinating cross-functional teams and driving timely resolution with clear RCA.
Maintains a strong customer and business-focused mindset while prioritizing tasks.

Required Skills

DevSecOps
Sovereign Cloud
Cloud Operations
Security Engineering
SRE Practices
Kubernetes Management
Infrastructure as Code (IaC)
Terraform
CI/CD Automation
Git
GitLab
Secrets Management
HashiCorp Vault
Security Incident Response
Platform Reliability
Observability
Datadog
24x7 Incident Management
PagerDuty
Cloud Security
Networking Fundamentals
Compliance-driven Environments
Cross-functional Incident Leadership
AWS
GCP
Python
Bash
Ansible
IAM
Encryption
Network Security
Container Security
Vulnerability Management
Incident Management
Change Management
Root Cause Analysis (RCA)
SLIs
SLOs
Error Budgets
Reliability Metrics
TCP/IP
DNS
Load Balancing
Firewalls
VPNs
Private Endpoints
Communication
Problem-Solving

Work Environment and Additional Information

This is a full-time position based in Dammam, Eastern Province, Saudi Arabia. Cloud certifications are considered a plus for candidates. Interested individuals are encouraged to forward their CV via the provided link for information on upcoming job vacancies: https://*********.