Senior Site Reliability Engineer (SRE) in Moventi

Remote | Senior | Full time | SysAdmin / DevOps / QA

26 applications

Replies between 13 and 21 days

Last checked 6 days ago

Apply now

ⓘ Requires applying in English

Moventi is a forward-thinking technology company dedicated to helping organizations embrace innovation through cutting-edge technologies. We provide an environment where multidisciplinary talents collaborate to drive digital transformation and shape the future with technology. Our teams engage with state-of-the-art cloud infrastructure projects and large-scale SaaS platforms designed to scale and maximize reliability and efficiency.

Our current project focuses on designing, building, and operating highly available, scalable infrastructure on cloud platforms such as AWS EKS and Azure AKS. This includes advanced data ingestion pipelines handling hundreds of terabytes to petabytes of data, leveraging technologies like Clickhouse and Kafka. We prioritize observability, security, and automation using modern tools and practices to ensure robust SaaS operations.

Official source: getonbrd.com.

Primary Responsibilities and Role Overview

As a Senior Site Reliability Engineer (SRE), you will be instrumental in architecting, deploying, and managing complex cloud-native infrastructure in large-scale SaaS environments. Your role includes designing and maintaining Kubernetes clusters (AWS EKS, Azure AKS) using Infrastructure as Code tools such as Terraform, ensuring seamless scalability and reliability.

You will be responsible for implementing observability solutions across monitoring, logging, tracing, and metrics collection using Prometheus, Grafana, Datadog, and ELK ecosystem to maintain robust system health and performance. Managing data ingestion pipelines with high-volume technologies like Clickhouse and Kafka will be part of your core tasks.

Collaboration with cross-functional teams to drive GitOps CI/CD processes, writing automation scripts in Python, Go (Golang), and Bash to streamline deployments and operations, and enforcing security best practices involving encryption, key management, and policy enforcement are key duties. In addition, you will design and maintain disaster recovery strategies for critical components such as MySQL, Kafka, and Zookeeper, ensuring business continuity and data integrity.

This role requires active participation in a large, dynamic team environment where your expertise will elevate operational standards and foster innovation around high availability SaaS platforms.

Required Experience and Skills

We are looking for a professional with a minimum of 7 years designing, building, and maintaining SaaS environments with a strong focus on large-scale cloud infrastructure. Candidates must have at least 5 years of hands-on experience working with AWS EKS and Azure AKS managed Kubernetes clusters, leveraging Terraform for infrastructure automation.

Key technical expertise includes managing data ingestion in large environments through Clickhouse and Kafka, handling data volumes scaling up to petabytes. A solid background in observability tools such as Prometheus, Grafana, Datadog, and ELK for monitoring, logging, and tracing is mandatory.

Expertise in GitOps practices and CI/CD pipelines is essential. We expect proven scripting skills in Python, Go (Golang), Bash, and AWS CLI to automate operational workflows. Candidates must demonstrate at least 3 years of experience managing security operations including enforcing infrastructure security policies, managing encryption at rest and in transit, and key management.

In-depth knowledge of disaster recovery planning and execution for distributed systems including MySQL, Kafka, and Zookeeper is required. The ideal candidate should be a proactive and collaborative team player, able to communicate complex technical concepts effectively while taking initiative in a fast-paced innovative environment. Strong problem-solving skills, risk management, and a passion for continuous improvement are imperative to succeed in this role.

Additional Preferred Skills and Experience

Experience with service mesh technologies such as Istio and Kubernetes operators is highly valued to optimize application connectivity and lifecycle management. Familiarity with large scale data architectures and tuning Kafka clusters would be advantageous. Prior exposure to cloud security frameworks and compliance standards, as well as experience leading or mentoring SRE teams within enterprise environments, will be considered a strong plus.

An innovative mindset with eagerness to explore emerging tools and automation techniques is welcome, along with excellent interpersonal skills suitable for a multidisciplinary collaborative culture.

Our Offer and Work Environment

At Moventi, we provide a stimulating and innovative work atmosphere located in the heart of San Isidro, Lima, featuring convenient parking and recreational zones to promote wellbeing during breaks. Our hybrid work model gives you the flexibility to balance time between our fully equipped office and remote work tailored to team and personal preferences.

We prioritize professional growth by exposing you to challenging, diverse projects that foster continuous learning and skills enhancement. Our organizational culture is based on transparency, collaboration, commitment, risk-taking, and innovation. We offer formal employment contracts with full legal benefits from day one, ensuring job security and comprehensive protections.

GETONBRD Job ID: 53520

Fully remote You can work from anywhere in the world.

Informal dress code No dress code is enforced.

Remote work policy

Fully remote

Candidates can reside anywhere in the world.

About Moventi

Technology is dramatically changing the world we live in. Our focus is to help organizations lead the way using technology with an innovation mindset. — Moventi's full profile