Talento | Employers

El siguiente reto de tu carrera profesional

Publicado hace 29 días

icon job

Senior

Staff Database Reliability Engineer

$150,000 MXN/mes brutos

*Salario especifico depende del proceso de selección

icon portfolio

4-6 años de experiencia

icon facbuildingebook

Híbrido

icon location

None, México

icon portfolio

4-6 años de experiencia

icon building

Híbrido

icon location

None, México

Description

About you 

You’re an analytical problem-solver ready to put your skills toward purposeful work that has a global impact. You want to lead the way in innovation, exploring the latest technologies and finding new solutions. You thrive in a collaborative environment and are eager to work with and learn alongside the best in Product, Design, and Engineering.

About this role:

As part of Udemy's Platform team, the Datastore Infrastructure (DSI) team is responsible for overseeing all aspects of Databases (MySQL, Aurora, DynamoDB), Message Queues (RabbitMQ), Streaming (Kafka), and Caching (Redis, Memcache) in our infrastructure. This includes ensuring uptime, security and compliance, observability, performance,  improving developers' productivity and developing future growth strategies. The team is split between EU and US regions. You will play a vital role in overseeing day-to-day activities and engineering strategies of DSI, ensuring that millions of students worldwide achieve greater learning and career outcomes on Udemy. We value teamwork, a good sense of humor, strong ownership, technological curiosity, and a desire to learn.

To be successful in this role, you will collaborate closely with engineering, product, and a diverse set of stakeholders around the world. You are not just interested in maintaining systems but also writing the software that maintains them. You strongly believe in a no-blame culture and advocate for humane on-call practices. You constantly seek opportunities for improvement and thrive in an environment where you can drive positive change.


What you'll be doing: 

  • Lead improvement projects for our datastores and platform teams to align with the company’s long term objectives.

  • Maintain Infrastructure Uptime, monitor performance, and ensure infrastructure continues scaling as we grow.

  • Ensure adherence to PCI and ISO27001 compliance as well as SOC 2 security requirements, modifying CI/CD processes when necessary, and upholding policies and standards.

  • Advocate for and implement positive changes in tools and processes through healthy discussions.

  • Participate in the on-call rotation, demonstrating a systematic approach to incident management.

  • Participate in day-to-day activities, support requests, and project-related tasks for the team.

  • Contribute to documentation, maintain ticketing queues, provide project support, troubleshoot, and offer after-hours assistance as required

  • Provide coaching and mentorship to new hires, fostering their technical growth and integration into the team. Maintain close communication with team members throughout their tenure.




Requirements

What you’ll have:

  • 3-5 years of professional experience working in an SRE/DBRE team with Infrastructure responsibilities in managing large production workloads.

  • Proficiency with managing MySQL at scale (Horizontal Scaling, sharding, InnoDB optimizations, Query Optimization, HA/DR, Monitoring, Backups Strategy, Security, Automations).

  • Strong understanding in running Production Workloads in Kubernetes

  • Proficiency with tools like Terraform, Ansible, Git and how to work with Infrastructure as Code, and automated provisioning.

  • Strong experience in Kafka cluster management, topic configuration, performance tuning, and ensuring high availability and fault tolerance. Experience with MSK is also good.

  • Experience with  Message Queues (MQ/SQS) and Caching (Redis, Memcache) or similar products

  • Experience in Python.

  • Knowledge of configuration management tools, monitoring systems (Datadog or similar) for database infrastructure, and scaling strategies for handling increased data volumes.

  • Strong troubleshooting skills to diagnose complex database issues.

  • Hands-on experience with AWS cloud infrastructure and a grasp of security best practices.

  • Adaptability and comfort working in a fast-paced, hands-on environment.


Nice to have :  

  • Experience with any additional Programming Languages (Golang, Kotlin, Java)

  • Experience in implementing CDC pipelines for reliable data replication and synchronization

  • Experience with Vitess Operator running MySQL on Kubernetes.

  • Experience with Writing Kubernetes Helm Charts.

  • Experience with tools like ArgoCD/Argo Workflows, or similar alternatives in various combinations.

  • Knowledge of security standards, vulnerability patching, TLS/SSL and related..

  • Any additional experience or familiarity with related technologies would be advantageous.



Skills

MySQL

Python

Kubernetes

¿Listo para dar el siguiente paso?

Compartir Vacante

icon facebook icon twitter icon clipboard