COVID-19 UPDATE: As our company responds to COVID-19, the health and safety of our communities—including our employees and those considering a career at WarnerMedia—remains our top priority. If you have submitted an application, rest assured that your resume will be reviewed by our recruiting team, it just may take some time as we face this unprecedented situation. We appreciate your understanding and flexibility with any adjustments to our interviewing process. Stay safe and healthy.

Site Reliability Engineer

Atlanta, GA
Full Time
Requisition ID: 187060BR

Share this job

Twitter LinkedIn Copy Link

The Job

WarnerMedia seeks a Site Reliability Engineer to lead SRE efforts within our WarnerMedia Technology & Operations (WMTO) organization. The SRE team owns and manages the infrastructure stack for our unified video delivery platform, a core set of products and workflows that power video acquisition, encoding, delivery, and playback across our WarnerMedia brands.

As a hands-on Site Reliability Engineer, you will be a key contributor to maintain and improve our highly-available, highly-scalable video systems infrastructure using containers, cluster management, cloud services, and performance tools to keep our systems available in 24x7 environments. You will help to build platform automation, configuration management, and service administration for cloud and on-prem environments. You will work closely with our product developers to review cloud architecture footprints, setup cloud stack resources, contribute code for infrastructure needs, setup CI/CD pipelines, and develop new tooling. Our tech stack includes AWS, Kubernetes, Docker, Terraform, Postgres, Mongo, Jenkins, and Elasticsearch.

The Site Reliability Engineer will need to be strong in DevOps and SRE practices, combining software and systems engineering to build large-scale distributed fault-tolerant systems. You will need to lead full end-to-end SRE projects that include cloud solution design, environment creation and configuration, deploying and supporting cloud services, and writing strong technical documentation. You will assist with critical incident management and on-call rotations for after-hours support.

The Daily

  • Code, scale, and support the cloud stack supporting WarnerMedia live streaming and VOD workflows with performance and cost efficiency as primary goals
  • Design and implement cloud infrastructure with optimal decisions for availability, reliability, scalability, maintainability, and security
  • Build tooling and services to track application/system health and performance
  • Implement Infrastructure as Code using best practices to standardize stack resources setup across multiple environments
  • Build, improve, and support CI/CD pipelines with developer buy-in and full automation
  • Evaluate emerging cloud technologies for adoption via discovery and proof-of-concepts
  • Develop operational automation and self-service frameworks for developers and support teams
  • Monitor and troubleshoot applications and cloud infrastructure across all environments
  • Provide tiered on-call support for critical application incidents during off-hours and weekends
  • Write robust technical documentation for systems and processes

The Essentials

  • Bachelor’s degree in Computer Science or equivalent experience
  • Deep understanding of AWS, containers, and Kubernetes (EKS, RKE)
  • Experience in designing, implementing, and supporting container and serverless cloud stacks
  • 2+ years as a site reliability / DevOps engineer for enterprise-scale systems
  • 2+ years experience in cloud and container designs, architectures, and migrations
  • 2+ years experience in AWS cloud technologies, with broad exposure to AWS suite of services including S3, EFS, RDS, ECS, EKS, ALB, Route 53, etc.
  • 2+ years experience in software development lifecycle and application modernization
  • Strong experience with Unix/Linux system administration at scale
  • Experience using source control (git) and CI/CD pipelines
  • Ability to code tooling using Golang, Node.js, Python, shell scripting, or other languages
  • Strong problem solving and troubleshooting skills for incident remediation
  • Ability to work in a dynamic, fast-paced environment
  • Clear and effective communicator with both technical and non-technical audiences
  • Experience in full digital video stack is a plus – video encoding (CMAF/DASH/HLS), adaptive bit rate packaging, CDN delivery, DRM solutions, and AWS video cloud service (MediaLive, MediaConvert, MediaPackage), video playback

Warner Media, LLC and its subsidiaries are equal opportunity employers. Qualified candidates will receive consideration for employment without regard to race, color, religion, national origin, gender, sexual orientation, gender identity or expression, age, mental or physical disability, and genetic information, marital status, citizenship status, military status, protected veteran status or any other category protected by law.