Job Overview:
MTVG is seeking an Engineering DevOps professional to help build, scale, and strengthen our broadcast and live-event technology platform. This role sits at the intersection of engineering, infrastructure, and live operations, focusing on automation, observability, reliability, and secure deployment practices across on-prem and cloud systems that support 24/7 live events.
You'll partner closely with Broadcast Engineering, Technical Operations Center (TOC)/Master Control Room (MCR) Operations, and external vendors to improve system stability, reduce manual work, accelerate change delivery, and ensure our live production workflows are resilient.
Job Summary:
Are you passionate about keeping live events running flawlessly for millions of viewers? MTVG's broadcast platform powers 24/7 live sports, entertainment, and breaking news across a global footprint. We're looking for a DevOps Engineer who thrives at the intersection of cutting-edge cloud infrastructure and real-world broadcast technology—where a single automation can prevent an outage during a championship game, and your monitoring dashboards become the nerve center for live operations teams.
You'll work directly with broadcast engineers and operations teams to transform manual, high-stakes workflows into resilient, automated systems. This isn't just infrastructure—it's the backbone that ensures our customers never miss a moment of the content they love. If you're energized by the challenge of building systems that must work perfectly, every time, under the pressure of live television, this role is for you.
A Day in the Life:
Your morning might start by reviewing overnight alerts from your observability stack—investigating why a backup encoder cluster showed elevated latency during an early morning soccer match in Europe. You'll trace through logs in OpenSearch, correlate metrics in Grafana, and work with the London MCR team to implement a fix before the next event window.
Mid-morning, you're in a design review with the Broadcast Engineering team, discussing a new CI/CD pipeline for deploying configuration changes to our EVS Cerebrum control system. You're advocating for infrastructure-as-code patterns that will let us version-control broadcast workflows the same way we manage cloud resources—a novel approach in the broadcast world that could revolutionize how we operate.
After lunch, you're deep in Python, building a custom integration script for a new watermarking appliance that has minimal API documentation. You fire up Wireshark to reverse-engineer the communication protocol, document your findings, and create a reusable module that the team can leverage across multiple sites.
The afternoon brings an incident postmortem where you facilitate the discussion, capture action items, and immediately start automating the manual checks that could have caught the issue earlier. You're not just responding to problems—you're systematically eliminating entire classes of failures.
As the day winds down, you're mentoring a junior engineer on Terraform best practices while keeping one eye on the monitoring dashboard—because in live broadcasting, the show never stops, and neither does the opportunity to make our systems better.
Key Responsibilities:
Design and implement CI/CD pipelines and deployment patterns for broadcast and media services (on-prem + cloud).
Build and maintain infrastructure-as-code (Terraform/Ansible/CloudFormation or similar).
Improve monitoring/observability (metrics, logs, traces) and alerting to support TOC/MCR incident response.
Drive reliability engineering initiatives: redundancy, failover patterns, capacity planning, and runbooks.
Partner with ops teams to operationalize changes through change management, validation, and rollback plans.
Automate repetitive tasks and workflows (Python/Bash/PowerShell), including health checks, validation, and reporting.
Strengthen systems through security best practices (least privilege, secrets mgmt, patching, vulnerability remediation).
Support incident response and root cause analysis; build postmortem action plans and track completion.
Work with vendors/partners to integrate and maintain reliable handoffs (encoders, distribution, monitoring, ad triggers, watermarking, etc.).
Contribute to documentation: SOPs, architecture diagrams, operational readiness checklists, and escalation paths.
Develop custom drivers and integration scripts for broadcast hardware with limited or missing documentation, utilizing protocol analysis (Wireshark/tcpdump) to reverse-engineer communication requirements.
Participate in an on-call rotation for high-priority events/incidents
Work some evenings/weekends during major live events (planned)
Basic Qualifications:
2+ years in DevOps/SRE/Platform Engineering (or equivalent).
Strong Linux fundamentals
Networking knowledge (TCP/IP, routing, VLANs, DNS, NAT, firewall concepts).
Experience integrating systems via REST APIs and Webhooks
Scripting proficiency (C#, Javascript, Python, etc.).
Virtualization platforms (Proxmox, LXC, etc.).
Monitoring/logging experience (Zabbix, Prometheus/Grafana, ELK/OpenSearch, Datadog, Splunk, etc.).
CI/CD experience (GitHub Actions, GitLab CI, Jenkins, etc.).
Infrastructure automation experience (Terraform/Ansible/Puppet/Chef).
Proven ability to support production systems with on-call/incident response expectations.
Preferred Qualifications:
Experience supporting live broadcast / media workflows (encoding, contribution/distribution, SCTE, captioning, watermarking).
Experience with broadcast control systems (EVS Cerebrum preferred).
Cloud experience (AWS preferred): IAM, VPC, EC2, EKS, CloudWatch, S3, Route 53, etc.
Experience building reliability practices: SLOs/SLIs, error budgets, chaos testing, resilience drills.
Familiarity with ITIL or similar change control frameworks and high-availability environments.
Kubernetes/Docker experience.
Location & Schedule:
Location: Denver, CO, USA (Onsite)
Schedule: Standard business hours with participation in on-call rotation and some evenings/weekends during major live events
Salary Range:
$100,000 - 150,000 per year