Observability Engineer
About the role
Inclusion without Exception
Tata Consultancy Services (TCS) is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity is reflected in our people stories across our workforce and implemented through equitable workplace policies and processes.
Tata Consultancy Services (BSE: 532540, NSE: TCS) is the technology partner of choice for industry-leading organizations worldwide. Since its inception in 1968, TCS has upheld the highest standards of innovation, engineering excellence and customer service. It has set an aspiration to become the world's largest AI-led technology services company and is enabling its clients to transform themselves across the full AI stack, from infrastructure to intelligence.
Rooted in the heritage of the Tata Group, TCS is focused on creating long term value for its clients, its investors, its employees, and the community at large. With a highly skilled workforce spread across 56 countries and 194 service delivery centers across the world, the company has been recognized as a top employer in six continents. With the ability to rapidly apply and scale new technologies, the company has built long term partnerships with its clients. Many of these relationships have endured into decades and navigated every technology cycle, from mainframes in the 1970s to artificial intelligence today.
ABOUT THE ROLE
We are seeking an experienced Observability Engineer to join our Enterprise Kubernetes Platform team at a leading financial services organization. You’ll own the complete observability stack across 50+ production Kubernetes clusters, providing metrics, logging, tracing, and alerting capabilities that ensure exceptional reliability and performance for mission-critical applications. This role combines deep technical expertise in modern observability tools with emerging AI/ML capabilities to build intelligent monitoring solutions, predictive alerting, and self-healing infrastructure.
WHAT YOULL DO
• Design, deploy, and maintain enterprise-scale observability infrastructure including Prometheus, Grafana, Thanos, Loki, and modern collection agents • Manage observability deployments using GitOps principles and infrastructures code • Implement long-term metrics storage solutions with cloud object storage • Maintain and upgrade observability components across development, QA, UAT, production, and DR environments • Configure distributed observability architecture spanning multiple datacenters and cloud providers
METRICS & MONITORING
• Design and implement Prometheus monitoring strategies for Kubernetes infrastructure and containerized applications • Create Service Monitors, Pod Monitors for automated metrics collection • Develop rules for intelligent alerting with minimal false positives • Configure multi-cluster metrics federation and aggregation • Optimize metrics cardinality, storage deficiency, and query performance.
Salary Range - CA$ 100,000 - CA$ 120,000 Per Year
TCS does not use artificial intelligence tools for candidate screening or evaluation. This post is for a current vacancy. The hiring process includes an initial screening, followed by a technical evaluation and managerial discussion.
Tata Consultancy Services Canada Inc. is committed to meeting the accessibility needs of all individuals in accordance with the Accessibility for Ontarians with Disabilities Act (AODA) and the Ontario Human Rights Code (OHRC). Should you require accommodation during the recruitment and selection process, please inform Human Resources.
Thank you for your interest in TCS. Candidates that meet the qualifications for this position will be contacted within a 2-week period. We invite you to continue to apply for other opportunities that match your profile.