jobs Logo
Deep Genomics logo

Senior MLOps Engineer

Deep Genomicsabout 2 months ago
Toronto, Ontario
$175,000 - $200,000/yearly
Senior Level
Full-Time

About the role

  • You will own and evolve the infrastructure that powers our ML pipelines – from cloud environments and CI/CD systems to workflow orchestration and model deployment
  • You will work closely with ML scientists, bioinformaticians, and software engineers to keep our platform reliable, reproducible, and scalable
  • You’ll maintain and improve cloud infrastructure (GCP) using Infrastructure-as-Code tools (Terraform)
  • Manage IAM, RBAC, and permission policies across cloud environments
  • Own and evolve CI/CD pipelines (CircleCI, GitHub Actions) and ensure best practices are followed across the engineering and ML teams
  • Administer and support workflow orchestration platforms (e.g., Seqera/Nextflow, Argo, Kubeflow)
  • Operate and configure ML experiment tracking and registry tooling (e.g., W&B, MLflow)
  • Build and maintain containerized environments (Docker) and manage Kubernetes clusters
  • Manage GPU resources – provisioning, scheduling, and debugging hardware and driver issues
  • Write and maintain Python tooling, scripts, and integrations that support ML infrastructure
  • Help deploy ML models to production environments and monitor their performance- If this sounds like you, we would love to hear from you
  • You have 4+ years of experience in production infrastructure or MLOps, you write solid Python, and you are curious about the ML and scientific workflows your work supports
  • You are someone who enjoys keeping the infrastructure running smoothly so that scientists can focus on their research
  • Above all, you are a collaborative, kind team member who communicates clearly, adapts to evolving needs, and is happy to help colleagues grow their own infrastructure skills along the way
  • You are comfortable working across cloud platforms, CI/CD systems, containers, and GPUs – and you take pride in making these systems reliable and easy for others to use
  • Extensive Hands-on experience with Kubernetes and containerization (Docker)
  • Familiarity with Python package and environment management (e.g., pip, conda, pixi)
  • Strong Python programming skills
  • Experience managing GPU compute (provisioning, debugging, driver management)
  • 4+ years of experience operating production infrastructure
  • Proficiency with cloud platforms (GCP preferred; AWS/Azure acceptable) and Infrastructure-as-Code (Terraform)
  • Self-motivated problem solver with excellent communication skills
  • Solid background in CI/CD systems (CircleCI, GitHub Actions, or similar)
  • Understanding of ML frameworks (e.g., PyTorch, PyTorch Lightning), ML workflows (training, inference, evaluation), and the model lifecycle
  • Familiarity with Kubernetes CRDs and batch/gang schedulers (e.g., Volcano, Kueue)
  • Experience working with large-scale datasets (storage, versioning, efficient access patterns)
  • Experience working directly with scientists and researchers in an interdisciplinary setting
  • Knowledge of biology and/or machine learning science
  • Familiarity with data compliance and governance frameworks (e.g., HIPAA, SOC 2)
  • Previous startup experience
  • Familiarity with MLOps tooling (e.g., W&B, Ray, VertexAI) and distributed compute patterns (e.g., DDP, realtime/batch inference, multi-node training).

About Deep Genomics

Biotechnology Research