Senior MLOps Engineer - LLMs

Netherlands - Amsterdam

PDT – Data Science & AI /

1. Role: Permanent /

Hybrid

Join our AI team at Prosus, the largest consumer internet company in Europe and one of the biggest tech investors in the world. You'll be working on the team that drives growth and innovation across the company, with your work directly impacting how millions of people shop online.

Who we’re looking for

We're seeking an experienced Senior MLOps Engineer to build and operate the infrastructure that powers our LLM systems at scale. You'll own the deployment pipelines, serving infrastructure, and production APIs that enable our ML teams to ship models with confidence. You have deep expertise in model serving optimization with vLLM, understand how to balance latency, throughput, and cost at scale, and excel at building reliable systems that enable rapid experimentation. You're motivated by seeing ML research make it to production efficiently and thrive in environments where infrastructure quality directly impacts business outcomes.

What you’ll do

ML Pipelines:

Build ML pipelines for data ingestion, processing, model deployment, and evaluation
Own CI/CD for ML systems, including automated testing, model versioning, and deployment workflows
Implement monitoring for model performance, latency, throughput, and costs with budget alerting
Set up experiment tracking and model registry systems (MLflow, Weights & Biases, or similar)
Define and monitor SLIs/SLOs for production model serving

Infrastructure & Orchestration:

Manage Kubernetes and Slurm clusters for GPU workloads with multi-tenant resource allocation
Optimize GPU utilization and implement cost controls across training and inference workloads
Own CI/CD pipelines, model versioning, and deployment automation

Model Serving & APIs:

Deploy and optimize LLM serving infrastructure using vLLM
Apply inference optimizations: quantization, continuous batching, PagedAttention, KV cache management to maximize throughput and minimize latency
Design and build production-grade async API services (FastAPI, etc.) with pre/post-processing, business logic, and strict latency SLAs
Continuously optimize serving costs through model compression, batching strategies, and infrastructure tuning
Implement A/B testing infrastructure and canary deployments for safe model rollouts

Enablement & Best Practices:

Create templates and documentation to accelerate team productivity
Establish MLOps best practices and guide teams in their adoption
Support model training experiments when needed

Minimum qualifications

5+ years in MLOps, DevOps, or platform engineering with focus on ML workloads
Expert-level experience deploying and optimizing LLM serving infrastructure
Strong Python skills with experience building production APIs (FastAPI or similar)
Proven experience with cost optimization for GPU-intensive workloads: tracking, budgeting, alerting, and resource efficiency
Hands-on experience with Kubernetes and Docker for GPU workloads
Experience with job orchestration systems (Slurm, Ray, Argo, Kubeflow, or similar)
Solid understanding of monitoring and observability for production ML systems
Naturally curious with a track record of proactively identifying and implementing improvements

Preferred qualifications

Deep knowledge of GPU architectures and their performance implications for inference optimization
Expertise in model compression techniques: quantization (INT8, INT4, FP8), pruning, distillation for production deployment
Understanding of security best practices for ML serving: authentication, authorization, rate limiting, model access controls
Experience managing multi-tenant GPU clusters with fair scheduling and resource isolation
Proficiency with infrastructure-as-code tools
Experience supporting distributed training infrastructure: multi-node job orchestration, checkpoint management, debugging training failures
Contributions to open-source MLOps tools or serving frameworks

What we offer

Critical infrastructure ownership for high-impact AI projects that are strategically vital to the company, with direct visibility to senior leadership including the CEO
State-of-the-art GPU infrastructure: H200 fleet, vLLM serving stack, cutting-edge optimization tools
Expert ML team who have released top Hugging Face models, published at NeurIPS, and built production systems that will run on your infrastructure
Significant autonomy in designing MLOps solutions, choosing tools, and shaping infrastructure strategy for LLM serving
Modern tooling: Latest MLOps frameworks, coding assistants, best-in-class development environment
Hybrid work model with our Amsterdam office - home to the AI House, bringing together 200+ AI professionals through events and collaborations
Competitive compensation, top-spec MacBook Pro, and an environment genuinely built for professional growth and learning

If you're passionate about building scalable, high-performance infrastructure that enables cutting-edge AI deployment and want to see your work impact millions of users globally, let's talk.

Our Diversity & Inclusion Commitment

We respect the dignity and human rights of individuals and communities wherever we operate in the world. Building an inclusive workplace where everyone feels welcome and can thrive is critical for us. We provide access to education, which helps everyone understand the important role they play and the positive impact they can have.

For a deeper look at our journey and future plans, explore our latest Annual Report. Stay up to date with our latest news to see what makes Prosus stand out. Learn more at www.prosus.com.

apply for this job