MLOps Engineer - LLMs

Netherlands - Amsterdam
PDT – Data Science & AI /
1. Role: Permanent /
Hybrid
Join our AI team at Prosus, the largest consumer internet company in Europe and one of the biggest tech investors in the world. You'll be working on the team that drives growth and innovation across the company, with your work directly impacting how millions of people shop online.

Who we’re looking for

We're seeking an experienced MLOps Engineer to build and operate the infrastructure that powers our LLM systems at scale. You'll own the deployment pipelines, serving infrastructure, and production APIs that enable our ML teams to ship models with confidence. You have deep expertise in model serving optimization with vLLM, understand how to balance latency, throughput, and cost at scale, and excel at building reliable systems that enable rapid experimentation. You're motivated by seeing ML research make it to production efficiently and thrive in environments where infrastructure quality directly impacts business outcomes.

    • What you’ll do

    • ML Pipelines:
    • Build ML pipelines for data ingestion, processing, model deployment, and evaluation
    • Own CI/CD for ML systems, including automated testing, model versioning, and deployment workflows
    • Implement monitoring for model performance, latency, throughput, and costs with budget alerting
    • Set up experiment tracking and model registry systems (MLflow, Weights & Biases, or similar)
    • Define and monitor SLIs/SLOs for production model serving

    • Infrastructure & Orchestration:
    • Manage Kubernetes and Slurm clusters for GPU workloads with multi-tenant resource allocation
    • Optimize GPU utilization and implement cost controls across training and inference workloads
    • Own CI/CD pipelines, model versioning, and deployment automation

    • Model Serving & APIs:
    • Deploy and optimize LLM serving infrastructure using vLLM
    • Apply inference optimizations: quantization, continuous batching, PagedAttention, KV cache management to maximize throughput and minimize latency
    • Design and build production-grade async API services (FastAPI, etc.) with pre/post-processing, business logic, and strict latency SLAs
    • Continuously optimize serving costs through model compression, batching strategies, and infrastructure tuning
    • Implement A/B testing infrastructure and canary deployments for safe model rollouts

    • Enablement & Best Practices:
    • Create templates and documentation to accelerate team productivity
    • Establish MLOps best practices and guide teams in their adoption
    • Support model training experiments when needed 

Minimum qualifications

    • 5+ years in MLOps, DevOps, or platform engineering with focus on ML workloads
    • Expert-level experience deploying and optimizing LLM serving infrastructure
    • Strong Python skills with experience building production APIs (FastAPI or similar)
    • Proven experience with cost optimization for GPU-intensive workloads: tracking, budgeting, alerting, and resource efficiency
    • Hands-on experience with Kubernetes and Docker for GPU workloads
    • Experience with job orchestration systems (Slurm, Ray, Argo, Kubeflow, or similar)
    • Solid understanding of monitoring and observability for production ML systems
    • Naturally curious with a track record of proactively identifying and implementing improvements

Preferred qualifications

    • Deep knowledge of GPU architectures and their performance implications for inference optimization
    • Expertise in model compression techniques: quantization (INT8, INT4, FP8), pruning, distillation for production deployment
    • Understanding of security best practices for ML serving: authentication, authorization, rate limiting, model access controls
    • Experience managing multi-tenant GPU clusters with fair scheduling and resource isolation
    • Proficiency with infrastructure-as-code tools 
    • Experience supporting distributed training infrastructure: multi-node job orchestration, checkpoint management, debugging training failures
    • Contributions to open-source MLOps tools or serving frameworks

What we offer

    • Critical infrastructure ownership for high-impact AI projects that are strategically vital to the company, with direct visibility to senior leadership including the CEO
    • State-of-the-art GPU infrastructure: H200 fleet, vLLM serving stack, cutting-edge optimization tools
    • Expert ML team who have released top Hugging Face models, published at NeurIPS, and built production systems that will run on your infrastructure
    • Significant autonomy in designing MLOps solutions, choosing tools, and shaping infrastructure strategy for LLM serving
    • Modern tooling: Latest MLOps frameworks, coding assistants, best-in-class development environment
    • Hybrid work model with our Amsterdam office - home to the AI House, bringing together 200+ AI professionals through events and collaborations
    • Competitive compensation, top-spec MacBook Pro, and an environment genuinely built for professional growth and learning

    • If you're passionate about building scalable, high-performance infrastructure that enables cutting-edge AI deployment and want to see your work impact millions of users globally, let's talk.
Our Diversity & Inclusion Commitment

We respect the dignity and human rights of individuals and communities wherever we operate in the world. Building an inclusive workplace where everyone feels welcome and can thrive is critical for us. We provide access to education, which helps everyone understand the important role they play and the positive impact they can have.

For a deeper look at our journey and future plans, explore our latest Annual Report. Stay up to date with our latest news to see what makes Prosus stand out. Learn more at www.prosus.com.