Senior DevOps Engineer – ML Platform

Jerusalem, Israel
R&D – Software /
Full time /
Hybrid
AI Engineering's ML-Platform team goal, is to deliver a modern infrastructure and solutions to enhance Mobileye's Algorithm development life cycle and shorten our delivery times. We are an independent group, consisting of excellent and experienced engineers with diverse skills in algorithms, software, and infrastructure. We strive to implement a DevOps culture allowing our engineers to easily collaborate on large-scale products. We develop cross-company products that enable the research and deployment of state-of-the-art algorithms.

What will your job look like?

    • Build and maintain infrastructure for large‑scale AI and HPC workloads across on‑prem and cloud environments
    • Operate and enhance our multi‑cloud, multi‑cluster scheduling platform
    • Develop automation, tooling, and platform services und Bash
    • Troubleshoot complex issues across the stack: compute, networking, storage, orchestration, and distributed systems
    • Improve reliability of critical systems
    • Collaborate with ML, data, and backend teams to support evolving platform needs
    • Drive best practices in CI/CD, infrastructure-as-code, and system design
    • Participate in on‑call rotations for critical infrastructure components

All you need is:

    • 10+ years of hands‑on experience in DevOps, SRE, systems engineering, or similar roles
    • Linux knowledge, including debugging, performance tuning, ana system internals
    • Proven experience working with HPC environments, large clusters, or high‑performance compute systems
    • Solid experience with Kubernetes (EKS or similar managed K8s services)
    • Knowledge of infrastructure‑as‑code tools(Terraform, Helm, etc.)
    • Hands‑on experience with:
    • PostgreSQL or similar relational databases
    • Elasticsearch or similar search/indexing systems
    • Prometheus/Thanos/Grafana or similar observability stacks
    • RabbitMQ or similar messaging systems
    • Strong proficiency in Bash, networking fundamentals, and debugging distributed systems.
    • Experience investigating complex issues across compute, storage, networking, and orchestration layers
    • Advantages:
    • Experience with multi‑cloud architectures
    • Experience with workflow orchestration tools such as Argo Workflows (or similar systems like Airflow, Prefect, Flyte)
    • Familiarity with GPU scheduling, AI/ML pipelines, or data‑intensive workloads
    • Background in large‑scale distributed systems or platform engineering
    • Ability to write production‑quality Go (Golang) code

What We Offer:

    • Impactful engineering that advances Mobileye’s AI capabilities and strengthens the safety of transportation systems globally
    • The opportunity to work on cutting‑edge AI infrastructure at massive scale
    • A highly technical environment with deep engineering challenges
    • Collaboration with great ML, software, and systems engineers
Mobileye changes the way we drive, from preventing accidents to semi and fully autonomous vehicles. If you are an excellent, bright, hands-on person with a passion to make a difference come to lead the revolution!
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.