Senior ML Data Engineer

Jerusalem, Israel

R&D – Software /

Full time /

Hybrid

The AI Engineering group builds modern infrastructure and solutions that improve how algorithms are developed at Mobileye.
We are a small, independent team of experienced engineers with a mix of skills in algorithms, software, and infrastructure. We work in a DevOps style and build cross-team solutions that support research and development of advanced perception algorithms.

Our flagship project is a unified AV dataset used to train and evaluate next-generation models. We take large volumes of multi-camera video, object labels, HD maps, and sensor data from across the organization, and turn it into a curated, high-quality training set - at scale.

We are looking for someone who brings ML and computer-vision depth to the team - someone who can help shape the intelligence layer that decides what data is worth training on.

What will your job look like:

Work collaboratively with shared ownership. Your focus area will be the curation and ML side of our data pipeline, but you will contribute across the full stack alongside the rest of the team.
Build and improve the curation pipeline - from vision-model embeddings and scene detection, through VLM-based scene analysis, to scoring, deduplication, and sampling that produces a balanced and diverse dataset.
Run and optimize GPU inference at scale (embedding extraction, VLM inference) across thousands of driving sessions using workflow orchestration.
Develop scoring and sampling strategies that ensure rare but important scenarios (night driving, adverse weather, hazardous situations) are well-represented in the final dataset.
Work with algorithm teams to understand what data gaps hurt model performance and translate those into curation criteria.
Build validation and diagnostics that measure dataset quality - not just pipeline health, but whether the data is actually good for training.
Contribute to the core dataset SDK, converter, and 3D-geometry tooling (camera projection, calibration, coordinate transforms).

All you need is:

4+ years in data engineering or backend/software engineering with serious data work — pipelines that run in production, not just notebooks.
Strong Python and the PyData stack (NumPy, PyArrow, Pandas, DuckDB).
Some background in research, algorithms, or ML — enough that you can read a paper, understand a model's outputs, and have informed conversations with algorithm engineers.
Comfort working with vision-model outputs as data: embeddings, detection results, VLM responses.
Ability to work across team boundaries — this role lives between algorithm teams, infra teams, and our own.

Advanteges:

Experience with autonomous-driving datasets or perception pipelines.
3D geometry and camera model intuition (or the mathematical background to ramp up).
Workflow orchestration (Argo, Airflow, Kubeflow).
Vector databases or columnar analytics (LanceDB, DuckDB, Parquet at scale).
Familiarity with curation concepts (active learning, hard-example mining, distribution balancing) — useful context, not a requirement.
Exposure to LLM agents or agentic workflows for data tasks.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

apply for this job