Reinforcement Learning Engineer - Whole-Body Control

Shanghai, China

R&D – Algorithms /

Full time /

On-site

At Mentee Robotics, we are redefining humanoid automation with an AI-first approach - combining perception, reasoning, and dexterous manipulation into fully autonomous systems that continuously learn and adapt.

We are now expanding with a new robotics Engineering Center in China, working hand-in-hand with our engineering teams in headquarters. Its mission: to rapidly develop our next-generation full-size humanoid and bring it to life - a walking, working platform that becomes the foundation of our next generation of products. This is a small, senior, hands-on team where speed of iteration is the core value.

We are looking for an RL Engineer to train the whole-body behaviors of our humanoid in simulation, on Isaac Sim with the Newton physics engine. In our architecture there is no classical motion controller above the joint level - the learned policy is the robot's entire behavior layer, coordinating all degrees of freedom and commanding joints directly through the actuator controllers. You are on the critical path to the robot's first steps.

Who you are 期待中的你

A deep RL practitioner who has actually transferred policies to physical legged robots - not only benchmarks
Strong engineer first: your training code is infrastructure, not a notebook
Comfortable being the owner of the robot's most visible capability

Responsibilities 岗位职责

Design, train, and iterate whole-body RL policies in Isaac Sim/Newton: walking, balance recovery, manipulation, and coordinated loco-manipulation behaviors
Own reward design, curriculum learning, and training methodology; incorporate motion priors (mocap/imitation, AMP-style) for natural movement
Build and maintain the training infrastructure: massively parallel simulation, experiment tracking, evaluation suites
Work with the Sim2Real engineer to bake measured actuator constraints and domain randomization into training
Work with the compute platform engineer to deploy policies on the robot and run on-hardware evaluation
Progressively expand the policy's capability envelope - from first steps to dynamic, contact-rich whole-body tasks
Define what the policy observes and commands together with motion control and platform teams - you co-own the robot's core software contract

Requirements任职要求

M.Sc. or Ph.D. (or equivalent industry experience) in Computer Science, Robotics, or a related field
5+ years of hands-on deep RL for robotics with strong PyTorch engineering skills
Direct experience with Isaac Sim/Newton (or equivalent GPU-parallel simulators) for whole-body RL on legged robots
Proven sim-to-real transfer of at least one policy to a physical legged robot
Deep understanding of PPO-family training at scale, reward shaping, and curriculum design

Advantages加分项

Familiarity with IsaacLab
Humanoid (vs. quadruped) whole-body RL experience
Experience with motion-imitation methods (AMP, DeepMimic-style) and mocap data pipelines
Publications in top robotics/ML venues (RSS, CoRL, ICRA, NeurIPS) or experience at leading humanoid teams
Experience with teleoperation or demonstration data pipelines for whole-body skills
Comfortable communicating technical topics in English with international teams

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

apply for this job