at Zoox
Location
Foster City, CA
Compensation
$189k–$290k USD
Type
full time
Posted
2 months ago
Remote
Yes
Market range · company + function + seniority
p25 · target · p75 · n=43
Posted $290k · in the market band
Tailor your résumé to this role in 30 seconds.
Free account · ATS keyword check · per-job bullet rewrite by Claude.
The Perception team at Zoox creates the "eyes and ears" of our self-driving robots. Navigating safely and efficiently in complex environments requires detecting, classifying, tracking, and understanding various attributes of surrounding objects—all in real-time and with exceptional accuracy.
Design and train Vision-Language-Action (VLA) solutions for robotaxis
Lead end-to-end data strategy, including mining, auto-labeling, and dataset construction to power our ML flywheel
Lead the full post-training stack for VLMs and VLAs, including Continual Pre-training (CPT) on domain-specific driving data, Supervised Fine-Tuning (SFT) for instruction following.
Utilize our large-scale data pipelines and ML infrastructure to research, prototype, and deploy solutions that improve driving behavior
Partner with cross-functional teams to integrate perception signals
MS or PhD in Computer Science or related field
Background in deep learning solutions for VLM and VLA models
Track record in post-training large-scale models, CPT, SFT, RL
Hands-on experience with production ML pipelines, including dataset creation, training frameworks, and metrics
Expertise in Python libraries (PyTorch, NumPy, Pandas, VLLM)
Deep knowledge of cutting-edge computer vision techniques
Publications in top-tier conferences (CVPR, ICCV, RSS, ICRA)
Experience with integrating large language models to various tasks.
More open roles at Zoox
Hiring velocity, headcount trend, and every open posting on one page.
Open postings ranked by description similarity — useful if this role isn't quite right.