We are seeking a Systems Development Engineer to own the research compute platform for Fauna Robotics. You will build and operate the physical and virtual infrastructure that our ML scientists use to train reinforcement learning policies for real robots, from fleet provisioning and job scheduling to cloud burst capacity and environment reproducibility.
This role requires both strong systems engineering fundamentals and genuine comfort working alongside researchers. The ideal candidate is as happy diagnosing a
GPU thermal fault as they are designing a job scheduler, and treats “the scientist’s training run just works” as the north star for everything they build.
Key job responsibilities
- Own on-prem
GPU compute end-to-end: provisioning, imaging, driver and
CUDA management, monitoring, failure diagnosis, hardware RMA, and capacity planning
- Build and operate a job scheduling layer (Slurm,
Ray, SkyPilot, or equivalent) so scientists submit training runs without managing individual machines
- Design and implement the bridge between on-prem and cloud compute
- Partner directly with ML scientists to triage training issues, profile workloads, identify bottlenecks, and advise on how to structure training for the hardware at hand
About the team
Fauna Robotics, an Amazon company, is building capable, safe, and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces.
We believe that future won’t arrive until building for robotics becomes far more accessible. Today, too much effort is spent reinventing the fundamentals. We’re changing that by developing tightly integrated hardware and software systems that make it faster, safer, and more intuitive to create real-world robotic products.
Our work spans the full stack: mechanical design, control systems, dynamic modeling, and intelligent software. The focus is not just functionality, but experience. We’re building robots that feel responsive, expressive, and genuinely useful.
At Fauna, you’ll work at the frontier of this space, helping define how robots move, manipulate, and interact with people in natural environments. It’s an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build.
If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you.
- 3+ years of Linux systems administration experience
- 3+ years of non-internship professional systems engineering or systems development experience
- Experience with configuration management and fleet automation (
Ansible,
Chef, or equivalent)
- Experience with containerization in production (
Docker required;
Kubernetes or containered exposure preferred)
- Proficiency in
Python,
Go, or
Bash for systems tooling and automation
- Experience with NVIDIA
GPU infrastructure: driver management,
CUDA versioning, basic
GPU diagnostics
- Experience with job schedulers or orchestrators (Slurm,
Ray, SkyPilot,
Kubernetes with
GPU operator, or equivalent)
- Hardware comfort: diagnosing and replacing GPUs, PSUs, memory, storage
- NVIDIA deep fluency: DCGM, NVLink / PCIe topology, IOMMU, compute mode configuration
- Experience with
GPU cloud providers (
AWS p5/g6e, RunPod,
Lambda, CoreWeave) for hybrid on-prem/cloud workflows
- Track record of building internal platforms that accelerate other engineers or scientists
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit
https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, NY, New York - 142,300.00 - 192,400.00 USD annually