at Apple
Location
Cupertino, United States of America
Compensation
$147k–$221k USD
Type
full time
Posted
2 months ago
The Apple Silicon GPU Driver Scheduler team is directly responsible for GPU workload management including scheduling of commands on the GPU, manage resources and dependencies, responsiveness and quality of service for applications using the GPU. The GPU Scheduler team directly impacts the performance and power efficiency of all Apple products using Apple Silicon GPU. We are looking for an engineer with a strong engineering background who is excited to work with engineers and other leaders at Apple to deliver Apple GPUs across all Apple devices, build and ship exciting new GPU focused features, work with other teams to prototype future HW and SW GPU features.
In this role, you'll architect the GPU driver scheduling layer underneath Apple's largest server-side ML and LLM workloads. You’ll design parallelism strategies that scale from a single GPU to clusters of nodes, build the synchronization and communication primitives that hold them together, and shape the HW/SW interfaces for next-generation GPU designs. You will be working at the intersection of cutting-edge ML systems, systems programming and hardware acceleration, partnering with world-class teams across Apple software and hardware organizations to co-design scheduling primitives in next-generation GPU, collaborate with framework and infrastructure teams to expose scheduling control where it matters, and contribute to the performance and reliability characteristics that ultimately determine inference latency and cost.
We are seeking an individual with curiosity and passion to learn and innovate.
The people here at Apple don’t just create products — they create the kind of wonder that’s revolutionized entire industries. It’s the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Join Apple, and help us leave the world better than we found it.
Design and implement low-level GPU driver and scheduler features optimized for ML/LLM workloads
Design, implement, and optimize scheduling strategies for efficient parallelism across one or more GPUs — data, model, and pipeline parallelism
Co-design scheduling primitives with hardware, performance-architecture, and software teams to achieve peak compute utilization and optimal memory throughput on next-generation GPU designs
Design and implement multi-GPU communication and synchronization using RDMA technologies, integrating with SoC, networking, and GPU front-end primitives, and influencing API/framework usage
Design and implement scalable ML serving infrastructure with first-class support for security, load balancing, and fault tolerance
Contribute to the design of APIs and abstractions that expose scheduling control to higher layers of the ML stack
Drive debug, performance analysis, and optimization for ML workloads — identifying bottlenecks in compute, memory, and distributed/network subsystems
Technical BS/MS degree or equivalent experience
Excellent systems programming knowledge with C or C++
Strong experience with operating systems and/or scheduling policies knowledge
Experience or deep understanding of distributed systems and parallel computing architectures
Understanding of systems architecture/compilers/algorithms
Excellent written and oral communication skills
Experience with GPU Programming (CUDA/ROCm/Metal) and high-performance computing, successfully optimizing large-scale parallel workloads
Experience with inter-node communication technologies (InfiniBand, RDMA, NCCL) in the context of ML training/inference
Apple's GGML team provides developers access to harness the power of the GPU across all of Apple's innovative products, from iPhone, iPad, Apple TV, Apple Watch to the Mac product line. Apple Silicon GPU Driver Scheduler team within Graphics, Games and ML group is seeking a senior/principal engineer to lead design of GPU scheduling mechanisms that drive peak utilization and orchestrate distributed inference across multi-node clusters for server-side ML acceleration - the compute infrastructure foundation that will deliver Apple Intelligence on Private Cloud Compute at unprecedented scale.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant
At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.
Learn about accessibility in Apple’s workplace
Learn about reasonable accommodations for job applicants
Apple accepts applications to this posting on an ongoing basis.
More open roles at Apple
Hiring velocity, headcount trend, and every open posting on one page.