at Apple
Location
Sunnyvale, United States of America
Compensation
$147k–$272k USD
Type
full time
Posted
2 weeks ago
Market range · company + function + seniority
p25 · target · p75 · n=626
Posted $272k · in the market band
Tailor your résumé to this role in 30 seconds.
Free account · ATS keyword check · per-job bullet rewrite by Claude.
We are seeking highly motivated and skilled engineers to join our Human Intelligence team. The ideal candidates will have strong backgrounds in developing and exploring capabilities of foundation models and agentic AI systems that enable natural, proactive and personalized human interactions. You will be responsible for multimodal LLM development including training, fine-tuning, agentic AI, and reasoning systems.
In this role, you will work on cutting-edge research and engineering problems, collaborating across teams and help shape the technical direction of multimodal and agentic AI systems from research to production. You will lead and contribute to the research roadmap for multimodal foundation models, identifying key opportunities for innovation in agentic AI and reasoning capabilities. You will design and implement agentic systems, and large-scale simulation and evaluation frameworks that can transition from research prototypes to production-grade technologies.
Develop, train, and fine-tune multimodal LLMs across image, video, text, and audio modalities, from data curation through deployment.
Design and build video/audio encoders, tokenizers, and generative models for multimodal understanding and generation.
Design and implement agentic AI systems that enable reliable reasoning for natural, proactive, and personalized human interactions.
Architect end-to-end ML systems that transition from research prototypes to production-grade technologies at scale.
Collaborate across HW, SW, and ML teams to influence sensor and silicon roadmaps and deliver pioneering on-device experiences.
Critically evaluate and improve ML codebases, ensuring correctness, efficiency, and maintainable engineering quality.
Contribute to the team's research direction, identify opportunities for innovation, and help shape product features.
Master's or equivalent practical experience, in Computer Science, Computer Vision, Machine Learning, or related technical field.
3+ years of relevant academic or industry experience in Machine Learning, Computer Vision, or Artificial Intelligence.
Experience in deep learning with demonstrated work in multimodal systems (e.g. vision, language, video, etc.).
Proficiency in Python and in a modern deep learning framework such as PyTorch or JAX.
Experience with foundation models (language or multimodal), including training, fine-tuning, and deployment.
Experience developing, training, and fine-tuning multimodal LLMs.
Strong foundations in optimization, probability, and linear algebra as applied to machine learning and computer vision.
PhD, or equivalent practical experience, in Computer Science, Machine Learning, Computer Vision, or a related technical field with a focus on AI, machine learning, or computer vision.
Demonstrated expertise in developing, training, and fine-tuning multimodal LLMs at scale and developing industry scale agentic products.
Proven track record of technical leadership, including architecting complex ML systems and leading projects from conception to product deployment.
Experience applying foundation models to build autonomous or semi-autonomous agents, including planning, task decomposition, and multi-step reasoning.
Strong publication record in top-tier venues such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, COLM, etc.
Experience with large-scale distributed training and model parallelism.
Strong communication skills and ability to present research findings to both technical and non-technical audiences.
Are you excited about the amazing potential of foundation models, LLMs, and multimodal LLMs? We are looking for individuals who thrive on collaboration and have a desire to push the boundaries of what is possible today! The VCV org is a centralized applied research and engineering organization responsible for developing real-time on-device Computer Vision and Machine Perception technologies across Apple products. In the Human Intelligence team, we balance research and product to deliver Apple quality, pioneering experiences, innovating through the full stack, and partnering with HW, SW, and ML teams to influence the sensor and silicon roadmap that brings our vision to life.
Join us in this truly exciting era of Artificial Intelligence to help deliver the next groundbreaking Apple products & experiences! We are continuously advancing the state of the art in Computer Vision and Machine Learning, touching all aspects of multimodal LLMs, from data collection, data curation to modeling, evaluation and deployment. As a member of our dynamic group, you will have the unique and rewarding opportunity to craft upcoming research directions in the field of multimodal LLMs that will inspire future Apple products.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant
At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.
Learn about accessibility in Apple’s workplace
Learn about reasonable accommodations for job applicants
Apple accepts applications to this posting on an ongoing basis.
More open roles at Apple
Hiring velocity, headcount trend, and every open posting on one page.
Open postings ranked by description similarity — useful if this role isn't quite right.