Google Cloud’s mission is to make every business successful through AI by combining technology, infrastructure, and talent. AI/ML software engineers in Cloud bridge the gap between pioneering models and a massive product vehicle reaching billions. Our talent density and AI-powered tools drive rapid development, rooted in a culture of empowerment and a bias to action. In this role, you aren’t just building technology; you’re shaping the frontier of enterprise and driving the evolution of advanced models.
In this role, you will serve as the Uber Technical Lead (UTL) for Observability Intelligence, driving strategic initiatives to pivot SRE incident response toward an AI-driven paradigm, at a pivotal moment of
Google's monitoring systems undergo a generational evolution. You will be a part of a transformative shift away from a disjointed collection of isolated tools into a cohesive, "Northstar" observability ecosystem.
As a part of this role, we are seeking a leader with a proven history of managing business-critical domains, possessing the expertise to navigate architectural trade-offs between urgent product requirements and long-term technical durability.The US base salary range for this full-time position is $262,000-$365,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about
benefits at Google.
Responsibilities
- Drive technical project strategy, lead large-scale ML infrastructure optimization, and oversee the design and implementation of solutions across multiple specialized ML areas.
- Define and socialize a cohesive "Observability Intelligence" strategy that aligns with the broader Monitoring Northstar, ensuring we build shared technical concerns once and solve them for the entire organization.
- Represent the Observability Intelligence organization in high-stakes technical reviews and collaborate across organizational boundaries (AlertManager, AI Operations, Incident Response Management, and Site Reliability Engineering teams across all Product Areas) to drive consensus on critical observability standards
- Act as the primary technical partner to Product Management, translating broad product "Whats" into scalable architectural "Hows."
- Lead high-level design reviews that ensure technical consistency across the stack, prioritizing interoperability, reusability, and semantic cohesion.
Minimum qualifications:
- Bachelor’s degree or equivalent practical experience.
- 8 years of experience in software development.
- 7 years of experience managing technical projects, ML design, and working with industry ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
- 5 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), ML infrastructure, or specialization in another ML field.
- 5 years of experience with design and architecture; and testing/launching software products.
Preferred qualifications:
- Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
- 8 years of experience with data structures and algorithms.
- 5 years of experience in a technical leadership role leading project teams and setting technical direction.
- 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
- Familiarity with and interest in the current AI landscape (Large Language Model (LLMs), generative agents, etc).