Meta is seeking a Software Systems Engineer to join our Production Systems Engineering organization, where you will work at the intersection of software and large-scale hardware infrastructure. In this role, you will design, build, and optimize the systems software that powers Meta's global production fleet — spanning servers, storage, networking, and custom silicon. You will drive reliability, efficiency, and performance improvements across the infrastructure stack, partnering closely with hardware engineering, data center operations, and platform teams to ensure Meta's production systems operate at scale.
Responsibilities
- Design and develop systems software for managing, provisioning, and monitoring large-scale production hardware infrastructure including compute, storage, and networking components
- Build and maintain tooling for hardware lifecycle management, fleet health monitoring, and automated remediation of production system failures
- Collaborate with hardware engineering teams to define software interfaces and firmware integration requirements for new server and accelerator platforms
- Develop and optimize low-level systems software including kernel modules, device drivers, and platform management agents to improve hardware utilization and reliability
- Architect scalable infrastructure automation frameworks that reduce manual operational toil and accelerate hardware deployment across Meta's data center fleet
- Identify and resolve systemic reliability and performance issues across production hardware by analyzing telemetry, failure patterns, and system-level diagnostics
- Define technical direction for production systems software components, driving alignment across infrastructure engineering and data center operations stakeholders
- Mentor other engineers on systems software design patterns, debugging methodologies, and production infrastructure best practices
- Lead cross-functional efforts to evaluate and integrate new hardware technologies into the production environment, including bring-up, validation, and qualification workflows
Minimum Qualifications
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- 6+ years of experience in systems software engineering, including development in C, C++, or Python for Linux-based production environments
- 6+ years of experience with large-scale infrastructure systems, including hardware lifecycle management, fleet automation, or data center operations software
- Experience developing or integrating with low-level systems components such as kernel interfaces, BMC/IPMI/Redfish management stacks, or hardware telemetry frameworks
- Experience designing and operating distributed systems software at scale, including monitoring, alerting, and automated remediation pipelines
- Experience communicating technical decisions and system designs through written documentation and cross-functional stakeholder alignment Experience working on hardware/software projects in the manufacturing and hardware validation space
- Experience with large-scale distributed systems
- Familiarity with test automation frameworks and CI/CD pipelines
- Strong debugging and troubleshooting skills across hardware and software boundaries