CareerPlanGet AI match score →

Director of AI Infrastructure Seattle, WA View job

🗓 Posted 2026-06-04
Apply on employer site ↗
How well do you match this role? Upload your CV — CareerPlan scores your fit with AI in seconds.
See my score →

Role Overview
Ai2 is a non-profit research institute at the forefront of open-source AI development. Unlike industry peers, our goal is to share our findings, data, code, and models with the global scientific community. We are seeking a Director of AI Infrastructure to oversee the systems that power our research. This leader will be responsible for the full lifecycle of our high-performance computing (HPC) environment which includes on-prem GPU clusters and the software orchestration layer that schedules workloads across a hybrid cloud environment.

Responsibilities
The essential functions include, but are not limited to the following:
- Cluster Management: Oversee the availability and performance of dense on-prem GPU clusters. You will partner with hardware vendors and internal teams to ensure our physical infrastructure meets the demands of frontier model training.
- Orchestration & Scheduling: Direct the strategy for Beaker, our internal orchestration platform. Your goal is to optimize job scheduling, ensuring high utilization of both on-prem assets and elastic cloud resources (AWS/GCP).
- Storage Architecture: Develop and execute a long-term roadmap for storage that balances high-throughput performance for active training with cost-effective durability for petascale research data.
- Resource Economics: Act as the primary steward of our GPU compute budget. You will make data-driven decisions on when to burst to the cloud versus when to invest in on-prem capacity.
- User Support & Velocity: Serve as the technical bridge to our research teams. You will ensure that infrastructure is an accelerator, not a bottleneck, for a diverse set of research objectives.

Requirements and Qualifications
Who You Are:
- Systems Expert: You have a deep understanding of the Linux kernel, container runtimes, and distributed systems. You understand the performance implications of InfiniBand topologies and NCCL optimizations.
- Strategic Thinker: You look beyond the immediate "fire" to design systems that will scale for the next 3–5 years of AI research.
- Pragmatic Leader: You are comfortable making trade-offs between technical elegance and operational necessity. You prioritize reliability and researcher velocity above all else.

What You’ll Need:
- Experience: 12+ years in infrastructure, systems engineering, or HPC, with at least 5 years in a leadership role managing multi-disciplinary engineering teams.
- Bachelor’s degree in related field; relevant advanced degree may substitute for equivalent years of technical work experience.
- GPU/HPC Stack: Direct experience managing large-scale NVIDIA GPU clusters and high-performance networking (InfiniBand/RoCE).
- Cloud Native: Strong background in Kubernetes, Slurm, or similar orchestration frameworks, particularly in hybrid-cloud configurations.
- Storage Mastery: Experience with distributed filesystems (e.g., WEKA, Ceph, Lustre) and cloud storage integration at scale.
- Software Development: Proficient in Go or Python, with the ability to review architecture and code for our internal tooling.

Physical Demands and Work Environment:
- Must be able to remain in a stationary position for long periods of time.
- The ability to communicate information and ideas so others will understand. Must be able to exchange accurate information in these situations.
- The ability to observe details at close range.
- Can work under deadlines.

What They Offer
Compensation:
- Base salary range: $176,400 - $264,600.
- Generous bonus plans to provide a competitive compensation package.
- Annual bonuses and participation in the long-term incentive plan.

Benefits:
- Medical, dental, vision, and an employee assistance program for team members and their families.
- Enrollment in health savings account plan, healthcare reimbursement arrangement plan, and health care and dependent care flexible spending account plans.
- Company’s 401k plan.
- $125 per month to assist with commuting or internet expenses.
- $200 per month for fitness and wellbeing expenses.
- Up to ten sick days per year, up to seven personal days per year, up to 20 vacation days per year, and twelve paid holidays throughout the calendar year.

Culture and Perks:
- Learning organization with weekly Ai2 Academy lectures and world-class AI experts as guest speakers.
- Commitment to diversity, inclusion, and a healthy work/life balance.
- Collaborative and transparent team environment.
- Seattle office located on the water with access to mountains, lakes, and outdoor activities.

Sourced via direct · Listed on CareerPlan, which tracks 70,000+ jobs from 20+ sources.

Don't apply to Director of AI Infrastructure Seattle, WA View job alone — put your search on autopilot

CareerPlan is passive job search. Upload your CV once and we apply to roles like this on your behalf, track every reply, and ping you only when someone wants to interview — while you stay focused on your current job. Too busy to job-hunt? That's the point.

Put my job search on autopilot →