A well-funded and rapidly growing technology organization focused on building and scaling high-performance computing (HPC) and cloud infrastructure. The team operates at the forefront of distributed systems, enabling large-scale research, simulation, and data-driven decision-making. Their platform supports mission-critical workloads that drive innovation across advanced technical domains.

The Role

We're looking for a Software Engineer to join a team responsible for building and operating a large-scale HPC platform that powers complex batch workloads on Kubernetes.

This role sits at the intersection of distributed systems, cloud infrastructure, and high-performance computing. You'll work on a modern scheduling platform designed to orchestrate workloads across multiple Kubernetes clusters at scale.

You'll be joining a highly experienced engineering team working on cutting-edge infrastructure supporting ML and compute-intensive workloads.

What You'll Do

Design and build backend systems using Go (Golang)
Develop and operate highly scalable, distributed systems for large-scale workloads
Build and manage containerized applications in Kubernetes environments
Optimize and manage data across databases (primarily PostgreSQL and other data stores)
Troubleshoot and tune Linux-based systems within a compute-heavy environment
Debug networking and system-level issues to improve performance and reliability
Diagnose complex production issues across infrastructure and application layers
Apply strong software design principles and computer science fundamentals to your work
Contribute to CI/CD pipelines and engineering best practices
Stay current with new technologies and apply them where relevant

What We're Looking For

Experience building Kubernetes components (e.g. controllers, operators)
Experience with event-driven architectures (Kafka, Pulsar, or similar)
Background in high-performance computing, Kubernetes, or workflow orchestration systems
Experience running distributed systems in cloud environments (AWS preferred)
Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana)
Experience with job scheduling systems (e.g. SLURM or similar)

Why Join

Work on cutting-edge infrastructure at scale
Tackle complex engineering challenges in distributed systems and HPC
Collaborate with a high-caliber, deeply technical team
Make a direct impact on systems that power advanced research and innovation

Benefits

Lunch stipend (via delivery service)
100% employer-covered medical, dental, and vision (for employees + families)
16 weeks paid parental leave
401(k) with company match
Additional optional health and wellness benefits
Generous PTO + company holidays

HPC Software Engineer

Recommended Jobs