HPC Software Engineer
A well-funded and rapidly growing technology organization focused on building and scaling high-performance computing (HPC) and cloud infrastructure. The team operates at the forefront of distributed systems, enabling large-scale research, simulation, and data-driven decision-making. Their platform supports mission-critical workloads that drive innovation across advanced technical domains.
We're looking for a Software Engineer to join a team responsible for building and operating a large-scale HPC platform that powers complex batch workloads on Kubernetes.
This role sits at the intersection of distributed systems, cloud infrastructure, and high-performance computing. You'll work on a modern scheduling platform designed to orchestrate workloads across multiple Kubernetes clusters at scale.
You'll be joining a highly experienced engineering team working on cutting-edge infrastructure supporting ML and compute-intensive workloads.
- Design and build backend systems using Go (Golang)
- Develop and operate highly scalable, distributed systems for large-scale workloads
- Build and manage containerized applications in Kubernetes environments
- Optimize and manage data across databases (primarily PostgreSQL and other data stores)
- Troubleshoot and tune Linux-based systems within a compute-heavy environment
- Debug networking and system-level issues to improve performance and reliability
- Diagnose complex production issues across infrastructure and application layers
- Apply strong software design principles and computer science fundamentals to your work
- Contribute to CI/CD pipelines and engineering best practices
- Stay current with new technologies and apply them where relevant
- Experience building Kubernetes components (e.g. controllers, operators)
- Experience with event-driven architectures (Kafka, Pulsar, or similar)
- Background in high-performance computing, Kubernetes, or workflow orchestration systems
- Experience running distributed systems in cloud environments (AWS preferred)
- Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana)
- Experience with job scheduling systems (e.g. SLURM or similar)
- Work on cutting-edge infrastructure at scale
- Tackle complex engineering challenges in distributed systems and HPC
- Collaborate with a high-caliber, deeply technical team
- Make a direct impact on systems that power advanced research and innovation
- Lunch stipend (via delivery service)
- 100% employer-covered medical, dental, and vision (for employees + families)
- 16 weeks paid parental leave
- 401(k) with company match
- Additional optional health and wellness benefits
- Generous PTO + company holidays
Recommended Jobs
Posted 3 hours ago
Posted 4 hours ago
Posted 4 hours ago
Posted 4 hours ago
Posted 4 hours ago

