Austin, TX, USA
7 days ago
Principal Software Developer Engg, AI Workload Orchestration

Here at OCI we’re building the world’s largest AI clusters and we’re the fastest at bringing them to the market.  The AI Infrastructure organization at OCI is leading this effort by creating a GPU focused cloud for AI workloads with the latest hardware. The AI workload organization is developing solutions to enable large AI customers to schedule their Kubernetes AI workloads on OCI’s GPU cloud with the best performance, efficiency, reliability, and scalability.  This is your chance to be part of the AI revolution, creating systems that allow customers to scale from tens to thousands of GPUs without compromising performance. You will have the opportunity to work with cutting-edge technologies and make a significant impact on our organization's success.

We are looking for a highly skilled distributed systems engineer to optimize Kubernetes schedulers for AI workloads to increase GPU workload utilization and throughput. In this role, you will ensure top performance for AI workloads scheduled on our platform. You will provide technical leadership to the team and bring clarity to ambiguous problems and come up with innovative solutions that make it easy for our customers to deploy AI workloads on our GPU infrastructure. You will collaborate with cross-functional teams to enhance GPU control plane and GPU data plane to deliver exceptional customer experience.

Career Level - IC4

Confirm your E-mail: Send Email