Seattle, WA, USA
13 days ago
Senior Network Developer AI2NE

The AI2NE Org strives to be global leaders in the RDMA cluster networking domain and enable seamless, accelerated High-Performance Compute (HPC), Artificial Intelligence and Machine Learning advancements. We envision a future where artificial intelligence and machine learning revolutionize industries, reshape societies, and unlock limitless possibilities. Our vision is to be a pioneering force, driving the development and design of state-of-the-art RDMA clusters tailored specifically for AI, ML, HPC workloads.

We strive to be the go-to experts in RDMA cluster architecture, leveraging our deep understanding of the unique demands of AI/ML and HPC applications. By staying at the forefront of technological advancements, we aim to redefine the boundaries of what is possible, pushing the envelope of computational capabilities and unlocking unprecedented performance.

This position supports the design, deployment, and operations of a large-scale global Oracle cloud computing environment (Oracle Cloud Infrastructure - OCI). Primarily focused on development and support of network fabric and systems through a combination of a deep level understanding of networking at the protocol level coupled with programming skills to support the intensive automation required to operate a production environment.  As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure and the Internet. 

Ultimately, our vision is to enable a future where AI and ML technologies are seamlessly integrated into everyday life, solving complex problems, enhancing decision-making processes, and creating a positive impact on a global scale. Through our commitment to innovation, excellence, and collaboration, we aim to support the driving force behind this transformative era, revolutionizing the way we perceive and interact with technology.

Research and Development: Conduct cutting-edge research to understand the evolving landscape of AI and ML, and apply the findings to the development of RDMA clusters. Explore new algorithms, hardware architectures, and optimization techniques to maximize performance. Design and Engineering: Design and engineer RDMA clusters that align with the unique demands of AI/ML, HPC, and Database workloads. Collaborate with hardware and software engineers to optimize cluster architecture, cooling systems, power efficiency, and integration with AI and HPC frameworks. Capacity Scaling: Global deployments of RDMA clusters that meet the needs of the business and our customers. Testing and Quality Assurance: Develop rigorous testing procedures to ensure the stability, reliability, and compatibility of RDMA clusters. Conduct comprehensive benchmarking, stress testing, and validation to guarantee optimal performance and adherence to industry standards. Customer Engagement and Support: Establish strong relationships with customers, understand their requirements, and provide expert guidance on RDMA cluster configuration, deployment, and optimization. Offer ongoing technical support, training, and troubleshooting to ensure customer success. Documentation and Knowledge Sharing: Create comprehensive documentation, user guides, and best practices to empower users in utilizing RDMA clusters effectively. Facilitate knowledge sharing through internal and external forums, presentations, and workshops to promote learning and collaboration

Career Level - IC3

Confirm your E-mail: Send Email