OCI’s Security Assurance Gateways Team offers exciting opportunity in our fast-paced rapidly evolving Cloud engineering team. is seeking a motivated Site Reliability Engineer in a large-scale and mission-critical distributed systems and cloud services. We are looking for people who have experience and are deeply interested in databases, distributed systems and cloud services. There is always something new to do at our team, whether it’s enhancing our existing products or building a completely new one from scratch.
This individual will be a member of the SRE services focused on Cloud Services, build deployments, operations, Mitigating Security vulnerabilities, Operations and Automation. This position will be instrumental in fostering a culture of SRE for Security Assurance Gateway activities and for products and services across our global cloud service team. The team you work in will have diverse expertise in Cloud environment, systems, DNS, networking and storage to provide the stability, performance and reliability of our adjacent service teams and customers.
We work with multiple service development teams, identifying cross-team issues which create risk for operations across the organization and resolving those issues with a mixture of engineering, troubleshooting expertise and general operational guidance. You will deliver the solutions that directly contribute to our internal customer’s success.
Looking for a self-motivated engineer and ready for changes in a large-scale and mission-critical distributed systems and cloud services.
Job Description:
5+ years of SRE/DevOps/Automation experience in a large-scale infrastructure and cloud services. Experience in Cloud Technologies, infrastructure and DNS Operation and Observability Experience on Grafana, LumberJack, Shepherd, bit bucket, code reviews and scripting. Deploy, Operate and maintain large scale Cloud Service products. Familiarity with docker containers, Multi-Tenant, Virtualised Infrastructure and Orchestration. Experience in operating CI/CD related systems, Linux Systems, Terraform, Java and Python or Go. Experience working with fault tolerant, highly available, high throughput, distributed and scalable systems. A mind focused on Systems Reliability, Automation and Improvements by collaborating with local and global teams Keen Troubleshooting skills for improving performance, availability, reliability and scalability. Improve our offerings through Deep Analysis, Diagnose, on-call rotations and resolve issues Aptitude to be a good team player and the desire to learn and implement new Cloud technologies as needed Excellent organizational verbal, and written communication skill
FY25-Q3-IC4-Dev-US HIGH-1009226
Career Level - IC4