HYDERABAD, TELANGANA, India
20 days ago
Site Reliability Developer 4

Within the Oracle Health (OHAI) organization, the new EHR and Clinical AI Agent cloud services are at the forefront of new generative AI services for healthcare organizations. Building on the success of the established Digital Assistant (ODA) product, EHR and AI Agent enable healthcare providers to leverage advanced AI technologies, together with voice commands, to reduce manual work and enable providers to focus on patient care.

Oracle Health EHR is expanding their OCI Operations team, and looking to bring in new Site Reliability Engineers.  As an SRE engineer, you will be engaged in solving technical challenges on an advanced OCI cloud service platform, focusing on areas such as reliability, scalability, resilience, security, and performance.  

You will define how to use latest technologies to optimize the operational efficiency of the service. You will gain a deep understanding of ChatBots, cognitive services, machine learning and analytics. You will work with a team pushing the boundaries of a scalable, self-healing, autonomous platform built on Kubernetes, Docker, Prometheus, and Grafana.   You will be exposed to a wide range of OCI cloud services and understand how we interact with many dependent services across the organization.  

Areas of responsibility- Service OwnershipAs part of the EHR/Clinial Agent team, you will be responsible for all operational aspects of the OCI services included in our portfolio.Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the Digital Assistant suite of products.Own end-to-end availability, reliability, and performance of a Cloud Service

- Service DesignDesigning and implement solutions for rolling out software and security updates  with zero downtimePartner with development and product management to build and maintain platform and automation frameworks to ensure maximum up-time and predictability, preventing outages and service interruptions or degradationAnalyze system failures and developing rapid response processes  

- Operations engineeringEvaluate the operation of cloud service deployments across commercial and government datacentersMonitor the degradation of the service and dependencies under load, and implement solutions to ensure high availability to our customersAnalyse resource utilitization and scaling requirements in a high-end production systemResolve security vulnerabilities to conform to corporate and government security standards.

- AutomationBuilding on your understanding of automation and orchestration principles, you will be identifying opportunities to automate SRE procedures in production environmentsThe solution implemented will be designed to minimize the possibility of errors being introduced into the system

- Technical expertiseHandle complex, critical issues encountered in production environments, drawing on your accumulated technical knowledge to rapidly identify the issues and apply steps to mitigate.Develop an understanding of the underlying AI technologies used to implement the Clinical Digital Assistant service As an SME, you will be called in to handle major incidents, and your understanding of the architecture and dependent services will position you to apply mitigations to resolve the issue quickly, then working with development to assist implementing preventative actions.

 

Career Level - IC4

Confirm your E-mail: Send Email