Malaysia
28 days ago
Site Reliability Engineer

The Hospitality Cloud SRE team is focused on maximizing service reliability for our hotel product service offerings across global Oracle data centres. Our team runs with a start-up like approach, leaving room for creative freedom. We have worked to assemble the smartest people in the industry to build and grow this revolutionary and disruptive team. 

 

We are looking to add new members to this dynamic team and are seeking subject matter experts for designing and continuously improving reliability for all components within our solution portfolio. 

 

About The Job

As part of the SRE team, you will be continually challenged and directly contribute to the success of our Oracle Hospitality cloud service offerings, every day, working closely with product and Infrastructure partners. 

 

As an SRE, you will solve interesting technical challenges by defining, designing deploying and troubleshooting key Oracle Cloud services, platforms, and infrastructure, always thinking about reliability, scalability, resilience, security, and performance.

 

In this role, which is a mix of software, architecture and operational readiness, you will be responsible for the following:

 

Automation – You will have a clear understanding of automation and orchestration principles, and will be eager to automate, wherever and whenever the possibility arises, while simultaneously eliminating technical debt. Automation must be part of your DNA.

 

Service Ownership –You will be part of the SRE team, whose mission is the shared full stack ownership of a collection of services and/or technology areas, with our Development partners.

 

Ownership Scope – As an SRE, you will understand the end-to-end configuration, technical dependencies, and overall behavioural characteristics of the production services you own. In partnership with your Development partners, you will have the responsibility to ensure that services are designed, delivered and deployed to be mission critical with focus on security, resiliency, scale, and performance. SREs are accountable for the end-to-end performance and operability of the services they own.

 

Service Design – As Oracle Hospitality Cloud Services continually evolve; you will partner with development teams in defining and implementing improvements in service architecture, both current and future. As an SRE, you will be an expert at articulating technical characteristics of your services and the dependencies between services, and guide Development teams to engineer and add premier capabilities to the Oracle Cloud service portfolio.

 

Operations Engineering – You will understand and be able to communicate the scale, capacity, security, performance attributes and requirements of the services you own. To understand and communicate every characteristic of their service stack, such as:

degradation and behaviour under load of the services and their dependencies end-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate Instrumentation and metrics that clearly describe the service behaviours scaling requirements and patterns resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained

 

 

 

Ideal Qualification/ Experience

BS or MS in Computer Science, or equivalent work experience Must have hands-on experience developing and deploying large-scale HA enterprise solutions according to MAA best practises. Good understanding and appreciation of Cloud Native Computing Foundation Charter (CNCF) and Cloud Native Technologies  Knowledge of networking and security i.e. DNS records, Load Balancers (F5 / LbaaS /NGINX), subnets, TLS, SSL, SAML etc. Knowledge of Container technology (Docker etc.) and developing software to work in containers and container orchestration technologies (Kubernetes, Docker Swarm)  Familiarity with continuous integration and continuous deployment (CI/CD) tools and practices, such as Jenkins, GitLab CI, or CircleCI. Knowledge and experience of Monitoring and Observability pipeline and Observability enabling tools (Prometheous, Grafana, Thanos, ELK, Datadog etc) Demonstrated experience in designing, implementing, and managing automation solutions in a production environment. Proficiency in multiple scripting languages such as Python, Bash, Perl, or Ruby. Ability to write clean, maintainable, and efficient code. Experience with automation frameworks and tools like Ansible, Puppet, Chef, or Terraform. Familiarity with Agile methodologies and DevOps practices Methodical approach to troubleshooting complex problems Defining and documenting technical architecture of complex and highly scalable products  Most importantly, the aptitude to be a good team player and the willingness to learn and implement new Cloud technologies

Career Level - IC3

Confirm your E-mail: Send Email
All Jobs from Oracle