Position Overview
We are seeking a dynamic and experienced Manager for our Site Reliability Engineering (SRE) team. This individual will play a critical role in ensuring the stability, performance, and scalability of our infrastructure. The ideal candidate will possess excellent leadership skills, profound technical expertise, and the ability to thrive in a fast-paced, collaborative environment.
Key Responsibilities
Leadership and Team Management
Lead, mentor, and develop a team of highly skilled Site Reliability Engineers.
Promote a culture of continuous improvement and high performance.
Foster collaboration and communication within the team and with other departments.
Monitor team performance and provide constructive feedback.
Technical Expertise
Oversee the design, implementation, and maintenance of reliable and scalable infrastructure.
Develop and enforce best practices for system reliability, monitoring, and incident management.
Ensure the availability, performance, and security of our services.
Collaborate with software engineering teams to design and implement solutions that improve system reliability and performance.
Utilize automation and DevOps practices to streamline operations and enhance productivity.
Experience with Terraform
Extensive Knowledge on Multi Cloud Environment is an added advantage
Collaboration and Communication
Work closely with cross-functional teams, including engineering, product management, and operations, to ensure alignment and successful project execution.
Communicate effectively with stakeholders at all levels, providing regular updates on SRE initiatives and performance metrics.
Facilitate incident response and post-mortem meetings, ensuring thorough analysis and follow-up on action items.
Qualifications
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Proven experience in a leadership role within a Site Reliability Engineering or DevOps team.
Strong technical background with extensive knowledge of cloud infrastructure, containerization, automation, and monitoring tools.
Proficiency in scripting languages such as Python, Bash, or similar.
Excellent problem-solving skills and a proactive approach to identifying and mitigating risks.
Exceptional communication and interpersonal skills.
Why Join Us?
Be part of a forward-thinking company that values innovation and excellence.
Work in a supportive and collaborative environment where your contributions are recognized and rewarded.
Opportunities for professional growth and development through ongoing training and mentorship.
Competitive compensation and benefits package.
If you are a motivated and visionary leader with a passion for site reliability engineering, we would love to hear from you. Apply today and join our team in ensuring the robustness and efficiency of our cutting-edge infrastructure.