Site Reliability Engineer (Mid) – CPT
Engineering/Technical
Cape Town – Western Cape – South Africa
ENVIRONMENT:
A globally recognized brand with a strong strategic vision, dedicated to enhancing lives through cutting-edge technology, is looking for a Mid-Level Site Reliability Engineer. They seek a highly skilled and adaptable professional with a solid background in system automation and configuration management tools, including but not limited to Ansible, Puppet, and Terraform. This role offers an opportunity to contribute to a dynamic environment, ensuring reliability, scalability, and efficiency through automation and infrastructure management. You will need Matric/Grade 12, suitable Certifications such as Oracle, Cloud & DevOps and 5-10 years Software Development, of which preferably 3-5 years must be experience in SRE, DevOps, or System Engineering.
DUTIES:
- Experience in monitoring and logging tools to enhance system observability and optimize troubleshooting processes.
- Develop and maintain tools to automate operational workflows.
- Actively participate in on-call rotations, promptly respond to incidents, and drive thorough root cause analysis to ensure effective resolution.
- Work closely with Development teams to enhance system reliability through in-depth code reviews, performance analysis, and infrastructure improvements.
- Drive the adoption of reliability best practices by contributing to the development, implementation, and continuous improvement of standards that enhance system stability and performance.
- Promote a culture of knowledge-sharing within the team, encouraging collaboration and enabling continuous learning through open discussions, documentation, and technical insights.
REQUIREMENTS:
Minimum Requirements:
- Matric/Grade 12.
- 5-10 Years in Software Development, of which preferably 3-5 years must be experience in SRE, DevOps, or System Engineering.
- Proficiency in Scripting languages
- Relevant Certification such as Oracle, Cloud, DevOps.
Technical Skills:
- Continuous delivery
- Cloud skills & best practices
- Observability (System and Application Performance Monitoring)
- Infrastructure as code
- Configuration Management (Infrastructure as a Service)
- Containers
- Automation
- Collaboration and Communication
- Coding and Scripting
- Azure DevOps
- General systems uptimes
- SLO (Service-Level Objectives)
- Latency
- Incident and Outage Management
- Change Management
- Capacity Planning
ATTRIBUTES:
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
- Strong troubleshooting.
- Self-disciplined and self-motivated.
- Ability to learn quickly and share knowledge with others.
- Work well in a team and independently.
- Accountable and responsible.
- Attention to detail, accurate and analytical.
- Good reporting and documentation.
- Excellent communication.