Site Reliability Engineer – Remote
Engineering/Technical
South Africa, United Kingdom, Remote
ENVIRONMENT:
For over a century, our client has been helping generous individuals and the causes they care about make a lasting impact. Today, they support over 30,000 donors in giving more than £110 million annually to over 12,000 charities, community initiatives, and individuals in need. Their expert services empower these partners, equipping them to grow, thrive, and create meaningful change in the world. They are looking for a Site Reliability Engineer with drive, intellectual curiosity and technical capability to join their small but dynamic Technology team. The role will work in both Azure Native environment and co-location data centre. A significant proportion of their line of business applications remain hosted on Hyper-V based Private Cloud and will need to be “fed and watered” until they are moved to new platforms or technologies, a task that this role will assist with. These migrations offer the role holder learning and development opportunities.
DUTIES:
- Maintain the operations of physical, virtual and cloud systems to ensure availability, reliability and integrity are maintained.
- Implement infrastructure, configuration, and network as code for reliable, consistent, repeatable deployments.
- Improve the operations of the Azure environment and broader platform services, operating systems and servers, networks and security technologies.
- Collaborate with other teams to ensure the smooth operation of their Digital Product
- Troubleshoot and resolve complex IT issues via phone, web, and in-person.
- Collaborate with technical experts, stakeholders, and other team members to resolve assigned incidents against the SLA and preventing reoccurrence.
- Promote a culture of learning, proactively learning about and training other staff members on new product and service technologies.
- Other duties as directed by your line manager or other senior member of the Technology Team.
REQUIREMENTS:
- Ideally, you have 3+ years of experience working in a similar role.
- You are proficient in reliability, scalability, performance, security, toil reduction, and other site reliability best practices.
- You have previous or current experience working with Azure (or another cloud platform, such as AWS).
- You have a deep understanding of security principles and how to keep large operational services secure.
- You have a strong understanding of DevOps/DevSecOps, including experience of Infrastructure as Code and CI/CD pipelines, such as Azure DevOps.
- You have experience working with Windows Servers, SQL databases and Networking infrastructure (Routing/Switching, VLANs, Firewalls, and Load Balancers).
- You are familiar with Agile project management practices.
- You have experience in proactively monitoring live services, troubleshooting issues and implementing service improvements.
- You understand the importance of adherence to Payment Card Industry (PCI-DSS) requirements.
- You are comfortable setting up component logging and configuring monitoring tools and providing on- call support, escalation/paging and managing incidents.
ATTRIBUTES:
- You are comfortable being both self-sufficient and able to take your own initiative as well as being a team player and you are able to work effectively as part of a hybrid team.
- You demonstrate a passion to continue to learn and develop your engineering skills.
- You have strong verbal and written communication skills and be able to communicate clearly, effectively and appropriately, depending on your audience.