One our clients provides geofencing technology solutions to detect and circumvent cases of fraud, false data, spoofing and device tampering to comply with geo-compliance regulations in the sports betting and gaming industry.
They are looking for a Site Reliability Engineer to provide support for all production applications and systems in a US timezone.
As they build solutions to meet the needs of their customers, ensuring a high reliability of 99.999% is key to providing an excellent customer experience.
Introducing and implementing systems that will increase uptime and overall reliability will play a central part in this role.
They are looking for the ideal candidate to own this area and drive successful change.
What will you do ?
● Manage a team of high-performing SRE’s
● Follow best practices in Incident Management and evolve the SRE mindset across Engineering
● Develop monitoring dashboards across all production systems
● Establish a framework for incident management
● Improve release processes whilst also ensuring the Definition of Done is met
● Participate in architecture and design discussions to ensure that SRE best practices are met
● Respond to system generated alerts/escalation relating to any failures in the production applications
● Participate in Site Reliability Engineer functions, such as 24×7 on-call coverage as needed
What are we looking for ?
● 6+ years experience in Site Reliability
● Experience in 24/7 monitoring of Distributed Systems
● Experience working with java based technologies in a linux environment
● Experience working with NoSQL databases such as DynamoDB
● Knowledge of microservices architecture in a cloud based environment, ideally AWS
● Knowledge of mobile technologies
● Good ability to troubleshoot issues and distinguish between critical and non critical
● Good understanding of CI/CD pipelines
● Good understanding of InfoSec controls
● Great communication skills
● Ability to work in a mission critical environment