As a Site Reliability Engineer (SRE) at Grab, you will be responsible for the stable operation of the core Grab systems. You will also be reviewing and integrating new services and preparing them for large scale usage. You will be monitoring the availability, latency, and overall system health. You will scale systems sustainably through automation and push changes to improve reliability and velocity. Limiting the time spent on day-to-day operational work, blameless postmortems, identification of potential outages would be the key to product quality and interesting work life.
As part of the SRE team, you will:
Engage in and improve the whole lifecycle of services - from design, through deployment, operation and refinement.
Work with engineering teams to design and write code to create systems which are highly available and able to scale seamlessly.
Help improve reliability, stability and scalability challenges with engineering teams
Get involved in deep diagnosis of incidents, and engage with multiple highly skilled engineering teams on resolutions.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Contribute to a culture of learning and responsibility by writing detailed postmortem reports.
Identify and resolve problems relating to critical service operations and to prevent their recurrence using automation.
Be part of a cool team, responsible for one of the largest cloud based services in South East Asia.
Mentor other engineers, define our technical culture, and help build a fast-growing team
BS degree in computer science, software engineering, information technology or related technical field involving coding, or equivalent practical experience.
Experience with algorithms, data structures, complexity analysis and software design.
Experience in one or more of the following: Go, C, C++, Java, Python, Perl or Ruby.
Possess analytical skills, mental resiliency and the ability to think systematically under stressful conditions
Highly accountable and takes ownership. Outstanding work ethic, high-integrity, team player, and a lifelong learner
Really Nice to Haves
Experience in Go.
Experience with cloud based large-scale infrastructure from vendors such as Amazon Web Services, Azure or Google Cloud Platform
Contributes to open source project experience with performance analysis and debugging tools.
Ability to debug and optimize code and automate routine tasks.
Grab is a Singapore-based technology company offering ride-hailing transport services, food delivery and payment solutions. Find out more!