Skip to main content

PhonePe CL Infrastructure

Go AWS Kubernetes Prometheus

PhonePe is a leading digital payments and financial services platform in South Asia, enabling users to make seamless transactions, manage investments, and access various financial services through its mobile app.
Contractor
Remote
12 months

About project
#

Launching cloud infrastructure for PhonePe Consumer Lending business from scratch.

Tech stack
#

  • Kubernetes (AWS EKS)
  • AWS
  • Golang
  • Gitlab, GitlabCI
  • Prometheus, Grafana, Thanos

Project scope
#

As the SRE Manager, I led a team of four Senior DevOps / Platform Engineers to deliver the infrastructure for hosting PhonePe’s CL services. My time was divided as follows:

  • 40% dedicated to people management, and fostering team cohesion
  • 50% allocated to technical leadership, which involved hands-on activities such as coding, code reviews, and system design
  • 10% focused on establishing and maintaining effective communication with stakeholders and other teams within PhonePe

From the project’s outset, we established high standards, committing to the principle that all infrastructure and AWS-related tasks would be meticulously written as code.

As an SRE Manager, my responsibilities included:

  • Providing technical leadership for building and maintaining infrastructure in AWS
  • Collaborating closely with the InfoSec team to ensure the infrastructure meets high standards of security and compliance
  • Working in partnership with a dedicated AWS Solutions Architect (Premium Enterprise Support) to ensure the infrastructure is well-architected
  • Acting as Project Manager to plan and ensure deliverables set by stakeholders are met within strict deadlines
  • Implementing and participating in on-call support rotation together with my team
  • Leading incident response and post-mortem analysis to drive continuous improvement in system reliability and performance
  • Mentoring and guiding team members, fostering a culture of innovation and excellence within the SRE team
  • Monitoring and optimizing system performance, availability, and scalability to meet the needs of a growing user base