About Setel:
The Future of Mobility
Introduced in July 2018, Setel is a mobile platform that aims to delight customers by innovating for better, inclusive mobility. Setel serves customers across Malaysia by powering one app as the constant companion to ease motorists’ journey across fueling, parking, EV charging, motor insurance, road tax, auto assistance, general purchases, and more across an ecosystem of PETRONAS petrol stations, retail partners, and online merchants.
Role Purpose:
The Senior Engineer, Site Reliability is a hands-on role where you are a deep technical owner for service clusters in designing resilient Zero-Downtime (ZDT) architectures (EKS blue-green), slashing toil 70% via GitOps policy-as-code and Prometheus auto-remediation. You proactively automate repetitive ops, drive MTTR under 2 min for P0 incidents with chaos-proven runbooks (Gremlin/Litmus), and enforce 99.99% SLOs as non-negotiable SLAs. Own cleanliness, SRE hygiene mindset, mentor ICs, and ship production-grade stability that scales well. 100% looking for Engineering Excellence.
In this role you will:
- Architect and lead the cross-org design, deployment, and ongoing evolution of Kubernetes clusters, database clusters (PostgreSQL/MongoDB/DynamoDB), observability stacks (Prometheus/Grafana/Loki/Tempo), CI/CD platforms (ArgoCD/Github Actions), VPCs, and firewalls using Terraform, GitOps, and policy-as-code to deliver 99.99%+ reliability, proven multi-region resiliency, and cost reduction at scale.
- Write, update, and simplify all technical documentation (runbooks, RCAs, KB articles, architecture decisions) in Confluence using strict KISS principles immediately after every change to eliminate tribal knowledge and ensure zero ambiguity for team collaborations.
- Orchestrate safe, fast deployments and database migrations or upgrades across our core services using ArgoCD/Github Actions progressive delivery, automated canaries, and scripted rollbacks. Meanwhile, keeping the CI/CD pipelines simple, reliable, and self-service. So developers ship multiple times per day with zero customer impact.
- Lead every Sev1/2 Incident, run the bridge, write RCA within 48H, enforce blameless post-mortems the same week, and ship permanent automated fixes so the same outage never happens twice.
- Review team members' code scripts by evaluating adherence to better code quality standards to ensure high-quality software delivery.
- Evolve product Observability. This includes metrics (Prometheus/Tempo), Logs (Loki/Cloudwatch), Traces (Tempo/OpenTelemetry) and proactively updates on the design, and implementation.
- Build, Develop and Maintain Database Reliability by backup/restore testing, failover drills (on PostgreSQL, MongoDB, DynamoDB).
- Continuously Improve and participate in chaos experiments which work on improving MTTR, RPO, and RTO.
- Enforce production-readiness reviews, security hardening (least privileges IAM, secret rotation, SCA) and block deployments that violate policy.
- General Responsibilities: All employees are required to ensure adherence to the compliance of company policies, industry regulations and legal requirements. All employees are expected to assist with tasks, projects, and other duties related to the role, as and when deemed necessary.
You're a great fit if you have:
- 4+ years of experience in site reliability engineering or software engineering.
- Proficiency in version control systems (VCS) like Git.
- Advance proficiency in cloud technology especially AWS services.
- Advance proficiency in containerisation technology such as Docker and Kubernetes.
- Advance proficiency in NoSQL, RDBMS, event, queue and cache databases such PostgreSQL, MongoDB, Kafka, RabbitMQ and Redis.
- Intermediate or Advanced knowledge in infrastructure as code (IaC) and CICD technology such as Terraform, GitOps (Eg ArgoCD / Github Action) and GitOps Workflows (Eg Argo Workflow / Github Workflows).
- Intermediate or Advanced knowledge in monitoring technology such as Vector, Loki, Tempo, OpenTelemetry, VictoriaMetrics (Prometheus) and Grafana.
- Intermediate or Advanced knowledge in networking and security especially around load balancing, firewall, encryptions.
- Intermediate knowledge in Chaos Engineering with Litmus or Gremlin or Chaos Monkey.
- Bachelor’s degree in Computer Science, Information Technology, or a related field; OR
- Diploma in Computer Science, Information Technology, or a related field, with a minimum of 2 years of relevant work experience.
What Makes Working With Us Awesome
- Our people and culture: You will get to work with awesome and friendly colleagues to whom you can expect to collaborate well to deliver your work. Empowerment is given and you will get a lot of opportunities for peer-learning.
- Availability of tools and applications: You will be provided with different tools to facilitate your work. Automate your work whenever possible so that you can focus on delivering impact for your role.
- Development focused: Your learning and growth matters most for us. We are people centric and always ready to help our people to define what they want to make an impact on and craft their learning plan accordingly.
Cool Perks/Benefits
- Hybrid working arrangement; Flexible working hours.
- Relax and unwind at the leisure area with video games, board games, books, and more.
- Wear your favourite jeans, or any cool OOTD so that you can work comfortably (in style).
- Coffee, tea, or snacks are available for consumption at the pantry. Because you’ll be happier with a full tummy.
- A healthy body leads to a brilliant mind. Let’s get moving with the inter-company sports team.
- There will be workshops, talent shows, sport activities, and other events for sharing and bonding.
Personal Data Protection
Setel Ventures Sdn Bhd (“Setel”, “we”, “our” “us”) is committed to protecting and respecting your privacy. This Setel privacy statement (“Privacy Statement”) explains what personal data we collect about you, when and why we collect it, how we use it, the conditions under which we may disclose it to others, your rights to your personal data, and how we keep it secure. This Privacy Statement covers both our online and offline collection activities, including personal data that we collect through online platforms such as websites, applications, third-party social networks or our online and physical events, or through other third parties that we work with. Please read this Privacy Statement carefully to understand our views and practices regarding your personal data.
#LI-JT1APPLY