A) SUMMARY OF RESPONSIBILITIES
We are looking for a skilled and passionate Platform Engineer with expertise in Chaos Engineering and resiliency testing. The ideal candidate will have a strong background in distributed systems, cloud infrastructure, and container orchestration. You will be responsible for designing, implementing, and managing chaos experiments to test the resilience of our platform. Your work will directly contribute to our platform's ability to withstand and recover from unexpected failures, ensuring continuous and reliable service for our clients.
B) KEY AREAS OF RESPONSIBILITIES
- Develop and implement chaos engineering strategies to test the resilience of our platform infrastructure.
- Design, execute, and automate chaos experiments using tools such as Gremlin, Chaos Mesh, Litmus, or similar.
- Collaborate with platform engineering and DevOps teams to identify critical systems and components for testing.
- Build and maintain a robust monitoring and observability framework to analyze the impact of chaos experiments.
- Identify weaknesses in the current infrastructure and provide recommendations for improvement.
- Integrate chaos engineering practices into CI/CD pipelines using GitOps tools like ArgoCD and Atlantis.
- Contribute to the development and maintenance of Kubernetes clusters, AWS EMR, AWS MSK Kafka, and VSphere environments.
- Utilize Terraform for infrastructure as code (IaC) to manage cloud resources.
- Participate in on-call rotation and assist in incident management and root cause analysis.
- Stay up to date with the latest trends and best practices in chaos engineering, resiliency testing, and cloud infrastructure.
C) FUNCTIONAL COMPETENCIES
Functional Competencies
- Strong understanding of Kubernetes, Docker, and container orchestration.
- Proficiency in AWS services, including EMR, MSK Kafka, and experience with VSphere.
- Experience with infrastructure as code (IaC) tools, particularly Terraform.
- Familiarity with GitOps practices and tools such as ArgoCD and Atlantis.
- Hands-on experience with chaos engineering tools (e.g., Gremlin, Chaos Mesh, Litmus).
- Solid understanding of distributed systems, microservices architecture, and cloud-native technologies.
- Excellent problem-solving skills and a proactive approach to identifying and addressing potential issues.
- Strong communication skills and the ability to work effectively in a collaborative team environment.
D) QUALIFICATIONS & EXPERIENCE
Minimum Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 3+ years of experience in platform engineering, site reliability engineering (SRE), or DevOps roles, with a focus on chaos engineering.
About the Company
Payments Network Malaysia Sdn Bhd
Embark on an exciting career journey with Payments Network Malaysia Sdn Bhd (PayNet), the heartbeat of Malaysia's financial markets!
As the national payments network and a pivotal infrastructure for Malaysia’s dynamic financial markets, PayNet is a linchpin in advancing the nation’s digital economy.
Our comprehensive suite of retail payment solutions - encompassing DuitNow (QR and P2P), JomPAY (Bill Payments), FPX (Online), MyDebit (Domestic Debit), MEPS (ATM), and IBG (Interbank GIRO) - not only offer wide accessibility but are seamlessly integrated into the fabric of daily life in Malaysia. These services have revolutionised the way Malaysians handle financial transactions, marking a significant leap in consumer convenience and efficiency.
At PayNet, our focus is on providing a safe, efficient, and innovative payments system. We are dedicated to improving and managing payment services that meet the evolving needs of consumers and businesses. Our work ensures the stability and reliability of Malaysia’s financial system, supporting the growth of the economy.
Learn more about our work and how we are contributing to Malaysia's financial future at www.paynet.my.
Join us in embracing digital payments and advancing Malaysia's financial landscape.