Mid Level Data Engineer

OTHERS
Malaysia


Job Summary

As a Mid-Level Data Engineer at Revolab, you will architect and maintain the high-throughput pipelines that power our Voice AI and NLP engines. You won't just move text; you will manage multimodal data, audio, streaming logs, and metadata, ensuring it is processed with sub-millisecond reliability. You will bridge the gap between our backend services and our ML team, turning raw interaction data into high-quality training sets and actionable insights.

Key Responsibilities

  • Build and optimize automated pipelines for ingesting and transforming massive datasets from diverse sources (voice streams, gRPC logs, web scrapers).
  • Develop robust workflows to preprocess and version "unstructured" data, specifically audio and text, ensuring it is ready for LLM and Speech model training.
  • Work closely with backend developers to integrate data pipelines with core application services, ensuring seamless data extraction from production databases and real-time event streams.
  • Contribute to the design of our data lake and warehouse strategy, focusing on efficient retrieval patterns for machine learning.
  • Monitor and optimize pipeline latency and resource usage within our Kubernetes environment.
  • Implement validation checks and monitoring to ensure data integrity, security, and compliance.
  • Partner with ML Engineers to deliver "AI-ready" datasets, moving beyond simple ETL to feature engineering and data versioning.

Qualifications

Education:

  • Bachelor’s degree in Data Science, Computer Science, Software Engineering, or related discipline. Master’s degree is a plus.

Technical Skills:

  • Proficiency in Python and SQL for data engineering tasks.
  • Familiarity with Apache Airflow, Spark, or similar ETL orchestration tools.
  • Experience with data lakes, databases, and cloud platforms (AWS preferred).
  • Understanding of data pipeline architecture and design principles.
  • Exposure to unstructured and multimodal data processing.
  • Basic understanding of data security, privacy, and compliance requirements.

Soft Skills:

  • Ability to deconstruct complex data bottlenecks into scalable engineering solutions.
  • Understanding of how data flows through a distributed system, not just a single script.
  • Eagerness to work in a fast-paced AI startup where data formats and model requirements evolve quickly.
  • Ability to explain data constraints to stakeholders

Preferred Experience:

  • Prior work on data projects supporting machine learning or AI development.
  • Experience with data from regulated environments (e.g., finance, healthcare).
  • Web scraping and Robotic Process Automation (RPA) knowledge.
APPLY

About the Company

Revolab Sdn Bhd