Job Summary
As a Mid-Level Data Engineer, you will be responsible for building and maintaining scalable data pipelines and infrastructure to support our AI and data-driven systems. You will work with large and diverse datasets including text, image, audio, and video data, and ensure that data is properly collected, processed, and stored for analytics and machine learning applications.
This role requires strong problem-solving skills, hands-on experience in data workflows, and the ability to work collaboratively with data scientists and engineers.
Key Responsibilities
- Build and maintain automated data pipelines to support data ingestion, transformation, and storage from various sources (e.g., websites, social media, podcasts, reports).
- Develop robust workflows for cleaning and preprocessing multimodal data (text, audio, image, video).
- Optimize pipeline performance to ensure reliability and scalability.
- Contribute to the design and implementation of cloud-based data infrastructure (e.g., data lakes, warehouses).
- Conduct exploratory data analysis (EDA) to support model development.
- Write clear documentation for data workflows and pipelines.
- Ensure data quality, security, and compliance with relevant regulations.
- Collaborate closely with the ML and Data Science teams to deliver data suitable for AI model training and analysis.
Qualifications
Education:
- Bachelor’s degree in Data Science, Computer Science, Software Engineering, or related discipline. Master’s degree is a plus.
Technical Skills:
- Proficiency in Python and SQL for data engineering tasks.
- Experience with data engineering libraries and tools: Pandas, NumPy, Jupyter Notebook, Git.
- Familiarity with Apache Airflow, Spark, or similar ETL orchestration tools.
- Experience with data lakes, databases, and cloud platforms (AWS preferred).
- Understanding of data pipeline architecture and design principles.
- Exposure to unstructured and multimodal data processing.
- Basic understanding of data security, privacy, and compliance requirements.
Soft Skills:
- Strong analytical and problem-solving skills.
- Effective communication and teamwork abilities.
- Ability to work independently and manage priorities.
- Eagerness to learn and adapt in a fast-paced, AI-focused environment.
Preferred Experience:
- Prior work on data projects supporting machine learning or AI development.
- Experience with data from regulated environments (e.g., finance, healthcare).
- Web scraping and Robotic Process Automation (RPA) knowledge.