As a Data Engineer you will be developing and optimizing the pipeline which contains the streaming processing of the images; sending the images to the model and processing the output of the model into timestamps for our consumers. This pipeline is set up in a continuous delivery environment with a focus on automation and optimizing maintainability.
Our pipeline is built on Microsoft Azure using technologies such as Event Hub and Kubernetes. We use both Python and Scala, along with some other tools, such as Docker and Terraform.
As a Data Engineer you will:
- Develop and enhance the deep turnaround streaming pipeline hosted on Microsoft Azure.
- Improve our CI/CD setup. We heavily use CI/CD principles. Deployments should never be done manually, and should be quick and pain-free.
- Our focus is on automation and optimising maintainability, never do the same thing twice
- Bring new ideas that can help us improve and make our pipeline more robust
- Communicate the progress of our project in internal or external talks
- We preach and practice software development and engineering best practices
- Support our data scientists to bring the deep turnaround model in production
This is the stack we are currently using:
- Deep Turnaround Pipeline: Event Hub, Kubernetes, Postgres, Docker
- Languages: Python, Scala
- Experiment tracking: MLFlow
- CI/CD pipeline: GitLab, TakeOff
- Cloud platform: Microsoft Azure
Requirements:
- Experience at deploying models in production.
- Knowledge of or experience as a Site Reliability Engineer. Our goal is to automate as much as possible and make the pipeline robust and maintainable
- Great programming skills in Python and affinity with Unix-like operating systems
- Relevant studies in IT
- Curiosity toward new technology
- Mindset to use right tool for right job
- Team player with good communication skills. Brainstorming and discussing ideas together is daily part of your job.
It would be an advantage if you have:
- Some knowledge of Scala
- Experience with streaming data