Data Pipeline

Why we need to create a data pipeline?

To execute machine learning projects we need a powerful computing platforms which are available on the cloud computing platforms.
As of now there are different Cloud Computing platforms available.
But to run a model on the powerful platforms we need data to be accessible from these powerful platforms during prototype development as well as after development for testing and further improvements.
This data can be present on another machine or on the different storage platforms (for example on Client’s databases). Hence data-pipeline can be built to bring data for our model on one place.
On this page I will discuss few useful data pipeline architecture, tools and techniques that I have learned from my experience and found useful.

Dig: Collecting captured data on one place using Data Pipeline

Data Pipeline using Python code Example:

Stitch

About Stitch. Features & limitations of the Stitch (coming soon)
End to End automation of data extraction using Stitch from Data source to Destination (coming soon)

Talend Open Studio (Open source)

Apache Airflow (Open source)

Apache Kafka (Open source)

MySQL Connector API in Python:

Real Time Data Engineering Pipeline for Machine Learning by Engineering@ZenOfAI. Link

Recent Posts