top of page

Apache Spark 3.0 ( available in Databricks Runtime 7.0)
Here are the biggest new features in Spark 3.0: 2x performance improvement on TPC-DS over Spark 2.4, enabled by adaptive query execution,...
Jul 7, 20204 min read
Load CSV file with Spark using Python-Jupyter notebook
In this article I am going to use Jupyter notebook to read data from a CSV file with Spark using Python code in Jupyter notebook. In this...
Jul 6, 20202 min read

Data Pipeline
Why we need to create a data pipeline? To execute machine learning projects we need a powerful computing platforms which are available on...
Jul 6, 20202 min read

Data Pre-processing
Raw Data can be structured or unstructured. Pre-processing prepares the input data required for the analysis. This is one of the...
Jul 6, 20202 min read

Data Analytics
Dig: Major steps in Data Science In this article I am going to explain some of the important steps generally we follow in most of the...
Jul 6, 20205 min read

Google BQ, Steps to Create a dataset and a table
In this article we will take a look at basics of Google BQ and also simple steps to create Dataset in Project and table in that dataset...
Jul 6, 20203 min read

Normalisation, L1, L2 Norms
In this article I am going to gather answers for some important questions regarding the Normalization. I have tried to find the answers...
Jul 6, 20203 min read

Setting up new GCP VM Instance for Machine Learning
In this article, I am going to show steps to set up Google Cloud Platform Instance to experiment Machine Learning Algorithms on it. We...
Jul 6, 20203 min read


Big Data Frameworks
Big Data Difference between structured, unstructured and semi structured data (coming soon) What is Big Data? What is Internet of Things?...
Jul 6, 20201 min read
Normalization Vs Standardization
Before going through following article please read previous articles: Why we normalise or scale data for most of the Machine Learning or...
Jul 6, 20203 min read
Power BI
Today we will take a look at one tool that is famous and useful for data visualisation it is Microsoft Power BI. I have just started to...
Jul 6, 20202 min read

Apache Kafka introduction
In this article we will take a look at the famous open source Apache Kafka for the Data Integration task. We will discuss following...
Jun 13, 20207 min read

Airflow introduction
Most of the Data Science or Machine Learning Startups face this common problem that is getting data from different data sources, perform...
Jun 13, 20203 min read

Airflow Installation
In this article we will take a look at the two different approaches to install the Apache Airflow on Ubuntu 18.0 VM instance of the...
Jun 9, 20207 min read

Apache Airflow Architecture
In this article we will take a look at the architecture of Apache Airflow. We can divide generally divides Apache Airflow Architecture...
Jun 9, 20205 min read


Let explore Data World (Big data)
Hi, I feel it will be really good idea writing the blog. Those who follow closely might help or discuss the opportunities ,ideas etc in...
Nov 10, 20141 min read
bottom of page