Data Pre-processing
- neovijayk
- Jul 6, 2020
- 2 min read
Raw Data can be structured or unstructured. Pre-processing prepares the input data required for the analysis. This is one of the important task in the data science that should be done carefully

Dig: Major steps in Data Science
Before Pre-processing:
Handling zip files: explanation of steps
Create a Check List
From the previous project experience you can create a Check list or lists having list of items that you find needed to be performed on raw data before using it for the project
Items in the list can be to check data quantity, quality, gaps, etc
Hence for the new or existing project we can use some or all of the items of the Check list as a litmus test on the new raw data that should be passed before using it in the project
This can help to speed up pre-prcessing, processing steps
This can address hidden problems related to raw data if present in very early stage hence will save time and resources in later stage of development
Data and Statistics: (coming soon)
mean, median, mode, 1st-2nd-3rd quartile, percentiles, standard deviation, range of the data
Data distributions and Probability distributions: (coming soon)
frequency distribution and probability distribution
Some useful Pre-processing techniques:

Dig: Making Raw data ready for the analysis
Data Cleaning (Coming soon)
Useful String operations implementation examples. (Coming soon)
Image Processing:
Some of the useful techniques in image processing:
Scale, Blur , Gray image using OpenCV , PIL (coming soon)
Image rotation/tilting by x angle using Scipy , imutils and OpenCV functions.
Automatic White border or padding detection and cropping in an image
Divide features into training and testing:
random.seed(): What does it do? (coming soon)
Handle Categorical Features. (coming soon)
Feature selection (coming soon)
Divide features into training and testing. Get index information of train and test features.
Normalization & Standardization:
Generating same set of data (train, test)
Some useful methods, packages:
Suppress a python warnings using following line of code:
1
2
3
import warnings
warnings.filterwarnings("ignore", message="TYPE THE WARNING MESSAGE THAT YOU ARE GETTING")
Comments