top of page

Data Pre-processing

  • neovijayk
  • Jul 6, 2020
  • 2 min read

Raw Data can be structured or unstructured. Pre-processing prepares the input data required for the analysis. This is one of the important task in the data science that should be done carefully


Dig: Major steps in Data Science


Before Pre-processing:

  1. Handling zip files: explanation of steps

  2. to create zip file using shutil

  3. to unzip the input .zip file using pyunpack in Python

  4. Create a Check List

  5. From the previous project experience you can create a Check list or lists having list of items that you find needed to be performed on raw data before using it for the project

  6. Items in the list can be to check data quantity, quality, gaps, etc

  7. Hence for the new or existing project we can use some or all of the items of the Check list as a litmus test on the new raw data that should be passed before using it in the project

  8. This can help to speed up pre-prcessing, processing steps

  9. This can address hidden problems related to raw data if present in very early stage hence will save time and resources in later stage of development

  10. Data and Statistics: (coming soon)

  11. mean, median, mode, 1st-2nd-3rd quartile, percentiles, standard deviation, range of the data

  12. Data distributions and Probability distributions: (coming soon)

  13. frequency distribution and probability distribution

Some useful Pre-processing techniques:

Dig: Making Raw data ready for the analysis


  1. Data Cleaning (Coming soon)

  2. Data imputing

  3. Useful String operations implementation examples. (Coming soon)

Image Processing:

Some of the useful techniques in image processing:

Divide features into training and testing:

  1. random.seed(): What does it do? (coming soon)

  2. Handle Categorical Features. (coming soon)

  3. Feature selection (coming soon)

  4. Divide features into training and testing. Get index information of train and test features.

Normalization & Standardization:

Generating same set of data (train, test)

Some useful methods, packages:

Suppress a python warnings using following line of code:



1

2

3


import warnings

warnings.filterwarnings("ignore", message="TYPE THE WARNING MESSAGE THAT YOU ARE GETTING")

Recent Posts

See All

Comments


Subscribe to BrainStorm newsletter

For notifications on latest posts/blogs

Thanks for submitting!

  • Twitter
  • Facebook
  • Linkedin

© 2023 by my-learnings.   Copy rights Vijay@my-learnings.com

bottom of page