Data Imputing in time series
- neovijayk
- Jun 13, 2020
- 2 min read
In this article I am going to explain important and useful techniques to fill the missing values in table of features. These techniques can be useful in case of Time series data also. Many of the times we want to use different smart imputing techniques other than just simple replacement of NaN with mean values or a defined value (such as 0 or -1) which will make a sense after imputing and will not be inconsistent.
In this article we will take a look at following topics:
Example of Nan values in dataframe
Impute functions in pandas : interpolate and fillna
Example 1 Nan values:
Lets take a look at the following table it is having some columns with NaN values for example Humidity, Pressure.

image is taken from internet for explanation purpose only
Since in case of prediction if we want to utilise all of the available data and we know that filling Nan with mean values or other values will be inconsistent then we can apply different imputing techniques like interpolation or fillna.
Impute functions in Pandas
Consider we are having data in pandas data frame which I am going to use in time series forecasting later. But in one important field that I am going to require for the time series forecasting having missing values that can be seen from the graph as follows: (on x axis= number of days, y = Quantity)
1
pdDataFrame.set_index('Dates')['QUANTITY'].plot(figsize = (16,6))

We can see there is some NaN data in time series. % of nan = 19.400% of total data. Now we want to impute null/nan values.
I will try to show you o/p of interpolate and filna methods to fill Nan values in the data.
interpolate() :
1st we will use interpolate:
1
pdDataFrame.set_index('Dates')['QUANTITY'].interpolate(method='linear').plot(figsize = (16,6))

NOTE: There is no time method in interpolate here
fillna() with backfill method
1
pdDataFrame.set_index('Dates')['QUANTITY'].fillna(value=None, method='backfill', axis=None, limit=None, downcast=None).plot(figsize = (16,6))

fillna() with backfill method & limit = 7
limit: this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled.
1
pdDataFrame.set_index('Dates')['QUANTITY'].fillna(value=None, method='backfill', axis=None, limit=7, downcast=None).plot(figsize = (16,6))

I find fillna function more useful. But you can use any one of the methods to fill up nan values in both the columns.
For more details about these functions refer following links:
There is one more Lib: impyute that you can check out . For more details regarding this lib refer this link: https://pypi.org/project/impyute/
That’s it for this article. If you have any questions please feel free to ask. Also if you like this article please like and subscribe to my blog 🙂
Comments