January 1, 2021

#103 - Time-series classification of power demand comparison between Sktime and Sklearn

Using the public dataset that represents the power consumption in Italy in 1997, the task is to classify whether it corresponds to winter (from October to March) or summer (from April to September). This will be done using different techniques from both sklearn and sktime. 

The code is on github and here below some comments:

  • In order to use classifiers from Sklearn such as the RandomForestClassifier, the data has to be in a tabularized format. 
  • Sktime provides different ways to tabularize the data:
    One option is to import tabularize:
    X_train_tab = tabularize(X_train)
    X_test_tab = tabularize(X_test)
    Another option is to use:
    sklearn make_pipeline that will create a classifier pipeline in which the first step is to apply sktime Tabularizer() and then the sklearn RandomForestClassifier. 
Exploring the dataset: shape, head(), info(), np.unique():

Tabularizing the dataset for X_train and X_test to use sklearn classifiers:

Applying a time series feature extraction prior to use the sklearn classifier:
Using directly (without pre processing and data reformatting) the sktime TimeSeriesForecastClassifier:


Conclusion:

In this post the goal was to use different methods to carry out a classification task. When working with time series data, there are several ways to achieve this task and some of those are presented. One direction is to tabularize the nested dataframe and to use common sklearn classifiers algorithms. Another option is to create pipelines, which are basically steps of algorithms, such as feature extraction and tabularization, that are applied prior to a classification task. 

At last, is shown the TimeSeriesForecastClassifier from sktime that can be use straightforwardly. The results observed in all these examples are very close to each other independently of the method used with an accuracy/score around 96%.


References and further readings:

No comments:

Post a Comment