Several applications in production, from finance to industrial systems, have the data being recorded and indexed by timestamps. In this post we are presenting a great open source library for python called sktime, which was recently launched and has been developed and optimized for machine learning time-series classification, regression and forecasting.
Some of the advantages of this new API are the easiness to interface with other key Python frameworks such as Sklearn, in addition to combination in the same API of multiple options of algorithms that are capable to be applied in different contexts.
In fact, with previous knowledge from sklearn, some of the methods are similar. For instance, after creating a model (called forecaster in sktime), it is possible to fit and predict with similarities to sklearn. One of the differences in sktime though, is the concept of forecast horizon (fh) that is an input parameter required in some methods, that simply represents the amount of time intervals to predict.
There are several datasets included in this library that is a great way to get started. This post comments one of these datasets, called airline, which has 144 months of sales records from an airline from 1949 to 1960. The source code is on gitHub and in the first part of the example shows how to load the dataset, split the dataset into train data and test data for a time-series analysis and to apply 4 forecast models (Naive, AutoARIMA, ExponentialSmoothing and Theta). These forecasts are then compared between them and with the test dataset both visually in a graph and also using a performance metric available in sktime called smape_loss. In addition to the 4 individuals forecast models, that can be further tuned, it was carried out an Ensemble Forecast that in this case was an ensemble of the previous 4 models, however it could be whichever model the programmer desires, and even the same model multiple times using different settings.
Here below the forecasting plots with some observations:
In this example some algorithms performed much better than others. For instance the Naive model performed over 3x worse than the AutoARIMA, which was the best scoring forecasting model. In addition, the AutoARIMA required little tuning and configuration, with just one main input parameter of the season period.
In conclusion, the goal of this post was to introduce this new library that has great capabilities for time-series machine learning in the python framework. It has syntax similarities with other often used libraries such as sklearn and matplotlib.
Installing the library from command prompt:
>pip install sktime
- I had an error during installation:
"No module named 'numpy.distutils._msvccompiler' on numpy.distutils". To solve this I installed an older version of Python (v3.7.0) (see previous post for more details about how to switch between python versions).
I think this is definitely an amazing project here. So much good will be coming from this project. The ideas and the work behind this will pay off so much. westwood real estate agent
ReplyDeleteOEM Band Saw Machine - Taiwan - Original Manufacturer of Portable, Semi Auto & Manual Bandsaw Machines. ISO Certified. CE Certified. Supplying Global Brands for 4 Decades. industrial band saw manufacturer
ReplyDelete