The following website http://timeseriesclassification.com/ has a great amount of public time-series datasets for several applications such as data from sensors, motion, image, audio, among others. In this post is shown how to use these datasets in a python framework.
- Search and download the desired dataset, for example
http://timeseriesclassification.com/description.php?Dataset=ItalyPowerDemand - unzip the downloaded file and place it in the same folder as the python program.
- the dataset is split into train and test and there's a .txt file and a .arff file for each one.
Using the python code shown on github it is possible to convert these files from .arff and to use these datasets in python framework using libraries like pandas, scipy and sktime. In addition to the data formatting and manipulation, it is also run a KNN classifier in the example presented.
Notes:
- arff is a data format created by the University of Waikato (New Zealand) dedicated to be used with their machine learning software called WEKA.
- some of the algorithms used in sktime require the data (e.g. X_train, X_test) to be a nested dataframe. In the github code is detailed how to format the data in this way.
No comments:
Post a Comment