Guide To Sktime – Python Library For Time Series Data (Compatible With Sci-kit learn)

Time series data is widely used to analyse different trends and seasonalities of products over time by various industries. Sktime is a unified python framework/library providing API for machine learning with time series data and sklearn compatible tools to analyse, visualize, tune and validate multiple time series learning models such as time series forecasting, time series regression and classification. Sktime was represented in a research paper named ‘sktime: A Unified Interface for Machine Learning with Time Series’ to NeurIPS by a group of researchers at Alan Turing Institute – Markus Loning, Franz J Kiraly, from University of East Anglia – Anthony Bagnall, Jason Lines and from University College London – Sajaysurya Ganesh, Viktor Kazakov.

Sktime explores a blend of both features of popular time series algorithms, and the sci-kit learn library. It uses sklearn algorithms in the reduction of vast tabular data. Other features include time series regression, classification(multivariate and univariate), time series clustering, time-series annotations, forecasting, estimation, transformation, datasets, feature tools and utility functions (preprocessing and plotting). 

Under time-series transformations comes Panel transformers and Series transformers. For Panel transformers there is Shapelet, Segment, Reduce, Rocket, PCA, Matrix profile, Compose, Summarize, tsfresh. For Series transformers there is DTrend, Adapt, box-cox, AutoCorrelation, Cosine.  The popular time series libraries available in sktime are ARIMA, AutoARIMA, fbprophet. The input data for sktime is expected to be in pandas dataframe. For more information, check the documentation.

The main aim of the library is to provide:

  • Standard interface for building different types of time-series learning tasks using sci-kit learn features
  • Applying various Reduction algorithms.
  • Providing model composition tools, model evaluation tools and comparative benchmarking tools.
  • Interface to handle varied time-series data 

Installation:

pip install sktime

Forecasting

 from sktime.forecasting.all import *
 y = load_airline()
 y_train, y_test = temporal_train_test_split(y)
 fh = ForecastingHorizon(y_test.index, is_relative=False)
 forecaster = ThetaForecaster(sp=12)  # monthly seasonal periodicity
 forecaster.fit(y_train)
 y_pred = forecaster.predict(fh)
 smape_loss(y_test, y_pred) 

0.08661468139978168

Time Series Classification

 from sktime.classification.all import *
 from sklearn.model_selection import train_test_split
 from sklearn.metrics import accuracy_score
 X, y = load_arrow_head(return_X_y=True)
 X_train, X_test, y_train, y_test = train_test_split(X, y)
 classifier = TimeSeriesForest()
 classifier.fit(X_train, y_train)
 y_pred = classifier.predict(X_test)
 accuracy_score(y_test, y_pred) 

0.8679245283018868

Univariate Time Series Classification with sktime

A single time series variable and a corresponding label for multiple instances. The aim is to find a suitable classifier model that can be used to learn the relationship between time-series data and label and predict likewise the new series’s label.

 import matplotlib.pyplot as plt
 import numpy as np
 from sklearn.metrics import accuracy_score
 from sklearn.model_selection import train_test_split
 from sklearn.pipeline import Pipeline
 from sklearn.tree import DecisionTreeClassifier
 from sktime.classification.compose import TimeSeriesForestClassifier
 from sktime.datasets import load_arrow_head
 from sktime.utils.slope_and_trend import _slope 
# Loading data

In this notebook, we use the arrowhead problem.

The arrowhead dataset is a time-series dataset containing outlines of the images of arrowheads. In anthropology, the classification of projectile points is an important topic. The classes are categorized based on shape distinctions eg. – the presence and location of a notch in the arrow.

arrow heads

The shapes of the projectile points are to be converted into sequences using the angle-based method. For more details check this blog post about converting images into time-series data for data mining.

from shapes to time series

# Data representation

 X, y = load_arrow_head(return_X_y=True)
 X_train, X_test, y_train, y_test = train_test_split(X, y)
 print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
 (158, 1) (158,) (53, 1) (53,)
 # univariate time series input data
 X_train.head() 

 dim_0250 -1.6320 1 -1.6301 2 -1.6075 3 …1050 -1.6758 1 -1.6742 2 -1.6674 3 …180 -2.1138 1 -2.0918 2 -2.0488 3 …1670 -1.7471 1 -1.7295 2 -1.7300 3 …1740 -1.6307 1 -1.6299 2 -1.6206 3 …

# binary target variables

 labels, counts = np.unique(y_train, return_counts=True)
 print(labels, counts) 

['0' '1' '2'] [60 54 44]

 fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
 for label in labels:
     X_train.loc[y_train == label, "dim_0"].iloc[0].plot(ax=ax, label=f"class {label}")
 plt.legend()
 ax.set(title="Example time series", xlabel="Time"); 

[Text(0.5, 1.0, 'Example time series'), Text(0.5, 0, 'Time')]

Time series forest

Time series forest is a modification of the random forest algorithm to the time series setting:

  1. Splitting the series into multiple random intervals,
  2. Extracting features (mean, standard deviation and slope) from each interval,
  3. Training a decision tree on the extracted features,
  4. Ensembling steps 1 – 3.
 from sktime.transformations.panel.summarize import RandomIntervalFeatureExtractor
 steps = [
     (
         "extract",
         RandomIntervalFeatureExtractor(
             n_intervals="sqrt", features=[np.mean, np.std, _slope]
         ),
     ),
     ("clf", DecisionTreeClassifier()),
 ]
 time_series_tree = Pipeline(steps) 

We can directly fit and evaluate the single time series tree (which is simply a pipeline).

 time_series_tree.fit(X_train, y_train)
 time_series_tree.score(X_test, y_test) 

0.8113207547169812

# For time series forest classifier, we can simply use the single tree as the base estimator in the forest ensemble.

 tsf = TimeSeriesForestClassifier(
     estimator=time_series_tree, 
     n_estimators=100,
     criterion="entropy",
     bootstrap=True,
     oob_score=True,
     random_state=1,
     n_jobs=-1,
 ) 

# Fitting and obtaining the out-of-bag score:

 tsf.fit(X_train, y_train)
 if tsf.oob_score:
     print(tsf.oob_score_)
 0.8417721518987342
 tsf = TimeSeriesForestClassifier()
 tsf.fit(X_train, y_train)
 tsf.score(X_test, y_test) 

0.8867924528301887

# algorithms for plotting feature importance graph over time to obtain feature importances for the different features and intervals.

 fi = tsf.feature_importances_
 fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
 fi.plot(ax=ax)
 ax.set(xlabel="Time", ylabel="Feature importance"); 
 /usr/local/lib/python3.6/dist-packages/pandas/plotting/_matplotlib/core.py:584: UserWarning: The handle <matplotlib.lines.Line2D object at 0x7f829afd6b70> has a label of '_slope' which cannot be automatically added to the legend.
   ax.legend(handles, labels, loc="best", title=title)
 [Text(0.5, 0, 'Time'), Text(0, 0.5, 'Feature importance')] 

For more examples visit Binder to directly try out the interactive Jupyter Notebook without any other dependencies to be installed, from here. 

Sktime-dl is an extension library to sktime in the form of applying deep learning algorithms to time-series data. This repository aims to include Keras networks to be used with sktime and it’s making a machine learning pipeline and strategy tools along with it also having an extension to sci-kit learn, for use in applications and research. The interface provides an implementation of neural networks for time series analysis.

Neural Networks for time-series Classification

The current toolkit provides an interface of dl-4-tsc and implements the following network architectures: Multilayer perceptron (MLP), Fully convolutional neural network (FCNN), Time convolutional neural network (CNN), Time Le-Net (TLeNet), Encoder (Encoder), Residual network (ResNet), Multi-scale convolutional neural network (MCNN), Multi-channel deep convolutional neural network (MCDCNN), Time warping invariant echo state network (TWIESN). There is one more interface with InceptionTime. 

Regression

Most classifier architectures act to serve as regressors. These are – Time convolutional neural network (CNN), Fully convolutional neural network (FCNN), Multilayer perceptron (MLP), Encoder (Encoder), Time Le-Net (TLeNet), Residual network (ResNet), InceptionTime (Inception).

Forecasting

The regression networks can be adapted to work as time series forecasting through sktime’s reduction strategies. In future, RNNs/LSTMs networks can be seen as functional within sktime.

Hyper-parameter tuning is done through sci-kit learn’s RandomizedSearch and GridSerach tools. Ensembling methods include different random initialisations for stability. These act as wrapper classes to the neural networks which can be further used in high-level data pipelines within sktime models.

NOTE: sktime-dl is currently not maintained and replies to issues and PRs may be slow. We’re looking for a new maintainer to help us maintain sktime-dl.

EndNotes

There is a separate repository for beginners to learn time-series using sktime along with notebooks and video lectures. Sktime-m4 has been created to use sktime to replicate and extend the M4 study. This project is under constant development and looks forward to real-world, real-time applications and also be used in advanced research-based works. 

Alternate Text Gọi ngay