sklearn.model_selection.TimeSeriesSplit
Time Series cross-validator
Provides train/test indices to split time series data samples
that are observed at fixed time intervals, in train/test sets.
In each split, test indices must be higher than before, and thus shuffling
in cross validator is inappropriate.
This cross-validation object is a variation of KFold
.
In the kth split, it returns first k folds as train set and the
(k+1)th fold as test set.
Note that unlike standard cross-validation methods, successive
training sets are supersets of those that come before them.
Read more in the User Guide.
New in version 0.18.
- Parameters
:
-
- n_splits
int, default=5
-
Number of splits. Must be at least 2.
Changed in version 0.22:
n_splits
default value changed from 3 to 5. - max_train_size
int, default=None
-
Maximum size for a single training set.
- test_size
int, default=None
-
Used to limit the size of the test set. Defaults to
n_samples // (n_splits + 1)
, which is the maximum allowed value
withgap=0
.New in version 0.24.
- gap
int, default=0
-
Number of samples to exclude from the end of each train set before
the test set.New in version 0.24.
- n_splits
Notes
The training set has size i * n_samples // (n_splits + 1)
in the
+ n_samples % (n_splits + 1)i
th split,
with a test set of size n_samples//(n_splits + 1)
by default,
where n_samples
is the number of samples.
Examples
>>>
import
numpy
as
np
>>>
from
sklearn.model_selection
import
TimeSeriesSplit
>>>
X
=
np
.
array
([[
1
,
2
],
[
3
,
4
],
[
1
,
2
],
[
3
,
4
],
[
1
,
2
],
[
3
,
4
]])
>>>
y
=
np
.
array
([
1
,
2
,
3
,
4
,
5
,
6
])
>>>
tscv
=
TimeSeriesSplit
()
>>>
(
tscv
)
TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None)
>>>
for
i
,
(
train_index
,
test_index
)
in
enumerate
(
tscv
.
split
(
X
)):
...
(
f
"Fold
{
i
}
:"
)
...
(
f
" Train: index=
{
train_index
}
"
)
...
(
f
" Test: index=
{
test_index
}
"
)
Fold 0:
Train: index=[0]
Test: index=[1]
Fold 1:
Train: index=[0 1]
Test: index=[2]
Fold 2:
Train: index=[0 1 2]
Test: index=[3]
Fold 3:
Train: index=[0 1 2 3]
Test: index=[4]
Fold 4:
Train: index=[0 1 2 3 4]
Test: index=[5]
>>>
# Fix test_size to 2 with 12 samples
>>>
X
=
np
.
random
.
randn
(
12
,
2
)
>>>
y
=
np
.
random
.
randint
(
0
,
2
,
12
)
>>>
tscv
=
TimeSeriesSplit
(
n_splits
=
3
,
test_size
=
2
)
>>>
for
i
,
(
train_index
,
test_index
)
in
enumerate
(
tscv
.
split
(
X
)):
...
(
f
"Fold
{
i
}
:"
)
...
(
f
" Train: index=
{
train_index
}
"
)
...
(
f
" Test: index=
{
test_index
}
"
)
Fold 0:
Train: index=[0 1 2 3 4 5]
Test: index=[6 7]
Fold 1:
Train: index=[0 1 2 3 4 5 6 7]
Test: index=[8 9]
Fold 2:
Train: index=[0 1 2 3 4 5 6 7 8 9]
Test: index=[10 11]
>>>
# Add in a 2 period gap
>>>
tscv
=
TimeSeriesSplit
(
n_splits
=
3
,
test_size
=
2
,
gap
=
2
)
>>>
for
i
,
(
train_index
,
test_index
)
in
enumerate
(
tscv
.
split
(
X
)):
...
(
f
"Fold
{
i
}
:"
)
...
(
f
" Train: index=
{
train_index
}
"
)
...
(
f
" Test: index=
{
test_index
}
"
)
Fold 0:
Train: index=[0 1 2 3]
Test: index=[6 7]
Fold 1:
Train: index=[0 1 2 3 4 5]
Test: index=[8 9]
Fold 2:
Train: index=[0 1 2 3 4 5 6 7]
Test: index=[10 11]
Methods
get_n_splits
([X, y, groups])
Returns the number of splitting iterations in the cross-validator
split
(X[, y, groups])
Generate indices to split data into training and test set.
-
get_n_splits
(
X
=
None
,
y
=
None
,
groups
=
None
)
[source]
¶
-
Returns the number of splitting iterations in the cross-validator
- Parameters
:
-
- X
object
-
Always ignored, exists for compatibility.
- y
object
-
Always ignored, exists for compatibility.
- groups
object
-
Always ignored, exists for compatibility.
- X
- Returns
:
-
- n_splits
int
-
Returns the number of splitting iterations in the cross-validator.
- n_splits
- Parameters
-
split
(
X
,
y
=
None
,
groups
=
None
)
[source]
¶
-
Generate indices to split data into training and test set.
- Parameters
:
-
- X
array-like of shape (n_samples, n_features)
-
Training data, where
n_samples
is the number of samples
andn_features
is the number of features. - y
array-like of shape (n_samples,)
-
Always ignored, exists for compatibility.
- groups
array-like of shape (n_samples,)
-
Always ignored, exists for compatibility.
- X
- Yields
:
-
- train
ndarray
-
The training set indices for that split.
- test
ndarray
-
The testing set indices for that split.
- train
- Parameters