Fast way to cluster time series data in R -
i'm trying cluster time-series data: have 16000 time-series vectors, each vector ~1500 samples long.
i tried using dtw package:
d = dist(x = time_series, method = "dtw") hclust(d)
however distance matrix calculation didn't finish running throughout whole weekend.
i'm looking faster way since data set larger.
your data on length 1500. suppose oversampled..
if downsample 1 in 2, dtw 4 times faster. if downsample 1 in 4, dtw 16 times faster. if downsample 1 in 10, dtw 100 times faster.
this might starting point.
are using cdtw or dtw? former significant faster, , can more accurate.
a paper in sigkdd week has faster way cluster dtw using upper , lower bounds [a].
however, matrix of size (16000 * 15999)/2.
so if have 2 days: 2 days / (16000 * 15999)/2 = 337 microseconds
so need each comparison in 337 microseconds, not lot of time. difficult..., doable effort. if stuck, email me (i last author of [a])
[a] nurjahan begum, liudmila ulanova, jun wang, eamonn keogh (2015). accelerating dynamic time warping clustering novel admissible pruning strategy sigkdd 2015
Comments
Post a Comment