Fast way to cluster time series data in R -


i'm trying cluster time-series data: have 16000 time-series vectors, each vector ~1500 samples long.

i tried using dtw package:

d = dist(x = time_series, method = "dtw") hclust(d) 

however distance matrix calculation didn't finish running throughout whole weekend.

i'm looking faster way since data set larger.

your data on length 1500. suppose oversampled..

if downsample 1 in 2, dtw 4 times faster. if downsample 1 in 4, dtw 16 times faster. if downsample 1 in 10, dtw 100 times faster.

this might starting point.

are using cdtw or dtw? former significant faster, , can more accurate.

a paper in sigkdd week has faster way cluster dtw using upper , lower bounds [a].


however, matrix of size (16000 * 15999)/2.

so if have 2 days: 2 days / (16000 * 15999)/2 = 337 microseconds

so need each comparison in 337 microseconds, not lot of time. difficult..., doable effort. if stuck, email me (i last author of [a])

[a] nurjahan begum, liudmila ulanova, jun wang, eamonn keogh (2015). accelerating dynamic time warping clustering novel admissible pruning strategy sigkdd 2015


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -