Fast way to cluster time series data in R -

- April 15, 2010

i'm trying cluster time-series data: have 16000 time-series vectors, each vector ~1500 samples long.

i tried using dtw package:

d = dist(x = time_series, method = "dtw") hclust(d)

however distance matrix calculation didn't finish running throughout whole weekend.

i'm looking faster way since data set larger.

your data on length 1500. suppose oversampled..

if downsample 1 in 2, dtw 4 times faster. if downsample 1 in 4, dtw 16 times faster. if downsample 1 in 10, dtw 100 times faster.

this might starting point.

are using cdtw or dtw? former significant faster, , can more accurate.

a paper in sigkdd week has faster way cluster dtw using upper , lower bounds [a].

however, matrix of size (16000 * 15999)/2.

so if have 2 days: 2 days / (16000 * 15999)/2 = 337 microseconds

so need each comparison in 337 microseconds, not lot of time. difficult..., doable effort. if stuck, email me (i last author of [a])

[a] nurjahan begum, liudmila ulanova, jun wang, eamonn keogh (2015). accelerating dynamic time warping clustering novel admissible pruning strategy sigkdd 2015

Search This Blog

Click Hand

Fast way to cluster time series data in R -

Comments

Post a Comment

Popular posts from this blog

apache - setting document root in antoher partition on ubuntu -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -

python - pip install -U PySide error -