amazon redshift - postgresql: merge rows keeping some information, without loops -

- February 15, 2012

i have list of calls per every user separated minutes. users can buy in these calls or not. when user makes call within 45 minutes after last call, need consider same call first one.

i need final number of calls ( aggregating calls separated less 45 minutes) , number of calls in bought something, per user.

so example, have list this:

buyer       timestamp              bougth_flag         tom         20150201 9:15                 1 anna        20150201 9:25                 0 tom         20150201 10:15                0 tom         20150201 10:45                1 tom         20150201 10:48                1 anna        20150201 11:50                0 tom         20150201 11:52                0 anna        20150201 11:54                0

the final table be:

buyer       time_started        calls              articles_bought         tom         20150201 9:15          1                        1 anna        20150201 9:25          1                        0 tom         20150201 10:15         3                        2 anna        20150201 10:50         2                        0 tom         20150201 11:52         1                        0

so, need merge rows separated less 45 minutes, , separate still per user. easy loop don't have loops or functions/procedures in postgresql using. ideas how it?

thank you

since not know beforehand how long "call" going (you have call buyer every 30 minutes full day - see comment question), can solve recursive cte. (note changed column 'timestamp' 'ts'. never use keyword table or column name.)

with conversations (   recursive calls (     select buyer, ts, bought_flag, row_number() on (order ts) conversation, 1::int calls     (       select buyer, ts, lag(ts) on (partition buyer order ts) lag, bought_flag       list) sub     lag null or ts - lag > interval '45 minutes'     union     select l.buyer, l.ts, l.bought_flag, c.conversation, c.calls + 1     list l     join calls c on c.buyer = l.buyer , l.ts > c.ts     l.ts - c.ts < interval '45 minutes'   )   select buyer, ts, bought_flag, conversation, max(calls) calls   calls   group buyer, ts, bought_flag, conversation   order conversation, ts ) select buyer, min(ts) time_started, max(calls) calls, sum(bought_flag) articles_bought conversations group buyer, conversation order time_started

a few words of explanation:

the starting term of inner recursive cte has sub-query gets basic data table every call, time of previous call. main query in starting term of inner cte keeps rows there no previous call (lag null) or previous call more 45 minutes away. these therefore initial calls in term here "conversation". conversation gets column , id row number query, , column track number of calls in conversation "calls".
in recursive term successive calls in same conversation added, "calls" counter incremented.
when calls close (such 10:45 , 10:48 after 10:15) later calls may included multiple times, duplicates (10:48) dropped in outer cte selecting earliest call in sequence each conversation.
in main query, finally, 'bought_flag' column summed every conversation of every buyer.

Search This Blog

Click Hand

amazon redshift - postgresql: merge rows keeping some information, without loops -

Comments

Post a Comment

Popular posts from this blog

python - pip install -U PySide error -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -

apache - setting document root in antoher partition on ubuntu -