amazon redshift - postgresql: merge rows keeping some information, without loops -
i have list of calls per every user separated minutes. users can buy in these calls or not. when user makes call within 45 minutes after last call, need consider same call first one.
i need final number of calls ( aggregating calls separated less 45 minutes) , number of calls in bought something, per user.
so example, have list this:
buyer timestamp bougth_flag tom 20150201 9:15 1 anna 20150201 9:25 0 tom 20150201 10:15 0 tom 20150201 10:45 1 tom 20150201 10:48 1 anna 20150201 11:50 0 tom 20150201 11:52 0 anna 20150201 11:54 0
the final table be:
buyer time_started calls articles_bought tom 20150201 9:15 1 1 anna 20150201 9:25 1 0 tom 20150201 10:15 3 2 anna 20150201 10:50 2 0 tom 20150201 11:52 1 0
so, need merge rows separated less 45 minutes, , separate still per user. easy loop don't have loops or functions/procedures in postgresql using. ideas how it?
thank you
since not know beforehand how long "call" going (you have call buyer every 30 minutes full day - see comment question), can solve recursive cte. (note changed column 'timestamp' 'ts'. never use keyword table or column name.)
with conversations ( recursive calls ( select buyer, ts, bought_flag, row_number() on (order ts) conversation, 1::int calls ( select buyer, ts, lag(ts) on (partition buyer order ts) lag, bought_flag list) sub lag null or ts - lag > interval '45 minutes' union select l.buyer, l.ts, l.bought_flag, c.conversation, c.calls + 1 list l join calls c on c.buyer = l.buyer , l.ts > c.ts l.ts - c.ts < interval '45 minutes' ) select buyer, ts, bought_flag, conversation, max(calls) calls calls group buyer, ts, bought_flag, conversation order conversation, ts ) select buyer, min(ts) time_started, max(calls) calls, sum(bought_flag) articles_bought conversations group buyer, conversation order time_started
a few words of explanation:
- the starting term of inner recursive cte has sub-query gets basic data table every call, time of previous call. main query in starting term of inner cte keeps rows there no previous call (
lag null
) or previous call more 45 minutes away. these therefore initial calls in term here "conversation". conversation gets column , id row number query, , column track number of calls in conversation "calls". - in recursive term successive calls in same conversation added, "calls" counter incremented.
- when calls close (such 10:45 , 10:48 after 10:15) later calls may included multiple times, duplicates (10:48) dropped in outer cte selecting earliest call in sequence each conversation.
- in main query, finally, 'bought_flag' column summed every conversation of every buyer.
Comments
Post a Comment