R: Plot Survival curve with types of death as stacked area chart -
i have dataset right censored containing information of life times , different types of deaths given sample , want produce plot of survival curve (with actual values calculated sample , not model estimation) different types of death stacked area chart, this:
how can accomplish in r?
the dataset this:
death type time event 1 type3 81 1 2 na 868 0 3 type3 1022 1 4 na 868 0 5 na 868 0 6 na 868 0 7 na 868 0 8 na 887 0 9 type3 156 1 10 na 868 0 11 na 868 0 12 na 868 0 13 type3 354 1 14 type3 700 1 15 type3 632 1 16 na 868 0 17 type1 308 1 18 na 1001 0 19 na 1054 0 20 na 1059 0 21 type3 120 1 22 na 732 0 23 type3 543 1 24 type1 379 1 25 na 613 0 26 na 1082 0 27 type3 226 1 28 type2 1 0 29 na 976 0 30 na 1000 0 31 na 706 0 32 na 1015 0 33 type3 882 1 34 na 1088 0 35 na 642 0 36 type3 953 1 37 na 1068 0 38 na 819 0 39 na 1029 0 40 type3 34 1 41 na 1082 0 42 type3 498 1 43 na 923 0 44 na 1041 0 45 type3 321 1 46 na 557 0 47 na 628 0 48 type3 197 1 49 type3 155 1 50 na 955 0
where death type na indicates censored data, time time of death or time of censoring, , event 1 dead , 0 censored. (this format required 'survfit' have actual start , end times dates)
(now, 50 points wouldn't possible construct such curve, data has lot more rows wouldn't fit here).
it's ugly bit of code, gets idea in. didn't take time figure out how add legend. please note kind of figure, while interesting in concept, isn't going mirror km curve. honest, if you're going present data way, makes more sense stacked bars @ fixed time points.
please note, i'm pretty sure there flaws lurking in code. comes no warranty, might started.
survdata <- structure(list(row.names = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""), death = 1:50, type = c("type3", na, "type3", na, na, na, na, na, "type3", na, na, na, "type3", "type3", "type3", na, "type1", na, na, na, "type3", na, "type3", "type1", na, na, "type3", "type2", na, na, na, na, "type3", na, na, "type3", na, na, na, "type3", na, "type3", na, na, "type3", na, na, "type3", "type3", na), time = c(81l, 868l, 1022l, 868l, 868l, 868l, 868l, 887l, 156l, 868l, 868l, 868l, 354l, 700l, 632l, 868l, 308l, 1001l, 1054l, 1059l, 120l, 732l, 543l, 379l, 613l, 1082l, 226l, 1l, 976l, 1000l, 706l, 1015l, 882l, 1088l, 642l, 953l, 1068l, 819l, 1029l, 34l, 1082l, 498l, 923l, 1041l, 321l, 557l, 628l, 197l, 155l, 955l), event = c(1l, 0l, 1l, 0l, 0l, 0l, 0l, 0l, 1l, 0l, 0l, 0l, 1l, 1l, 1l, 0l, 1l, 0l, 0l, 0l, 1l, 0l, 1l, 1l, 0l, 0l, 1l, 0l, 0l, 0l, 0l, 0l, 1l, 0l, 0l, 1l, 0l, 0l, 0l, 1l, 0l, 1l, 0l, 0l, 1l, 0l, 0l, 1l, 1l, 0l)), .names = c("row.names", "death", "type", "time", "event"), class = "data.frame", row.names = c(na, -50l)) library(dplyr) library(zoo) library(rcolorbrewer) survdatasummary <- arrange(survdata, time, type) %>% mutate(type = ifelse(is.na(type), "alive", type)) %>% group_by(time) %>% #* count number of each type @ each time point summarise(n_at_time = n(), alive_at_time = sum(type == "alive"), type1_at_time = sum(type == "type1"), type2_at_time = sum(type == "type2"), type3_at_time = sum(type == "type3")) %>% ungroup() %>% mutate(n_alive = sum(n_at_time) - cumsum(lag(n_at_time, default = 0)), #* proportion of each type p_type1_at_time = type1_at_time / n_alive, p_type2_at_time = type2_at_time / n_alive, p_type3_at_time = type3_at_time / n_alive, #* convert 0 na p_type1_at_time = ifelse(p_type1_at_time == 0, na, p_type1_at_time), p_type2_at_time = ifelse(p_type2_at_time == 0, na, p_type2_at_time), p_type3_at_time = ifelse(p_type3_at_time == 0, na, p_type3_at_time), #* fill nas last known value p_type1_at_time = na.locf(p_type1_at_time, false), p_type2_at_time = na.locf(p_type2_at_time, false), p_type3_at_time = na.locf(p_type3_at_time, false), #* make leading nas 0 p_type1_at_time = ifelse(is.na(p_type1_at_time), 0, p_type1_at_time), p_type2_at_time = ifelse(is.na(p_type2_at_time), 0, p_type2_at_time), p_type3_at_time = ifelse(is.na(p_type3_at_time), 0, p_type3_at_time), #* calculate cumulative proportions p_alive_at_time = 1 - p_type1_at_time - p_type2_at_time - p_type3_at_time, cump_type1_at_time = p_alive_at_time + p_type1_at_time, cump_type2_at_time = cump_type1_at_time + p_type2_at_time, cump_type3_at_time = cump_type2_at_time + p_type3_at_time, #* following time using geom_rect next_time = lead(time)) %>% pal <- brewer.pal(4, "prgn") ggplot(survdatasummary, aes(xmin = time, xmax = next_time)) + geom_rect(aes(ymin = 0, ymax = p_alive_at_time), fill = pal[1]) + geom_rect(aes(ymin = p_alive_at_time, ymax = cump_type1_at_time), fill = pal[2]) + geom_rect(aes(ymin = cump_type1_at_time, ymax = cump_type2_at_time), fill = pal[3]) + geom_rect(aes(ymin = cump_type2_at_time, ymax = cump_type3_at_time), fill = pal[4])
Comments
Post a Comment