predict - Simple Way to Combine Predictions from Multiple Models for Subset Data in R -
i build separate models different segments of data. have built models so:
log1 <- glm(y ~ ., family = "binomial", data = train, subset = x1==0) log2 <- glm(y ~ ., family = "binomial", data = train, subset = x1==1 & x2<10) log3 <- glm(y ~ ., family = "binomial", data = train, subset = x1==1 & x2>=10)
if run predictions on training data, r remembers subsets , prediction vectors length of respective subset.
however, if run predictions on testing data, prediction vectors length of whole dataset, not of subsets.
my question whether there simpler way achieve first subsetting testing data, running predictions on each dataset, concatenating predictions, rbinding subset data, , appending concatenated predictions this:
t1 <- subset(test, x1==0) t2 <- subset(test, x1==1 & x2<10) t3 <- subset(test, x1==1 & x2>=10) log1pred <- predict(log1, newdata = t1, type = "response") log2pred <- predict(log2, newdata = t2, type = "response") log3pred <- predict(log3, newdata = t3, type = "response") allpred <- c(log1pred, log2pred, log3pred) tall <- rbind(t1, t2, t3) tall$allpred <- as.data.frame(allpred)
i'd think being stupid , there easier way accomplish - many models on small subsets of data. how combine them predictions on full testing data?
first, here's sample data
set.seed(15) train <- data.frame(x1=sample(0:1, 100, replace=t), x2=rpois(100,10), y=sample(0:1, 100, replace=t)) test <- data.frame(x1=sample(0:1, 10, replace=t), x2=rpois(10,10))
now can fit models. here place them in list make easier keep them together, , remove x1
model since fixed each subset
fits<-list( glm(y ~ .-x1, family = "binomial", data = train, subset = x1==0), glm(y ~ .-x1, family = "binomial", data = train, subset = x1==1 & x2<10), glm(y ~ .-x1, family = "binomial", data = train, subset = x1==1 & x2>=10) )
now, training data, create indicator specifies group observation falls into. looking @ subset=
parameter of each of calls , evaluating conditions in test data.
whichsubset <- as.vector(sapply(fits, function(x) { subsetparam<-x$call$subset eval(subsetparam, test) })%*% matrix(1:length(fits), ncol=1))
you'll want make sure groups mutually exclusive because code not check. can use factor split/unsplit strategy making predictions
unsplit( map(function(a,b) predict(a,b), fits, split(test, whichsubset) ), whichsubset )
and easier strategy have been create segregating factor in first place. make model fitting easier well.
Comments
Post a Comment