predict - Simple Way to Combine Predictions from Multiple Models for Subset Data in R -


i build separate models different segments of data. have built models so:

log1 <- glm(y ~ ., family = "binomial", data = train, subset = x1==0) log2 <- glm(y ~ ., family = "binomial", data = train, subset = x1==1 & x2<10) log3 <- glm(y ~ ., family = "binomial", data = train, subset = x1==1 & x2>=10) 

if run predictions on training data, r remembers subsets , prediction vectors length of respective subset.

however, if run predictions on testing data, prediction vectors length of whole dataset, not of subsets.

my question whether there simpler way achieve first subsetting testing data, running predictions on each dataset, concatenating predictions, rbinding subset data, , appending concatenated predictions this:

t1 <- subset(test, x1==0) t2 <- subset(test, x1==1 & x2<10) t3 <- subset(test, x1==1 & x2>=10) log1pred <- predict(log1, newdata = t1, type = "response") log2pred <- predict(log2, newdata = t2, type = "response") log3pred <- predict(log3, newdata = t3, type = "response") allpred <- c(log1pred, log2pred, log3pred) tall <- rbind(t1, t2, t3) tall$allpred <- as.data.frame(allpred) 

i'd think being stupid , there easier way accomplish - many models on small subsets of data. how combine them predictions on full testing data?

first, here's sample data

set.seed(15) train <- data.frame(x1=sample(0:1, 100, replace=t),   x2=rpois(100,10),   y=sample(0:1, 100, replace=t)) test <- data.frame(x1=sample(0:1, 10, replace=t),   x2=rpois(10,10)) 

now can fit models. here place them in list make easier keep them together, , remove x1 model since fixed each subset

fits<-list(   glm(y ~ .-x1, family = "binomial", data = train, subset = x1==0),   glm(y ~ .-x1, family = "binomial", data = train, subset = x1==1 & x2<10),   glm(y ~ .-x1, family = "binomial", data = train, subset = x1==1 & x2>=10) ) 

now, training data, create indicator specifies group observation falls into. looking @ subset= parameter of each of calls , evaluating conditions in test data.

whichsubset <- as.vector(sapply(fits, function(x) {     subsetparam<-x$call$subset     eval(subsetparam, test) })%*% matrix(1:length(fits), ncol=1)) 

you'll want make sure groups mutually exclusive because code not check. can use factor split/unsplit strategy making predictions

unsplit(     map(function(a,b) predict(a,b),          fits, split(test, whichsubset)     ),      whichsubset  ) 

and easier strategy have been create segregating factor in first place. make model fitting easier well.


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

apache - setting document root in antoher partition on ubuntu -