r - How to handle null entries in SparkR -
i have sparksql dataframe.
some entries in data empty don't behave null or na. how remove them? ideas?
in r can remove them in sparkr there problem s4 system/methods.
thanks.
sparkr column provides long list of useful methods including isnull , isnotnull:
> people_local <- data.frame(id=1:4, age=c(21, 18, 30, na)) > people <- createdataframe(sqlcontext, people_local) > head(people) id age 1 1 21 2 2 18 3 3 na > filter(people, isnotnull(people$age)) %>% head() id age 1 1 21 2 2 18 3 3 30 > filter(people, isnull(people$age)) %>% head() id age 1 4 na please keep in mind there no distinction between na , nan in sparkr.
if prefer operations on whole data frame there set of na functions including fillna , dropna:
> fillna(people, 99) %>% head() id age 1 1 21 2 2 18 3 3 30 4 4 99 > dropna(people) %>% head() id age 1 1 21 2 2 18 3 3 30 both can adjusted consider subset of columns (cols), , dropna has additional useful parameters. example can specify minimal number of not null columns:
> people_with_names_local <- data.frame( id=1:4, age=c(21, 18, 30, na), name=c("alice", na, "bob", na)) > people_with_names <- createdataframe(sqlcontext, people_with_names_local) > people_with_names %>% head() id age name 1 1 21 alice 2 2 18 <na> 3 3 30 bob 4 4 na <na> > dropna(people_with_names, minnonnulls=2) %>% head() id age name 1 1 21 alice 2 2 18 <na> 3 3 30 bob
Comments
Post a Comment