r - How to handle null entries in SparkR -
i have sparksql dataframe.
some entries in data empty don't behave null or na. how remove them? ideas?
in r can remove them in sparkr there problem s4 system/methods.
thanks.
sparkr column provides long list of useful methods including isnull
, isnotnull
:
> people_local <- data.frame(id=1:4, age=c(21, 18, 30, na)) > people <- createdataframe(sqlcontext, people_local) > head(people) id age 1 1 21 2 2 18 3 3 na > filter(people, isnotnull(people$age)) %>% head() id age 1 1 21 2 2 18 3 3 30 > filter(people, isnull(people$age)) %>% head() id age 1 4 na
please keep in mind there no distinction between na
, nan
in sparkr.
if prefer operations on whole data frame there set of na functions including fillna
, dropna
:
> fillna(people, 99) %>% head() id age 1 1 21 2 2 18 3 3 30 4 4 99 > dropna(people) %>% head() id age 1 1 21 2 2 18 3 3 30
both can adjusted consider subset of columns (cols
), , dropna
has additional useful parameters. example can specify minimal number of not null columns:
> people_with_names_local <- data.frame( id=1:4, age=c(21, 18, 30, na), name=c("alice", na, "bob", na)) > people_with_names <- createdataframe(sqlcontext, people_with_names_local) > people_with_names %>% head() id age name 1 1 21 alice 2 2 18 <na> 3 3 30 bob 4 4 na <na> > dropna(people_with_names, minnonnulls=2) %>% head() id age name 1 1 21 alice 2 2 18 <na> 3 3 30 bob
Comments
Post a Comment