r - How to handle null entries in SparkR -


i have sparksql dataframe.

some entries in data empty don't behave null or na. how remove them? ideas?

in r can remove them in sparkr there problem s4 system/methods.

thanks.

sparkr column provides long list of useful methods including isnull , isnotnull:

> people_local <- data.frame(id=1:4, age=c(21, 18, 30, na)) > people <- createdataframe(sqlcontext, people_local) > head(people)    id age 1  1  21 2  2  18 3  3  na  > filter(people, isnotnull(people$age)) %>% head()   id age 1  1  21 2  2  18 3  3  30  > filter(people, isnull(people$age)) %>% head()   id age 1  4  na 

please keep in mind there no distinction between na , nan in sparkr.

if prefer operations on whole data frame there set of na functions including fillna , dropna:

> fillna(people, 99) %>% head()  id age 1  1  21 2  2  18 3  3  30 4  4  99  > dropna(people) %>% head()  id age 1  1  21 2  2  18 3  3  30 

both can adjusted consider subset of columns (cols), , dropna has additional useful parameters. example can specify minimal number of not null columns:

> people_with_names_local <- data.frame(     id=1:4, age=c(21, 18, 30, na), name=c("alice", na, "bob", na)) > people_with_names <- createdataframe(sqlcontext, people_with_names_local) > people_with_names %>% head()   id age  name 1  1  21 alice 2  2  18  <na> 3  3  30   bob 4  4  na  <na>  > dropna(people_with_names, minnonnulls=2) %>% head()   id age  name 1  1  21 alice 2  2  18  <na> 3  3  30   bob 

Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -