r - Operator == inconsistent in logical columns in data.table -


please see following reproducible example:

library(data.table) set.seed(123) dt <- data.table(a=rep(0.3,10000)) dt[, b := runif(.n) < a] dt[b == t, .n] # [1] 3005 dt[, summary(b)] #    mode   false    true    na's # logical    6995    3005       0 

everything looks fine , count of "true" values same 2 methods. replace col b new one.

dt[, b := runif(.n) < a] dt[b == t, .n] # [1] 3331 dt[, summary(b)] #    mode   false    true    na's # logical    6981    3019       0  

the count of 't' in column b different!!! same column 1 method gives 3331 "true" values , other 3019.

when == bypassed

dt[b != f, .n] # [1] 3019 dt[, summary(b)] #    mode   false    true    na's # logical    6981    3019       0  

which correct again

i can reproduce data.table v1.94 , 1.9.5 on windows 8.1 x64.


here's easier reproducible example without runif().

require(data.table) ## 1.9.4+ dt = data.table(x = 1:5) dt[, y := x <= 2l] #    x     y # 1: 1  true # 2: 2  true # 3: 3 false # 4: 4 false # 5: 5 false  dt[y == true, .n] # [1] 2             <~~~~~~ correct result.  dt[, y := x <= 3l] #    x     y # 1: 1  true # 2: 2  true # 3: 3  true # 4: 4 false # 5: 5 false  dt[y == true, .n] # [1] 2             <~~~~~~ incorrect result, should 3! 

now fixed in v1.9.5 on github.

:= , set* drop secondary keys (new in v1.9.4) dt[x==y] works again after := or set* without needing options(datatable.auto.index=false). setkey() dropping secondary keys correctly. 23 tests added. user36312 reporting, #885.


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -