Skip to content

Commit c069ffc

Browse files
saurfangshivaram
authored andcommitted
[SPARK-12526][SPARKR] ifelse, when, otherwise` unable to take Column as value
`ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <forest.fang@outlook.com> Closes apache#10481 from saurfang/spark-12526. (cherry picked from commit d80cc90) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
1 parent 85a8718 commit c069ffc

File tree

3 files changed

+18
-7
lines changed

3 files changed

+18
-7
lines changed

R/pkg/R/column.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,7 @@ setMethod("%in%",
215215

216216
#' otherwise
217217
#'
218-
#' If values in the specified column are null, returns the value.
218+
#' If values in the specified column are null, returns the value.
219219
#' Can be used in conjunction with `when` to specify a default value for expressions.
220220
#'
221221
#' @rdname otherwise
@@ -225,7 +225,7 @@ setMethod("%in%",
225225
setMethod("otherwise",
226226
signature(x = "Column", value = "ANY"),
227227
function(x, value) {
228-
value <- ifelse(class(value) == "Column", value@jc, value)
228+
value <- if (class(value) == "Column") { value@jc } else { value }
229229
jc <- callJMethod(x@jc, "otherwise", value)
230230
column(jc)
231231
})

R/pkg/R/functions.R

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ setMethod("lit", signature("ANY"),
3737
function(x) {
3838
jc <- callJStatic("org.apache.spark.sql.functions",
3939
"lit",
40-
ifelse(class(x) == "Column", x@jc, x))
40+
if (class(x) == "Column") { x@jc } else { x })
4141
column(jc)
4242
})
4343

@@ -2262,7 +2262,7 @@ setMethod("unix_timestamp", signature(x = "Column", format = "character"),
22622262
setMethod("when", signature(condition = "Column", value = "ANY"),
22632263
function(condition, value) {
22642264
condition <- condition@jc
2265-
value <- ifelse(class(value) == "Column", value@jc, value)
2265+
value <- if (class(value) == "Column") { value@jc } else { value }
22662266
jc <- callJStatic("org.apache.spark.sql.functions", "when", condition, value)
22672267
column(jc)
22682268
})
@@ -2277,13 +2277,16 @@ setMethod("when", signature(condition = "Column", value = "ANY"),
22772277
#' @name ifelse
22782278
#' @seealso \link{when}
22792279
#' @export
2280-
#' @examples \dontrun{ifelse(df$a > 1 & df$b > 2, 0, 1)}
2280+
#' @examples \dontrun{
2281+
#' ifelse(df$a > 1 & df$b > 2, 0, 1)
2282+
#' ifelse(df$a > 1, df$a, 1)
2283+
#' }
22812284
setMethod("ifelse",
22822285
signature(test = "Column", yes = "ANY", no = "ANY"),
22832286
function(test, yes, no) {
22842287
test <- test@jc
2285-
yes <- ifelse(class(yes) == "Column", yes@jc, yes)
2286-
no <- ifelse(class(no) == "Column", no@jc, no)
2288+
yes <- if (class(yes) == "Column") { yes@jc } else { yes }
2289+
no <- if (class(no) == "Column") { no@jc } else { no }
22872290
jc <- callJMethod(callJStatic("org.apache.spark.sql.functions",
22882291
"when",
22892292
test, yes),

R/pkg/inst/tests/testthat/test_sparkSQL.R

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1120,6 +1120,14 @@ test_that("when(), otherwise() and ifelse() on a DataFrame", {
11201120
expect_equal(collect(select(df, ifelse(df$a > 1 & df$b > 2, 0, 1)))[, 1], c(1, 0))
11211121
})
11221122

1123+
test_that("when(), otherwise() and ifelse() with column on a DataFrame", {
1124+
l <- list(list(a = 1, b = 2), list(a = 3, b = 4))
1125+
df <- createDataFrame(sqlContext, l)
1126+
expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1))
1127+
expect_equal(collect(select(df, otherwise(when(df$a > 1, lit(1)), lit(0))))[, 1], c(0, 1))
1128+
expect_equal(collect(select(df, ifelse(df$a > 1 & df$b > 2, lit(0), lit(1))))[, 1], c(1, 0))
1129+
})
1130+
11231131
test_that("group by, agg functions", {
11241132
df <- read.json(sqlContext, jsonPath)
11251133
df1 <- agg(df, name = "max", age = "sum")

0 commit comments

Comments
 (0)