Wednesday, November 3, 2010

Convert R "xtabs" object to a dataframe

Lately, I used R's "xtabs" command to generate a 2 dimension contingency table. Then I tried to merge the result with a dataframe object I created earlier. However "merge" function does not like the miss-match input types, meaning I have to transform the xtabs result to a dataframe. The challenges are (1) the regular as.data.frame did not work. I have to use as.data.frame.matrix (2) the column's naming convention is not the ones I'd like it to be. So I write a little R function to accomplish the task of converting xtabs objects to dataframes.

# define x
x <- data.frame(id=c(123,123), model=c(2,3), value=c(2.73,0.36))

# > x
# id model value
# 1 123 2 2.73
# 2 123 3 0.36

# create 2 dimension xtab variable
aa <- xtabs(value~id+model, data=x)

xtabs_2_dataframe <- function(aa){
# figure out column names that are originally used in the xtabs call
nm_tmp <- attributes(dimnames(aa))$names
if(length(nm_tmp) != 2){
# this function only handles 2 dimension xtabs object
cat('Error: the input xtabs object has to have at most 2 dimensions!\n')
#return(NULL)
} else {
bb <- as.data.frame.matrix(aa)

# playing aroun d
colnames(bb) <- paste(nm_tmp[2], colnames(aa), sep='_')
bb$newcol <- dimnames(bb)[[1]]
colnames(bb)[ncol(bb)] <- nm_tmp[1]
rownames(bb) <- NULL

cc=data.frame(bb[, ncol(bb)], bb[, 1:(ncol(bb)-1) ], stringsAsFactors = F)
colnames(cc)[1]=nm_tmp[1]
return(cc)
} }

# > xtabs_2_dataframe(aa)
# id model_2 model_3
# 1 123 2.73 0.36
# > str(xtabs_2_dataframe(aa))
# 'data.frame': 1 obs. of 3 variables:
# $ id : chr "123"
# $ model_2: num 2.73
# $ model_3: num 0.36


Of course, you may ask why not use cast function in reshape package? Well, I tried that and got a list object. So if I want a data frame, I have to process that too, for example using do.call(cbind,...) and assign the proper column names.

# > cast(x, id~model+value)
# id 2_2.73 3_0.36
# 1 123 2.73 0.36
# > str(y<-cast(x, id~model+value))
# List of 3
# $ id : num 123
# $ 2_2.73: num 2.73
# $ 3_0.36: num 0.36
# - attr(*, "row.names")= int 1
# - attr(*, "idvars")= chr "id"
# - attr(*, "rdimnames")=List of 2
# ..$ :'data.frame': 1 obs. of 1 variable:
# .. ..$ id: num 123
# ..$ :'data.frame': 2 obs. of 2 variables:
# .. ..$ model: num [1:2] 2 3
# .. ..$ value: num [1:2] 2.73 0.36

No comments:

Post a Comment