Modify columns in a data frame in R more cleanly - maybe using with() or apply()?

Go To StackoverFlow.com

1

I understand the answer in R to repetitive things is usually "apply()" rather than loop. Is there a better R-design pattern for a nasty bit of code I create frequently?

So, pulling tabular data from HTML, I usually need to change the data type, and end up running something like this, to convert the first column to date format (from decimal), and columns 2-4 from character strings with comma thousand separators like "2,400,000" to numeric "2400000."

X[,1] <- decYY2YY(as.numeric(X[,1]))
X[,2] <- as.numeric(gsub(",", "", X[,2]))
X[,3] <- as.numeric(gsub(",", "", X[,3]))
X[,4] <- as.numeric(gsub(",", "", X[,4]))

I don't like that I have X[,number] repeated on both the left and ride sides here, or that I have basically the same statement repeated for 2-4.

Is there a very R-style way of making X[,2] less repetitive but still loop-free? Something that sort of says "apply this to columns 2,3,4---a function that reassigns the current column to a modified version in place?"

I don't want to create a whole, repeatable cleaning function, really, just a quick anonymous function that does this with less repetition.

2012-04-03 19:38
by Mittenchops


4

Assuming X is a data frame, I would do:

X[2:4] <- lapply(X[2:4], function (x) as.numeric(gsub(",", "", x)))
2012-04-03 20:15
by Eduardo Leoni
This is exactly what I had in mind---I don't suppose though that you know a way to also eliminate having X[2:4] as an assignment, on both sides, do you - Mittenchops 2012-04-04 18:28
If you want to apply the transformation to every column you could do X[] <- lapply(X, function (x) as.numeric(gsub(",", "", x)) - Eduardo Leoni 2012-04-04 20:41


2

Something like

comma2numeric <- function(x) { as.numeric(gsub(",","",x)) }
X[,2:4] <- apply(X[,2:4],2,comma2numeric)

is a start. transform is a good modify-in-place idiom, but it operates with names rather than with column numbers.

edited: missing close-parenthesis in line 1

2012-04-03 19:45
by Ben Bolker
Is there some trick to defining functions in a single line? When I run comma2numeric <- function(x) { as.numeric(gsub(",","",x) } I get the error message Error: unexpected '}' in "comma2numeric <- function(x) { as.numeric(gsub(",","",x) }" which is fixed when I change the function definition to 3 lines, with the '}' alone on the last line - Mittenchops 2012-04-03 20:01
Mittenchops. In a one-liner you dispense the {}. e.g. comma2numeric <- function(x) as.numeric(gsub(",","",x - Eduardo Leoni 2012-04-03 20:09
I included the {} (although unnecessary) because I think it adds a bit of precision (coding styles & tastes differ). The missing close-parenthesis was the real problem - Ben Bolker 2012-04-03 20:10
I stand corrected - Eduardo Leoni 2012-04-03 23:24