Column Names Across Multiple Rows?

Go To StackoverFlow.com

2

I'm importing a csv file that consists of a crosstab with column names in a two-row hierarchy. When I get the table in R, the result looks like this:

   alpha  X.1  X.2 beta  X.1  X.2  X.3 gamma  X.1
    var1 var2 var3 var1 var2 var3 var4  var1 var4
1     21   50    5   22   48    6    8    25    8 
2     27   50    5   24   48    6    8    33    8 
3     26   50    5   28   48    6    8    33    8 
4     25   50    5   28   48    6    8    20    8

Here, alpha, beta, and gamma are all one level of the hierarchy, while var1, var2, var3, and var4 are the second level.

What I would like it do is get output like the following, where the row names are concatenated but also keeping in mind the structure of the data.

   alpha_var1  alpha_var2  alpha_var3 beta_var1  beta_var2  beta_var3  beta_var4 gamma_var1  gamma_var4
1          21          50           5        22         48          6          8         25           8 
2          27          50           5        24         48          6          8         33           8 
3          26          50           5        28         48          6          8         33           8 
4          25          50           5        28         48          6          8         20           8

Any ideas here? Haven't been able to find anything to deal with this issue. Thanks in advance.

2012-04-03 22:31
by user1202761
...what does the file look like - Tommy 2012-04-03 22:42
the answer will likely involve two calls to read.table() taking advantage of the skip and nrows parameters - Chase 2012-04-03 23:28
the file is a csv, where the X.1, X.2, etc represents blank cells. Not sure how to upload a demo file - user1202761 2012-04-04 01:12


1

This seems to work, though the use of xts seems a bit heavy handed for the na.locf() function, but I know it works and use it frequently, so that's what I used.

library(xts)
#Read in data without headers
x <- read.delim("Book1.txt", skip = 2, header = FALSE)
#Read in header files transposing them into columns
headers <- data.frame(t(read.delim("Book1.txt", nrows = 2, header = FALSE)), stringsAsFactors = FALSE)

#Create a now column with the value of alpha, beta, gama or NA
headers$vals <- with(headers, ifelse(grepl("[abg]", X1), X1, NA))
#Fill down the values above
headers$vals <- na.locf(headers$vals)
#Paste column names together
colnames(x) <- with(headers, paste(vals, X2, sep = "_"))
#Resulting object
x



 alpha_var1 alpha_var2 alpha_var3 beta_var1 beta_var2 beta_var3 beta_var4 gamma_ var1 gamma_var4
1         21         50          5        22        48         6         8          25          8
2         27         50          5        24        48         6         8          33          8
3         26         50          5        28        48         6         8          33          8
4         25         50          5        28        48         6         8          20          8
2012-04-03 23:35
by Chase
Thanks for this, but unfortunately I'm looking for a solution that is flexible to any set of column names. I've made it work for any set of characters by specifying grepl("[a-z]"...) but I'd prefer a solution that is more flexible if possible - user1202761 2012-04-04 01:24
@user1202761 - how do you plan to differentiate between the columns you want to keep and those you don't? It shouldn't be hard to find the corresponding regex to identify the columns you want to keep. Can you instead search for the pattern "X.1-9?" Something like this would work: !(grepl("X\\.\\d", x)). It's finding the pattern "X." followed by a digit...then taking the negative of that - Chase 2012-04-04 01:32
The X.1, etc are auto-generated when you import empty header cells from a csv. If you say header=FALSE, they appear as <NA>. Another issue is that if any of the columns are <NA> in both the first and second level, this fails - user1202761 2012-04-04 01:42
@user1202761 - well that simplifies things quite a bit then (and is not at all clear from your post above. Try using dput() to represent your R objects in the futre). Transpose the two rows of data, use the na.locf() function I showed you, paste them together, and stick it on top of the other data that you read in with the skip=... parameter. All of the pieces you need to solve this are here, just need to connect the dots now - Chase 2012-04-04 01:45
@user1202761 - maybe take a step back, walk around the office, then rethink what the actual problem you are encountering looks like...I find thinking through all of the possible avenues where something can go wrong helps to generate a solution to accommodate those problems as opposed to blindly writing something and updating the code every time something new goes wrong. If necessary, update your question with a representative example of your problem and what you've tried to solve it - Chase 2012-04-04 01:48
Thanks for your help with this. My example did not give enough background, as you made clear, so I appreciate you taking the time to work through it with me. I will use dput() in the future. As I should have done earlier, the help for the na.locf() function solved my problem of NA observations (set na.rm=FALSE of course). Dots connected, problems solved, thank you again - user1202761 2012-04-05 01:37