Split dataframe into multiple output files

Go To StackoverFlow.com

11

I have big dataset (but the following is small one for example). I can split the dataframe and then I want to output to multiple text file corresponding to lavel used to split.

mydata <- data.frame (var1 = rep(c("k", "l", "c"), each = 5), var2 = rnorm(5), 
      var3 = rnorm(5))
mydata    
   var1       var2       var3
1     k  0.5406022  0.3654706
2     k -0.6356879 -0.9160001
3     k  0.2946240 -0.1072241
4     k -0.2609121  0.1036626
5     k  0.6206579  0.6111655
6     l  0.5406022  0.3654706
7     l -0.6356879 -0.9160001
8     l  0.2946240 -0.1072241
9     l -0.2609121  0.1036626
10    l  0.6206579  0.6111655
11    c  0.5406022  0.3654706
12    c -0.6356879 -0.9160001
13    c  0.2946240 -0.1072241
14    c -0.2609121  0.1036626
15    c  0.6206579  0.6111655

Now split

> spt1 <- split(mydata, mydata$var1) 

> spt1

$c
   var1       var2       var3
11    c  0.5406022  0.3654706
12    c -0.6356879 -0.9160001
13    c  0.2946240 -0.1072241
14    c -0.2609121  0.1036626
15    c  0.6206579  0.6111655

$k
  var1       var2       var3
1    k  0.5406022  0.3654706
2    k -0.6356879 -0.9160001
3    k  0.2946240 -0.1072241
4    k -0.2609121  0.1036626
5    k  0.6206579  0.6111655

$l
   var1       var2       var3
6     l  0.5406022  0.3654706
7     l -0.6356879 -0.9160001
8     l  0.2946240 -0.1072241
9     l -0.2609121  0.1036626
10    l  0.6206579  0.6111655

I want to write.table in name of outputc, outputk, and outputl. Thus output is common prefix followed by name of label for grouping variable.

write.table (spt1) 
2012-04-03 21:57
by jon


12

Using lapply over the names of spt1 will allow us to access the dataframes in spt1 and the name that we can use in paste to create our files.

lapply(names(spt1), function(x){write.table(spt1[[x]], file = paste("output", x, sep = ""))})

You could add a common extension in the paste if you want.

2012-04-03 22:11
by Dason


8

You can also use a really fast data.table solution. In this case, there is no need to split the dataframe into a list.

library(data.table) # v1.9.7 (devel version)

setDT(mydata) # convert your dataframe into a data.table

# save files
  mydata[, fwrite(.SD, paste0("output", var1,".csv")), by = var1]

In case you want to keep var1 in the output, you can do this:

mydata[, fwrite(copy(.SD)[, var1 := var1] paste0("output", var1,".csv")), by = var1]

ps. note that this answer uses fwrite, which is still in the development version of data.table. Go here for install instructions. You could simply use write.csv or write.table, however you probably want a fast solution in case you're dealing with a big dataset and fwrite is certainly one of the fastest alternatives.

2016-05-19 16:03
by rafa.pereira
Maybe also worth noting: The OP wants to retain var1 in the output but .SD does not contain it. You could try c(.BY, .SD) (not sure if that works) or use the new split.data.table method (currently in the devel version https://github.com/Rdatatable/data.table/issues/1389 - Frank 2016-05-19 16:07