transpose column and rows using gawk

Go To StackoverFlow.com

4

I am trying to transpose a really long file and I am concerned that it will not be transposed entirely.

My data looks something like this:

Thisisalongstring12345678   1   AB  abc 937 4.320194
Thisisalongstring12345678   1   AB  efg 549 0.767828
Thisisalongstring12345678   1   AB  hi  346 -4.903441
Thisisalongstring12345678   1   AB  jk  193 7.317946

I want my data to look like this:

Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678
1                         1                         1                         1
AB                        AB                        AB                        AB
abc                       efg                       hi                        jk
937                       549                       346                       193
4.320194                  0.767828                  -4.903441                 7.317946

Would the length of the first string prove to be an issue? My file is much longer than this approx 2000 lines long. Also is it possible to change the name of the first string to Thisis234, and then transpose?

2012-04-04 00:08
by user1269741
If you're willing to put up with lines of 20,000 * 25 characters (or so) per column (so 100 KiB or so per line), and the applications you work with are too, then the chances are that gawk will be fine with it too. Yes, you can trim the long names; devise the algorithm and apply on output or during input - Jonathan Leffler 2012-04-04 00:50


7

I don't see why it will not be - unless you don't have enough memory. Try the below and see if you run into problems.

Input:

$ cat inf.txt 
a b c d
1 2 3 4
. , + -
A B C D

Awk program:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= NR; r++) {
    for(c = 1; c <= max_nf; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

Run:

$ ./mkt.sh 
a 1 . A 
b 2 , B 
c 3 + C 
d 4 - D 

Credits:

Hope this helps.

2012-04-04 00:32
by icyrock.com
Similar to command line pivotghoti 2012-04-04 00:38
@ghoti Agree, it's a similar topic, different approach - good for OP to have options - icyrock.com 2012-04-04 00:42


4

This can be done with the rs BSD command:

http://www.unix.com/man-page/freebsd/1/rs/

Check out the -T option.

2012-04-04 02:30
by Kaz
This is brilliant: also, available (stock) in OSX. rs as many features. I suggest reading the man page - Vincent 2015-03-21 17:43


3

I tried icyrock.com's answer, but found that I had to change:

for(r = 1; r <= NR; r++) {
  for(c = 1; c <= max_nf; c++) {

to

for(r = 1; r <= max_nf; r++) {
  for(c = 1; c <= NR; c++) {

to get the NR columns and max_nf rows. So icyrock's code becomes:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= max_nf; r++) {
    for(c = 1; c <= NR; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

If you don't do that and use an asymmetrical input, like:

a b c d
1 2 3 4
. , + -

You get:

a 1 .
b 2 ,
c 3 +

i.e. still 3 rows and 4 columns (the last of which is blank).

2015-01-23 00:51
by ScubaFish


0

For @ ScubaFishi and @ icyrock code:

"if (max_nf < NF)" seems unnecessary. I deleted it, and the code works just fine.

2017-02-25 03:15
by JeffZheng