Linux Utilities for Exploring Text Files -...

Preview:

Citation preview

Linux Utilities for Exploring Text Files

Jeffrey Horner

September 26, 2007

http://biostat.mc.vanderbilt.edu/JeffreyHorner

This isn't a shell, but you can find one close by.

Computer shell, or terminal... where we run the utilities.

$

Command prompt or command line.

$ cat

concatenate files and print on the standard output

$ cat family.txtJeff 35.58Anna 35.17Juliana 15Nicolas 9Joseph 7Isabela 6Eleni 4

$ head -n 2 family.txtJeff 35.58Anna 35.17

output the first part of files.

$ head -n 2 family.txt > parents.txt

output the first two lines of a file and place them in a new file.

$ cat parents.txtJeff 35.58Anna 35.17

$ tail -n 5 family.txt > kids.txt

output the last five lines of a file and place them in a new file.

$ cat kids.txt parents.txt > family.txt$ cat family.txt

Juliana 15Nicolas 9Joseph 7Isabela 6Eleni 4Jeff 35.58Anna 35.17

$ wc family.txt 7 14 70 family.txt

print the number of newlines, words, and bytes in a file.

$ od

write an unambiguous representation, octal bytes by default, of a file to standard output.

$ od family.txt0000000 072512 064554 067141 020141 032461 047012 061551 0661570000020 071541 034440 045012 071557 070145 020150 005067 0715110000040 061141 066145 020141 005066 066105 067145 020151 0050640000060 062512 063146 031440 027065 034065 040412 067156 0201410000100 032463 030456 0050670000106

Looks ambiguous to me.

$ od -c family.txt0000000 J u l i a n a 1 5 \n N i c o l0000020 a s 9 \n J o s e p h 7 \n I s0000040 a b e l a 6 \n E l e n i 4 \n0000060 J e f f 3 5 . 5 8 \n A n n a0000100 3 5 . 1 7 \n0000106

$ od -c family.txt | cut -b 1-34

0000000 J u l i a n0000020 a s 9 \n J0000040 a b e l a0000060 J e f f 30000100 3 5 . 1 7 \n0000106

remove sections of each line of standard input.

$ man ascii

ASCII is the American Standard Code for Information Interchange. It is a 7-bit code. Many 8-bit codes (such as ISO 8859-1, the Linux default character set) contain ASCII as their lower half. The international counterpart of ASCII is known as ISO 646.

$ man ascii

$ sort family.txt

Anna 35.17Eleni 4Isabela 6Jeff 35.58Joseph 7Juliana 15Nicolas 9

$ sort -k 2 family.txt

Juliana 15Anna 35.17Jeff 35.58Eleni 4Isabela 6Joseph 7Nicolas 9

$ sort -n -k 2 family.txt

Eleni 4Isabela 6Joseph 7Nicolas 9Juliana 15Anna 35.17Jeff 35.58

$ cat family.txt | awk '{print $2}'

15976435.5835.17

$ cat family.txt | awk '{print NF}'

2222222

$ cat family.txt

Juliana 15 GirlNicolas 9 BoyJoseph 7 BoyIsabela 6 GirlEleni 4 GirlJeff 35.58Anna 35.17

$ cat family.txt | awk '{print NF}'

3333322

$ cat family.txt | awk '{print NF}' | uniq -c

5 3 2 2

Linux Utilities for Exploring Text Files

Jeffrey Horner

September 26, 2007

http://biostat.mc.vanderbilt.edu/JeffreyHorner

Recommended