Upload
vandat
View
223
Download
4
Embed Size (px)
Citation preview
Linux Utilities for Exploring Text Files
Jeffrey Horner
September 26, 2007
http://biostat.mc.vanderbilt.edu/JeffreyHorner
This isn't a shell, but you can find one close by.
Computer shell, or terminal... where we run the utilities.
$
Command prompt or command line.
$ cat
concatenate files and print on the standard output
$ cat family.txtJeff 35.58Anna 35.17Juliana 15Nicolas 9Joseph 7Isabela 6Eleni 4
$ head -n 2 family.txtJeff 35.58Anna 35.17
output the first part of files.
$ head -n 2 family.txt > parents.txt
output the first two lines of a file and place them in a new file.
$ cat parents.txtJeff 35.58Anna 35.17
$ tail -n 5 family.txt > kids.txt
output the last five lines of a file and place them in a new file.
$ cat kids.txt parents.txt > family.txt$ cat family.txt
Juliana 15Nicolas 9Joseph 7Isabela 6Eleni 4Jeff 35.58Anna 35.17
$ wc family.txt 7 14 70 family.txt
print the number of newlines, words, and bytes in a file.
$ od
write an unambiguous representation, octal bytes by default, of a file to standard output.
$ od family.txt0000000 072512 064554 067141 020141 032461 047012 061551 0661570000020 071541 034440 045012 071557 070145 020150 005067 0715110000040 061141 066145 020141 005066 066105 067145 020151 0050640000060 062512 063146 031440 027065 034065 040412 067156 0201410000100 032463 030456 0050670000106
Looks ambiguous to me.
$ od -c family.txt0000000 J u l i a n a 1 5 \n N i c o l0000020 a s 9 \n J o s e p h 7 \n I s0000040 a b e l a 6 \n E l e n i 4 \n0000060 J e f f 3 5 . 5 8 \n A n n a0000100 3 5 . 1 7 \n0000106
$ od -c family.txt | cut -b 1-34
0000000 J u l i a n0000020 a s 9 \n J0000040 a b e l a0000060 J e f f 30000100 3 5 . 1 7 \n0000106
remove sections of each line of standard input.
$ man ascii
ASCII is the American Standard Code for Information Interchange. It is a 7-bit code. Many 8-bit codes (such as ISO 8859-1, the Linux default character set) contain ASCII as their lower half. The international counterpart of ASCII is known as ISO 646.
$ man ascii
$ sort family.txt
Anna 35.17Eleni 4Isabela 6Jeff 35.58Joseph 7Juliana 15Nicolas 9
$ sort -k 2 family.txt
Juliana 15Anna 35.17Jeff 35.58Eleni 4Isabela 6Joseph 7Nicolas 9
$ sort -n -k 2 family.txt
Eleni 4Isabela 6Joseph 7Nicolas 9Juliana 15Anna 35.17Jeff 35.58
$ cat family.txt | awk '{print $2}'
15976435.5835.17
$ cat family.txt | awk '{print NF}'
2222222
$ cat family.txt
Juliana 15 GirlNicolas 9 BoyJoseph 7 BoyIsabela 6 GirlEleni 4 GirlJeff 35.58Anna 35.17
$ cat family.txt | awk '{print NF}'
3333322
$ cat family.txt | awk '{print NF}' | uniq -c
5 3 2 2
Linux Utilities for Exploring Text Files
Jeffrey Horner
September 26, 2007
http://biostat.mc.vanderbilt.edu/JeffreyHorner