33
1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho [email protected] COMS W4995 Design - Bjarne Stroustrup April 24, 2015

1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho [email protected] COMS W4995 Design - Bjarne Stroustrup

Embed Size (px)

Citation preview

Page 1: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

1 Al Aho

The Evolution of AWK:Computational Thinking in

Programming Language Design

Alfred V. [email protected]

COMS W4995 Design - Bjarne Stroustrup April 24, 2015

Page 2: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

2 Al Aho

Computational Thinking

Computational thinking is a fundamental skill for everyone, not just for computer scientists. To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability. Just as the printing press facilitated the spread of the three Rs, what is appropriately incestuous about this vision is that computing and computers facilitate the spread of computational thinking.

Jeannette M. WingComputational Thinking

CACM, vol. 49, no. 3, pp. 33-35, 2006

Page 3: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

3 Al Aho

What is Computational Thinking?

The thought processes involved in formulating problems so their solutions can be represented as computation steps and algorithms.

Alfred V. AhoComputation and Computational Thinking

The Computer Journal, vol. 55, no. 7, pp. 832- 835, 2012

Page 4: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

4 Al Aho

Software in Our World Today

How much software does the world use today?

Guesstimate: over one trillion lines of source code

What is the sunk cost of the legacy base?

$100 per line of finished, tested source code

How many bugs are there in the legacy base?

10 to 10,000 defects per million lines of source code

A. V. AhoSoftware and the Future of Programming Languages

Science, February 27, 2004, pp. 1131-1133

Page 5: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

5 Al Aho

Programming Languages Today

Today there are thousands of programming languages.

The website http://www.99-bottles-of-beer.net has programs in over 1,500 different programming languages and variations to generate the lyrics to the song “99 Bottles of Beer.”

Page 6: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

6 Al Aho

“99 Bottles of Beer”99 bottles of beer on the wall, 99 bottles of beer.Take one down and pass it around, 98 bottles of beer on the wall.

98 bottles of beer on the wall, 98 bottles of beer.Take one down and pass it around, 97 bottles of beer on the wall.

.

.

.2 bottles of beer on the wall, 2 bottles of beer.Take one down and pass it around, 1 bottle of beer on the wall.

1 bottle of beer on the wall, 1 bottle of beer.Take one down and pass it around, no more bottles of beer on

the wall.

No more bottles of beer on the wall, no more bottles of beer.Go to the store and buy some more, 99 bottles of beer on the

wall.[Traditional]

Page 7: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

7 Al Aho

“99 Bottles of Beer” in C++#include <iostream>using namespace std;

int main() { int bottles = 99; while ( bottles > 0 ) { cout << bottles << " bottle(s) of beer on the wall," << endl; cout << bottles << " bottle(s) of beer." << endl; cout << "Take one down, pass it around," << endl; cout << --bottles << " bottle(s) of beer on the wall." << endl; } return 0; }

[Tim Robinson, http://www.99-bottles-of-beer.net/language-c++-109.html]

Page 8: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

8 Al Aho

Stroustrup’s Version #include <iostream> using namespace std;

int main() {     for (int bottles = 99;  bottles>0; --bottles)         cout << bottles << " bottle(s) of beer on the wall,\n"             << bottles << " bottle(s) of beer.\n"             << "Take one down, pass it around,\n"             << " bottle(s) of beer on the wall.\n\n"; }

[Bjarne Stroustrup, personal communication, April 17, 2015]

Page 9: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

9 Al Aho

“99 Bottles of Beer” in the Whitespace language

[Andrew Kemp, http://compsoc.dur.ac.uk/whitespace/]

Page 10: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

10 Al Aho

Why Are There So Many Languages?

• One language cannot serve all application areas well– e.g., programming web pages (JavaScript)– e.g., electronic design automation (VHDL)– e.g., parser generation (YACC)

• Programmers often have strongly held opinions about– what makes a good language– how programming should be done

• There is no universally accepted metric for a good language!

Page 11: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

11 Al Aho

Evolution of Programming Languages

1970Fortran

Lisp

Cobol

Algol 60

APL

Snobol 4

Simula 67

Basic

PL/1

Pascal

2015Java

C

C++

Objective-C

C#

JavaScript

PHP

Python

Visual Basic

Visual Basic .NET

TIOBE Index April 2015

2015JavaPHP

PythonC#

C++C

JavaScriptObjective-C

MATLABR

PYPL Index April 2015

Page 12: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

12 Al Aho

Evolutionary Forces on Languages

Increasing diversity of applications

Stress on increasing programmer productivity and shortening time to market

Need to improve software security, reliability and maintainability

Emphasis on mobility and distribution

Support for parallelism and concurrency

New mechanisms for modularity

Trend toward multi-paradigm programming

Page 13: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

13 Al Aho

Models of Computation in Languages

Early programming languages usually had only one model of computation:

Fortran (1957): Procedural

Lisp (1958): Functional

Simula (1967): Object oriented

Prolog (1972): Logic

SQL (1974): Relational algebra

Page 14: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

14 Al Aho

Models of Computation in Languages

New programming languages are often designed around several models of computation

And legacy languages are incorporating additional models of computation to support multiple programming paradigms

Page 15: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

15 Al Aho

Example: Elm

• Elm is a functional programming language for declaratively creating web browser based graphical user interfaces.

• It uses functional reactive programming and purely functional graphical layout to build user interfaces without any destructive updates.

• Elm was designed in 2012 by Evan Czaplicki.• The key features in Elm are signals, immutability, static types, and interoperability with HTML, CSS, and JavaScript.

elm-lang.org

Page 16: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

16 Al Aho

Example: Rust

• Rust is a general-purpose, multi-paradigm, compiled programming language.

• It is designed to be a safe, concurrent, practical language.

• First pre-alpha release of the Rust compiler was in 2012.

• It supports pure-functional, concurrent-actor, imperative-procedural, and object-oriented programming styles.

• Rust was originally designed by Graydon Hoare and is supported by Mozilla Research.

• It advertises itself as “a systems programming language that runs blazingly fast, prevents almost all crashes, and eliminates data races.”

www.rust-lang.org

Page 17: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

17 Al Aho

Example: Swift

• Swift is Apple’s new programming language for iOS and OS X whose code is designed to work with Objective-C

• It was designed with code safety and performance in mind.

• Some of the features of Swift include– named parameters– inferred types– modules– automatic memory management– closures with unified function pointers– functional programming patterns like map and filter

https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/

Page 18: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

18 Al Aho

• AWK is a scripting language for routine data-processing tasks designed by Al Aho, Brian Kernighan, Peter Weinberger at Bell Labs around 1977

• Each of the co-designers had slightly different motivations–Aho wanted a generalized grep–Kernighan wanted a programmable editor–Weinberger wanted a database query tool

• All co-designers wanted a simple, easy-to-use language

The Birth of AWK

Page 19: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

19 Al Aho

Kleene Regular Expressions

Regular Expression Matches

c the character c itself except when it is (, ), or *

r1 | r2 r1 or r2

r1 r2 r1 followed by r2

r * zero or more instances of r

( r ) r

• ‘|’ has lowest precedence, then concatenation, then *• ‘|’ and concatentation are left associative, * is right associative

• For example, a | b*c = (a) | ((b*) c)

Page 20: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

20 Al Aho

Kleene Regular Expressions andFinite Automata are Equivalent

• The set of strings denoted by any Kleene regular expression can be recognized by a deterministic finite automaton.

• The set of strings recognized by any finite automaton can be denoted by a regular expression.

Page 21: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

21 Al Aho

Grep Regular Expressions

Regular Expression Matches

c the character c itself except when c is . [ ] ^ $ * \

r1r2 r1 followed by r2

r * zero or more instances of r

. any character

^ beginning of line when ^ is first character in regexp

$ end of line when $ is last character in regexp

[abc] an a, b, or c

[a-z] any lower-case letter

[^abc] any character except an a, b, or c

[^0-9] any character that is not a digit

\c c unless c is ( ) or a digit

\( r \) tagged regular expression that matches r the matched strings are available as \1, \2, etc.

Page 22: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

22 Al Aho

Back Referencing in Grep Regular Expressions

Grep regular expressions can match non-regular languages:

^\([ab]*\)\1$ matches strings of the form xx where x is any string of a’s and b’s

Back referencing makes the string pattern matching problem NP-complete:

Theorem: Let r be a grep regular expression with back referencing and s an input string. Determining whether s contains a substring matched by r is NP-complete.

Proof. Reduction from vertex-cover. See A. V. Aho, “Algorithms for finding patterns in strings”, Handbook of Theoretical Computer Science, MIT Press 1990, pp. 255-300.

Page 23: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

23 Al Aho

Egrep Regular Expressions

• Started with grep regular expressions except for back referencing

• Added ‘|’ for union as in Kleene regular expressions

• Added parentheses for grouping as in Kleene regular expressions

• Current egrep uses POSIX regular expressions.

Page 24: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

24 Al Aho

Egrep Regular Expression Pattern-Matching Algorithm

• Constructs the transitions for a deterministic finite automaton on demand from the regular expression using an LR(0)-like algorithm

• Uses a fixed size cache to store the transitions of the DFA

• Adds a transition in a given state on a given input character only when it is needed

• When the cache becomes full, it flushes it and adds transitions to the empty cache as needed

• Observed time complexity given a regular expression r and an input string s is O(|r| + |s|). It is an open question if this can be achieved in the worst case.

Page 25: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

25 Al Aho

• Prototypical use cases–selection: “print all lines containing the word AWK in the first field”

$1 ~ /AWK/

–transformation: “print the second and first field of every line”

{ print $2, $1 }

–report generation: “sum the values in the first field of every line and then print the sum and average”

{ sum += $1 }

END { print "sum = " sum, "avg = " sum/NR }

The Evolution of AWK

Page 26: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

26 Al Aho

• An AWK program is a sequence of pattern-action statements

pattern { action }pattern { action }. . .

• Each pattern is a boolean combination of regular, numeric, and string expressions

• An action is a C-like program–If there is no { action }, the default is to print the line

• Invocationawk ‘program’ [file1 file2 . . . ] awk –f progfile [file1 file2 . . . ]

Structure and Invocation of an AWK Program

Page 27: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

27 Al Aho

for each file

for each line of the current file

for each pattern in the AWK program

if the pattern matches the input line then

execute the associated action

AWK’s Model of Computation:Pattern-Action Programming

Page 28: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

28 Al Aho

• Input is read automatically across multiple files–lines are split into fields $1, $2, . . ., $NF–whole line is $0

• Variables are dynamic and can contain string or numeric values or both–no declarations: types determined by context and use–initialized to 0 and empty string–built-in variables for frequently used values

• Operators work on strings or numbers–coerce type/value according to context–what does $1 == $2 mean?

• Associative arrays take arbitrary subscripts• Regular expressions as in egrep

AWK in a Nutshell - I

Page 29: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

29 Al Aho

• Control-flow operators similar to C–if-else, while, for, do

• Built-in functions for arithmetic, string processing, regular expressions, editing text, . . .

• Supports user-defined functions

• printf for formatted output

• getline for input from files or processors

AWK in a Nutshell - II

Page 30: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

30 Al Aho

1. Print the total number of input linesEND { print NR }

2. Print the last field of every line{ print $NF }

3. Print each line preceded by its line number{ print NR, $0 }

4. Print all non-empty linesNF > 0

5. What does this AWK program do?!x[$0]++

Some Useful AWK “One-liners”

Page 31: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

31 Al Aho

AWK is a scripting language designed for routine data-processing tasks on strings and numbers

E.g.: given a list of name-value pairs, print the total value associated with each name.

AWK Summary

eve 20 bob 15 alice 40

alice 10 eve 20 bob 15 alice 30

{ total[$1] += $2 } END { for (x in total) print x, total[x] }

An AWK programis a sequence of

pattern-action statements

Page 32: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

32 Al Aho

Comparison: Regular Expression Pattern Matchingin Perl, Python, Ruby vs. AWKTime to check whether a?nan matches an

regular expression and text size n

Russ Cox, Regular expression matching can be simple and fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) [http://swtch.com/~rsc/regexp/regexp1.html, 2007]

Page 33: 1 Al Aho The Evolution of AWK: Computational Thinking in Programming Language Design Alfred V. Aho aho@cs.columbia.edu COMS W4995 Design - Bjarne Stroustrup

33 Al Aho

“99 Bottles of Beer” in AWK (bottled version)

BEGIN{ split( \ "no mo"\ "rexxN"\ "o mor"\ "exsxx"\ "Take "\ "one dow"\ "n and pas"\ "s it around"\ ", xGo to the "\ "store and buy s"\ "ome more, x bot"\ "tlex of beerx o"\ "n the wall" , s,\ "x"); for( i=99 ;\ i>=0; i--){ s[0]=\ s[2] = i ; print \ s[2 + !(i) ] s[8]\ s[4+ !(i-1)] s[9]\ s[10]", " s[!(i)]\ s[8] s[4+ !(i-1)]\ s[9]".";i?s[0]--:\ s[0] = 99; print \ s[6+!i]s[!(s[0])]\ s[8] s[4 +!(i-2)]\ s[9]s[10] ".\n";}}

[Wilhem Weske, http://www.99-bottles-of-beer.net/language-awk-1910.html