9
literateprogramming tools et youarrangehe parts f a program n anyorder ndextract docurqentationdcode from hesame ource file.Theauthor rgues that anguagedepen- dence nd eature om- plexity avehampered acceptance f these tools, henaffers simpler lternative. NORMANAMSEY Bellcore LITERATE PROGRAMMING S~MPLUFIED~ I 1983, Donald Knuth introduced literate programming’in the form of Web, his tool for writing literate Pascal programs.’ Web lets authors interleave source code and descriptive text in a single document. It also frees authors to arrange the parts of a program in an order that helps explain how the pro- gram functions, not necessarily the order required by the compiler. In the mid-‘80s , word spread about this new programming method as sev- eral literate programs were published. In 1987, Com77zmications of the ACM created a special forum to discuss liter- ate programming.2 Web was adapted to programming languages other than Pascal, including C, Modula-2, Fortran, Ada, and others.3-6 With expe- rience, however, many Web users became dissatisfied.’ Continued inter- est in literate programming led to a frenzy of tool building. In the resulting confusion, the literate-programming forum was dropped, on the grounds that literate programming had become the province of those who could build the* own tools.* The proliferati on of literate-pro- gramming tools made it hard for liter- ate programming to enter the main- stream, but it led to a better under- standing of what such tools should do. Today the field is more mature, and there is an emerging demand for tools that are simple, easy to learn, and not tied to a particular programming lan- guage- My own literate-programming tool, noweb, fills this niche. Freely available IEEE SOFTWARE 07407459/94/m 00 0 1994 IEEE 97

Better Literate Programming

Embed Size (px)

DESCRIPTION

Better ways to do literate programming according to the original author, Norman Ramsey...

Citation preview

  • literate programming tools let you arrange the

    parts of a program in any order and extract

    docurqentation and code from the same source file. The author argues

    that languagedepen- dence and feature com- plexity have hampered

    acceptance of these tools, then affers 0 simpler alternative.

    NORMAN RAMSEY Bellcore

    LITERATE PROGRAMMING S~MPLUFIED~

    I n 1983, Donald Knuth introduced literate programmingin the form of Web, his tool for writing literate Pascal programs. Web lets authors interleave source code and descriptive text in a single document. It also frees authors to arrange the parts of a program in an order that helps explain how the pro- gram functions, not necessarily the order required by the compiler.

    In the mid-80s, word spread about this new programming method as sev- eral literate programs were published. In 1987, Com77zmications of the ACM created a special forum to discuss liter- ate programming.2 Web was adapted to programming languages other than Pascal, including C, Modula-2, Fortran, Ada, and others.3-6 With expe- rience, however, many Web users

    became dissatisfied. Continued inter- est in literate programming led to a frenzy of tool building. In the resulting confusion, the literate-programming forum was dropped, on the grounds that literate programming had become the province of those who could build the* own tools.*

    The proliferation of literate-pro- gramming tools made it hard for liter- ate programming to enter the main- stream, but it led to a better under- standing of what such tools should do. Today the field is more mature, and there is an emerging demand for tools that are simple, easy to learn, and not tied to a particular programming lan- guage-

    My own literate-programming tool, noweb, fills this niche. Freely available

    IEEE SOFTWARE 07407459/94/m 00 0 1994 IEEE 97

  • AN EXAMPLE OF NOWEB: COUNTING WORDS I This example, based on d program by Klaus Gunter-

    mann and Joachim Schrod and a program hy Silvio Levv and D. E. Knuth, presents the word count program from Lnix, rewritten in noweh to demonstrate literate programming using noweh. The level of detail in this c document is intentionally high, for didactic purposes; many of the things spelled out here dont need to he explained in other programs. The purpose ofwc is to count lines, characters, and/or words in a list of files. The number of lines in a file is the number of new-line characters it contains. The number of characters is the file length in bytes. A word is a maximal sequence of consecutive characters other than newline. space, or tah, containing at least one visible ASCII code. (Vie assume that the standard .ASCIl code is in use.)

    Most literate C programs share a common structure. Its probably a good idea tn state the overall structure explicitly at the outset, even though the various parts could all be introduced in chunks named if we want- ed to add them piecemeal

    Here, then. is an overview of the file WC. c that is defined by the noweh program WC. nw:

    98a

    Root chunk (not used in this docummt).

    Lve must include the standard l/O definitions because we want to send formatted output to stdout and stderr. dieader fiks to inrhde 98b>= 986

    #include This code is used in chunk 9%~.

    The status variable will tell the operating system if the run was successful or not, and prog-nume is used in case theres an error message to be printed.

    cDe$niriom 98~s 98C #define OK 0

    /* status code for successful run */

    #define usage-error 1 /* status code for improper rryntsx l /

    #define cannot-open-file 2 /* statm code for file acc68is error ft

    Definer cannot-open-file, usedin chunktO&. OK, used in chunk 98d. usage~ertot, usedinchunk IO2d.

    Uscs8tatwi 9M This d&niiw is continualin chunks loOa, IO&, md 102r. Thiic&iPtsedG&uak9Ra. ,: '

    .P .oSk&d tamk&s 98dxa /* 962

    on the Internet since 1989, noweb strips literate programming ( tc CC

    nc tk PI

    i its essentials. Programs are composed of named chunks of )de, written in any order, with documentation interleaved.

    To facilitate comparison of Web and noweb, a sample Iweb program appears in the shaded box that runs throughout iis article. I took the text, code, and presentation for this sam- e from Knuths Literate Programming.

    Noweb was developed on Unix and can be ported to non- nix platforms provided they can simulate pipelines and sup- )rt both AVSI C and either awk or Icon. For example, Kean alleges Lee Wittenberg ported noweb to MS-DOS. Noweb unique among literate-programming tools in its pipelined,

    rtensible implementation, which makes it easy for experi- [enters to create new features without writing their own tools.

    Ii1 m m W

    cc h: 0 St sl St

    P rc

    EBS COMPLEXITIES

    Webs complexities make it difficult to explore the idea of terate programming because too much effort is required to laster the tool. To compound the difficulty, different program ling languages are served by different versions of Web, each ith its own idiosyncrasies.

    The classic Web expands three kinds of macros, prettyprints ,de for typeset output, evaluates some constant expressions, a&s string support into Pascal, and implements a simple form f version control. The manual documents 27 control :quences. I Versions for languages other than Pascal offer ightly different functions and different sets of control :quences.

    Web uses its Tangle tool to produce source code and its Jeave tool to produce documentation. Webs original Tangle :moved white space and folded lines to fill each line with

    multiple f i Les a:~ exit (stdt:b) ;

    If the.first argument begins with a (\tt-), the user is choosing the desired counts and specify-

    Fig-we 1. A noweb sozwc(, fiagmentfiom the example progmnz.

    98 SEPTEMBER 1994

  • - - - - - - - . -7

    This code is used in chunk 9 8 ~ ~ .

    N o w w e c o m e to the genera l layout of the m a i n function.

    / < T h e m a i n p rog ram 9917s 9 9 a

    main(argc , a rgv) int argc;

    ! /* # a rgumnte o n Un ix c o n m a n d l ine*/ I char l *argv;

    tokens, m a k i n g its ou tpu t u n r e a d a b l e . La te r adap ta t i ons p r e - se r ved l ine b reaks bu t r e m o v e d o the r wh i te space . W e b s W e a v e d iv ides a p r o g r a m in to n u m b e r e d sect ions, a n d its i ndex a n d c ross - re fe rence in fo rmat ion re fe r to sect ion n u m b e r s , no t p a g e n u m b e r s . W e b works poo r l y wi th LaTexz L a T e x c o n - structs canno t b e u s e d in W e b source , a n d get t ing W e a v e out- pu t to wo rk in L a T e x d o c u m e n t s requ i res ted ious ad jus tments by h a n d . W e a v e s sou rce (wr i t ten in W e b ) is severa l t h o u s a n d l ines long , a n d the format t ing c o d e is no t iso la ted.

    N O W E B S F E A T U R E S

    N o w e b s simplici ty de r i ves f rom a s imp le m o d e l of fi les, wh ich a r e m a r k e d u p us ing a s imp le syntax. F igu re 1 s h o w s a f ragmen t of the n o w e b sou rce u s e d to g e n e r a t e the b o x e d s a m - p le p r o g r a m . It s h o w s e x a m p l e s of c h u n k def in i t ions a n d uses, q u o t e d code , a n d lists of de f i ned ident i f iers - al l o f n o w e b s syntax excep t e s c a p e d a n g l e brackets .

    F R O structere. A n o w e b fi le is a s e q u e n c e of chunks . A c h u n k m a y con ta in code , in wh ich case it is n a m e d , o r documen ta t i on , in wh ich case it is u n n a m e d . C h u n k s m a y a p p e a r in a n y o rde r . E a c h c o d e c h u n k b e g i n s wi th + & m k nmtzes= o n a l ine by itself. T h e doub le - le f t a n g l e b racke t must b e i n the first co lumn. E a c h documen ta t i on c h u n k b e g i n s wi th a l ine that starts wi th a n @ symbo l fo l l owed by a s p a c e o r new l ine . C h u n k s a r e te rm ina ted implicit ly by the b e g i n n i n g of a n o t h e r c h u n k o r by the e n d of the file. If t he first l ine in the fi le d o e s no t m a r k the b e g i n n i n g of a chunk , n o w e b a s s u m e s it is the first l ine of a documen ta t i on chunk .

    A s F igu re 2 shows, n o w e b uses its n o t a n g l e a n d n o w e a v e tools to extract c o d e a n d documen ta t i on , respect ive ly . W h e n n o t a n g l e is g i ven a n o w e b file, it wr i tes the p r o g r a m o n s t a n d a r d output . W h e n n o w e a v e is g i ven a n o w e b file, it r e a d s it a n d p r o - duces , o n s t a n d a r d output , Tex sou rce for typeset d o c u m e n t a - t ion.

    c o d r US. C o d e chunks con ta in p r o g r a m sou rce c o d e a n d

    /* the a rgumante , a n a r ray of a t r iage l / f

    4nr iab leshca l to m a i n Y Y b > w a g - n a m - a rgv [Ol r < S e t u p opt ion select ion 9 9 0 cPr in t the g rand tota.ls if there w e r e mrr l t ip lef i l r r 1 0 2 b exit (statue);

    )

    Defines: argc, used in chunks 99c a n d 99d. a r m used in chunks 99c, l O O r , a n d IOlc. ma in . never used.

    Uses p rog_nuu 9 8 d a n d #tatw 986. This code is used in chunk 98s.

    4i r iab lcs lour l tomain99br t int f l le~count~

    /* h o w m a n y f i le8 thora a re l / char *whiaht

    9 9 b

    /* wh ich ooun t~ to pr int */ D & W

    f i l r_aouat ,usedincbdra99i r , JOOc , 10/r , m d 102b. r ih ia~, w e d in chunks J?%, lo le, R X ? & , a n d mu.

    Tb io def in i t ion ia umc io@ i s~dwnb I& b a a d lw T h i s + b d h o ~ .

    < .G t u p opt ion select ion Y 9 ~ x 3 9 91.

    wh ich = lwc; /* if n o opt ion is g iven pr int 3 va lues */

    if (a rgc > 1 h & *argv[l ] = = t-c) f wh ich = argv[ l ] + 1; argc-- ; a r g v + + j

    1

    r e fe rences to o the r c o d e chunks . S e v e r a l c o d e chunks m a y h a v e the s a m e n a m e ; n o t a n g l e conca tena tes the i r def in i t ions to p r o - d u c e a s ing le c h u n k

    f i le-count - argc - 1 ~ Us- = g c 9 % p % r r 9 9 ~ 4 f i le-count 996, ~L IJ wh ich 99) . This code is used in chunk 99a.

    C o d e - c h u n k def in i t ions a r e l ike m a c r o def in i t ions: N o t a n g l e : N o w w e S c a n the r e m a i n i n g arguments a n d try to o p e n a fi le

    extracts a p r o g r a m by e x p a n d i n g o n e c h u n k (by defau l t t he if possib le. The ti le is p rocessed a n d its statistics a re aven. i V e u s e a d o 9 l . whi le h o p b e c a u s e w e shou ld read f rom the

    c h u n k n a m e d cc*>>) . T h e def in i t ion of that c h u n k c o & & s ref - s tandard input if n o ti le n a m e is g iven. e r e n c e s to o the r chunks , wh ich a r e themse lves e x p a n d e d , a n d so on . F igu re 3 s h o w s par t of the b o x e d s a m p l e p r o g r a m as

    < P r o a u r & t b e @ 9 9 $ > r r p h 9 9 d

    ext rac ted by no tang le . N o t a n g l e s ou tpu t is r eadab le ; it p r e - l&3- - j & c I

    se rves wh i te s p a c e a n d ma in ta ins the inden ta t ion of e x p a n d e d : chunks wi th respec t to the chunks in wh ich they a p p e a r . Th is b e h a v i o r a l lows n o w e b to b e u s e d wi th l a n g u a g e s l ike M i r a n d a

    & .& ,& i m d e o - IO lrz a n d Haskel l , in wh ich inden ta t ion is s igni f icant. , , .gkaB r r p % jJ,

    W h e n doub le - le f t a n d - r ight a n g l e b rackets a r e no t pa i red , r #i i CstsWcs*@ X U X e s

    they a r e t rea ted as l i terals. Use rs c a n fo rce a n y such brackets , *W # e f t?o+ 1,

    e v e n p a i r e d brackets , to b e t rea ted as l i teral by us ing a p r e c e d - _ ~?q r *mrJ@?rc r ; ,I II /e _.. . .~ .-..._ - -..- .-.._ .__. _ _ _ _ _

    If tbe first a rgumen t beg ins with a -, the user is choos in r the des i red counts a n d speci fy ing the o rde r in wh ich they - shou ld b e d isp layed. E a c h select ion is g iven by the init ial char - acter ( l ines, words, o r characters) . For example , -cl wou ld cause just the n u m b e r of characters a n d the n u m b e r of l ines to b e pr inted, in that o rde r . W e d o no t p rocess this s t r ing now ; w s imply r e m e m b e r whe re it is. It wi l l b e used to contro l the for- mat t ing at ou tpu t t ime.

    I E E E S O F T W A R E 8 R

  • /* even if there is only one file*/ ) while (--argc > 0); Lsce argc 9%. This code is ued in chunk 79n. Heres the code to open the tile. A special trick allows us

    to handle input From &din when no name is given. Recall that the file descriptor to etdin is 0; thats what we use as the default initial value.

    int fd = 0; /*file descriptor, initialized to &din*/

    DdiIP3: fd, used tn chunh lOOr, 1OOd. and IOld.

    0 &8 (fd=open (*(++argv),REAI_ONLY))< 0) {

    fprint(stderr, "%a: cannot open file %e\n", 9rogsame. *argv);

    statue I= cannot~open~file; file-count--; continue;

    Lkesargv99a, camot~open~file98c. fd 10Oa,ffle~count 99b, prog_nans 98d, IO&%, and status 98d.

    This code is wed in chunk 99d.

    4kejZe 1 OOd>=

    close (fd)f

    Uaesfll looa.

    IOVd

    This code is wed in chunk 99d.

    W e will do some homemade buffering in order to speed things up: Characters will be read into the buffer array before we process them. To do this we set up appropriate pointers and counters.

    eDq%it iQm98n+e 1OOe #define buf-size BWFSI!6

    /* atdi0.h BuFsIe cbnn far effici~ */ D&es:

    ktf-#i8& used in chmka lwfand IOlA

    --. - --i Figure 2. Using noweb to build code and documentat ion.

    ing @ sign. Any line beginning with 0 and a space terminates a code

    chunk. If such a line has the form @ cj %de f identijk-s it also means that the preceding chunk defines the identifiers listed in identijkn. This notation provides a way of marking definitions manual ly when no automatic marking is available.

    Documentat ion chunks. Documentat ion chunks contain text that is ignored by notangle and copied verbatim to standard output by noweave (except for quoted code). Code may be quoted within documentat ion chunks by placing double square brack- ets around it. These brackets are ignored by notangle but are used by noweave to give the quoted code special typographic treatment. For example, in the sample program, quoted code is set in the Courier font.

    Noweave can work with LaTex, or it can use a plain Tex macro package, suppl ied with noweb, that defines commands like \chapter and \section. Noweave can also work with HTML, the hypertext markup language for Mosaic and the World-Wide Web. The example simulates the results after processing by noweave and LaTex.

    Noweave adds no newline characters to its output, making it easy to find the sources of Tex or LaTex errors. For example, an error on line 634 of a generated Tex file is caused by a prob- lem on line 634 of the corresponding noweb file.

    Index and cross-reference features. Cross-referencing of chunks and identifiers makes large programs easier to understand. The sample program accompanying this article shows full cross-ref- erence information.

    Unlike Web, noweb does not introduce numbered set- t ions for cross-referencing. Noweb uses page numbers. If two or more chunks appear on a page, say page 24, they are distin- guished by appending a letter to the page number: 24a or 24b, for example. Readers of large literate programs will appreciate the use of a single number ing system.

    Like Web, noweb writes chunk-cross-reference information in a footnote font below each code chunk. Noweb also includes cross-reference information for identifiers, for example, Defines file-count, used in chunks 7,11,19, and 21. Noweb generates this by using the @ U %de f markings in its source code, or by recognizing definitions automatically. Although noweb can automatically recognize definitions in C programs, I used @J%def to mark the definitions in the sample program. This choice not only illustrates the use of @ 0 %de f

    100 SEPTEMBER 1994

  • but it also ensures results compatible with the CWeb version of this program. Atitomatically generated indices would differ because CWeb and noweb use different recognition heuristics. Because noweb uses a language-independent heuristic to find identifier uses, it can be fooled into finding false uses in com- ments or string literals, like the use of status in chunk 3.

    Complier and debugger support. On a large project, it is essential that compilers and other tools refer to locations in the noweb source, even though they work with notangles output. Giving notangle the -L option makes it emit pragmas that inform compilers of the placement of lines in the noweb source. It also preserves the columns in which tokens appear, so that line-and- column error messages are accurate. If you do not give notan- gle the -L option, it respects the indentation of its input, mak- ing its output easy to read.

    Formatting features. Noweave depends on text formatters in two ways: in the source of noweave itself and in the supporting macros. Noweaves dependence on its formatter is small and isolated, instead of being distributed throughout a large imple- mentation. Noweb uses 250 lines of source for Tex and LaTex combined, and another 250 for HTML. It uses about 200 lines of supporting macros for plain Tex and another 300 lines to support LaTex, primarily because the page-based cross-refer- ence mechanism is complex. LaTex support without cross-ref-

    / 1

    maintargc, argv! t i int argc;

    /* the number of arguments on theUNIX command line */

    char **arp. !* the &uments themselves, an array

    of strings */ i

    int fiLe_count; i* how many f-:es tkele are *:

    char **hick;- i* which cxnts to c:~nt *i

    int fd = 3; /* f:le descriptor, ir~itiallzed to stdin Y

    char buffer[kxf-size;; 1 i* we read the ir.~.;: :T.LO this array */

    register char *ptr; ;* the first -nprocessed cnaracterin buffer */ I

    register char *buf-end; /* the first unused position in buffer 4

    register int c; I* current character, or number of characters

    iust read */ int &word;

    /* are we within a word? */ low word-count, line-count, char-count;

    /*number of words, lines, and characters -&und in file so far */

    which = *l~~*rol'

    :q P=xLn~ =

    -.-- Figure 3. Part of the example program after extraction by notangle.

    Ptr = buf-end = buffer; line-count = word-count = char-count = 0; in-word = 0; C W buf-end /Wj, buffer !f//& char-count [0/y-,

    in-word lM)/~, line-count I00t; Ptr IO/~; and word-count /U/!/I

    .Ihls co& is uvzl in chunk YYd. The grand totals must he initialized to zero at the beginning

    of the program. If u e made there variables local to main, we would have to do this initialization cylicitlp; however, Cs globals are automatically zeroed. (Or rather, statically zeroed.) (Get It?)

    cGlobaal z?ariabb 98dd>+= 1Olb

    long tot-word-count, tot-line-count. tot-char-count;

    /* total number of words, lines, chars */ The present chunk, which does the counting that is W C S mi-

    smz A%-e, was actually one of the simplescto write. LSre look at each character and change state if it begins ot ends a word.

    c.%anjk 101~~ IOIC

    while (1) ( &ill buf fer ifit is empty; break at end offile [Old> C ii l ptKtti if (c > " && c < 0177) 1:

    /* vieibile ASCII codes l / if (!in-word) {

    word-count++; in-word = 1;

    1 continue

    if (C == \Zl) lZi.Ile-CoUnt++i

    else if (c != "'1 &Ii c Ir '\t')coutiIlue;

    in_word = 0: /*c ie newline, space, or tab */

    Usesingr~rd 1OOj line-countlwptr 10af;wcmLcountlO@ Thiscode isusedinchunk 996 Buffered I/O allows us to count the number of characters

    almost for free.

    IEEE SOFTWARE 101

  • printf(" '%a\n", l argv); /* not etdin l / else

    printf (\n) ; /* stdin l /

    Cres argv YYz~.char-count IO/J5 file-count 996, line- count lOOj3L;wcgrint 1026. which 99b, word-count lOll&

    Yhls code 1s used ,n chunk 99d.

    tot-line-count t= line-count; tot-word-count += word-count; tot-char-count += char-count;

    1 erencing requires only 34 lines of source and no supporting i macros. HTML requires no supporting macros.

    Ccrs char count lOl$ line-count 1Otlf; word-count IO@ ! - Xhls code is wed in chunk 996. I

    i1.c might as well improve a hit on Vnirs WC by displaying ~ the number of tiles too.

    1) {

    wcgrint(which, tot-char-count, totJord~count, tot~line.Jzount);

    prfntf (total in %d filas\n, file-count);

    Uses file-count 9911, wcgrint IO2d. which 996. This code is used in chunk 99~. The function below prints the values according to the speci-

    fied options. The calling routine should supply a newline. If an invalid option character is found we inform the user about proper uie of the command. Counts are printed in eight-digit fields so thev will line UD in columns.

    1 cl)efhkiuns 98c>+a

    #define print-count(n) printf("%81d': n) l)ifi!liY

    I OZC

    dmkm~~ I OX>= WC grintcwhich, char-count, word_count,

    line_count )

    1 l,21

    char *which; /* which counts to print/ long &ax-count, word-count, line-count :

    /* given totals l /

    while (*which) switch (%hich+t) (

    case '1': print-count(line-count); break:

    case w: print-count (word_caunt) ; break;

    case c: print-count(char-count); break; I

    default: if ((status 6r usage-error) == 0) I

    fgrintf (stderr, \nlWage:%a[-lwcl filename.. .]\I?,

    Prog-=) i status I= usage~ermr;

    D&C?S: wc.print,usedin~hunb IOlrand 102b.

    Csrs char-count loof; line-count lOof; print-count 102~. prog~~ame Wd. status Pad, usage-error 98c, which 99b, andword-count lOOf.

    This code is used in chunk 988. A test of this program against the nstem WC command on a

    SparcStadon showed the official WC rvas slightly slower. Although that WC gave an appropriate error message for the options -abc, it made no complaints about the options -labc! Dare we suggest the s)istem routine might have been better had its programmer used a more literate approach?

    Uncoupling files and programs. The mapping between noweb files and programs is many-to-many; the mapping between files and documents is many-to-one. You combine source files by listing their names on notangles or noweaves command line. Notangle can extract more than one program from a single source file by using the -R command-line option to identify the root chunks of the different programs.

    The simplest example of one-to-many program mapping is that of putting a C header and program in a single noweb file. The header comes from the root chunk , and the pro- gram from the default root chunk, wc.c notangle -Rheader wc.nw I cpif -ne wc.h

    The > in the first command directs notangles output to the file wc.c. The I in the second command directs notangles output to the cpif program, which is distributed with noweb. cpi f - ne WC. h compares its input to the contents of file wc.h; if they differ, the input replaces wc.h. This trick avoids touching the file wc.h when its contents have not changed, which avoids tiggering unnecessary recompilations.

    Because it is language-independent, noweb can combine dif- ferent programming languages in a single literate program. This ability makes it possible to explain all of a projects source in a single document, including not just ordinary code but also things like make files, test scripts, and test inputs. Using literate programming to describe tests as well as source code provides a lasting, written explanation of the thinking needed to create the tests, and it does so with little overhead. If not documented at the time, the rationale behind complex tests can easily be lost.

    IMPLEMENTING NOWEB

    Until now we have discussed noweb from a users point of view, showing that it is simple and easy to use. Nowebs imple- mentation is also worth discussing, because nowebs extensible implementation makes it unique among literate-programming tools. Noweb tools are implemented as pipelines. Each pipeline begins with the noweb source file. Successive stages of the pipeline implement simple transformations of the source, until the desired result emerges from the end of the pipeline.

    Users change or extend noweb not by recompiling but by inserting or removing pipeline stages; for example, noweave switches from LaTex to HTML by changing just the last pipeline stage. Nowebs extensibility enables its users to create new literate-programming features without having to write their own tools.

    Nowebs syntax is easy to read, write, and edit, but it is not easily manipulated by programs. Markup, which is the first stage in every pipeline, converts noweb source to a representa-

    102 SEPTEMBER 1994

  • don easily manipulated by common Unix tools like sed and mands, respectively. awk, greatly simplifying the construction of later pipeline Noweb turns a World-Wide-Web browser like Mosaic stages. Middle stages add information to the representation. into a hypertext browser for literate programs. For example, Notangles final stage converts to code; noweaves final you can click on an identifier or chunk name to jump to the stages convert to Tex, LaTex, or HTML. definition of that identifier or chunk. You can find a hyper-

    In the pipeline representation, every line begins with 8 text version of the boxed sample program at ftp://bellcore. and a keyword. The most important possibilities appear in comfpub/norman/noweb/wc.html. Table 1. Markup brackets chunks by @begin . . . @end, and it uses the noweb source to identify text and newlines, defini- EVALUATING NOWEB tions and uses of chunks, and quoted code, which can all appear inside chunks. It 1 a so preserves information about file Reviewers have had many expectations of literate-pro- names and defined identifiers. Other index and cross-refer- gramming tools. lo We expect to be able to write code ence information is inserted automatically by later pipeline chunks in any order. We expect to develop code and docu- stages. The details of nowebs pipeline representation are mentation in one place. Finally, we expect automatically described in the Noweb Hackers Guide, which is distributed generated cross-reference and index information. Like the with noweb. original Web, noweb provides all these features, but in sim-

    pler form. EXTENDING NOWEB Web does provide features that noweb lacks, but existing

    Unix tools can substitute for most of these. Although noweb Noweb lets users insert stages into the notangle and contains no internal support for macros, Unix supplies two

    noweave pipelines, so that they can change a tools existing macro processors that can work with noweb: the C pre- behavior or add new features without recompiling. Even lan- processor and the m4 macro processor. The xstr program guage-dependent features like formatted output and auto- extracts string literals, and the patch program provides a matic index generation have been added to noweb without form of version control similar to Webs change files. recompiling. Indexing and cross-referencing make noweb less simple

    Stages inserted in the middle of a pipeline both read and than it could be. I need complex LaTex code to compute write nowebs pipeline representation; they are called Jilters, page numbers for use in cross-reference lists and in the by analogy with Unix filters, which are used in the Unix index. The ability to use page numbers justifies this com- implementation.

    Filters can be used to change the way noweb works; for example, a one-line sed script makes noweb treat two chunk names as identical if they differ only in their representation of white space, as in Web. A 55-line Icon program makes it possible to abbreviate chunk names using a trailing ellipsis. To share programs with colleagues who dont enjoy literate start a cllLlnk programming, I use a filter that places each line of docu- End a chunk mentation in a comment and moves it to the succeeding code chunk. With this filter, notangle transforms a literate Qtext mittg sWitt,q appeared in 3 chunk program into a traditional commented program, without @nl A newline appeared in a chunk loss of information and with only a modest penalty in read- ~ @tlefn *[z7/te The code chunk named t/N?/ze in being ability. defined

    Filters can be used to add significant features. Noweaves ~ @use nume A reference to code chunk named 7ull)le cross-reference and indexing features use two filters, one ~ @quote Start of quoted code in a document that finds uses of defined identifiers and one that inserts I chunk cross-reference information. In most cases, programmers @endquote F;$kf quoted code in a document must mark identifier definitions by hand, using @Cl %def, .

    but in some cases a third, language-dependent filter can be Q f le pt1a7ltc Name of the tile from which the used to mark identifier definitions, making index generation cl1w1ks Gllllt! completely automatic. @index defn ident The current chunk contains a

    Kostas Oikonomou of AT&T Bell Labs, Kaelin definition of ihnt Colclasure of Bridge Information Systems, and Conrad0 @index . . . Automatically generated index Martinez-Parra of the Universidad Politecnica de Catalunya inforination in Barcelona have written noweb filters that add prettyprint- @xref . . . Automaticall generated cross- ing for Icon, C++, and Dijkstras language of guarded corn- ~. .._- ~~-.-~~-.- reference in ormation fy --_ .- ~~ _~~ -~~ ~~,

    IEEE SOFTWARE 103

  • plexity, especially since it can be hid- den from most users. You do need to understand the LaTex code if you want to customize the appearance of your noweb documents while retain- ing nowebs use of page numbers for cross-reference. Most literate-pro- gramming tools forbid customization, but not all users will accept such a restriction. I have compromised between simplicity and customizability by add- ing LaTex options for a dozen of the most com- monly requested cus- tomizations. Users can choose from among these ontions without unders&ding nowebs LaTex code.

    Experimenting with noweb is easy because the tools are simple. If the experiment is unsat- isfying, it is easy to a- bandon, because notan- gles output is readable,

    records. Programs created with noweb may be delivered in the form of ordi- nary source code, leaving no clue that noweb was used. The only way for me to find out about uses of noweb is to appeal for information on the Internet. In this way I have learned about significant noweb projects in C++, Modula-2, Occam, parallel C, Perl, Prolog, and Scheme.

    David Hanson and

    LANGUAGE- INDEPENDENT TOOLS iIKE NOWEB ARE

    Chris Fraser are using noweb to write a book describing the design and implementation of a retar- getable C compiler. Tip- ton Cole & Company use Noweb in their consulting

    SIMILAR AND business, which focuses on EASIER TO

    writing database applica- tions on DOS platforms.

    USE THAN They find that noweb TliADlTlONAL

    helps compensate for some of the deficiencies in

    COMPLEX DOS database tools, and TOOL%

    that literate programming helps when a customer

    and documentation can - be preserved as embed- ded comments. Noweb is simpler than Web and easier to use and under- stand, but it does less. I argue, howev- er, that the benefit of Webs extra features is outweighed by the cost of the extra complexity, making noweb better for writing literate programs. Few of Webs remaining features will be missed; for example, many compil- ers evaluate constant expressions at compile time. Noweb users are most likely to miss pretty-printing, but it may be more trouble than it is worth.

    In my own work, I have used noweb for code written in various lan- guages, including assembly language, awk, Bourne shell, C, Icon, Modula-3, Promela, Standard ML, and Tex. These projects have ranged in size from a few hundred to twenty thou- sand lines of code. Information about other programs written using noweb is hard to find. Noweb is provided free of charge, generating no sales

    requests a change in a program that hasnt been

    touched in a year. A customer-sup- port group at Sun Micro-systems is using noweb to help teach their cus- tomers how to work with aspects of the. Solaris operating system like threads and device drivers. The liter- ate-programming paradigm makes it possible to extract working code from the same source used to create techni- cal reports and newsletters.

    OTHER TOOLS

    A survey of literate-programming tools is beyond the scope of this arti- cle, but we can still sketch nowebs place in the context of other tools. Most literate-programming tools are language-dependent and complex. You must change tools when chang- ing programming languages, repeat- ing effort spent mastering a tool.

    Newer tools, like noweb, are lan- guage-independent. The three most prominent are noweb, nuweb, and

    Funnelweb. To users, Noweb and nuweb look

    very similar. There are minor syntac- tic differences, and nuweb uses markup within the source file instead of command-line options to show things like the names of output files, but both are simple and easy to mas- ter. Funnelweb is a complex tool that includes its own rudimentary typeset- ting language and command shell.

    Many of the similarities between noweb and nuweb arise by design. Nuwebs initial design borrowed from noweb, and later versions of each tool have incorporated ideas from the other.

    Noweb and nuweb differ substan- tively in implementation. Nuweb is not pipelined; it is a single, mono- lithic C program. This structure makes nuweb easy to port, since only a C compiler is needed, and it makes it faster, since no parts are interpret- ed and the overhead of creating a pipeline is eliminated, but it also makes nuweb hard to extend. Nowebs pipeline makes it easy to extend, and different stages of the pipeline can be implemented in different pro- gramming languages, depending on which language is best for which job. Extensibility is particularly valu- able to those interested in pushing the frontiers of literate programming, who would otherwise have to write their own tools from scratch.

    I advocate language-independent tools for two reasons. First, after mas- tering one such tool, you can write almost anything as a literate program, including things like shell and per1 scripts, which often benefit dispropor- tionately from a literate treatment. Second, two of these tools - noweb and nuweb - are much simpler, and therefore much easier to master, than any of the language-dependent tools. Those who use one language exclu- sively may, however, prefer a lan- guage-dependent tool, since it pro- vides pretty-printing, which when done well can make the printed liter- ate program easier to read.

  • . N oweb probably culminates one kind of evolution in literate pro- gramming: the trend toward greatest simplicity. No significantly simpler tool could do much. Noweb also begins another kind of evolution, toward greater extensibility and flexi- bility. Further evolution might involve replacing Unix shell scripts and pipelines with an embedded language having special data types to represent pipelines, chunks, and literate pro- grams. This step would make it easier to port noweb to nonUnix platforms, and it could make noweb run much faster. Other developments might include constructing new pipeline stages to support language-dependent operations like macro processing, pretty-printing, and automatic iden- tifier cross-reference.

    These changes would extend no- webs capabilities, but noweb is already

    quite capable of supporting complex programs and documents. It and relat- ed tools are less capable of supporting a modem word-processing style. The word processors noweb currently supports, Tex, LaTex, and HTML, all use the old batch model of word processing. Today, ma.ny authors prefer WYSIWYG word processors like Framemaker, WordPerfect, or Microsoft Word. Kean Colleges Wittenberg has developed a noweb- like system called WinWordWeb based on Word. Because of Words limitations, including its secret propri- etary data format, he could not reuse any of nowebs implementation, but the design is the same.

    The challenge for literate pro- gramming today is getting it into use. Noweb helps by eliminating clutter and complexity. Supporting modern word processors would eliminate

    ACKNOWLEDGEMENTS Mark Weisers invaluable encouragement provided the impetus for me to write this

    oaner, which I did while visitine the Comnuter Science Laboratorv of the Xerox Palo Alto kesearch Center. David Hanso: sugEesteh and provided the cpif brogram. Preston Briggs developed many of the ideas used in-nowebs indexing, and he con&ib;ted code used in &e of the nineline stages. Bill Trost wrote the first HTMLnineline stage. Dave Love nrovided

    I I I 1 1 much-needed LaTex expertise. Comments from Hanson and from the anonymous referees stimulated me to improve the paper. The development of noweb was supported by a Fannie and John Hertz Foundation Fellowship.

    REFERENCES 1. D.E. Knuth, Literate Programming, Stanford University, Stanford, Calif., 1992.

    2. P.J. Denning, Announcing Literate Programming, &mm. ACM, July 1987, p. 593.

    3. K. Guntermann and J. Schrod, Web Adapted to C, TUGBoat, Oct. 1986, pp. 134-137.

    4. S. Levy, Web Adapted to C, Another Approach, TUGBoat, April 1987, pp. 12-13.

    5. N. Ramsey, Literate Programming: Weaving a Language-Independent Web, Connn. ACM, Sept. 1989, pp. 1051-1055.

    6. H. Thimbleby, Experiences of Literate Programming Using CWeb (a Variant of Knuths Web), Cmnputer3ouma1, 1986, pp. 201.2 11,

    7. N. Ramsey and C. Marceau, Literate Programming on a Team Project, Sofnuare -Pm&e 6 Eqrit=nce, July 1991, pp. 677-683.

    8. C. J. Van Wyk, Literate Programming: An Assessment, Comm. ACM, Mar. 1990, pp. 361.365.

    9. D.E. Km&, The Web System of Structured Documentation, Tech. Report 980, Computer Science Dept., Stanford Univ., Stanford, Calif., 1983.

    another barrier, making it possible to write literate programs without first learning a new word-processing lan- guage like LaTex or HTML.

    More must be learned about suit- able ways of structuring literate pro- grams, about whether hypertext is a useful alternative, and about what other kinds of documents literate pro- grams should resemble. What place does literate programming have for the majority of programmers, who are not writing for publication? In the near term, I suspect the best use for literate programming will be to sup- port rapid prototyping, providing a simple and reliable way of document- ing the design decisions made in, and the lessons learned from, the proto- type. In the long term, I hope that simple, extensible tools like noweb will lead everyone to appreciate the benefits of literate programming. +

    Norman Ramsey is a research scientist at Bellcore. His research interests are the construc- tion of software that is easy to understand and to retar- get to different machines. His recent work includes a retargetable debugger and a toolkit that helps build

    debuggers and other programs that manipulate machine code.

    Ramsey received a PhD in computer science from Princeton University. He is a member of ACM.

    Address questions about this article to Ramsey at Bellcore, 445 South Street, Morristown, NJ 07960; [email protected]. Noweb can be obtained by anonymous ftp from CLAN, the Comprehensive Tex Archive Network, in directory web/noweb. CTAN replicas appear on hosts ftp.shsu.edu, ftp.tex.ac.uk, and ftpani-stuttgade. Nowebs World-Wide-Web page is located at ftp://bellcore.com/pub/norman/noweb.

    IEEE SOFTWARE 105