Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
485
INDEX
A
abs() function, 21, 55.__abs__() method, 21, 431Absolute file positioning, 156Absolute path, 66Absolute value, 21Abstract state machine class, 273–274.__add__() method, 13, 431Adder factory, 4–5Adler-32 checksum,
See
ChecksumAE module, 100aepack module, 100aetypes module, 100AIFC audio files, 104aifc module, 104AIFF audio files, 104AL module, 104al module, 104Alphanumeric character class (\w), 239Alternation operator, SimpleParse (/), 324–325Alternation operator, regex (|), 208, 240Ambiguous arithmetic expression, 340amkCrypto module, 165and operator, 434anydbm module, 90–92.append() function, 32AppleEvents, 100–101AppleSingle format files, 100applesingle module, 100Applications, 102apply() function, 447, 450–451approx() function, 12Archived files, 178–180array module, 105, 130Arrays, 105
Art of Computer Programming, The,
Third Edition (Knuth), 20
ASCII (American Standard Code for Information Interchange), 465–466
characters, 312converting data to binary, 158–163regexes for markup, 318transmitting binary data as, 121–123
ascii encoding, 186Asymmetrical encryption, 164, 481Asynchronous event handlers, 108Asynchronous I/O on sockets, 397Asynchronous socket service clients and servers,
395asyncore module, 395atexit module, 105Atoms, 207, 209Attributes
mx.TextTool module, 308–309regular expressions, 249–255tokens, 335
Audio data, 104, 347Audio hardware under Windows interface, 104audiodata argument, 347audioop module, 104Authentication, 391awk, 204Aycock, John, 328
B
Backreferences, 210complex pattern, 218–219naming, 211operator, 238replacement, 214in replacement patterns, 218
Back-tick operator, 15Base classes for datatypes, 13–34base64 encoding, 158, 187base64 module, 122, 158–159
mertz_final_index.fm Page 485 Monday, May 5, 2003 9:26 AM
486 I
NDEX
base64 module,
continued
decode() function, 158decodestring() function, 158encode() function, 158encoded strings, 159encodestring() function, 158line blocks, 158–159
BaseHTTPServer module, 105Basic Macintosh dialogs, 101Basic string transformations, 128–147Bastion module, 105Beazley, David, 328Beginning of line operator (^), 239Beginning of string (\A), 239Bemers, Bob, 465Benchmarks, 87, 287–296Berkeley DB library interface, 92BigGraph module, 281Big-O notation, 481–482binary minus (-) operator, 22Binary bit operations, 18Binary data
converting to ASCII, 158–163transmitting as ASCII, 121–123
Binary files,
See
Binary string Binary string
based64 encoding, 159binhex4 encoding, 160hexadecimal encoding, 159packed, 84–86quoted printable encoding, 160UUencoding, 160
binascii module, 122, 159–161a2b_base64() function, 158, 159a2b_hex() function, 159a2b_hqx() function, 159a2b.qp() function, 159a2b_qp() function, 162a2b_uu() function, 159, 163b2a_base64() function, 158, 159b2a_hex() function, 159b2a_hqx() function, 160b2a_qp() function, 160, 162b2a_uu() function, 160, 163crc32() function, 160crc_hqx() function, 160Error exception, 161hexlify() function, 160Incomplete exception, 161rlecode_hqx() function, 161rledecode_hqx() function, 161unhexlify() function, 161
Binding names to objects, 42Bindings, 418–421
Binhex, 122binhex4 checksum,
See
Checksumbinhex module, 122, 161–162
binhex.binhex() function, 161Binhex-encoded string, 159binhex.hexbin() function, 161binhex4 RLE (run-length encoding), 161
Birthday paradox, 41, 482bisect module, 105Bit-position encoded character set, 310Bit-shifting, 19Bitwise inversion, 19Bitwise-and, 19Bitwise-or, 19Bitwise-xor, 19Block special device files, 70Block-level state machine, 292book2html.py file, 474–479Boolean comparisons, 14Boolean datatype, 421Boolean shortcutting, 434Boolean value, 26Bounded numeric quantifier, regex, 242Browsers, remote-control interfaces, 398BSD DB library interface, 92BSD sockets, low-level interface, 397bsddb module, 92BSD-style mailbox, 373buffer() function, 55buildtools module, 100__builtin__() function, 89Built-in functions, 55Buyer states chart, 279buyer_invoices.py file, 276–278Buyer/order report, parsing, 287–289buyer_report.py file, 288–289, 291buyers tag table states, 289–290Byte-codes, generating, 106bzip (BZ2), 172
C
C extensions, 55Caches
checking for file modification, 65clearing, 64directory listings, 57–58lines from files, 64–65reading lines from, 64
cal utility, 100Calendars, 100–101Canonicalization, 413Capabilities, testing, 10–11Capability-based polymorphism, 10
mertz_final_index.fm Page 486 Monday, May 5, 2003 9:26 AM
I
NDEX
487
"".capitalize() string method, 132–133Carbon API interface, 101Carbon.* modules, 101cargo variable, 274cd module, 101"".center() string method, 133cfmfile module, 101CGI applications, debugging, 382–383CGI (Common Gateway Interface), 376–383
GET request, 376–377POST request, 376
cgi module, 376–383CGI scripts, 376
calling from another Web page, 377traceback, 382–383Web bugs, 378
cgi.FieldStorage class, 379–380.file attribute, 381.filename attribute, 381.getfirst() method, 380.getlist() method, 380.getvalue() method, 380–381.list attribute, 381.value attribute, 381
cgi.MiniFieldStorage, 381CGIHTTPServer module, 105cgitb module, 382–383
.enable() method, 382–383Character classes, 208, 238, 323Character references, 387Character set
bit-position encoded, 310encodings, 395
Character special device files, 70Characters, 465–466
counting, 120–121lists split around, 311printable, 132
check_imap_subjects.py file, 366–367Checking for server errors, 224–226Checksum
Addler-32, 182binhex4 checksum, 160CRC32 checksum, 160, 182CRC32 (cyclic redundancy check), 482CRC32 hash, 196SHA cryptographic hash, 170
Child classes comparisons, 14chmod utility, 76chroot utility, 75chunk module, 104CJK (Chinese-Japanese-Korean) alphabets, 185class instances, 430–432Classes
definitions, 419–421inheritance, 11new-style, 11–14representing email messages, 355–362specializing, 11
Cleanup actions, 52Cloning message objects, 350.close() method of files, 16closed attribute, 16ClosureDict class, 36–37cmath module, 105cmd module, 105cmp() function, 25, 113.__cmp__() method, 29Code Fragment Resource module, 101code module, 105, 445Code objects types, 55codecs module, 186, 189–190
EncodedFile() function, 190open() function, 189
codeop module, 106, 445Codepages, 466col utility, 223Collections
number of items in, 97types, 14
colorize utility module, 472–474ColorPicker module, 101COLORSCHEME state, 269colorsys module, 104Column statistics for delimited or flat-record files,
117–120Combinatorial higher-order functions, 5–7Command line
parsing options, 44–47piped and redirected streams, 51summarizing documentation, 221–223
Commandsmx.TextTools module, 299–300quick access to external, 73
commands module, 73, 74getoutput() function, 73getstatus() function, 73getstatusoutput() function, 73
Commentsarchived file, 180HTML, 387in verbose regular expressions, 220regular expression pseudo-group, 242XML documents, 401
commify() function, 230-231Communications Tool Box interface, 101Comparing
custom classes, 14
mertz_final_index.fm Page 487 Monday, May 5, 2003 9:26 AM
488 I
NDEX
Comparing,
continued
directories, 58–61files, 58–61floating-point numbers, 21
Comparison operators, 14comp.compression FAQ, 464compile() function, 55compile module, 106compileall module, 106compile.ast module, 106Compiled applications, 103
Compilers: Principle, Techniques and Tools,
(Aho, et al.), 257–258
compile.visitor module, 106comp.lang.python newsgroup, xiv, 194complex datatype, 422complex() function, 10Complex numbers, 20, 22–24, 105
.conjugate() method, 23
.imag() method, 23
.__ge__() method, 22
.__gt__() method, 22
.__le__() method, 22
.__lt__() method, 22.real() method, 24Complex pattern backreferences, 218–219complex_file_operation() function, 443Compound address, parsing, 365Compound data, 90Compound types, 430–432compress (.Z), 172–173Compressing.
See
Data compressionCompression object, 183–184Concatenating strings to sha object, 172Concrete state machine, 274–280ConfigParser module, 282–283Configuration files, 221–223Constants
Font Manager library (IRIX), 101FORMS library (IRIX), 101interpreting results of os.statvfs() and
os.fstatvfs(), 108mx.TextTools module, 298–299regular expressions, 244–245Silicon Graphics' Graphics Library, 104trigonometric and algebraic, 107
Container types, 427–430ContentHandler class, 406ContentHandler handler, 405CONTENT_LENGTH environment variable, 378Content-Type header, 361Continuation characters, 226–227continuation_ws argument, 351Cookie module, 395
Cookies, managing, 395copy module, 43–44
copy() function, 43deepcopy() function, 43–44
Copying, 42deep copy, 43–44dictionaries, 26directory trees, 68–69file-like objects, 68files, 68–69permission bits, 68permissions data, 69sha object, 171shallow copy, 43substring within memory-mapped file object,
150symbolic links, 69timestamp data, 69URLs (Uniform Resource Locators), 392
copy_reg module, 106
Core Python Programming,
(Chun), xv"".count() string method, 134Counting characters, words, lines and paragraphs,
120–121cPickle module, 93–94, 106
dump() function, 93dumps() function, 93–94load() function, 94loads() function, 94
crc32.py utility, 196,
See also
ChecksumCriteria, quickly sorting lines on custom, 112–115crypt module, 166
crypt.crypt() function, 166Cryptography
asymmetrical encryption, 164, 481cryptographic hash, 163–164, 166, 482digital signatures, 164MD5 (Message Digest 5), 167SHA (Secure Hash Algorithm), 170–172strong hash, 196–198symmetrical encryption, 163third-party modules, 165threat model, 196–198
cStringIO file-like object, 122cStringIO module, 153–158
InputType constant, 154–155OutputType constant, 155StringIO class, 155StringIO() function, 459StringIO.close() method, 155StringIO.flush() method, 155StringIO.getvalue() method, 155StringIO.isatty() method, 155StringIO.read() method, 156
mertz_final_index.fm Page 488 Monday, May 5, 2003 9:26 AM
I
NDEX
489
StringIO.readline() method, 156StringIO.readlines() method, 156StringIO.reset() method, 156StringIO.seek() method, 156StringIO.tell() method, 157StringIO.truncate() method, 157StringIO.write() method, 157StringIO.writelines() method, 157–158
ctb module, 101current_section variable, 270curses package, 106
curses.ascii module, 106curses.panel module, 106curses.textpad module, 106curses.wrapper module, 106
Customcomparison function, 113datatypes, 11–13datatypes and magic methods, 13–34dictionary-like objects, 36–37file-like objects, 15–17FTP (file transfer protocol) clients, 395functions, 3–4IMAP clients, 366–368POP3 clients, 368–370processing, xSAX handlers, 406SMTP clients, 370–371sorting algorithm, 113–115Telnet clients, 397text compressor, 459–464text processing tasks, xWeb clients, 396
Customizable startup module, 108Customizing string representation of objects,
96–98Cyclic garbarge collection, 106Cyphertext string, 170
D
Data, ix–xaccidental damage to, 196–198as code, 445–446compound, 90deep, 258–260hash of correct, 196special values and formats, 82–89structured, 90
Data compressionbzip (BZ2), 172choosing correct data representation, 458–459compress (.Z), 172–173custom text compressor, 459–464
data set example, 454definition of, 453GZ (gzip), 172–173Huffman encoding, 456–457Lempel-Ziv compression, 457–458Lossless versus lossy, 454references, 464RLE (run-length encoding), 455–456SIT, 173whitespace compression, 455word-based Huffman compressed text,
460–461ZIP format, 172–173zlib library, 181–185
Data fork, 161Data set example, 454Data structures, 111Datatypes, 8–9, 54–55, 421
!=, <>, and == operators, 14base classes, 13–34Boolean comparisons, 14buffer() function, 55custom, 11–34emulating, 11equality, 95file-like objects, 11format codes, 424list-like, 28–32more readable, 94–96pretty-printing, 94–96printing, 425–427recursive containers, 95response to == operator, 14simple, 421–423string interpolation, 423–425
Date tuples, 86Dates, manipulating values, 86–89db file, 93dbhash module, 92dbm-style databases, 90–93dbm module, 92*DBM modules, 90–93
DBM.close() method, 91DBM.first() method, 91DBM.has_key() method, 92DBM.keys() method, 92DBM.last() method, 92DBM.next() method, 91–92DBM.open() function, 91DBM.previous() method, 92DBM.sync() method, 92
Debuggingstack traces, 109mx.TextTools tag table, 297–298
mertz_final_index.fm Page 489 Monday, May 5, 2003 9:26 AM
490 I
NDEX
Decimal numerals, 130, 298declaration patterns, SimpleParse, 321Decoding base64 encoding, 158Decompressing zlib library, 181–185Decompression object, 183, 185Decryption, Enigma-like, 168–169Deep data, 258–260Default Unicode string encoding, 52del statement, 30Delimited files, 117–120Deprecation Warning, 235DES (Data Encryption Standard), 166Detecting duplicate words, 223–224DEVICE module, 104Devices, 70Dict class, 24–27dict() function, 10dict type, 36
dict.clear() method, 26dict.__cmp__() method, 24–25dict.__contains__() method, 25dict.copy() method, 26dict.__delitem__() method, 25dict.get() method, 26dict.__getitem__() method, 25dict.has_key() method, 26dict.items() method, 26–27dict.iteritems() method, 26–27dict.iterkeys() method, 27dict.itervalues() method, 27dict.keys() method, 27dict.__len__() method, 26dict.popitem() method, 27dict.setdefault() method, 27dict.__setitem__() method, 26dict.update() method, 27dict.values() method, 27
Dictionaries, 24–27dicts, 428-429hash collisions, 30mapping group names to group numbers, 249mapping symbolic names to character entities,
383–384named groups, 253
Dictionary objects, 24–27Dictionary-based string interpolation, 35–36Dictionary-like objects
containing current environment, 81custom, 36–37
diff utility, 283difflib module, 283–284
get_close_matches() function, 283ndiff() function, 283restore() function, 283
Digit character class (), 208, 238Digital signatures, 164, 195–196, 482dir() function, 55dircache module, 57–58, 106
annotate() function, 58listdir() function, 58opendir() function, 58
Directories, 70caching listings, 57–58comparing, 58–61comparison report, 59–60filenames, 60identifying, 58information about, 76, 79–80listing, 76numeric mode, 75–77owner and group, 75path permissions, 74–75pathnames, 60, 66reading listings, 57–58removing, 79renaming, 79subdirectories, 60
Directory trees, 68–69dis module, 106dissertation.dtd file, 411–412dissertation.py file, 412
Distributing Python Modules,
(Ward),106distutils module, 106Division, 21divmod() function, 21dl module, 101__doc__ strings, 106doctest module, 106Document collections, 199–202Documentation
script and module for examining, 108summarizing command-line option, 221–223
Documents, finding relevant in collection, 199–200DocUtils package, 471DOM (Document Object Model), 403–404
4DOM, 413implementation, 413implementation that conserves memory, 405lightweight implementation, 405OOP model for working with XML, 410
DTDHandler class, 406DTDHandler handler, 405DTDs (Document Type Definitions), 261, 401–402dumbdbm module, 92Duplicate words, detecting, 223–224dupwords.py file, 223Dynamic Web pages, 34
mertz_final_index.fm Page 490 Monday, May 5, 2003 9:26 AM
I
NDEX
491
E
EasyDialogs module, 101EBCDIC, 465EBNF (Extended Backus-Naur Form) grammars,
258, 261, 262-264, 317EBNF parser library, 286EBNF parsing, high-level, 316–328EBNF-style description of Python floating point,
261eGenix.com Web site, xviElements, DOM, 400Ellipsis object, 55Email
adding string or Unicode string to end, 352–353
BSD mailbox, 361communicating with network servers, 344constructing header object, 354core text processing task, 345describing RFC-2231 string components,
353–354examining contents of message folders,
344–345formatted address, 364frauds, 345list of compound addresses, 364managing headers with non-ASCII values,
351–354manipulating and creating message texts,
345–348multinational strings in header, 351–354RFC-2282-formatted date, 364spam, 345timestamp, 365viruses, 345
Email addresses, 228–229Email clients, storing messages, 344–345Email messages
adding field to header, 357adding payload, 357–358base64 encoding, 349body, 345–346BSD mailbox envelope header, 362classes representing, 355–362content type, 362content typing rules, 356Content-Type header, 361current default type, 362default type, 359encoding body, 349encoding parameter to RFC-2231, 362header, 345–346header fields, 358–359
helper functions, 364–365indexing by key, 356iterating through components, 354MIME content delimiters, 358, 361MIME message boundary delimiter, 359mimification and unmimification, 396multipart, 360–361payload, 360–362pretty-printed representation of structure, 355quoted printable encoding, 349recursively traversing, 362removing parameter from header, 358serializing to RFC-2822-compliant text string,
357string description, 359uniform interface, 372–374
email package, 282, 345–349email.message_from_file() function, 348
email.message_from_string() function, 348email.Charset module, 351, 395email.Encoder module, 349
encode_base64() function, 349encode_7or8bit() function, 349encode_quopri() function, 349
email.Errors module, 349email.Generator module, 350
DecodedGenerator class, 350DecodedGenerator.clone() method, 350DecodedGenerator.flatten() method, 351DecodedGenerator.write() method, 351Generator class, 350Generator.clone() method, 350Generator.flatten() method, 351Generator.write() method, 351
email.Header module, 351–354decode_header() function, 353–354Header class, 351–352Header.append() method, 352–353Header.encode() method, 353Header.__str__() method, 353make_header() function, 354
email.Iterators module, 354–355body_line_iterator() function, 354–355_structure() function, 355typed_subpart_iterator() function, 355
email.Message module, 355–362Message class, 346, 355–357Message object, 389Message.add_header() method, 357Message.as_string() method, 357Message.attach() method, 357–358Message.del_param() method, 358Message.epilogue attribute, 358Message.get_all() method, 358–359
mertz_final_index.fm Page 491 Monday, May 5, 2003 9:26 AM
492 I
NDEX
email.Message module,
continued
Message.get_boundary() method, 359Message.get_charsets() method, 359Message.get_content_charset() method, 359Message.get_content_maintype() method, 359Message.get_content_subtype() method, 359Message.get_content_type() method, 359Message.get_default_type() method, 359Message.get_filename() method, 359Message.get_param() method, 360Message.get_params() method, 360Message.get_payload() method, 360–361Message.get_unixfrom() method, 361Message.is_multipart() method, 361Message.preamble attribute, 361Message.replace_header() method, 361Message.set_boundary() method, 361Message.set_default_type() method, 362Message.set_param() method, 362Message.set_payload() method, 362Message.set_type() method, 362Message.set_unixfrom() method, 362Message.walk() method, 362
email.MIMEAudio.MIMEAudio, 347–348email.MIMEBase.MIMEBase class, 346email.MIMEImage.MIMEImage class, 348email.MIMEMultipart.MIMEMultipart class, 347email.MIMENonMultipart.MIMENonMultipart
class, 347email.MIMEText.MIMEText class, 348email.Parser module, 363
HeaderParser class, 363HeaderParser.parse() method, 363HeaderParser.parsestr() method, 363Parser class, 363Parser.parse() method, 363Parser.parsestr() method, 363
email.Utils module, 364–365decode_rfc2231() function, 364encode_rfc2231() function, 364formataddr() function, 364getaddresses() function, 364make_msgid() function, 365mktime_tz() function, 365parseaddr() function, 365parsedate() function, 365parsedate_tz() function, 365quote() function, 365unquote() function, 365
Empty characters, 131–132Empty productions, 338"".encode() string method, 186–188encode_binary.py file, 122–123Encryption,
See also
Cryptography
algorithms, 169asymmetrical, 164Enigma-like, 168–169symmetrical, 163
End of line ($), 239End of string (\Z), 239EndLoop exception, 442end_state flag, 281"".endswith() function, 134English letters, 298
lowercase letters, 298numbers and letters, 298uppercase letters, 298
Enhanced objects, 11–13Enigma-like encryption and decryption, 168–169EntityResolver class, 406EntityResolver handler, 405enumerate() function, 447, 449Environment variables, 75, 78EOF command, 296errno module, 106errno system symbols, 106Error messages interpreter, 50–51Error on failure (!), 324Error recovery, 340ErrorHandler class, 406ErrorHandler handler, 405error_page.py file, 225Escape (\)operator, regular expressions, 236Escape-style shortcuts, 208/etc/inetd.conf file, 221–223evals() function, 445Evans, Carey, 165Event scheduler, 108Event-based API, 404Exact numeric quantifier (), 241except statements, 44, 421, 443Exception classes, 44Exceptions, 44, 49
built-in, 89catching, 441–444dynamic scope, 441–442email package, 349exiting gracefully from deeply nested loops,
442flagging circumstances as unusual, 441invalid or disallowed actions, 441raising, 441–444
exceptions module, 44, 443Excessive call nesting, 4exec statement, 445–446Execution, restricted facilities, 108Existential quantifier (+), regular expressions, 240Existential qunatifier (+), SimpleParse, 323
mertz_final_index.fm Page 492 Monday, May 5, 2003 9:26 AM
I
NDEX
493
Exit handlers, 105Exiting Python, 52"".expandtabs() string method, 134–135expat nonvalidating XML parser interface, 405exponentfloat alternative, 262Exponentiation, 22Expressions, 447–448Extended call syntax, 450–451Extended regular expressions, 209–210External commands
opening pipe to or from, 77opening STDIN, STDOUT, and STDERR
pipes, 78opening STDIN and STDOUT pipes, 77quick access to, 73
extract_email() function, 229Extracting content from fillers, 194–195extract_urls() function, 229
F
Fallback conditions, 295"Fast and Flexible Word Searching on Compressed
Text," 464Fast text manipulation tools, 286–316FastCGI, 376fcntl module, 101fcntl() system function, 101fcrypt module, 165Fermat triples, 442Field names, email headers, 346fields dictionary, 35fields_stats.py file, 117–120FieldStats class, 120FIFO (named pipe), 70file class, 15–17File descriptors, 74File extensions, 374–376File Find, xFile objects, 37, 74
closing, 16invisible, 80memory-mapped, 147–153STDERR (standard error stream), 50–51STDIN (standard input stream), 51STDOUT (standard output stream), 51temporary, 71update mode, 80
File system services, 102filecmp module, 58–61
cmp() function, 58cmpfiles() function, 59dircmp class, 59dircmp.common attribute, 60
dircmp.common_dirs attribute, 60dircmp.common_files attribute, 60dircmp.common_funny attribute, 60dircmp.diff_files attribute, 60dircmp.funny_files attribute, 60dircmp.left_list attribute, 60dircmp.left_only attribute, 60dircmp.report() method, 59–60dircmp.report_partial_closure() method, 60dircmp.right_list attribute, 60dircmp.right_only attribute, 60dircmp.same_files attribute, 60dircmp.subdirs attribute, 61
fileinput module, 61–63close() function, 62FileInput class, 63filelineno() function, 63filename() function, 63input() function, 62isfirstline() function, 63isstdin() function, 63lineno() function, 63nextfile() function, 63
File-like interface, 71File-like objects, 9–11, 11, 68
connecting to URL (Uniform Resource Loca-tor), 389
copying, 68custom, 15–17generator instance writing to, 350message text contained in, 348reading and writing strings, 15reading and writing to string buffer, 153–158serialized objects, 94writing serialized form of object, 93writing string to, 351
Filenamesmatching patterns against, 232–234temporary, 71
Filesappending, 16cache of os.stat() information, 108caching lines from, 64–65closing, 16, 63comparing, 58–61copying, 68–69file descriptor number, 16file handle, 381.fileno() method, 16, 148, 389finding random lines, 39–41group, 75identifying, 58information about, 74, 76, 79–80last access time, 70
mertz_final_index.fm Page 493 Monday, May 5, 2003 9:26 AM
494 I
NDEX
Files,
continued
last status change, 71line-oriented, 39–41lines from large, 37–38listing, 76mode, 16modification time, 71name of, 16, 63, 381number of lines read, 63numeric mode, 75opening next, 63owner, 70, 75path permissions, 74–75persistence, 147position, 17positioning, 156reading, 15–16
backwards, 126–128lines from, 2, 38–39multiple, 61–63over lines, 72
.readline(), 120
.readline() method, 126
.readlines() method, 120, 126removing, 78, 81renaming, 79run-length encoding, 161setting access and modification timestamps,
81shallow comparison, 58simulating random access, 64–65size of, 70as strings, 147–158strings delimiting lines, 82temporary, 71testing, 69–71truncating, 17TTY-like device, 16UUdecode, 163UUencode, 163writing to, 16.xreadlines() method, 126
Filesystems, 65–68filter() function, 4–6, 435–438, 447, 450Filters, 3–7finally clauses, 52"".find() string method, 135findertools module, 101findfile1.py file, 199–200findfile2.py file, 200–201Finding
first match, 312random lines in files, 39–41
find_urls.py file, 228–229
First-order functions, 4first_things() function, 125FL module, 101fl module, 101Flags, 149Flat-record files column statistics, 117–120float class, 21–22float() function, 10float datatype, 19, 422
float.__abs__() method, 21float.__add__() method, 21float.__cmp__() method, 21float.__div__() method, 21float.__divmod__() method, 21float.__floordiv__() method, 21float.__mod__() method, 21–22float.__mul__() method, 22float.__neg__() method, 22float.__pow__() method, 22float.__radd__() method, 21float.__rdiv__() method, 21float.__rdivmod__() method, 21float.__rfloordiv__() method, 21float.__rmod__() method, 21–22float.__rmul__() method, 22float.__rpow__() method, 22float.__rsub__() method, 22float.__rtruediv__() method, 22float.__sub__() method, 22float.__truediv__() method, 22
Floating point numbers, 19with beta distribution, 82circular uniform distribution, 83comparing, 21converting string to, 132defining, 262–263division, 21–22exception control (Unix), 102exponential distribution, 83exponentiation, 22floor division operator //, 21formatting functions, 106gamma distribution, 83Gaussian distribution, 83log normal distribution, 83math, 20modulo division, 22multiplication, 22negative, 22Pareto distribution, 83random, 84ratio, 21summing, 21von Mises distribution, 84
mertz_final_index.fm Page 494 Monday, May 5, 2003 9:26 AM
I
NDEX
495
Weibull distribution, 84Floor division operator (//), 21Flow control, 432
Boolean shortcutting, 434filter() function, 435–438for/continue/break statements, 434–435functions, 439–441if/then/else statements, 433–434list comprehensions, 435–438List-application functions as, 450map() function, 435–438reduce() function, 435–438simple generators, 439–441while/else/continue/break statements,
438–439yield statement, 439–441
flp module, 101Flush left text block, 220–221
flush_left() function, 221flush() method, 16fnmatch module, 64, 232–234
filter() function, 233–234fnmatch() function, 233fnmatchcase() function, 233
Font Manager library (IRIX), 101for statements, 26, 420for/continue/break statements, 434–435Format codes datatypes, 424Format string, 84formatter module, 284–285
AbstractWriter class, 115DumbWriter class, 115
Formatting events, 284Formfeed character, 299form_letter template, 35Forms
automating processing, 379–380filling out, 34–37
FORMS library (IRIX), 101Fourthought company Web site, 408Functional programming (FP), 271, 446–447
concepts, 1–2expressions, 447–448extended call syntax, 450–451functions, 447lamda operator, 447–448list-application functions as flow control, 450obfuscated Python code, 271rebinding names, 447side effects, 447solutions expressed in terms of what, 447special list functions, 448–449
fpectl module, 102fpformat module, 106
frame objects types, 56FrameWork module, 102Freshmeat Web site, xvFTP (File Transfer Protocol) clients, 395ftplib module, 343, 395funcs tag table, 296Function factories, 4–5Function objects, 4Function-defined patterns, 336Functions, 439–441
ad hoc overloading, 53built-in, 89built-in Unicode, 186–188custom, 3–4definitions, 419–421as first-class objects, 447first-order, 4higher-order, 1–7lambda operator, 419mx.TextTools modules, 310–311naive argument overloading, 53–54"quacks like a duck" overloading of argument,
54referentially transparent, 447regular expressions, 245–248signature-based, 8special list, 448–449standard operations as, 47–48string module, 129trigonometric and algebraic, 107with two parameters as argument, 437type checks on arguments, 8
Fuzzy concepts, 12–13, 15Fuzzy tagstack, 386
G
Garbage collection, 109gc module, 106gdbm module, 93generator function, 439generator iterator, 439generator_iter object, 440Generator-iterator objects, 56German letters, 298.__getinitargs__() method and the pickle module,
93.__getitem__() object method, 63, 431getopt module, 44–47
getopt.getopt() function, 46–47getopt.GetoptError exception, 46
getpass module, 106.__getstate__() method and the pickle module, 93gettext module, 102
mertz_final_index.fm Page 495 Monday, May 5, 2003 9:26 AM
496 I
NDEX
GL module, 104gl module, 104glob module, 64
glob.glob() function, 64Glob-style pattern, 64Glob-style subpatterns file, 233
Gnosis Utilities,
xvi, 408gnosis.indexer, xvignosis.xml.indexer module, xvi, 409gnosis.xml.objectify module, 409–410gnosis.xml.pickle module, xvi, 94, 410–411gnosis.xml.validity module, 411–412
Gnosis Web site, xiv, 447GNU readline interface, 108Google, xvi, 391Googol, 263Googolplex, 263Gopher protocol client interface, 395gopherlib module, 343, 395Grammar rules, 337–339Grammars, 260–265Greenwich Mean Time, 88grep, x, 204, 207, 213Grouping operators
regular expressions, 237-238SimpleParse, 326
Grouping regular expressions, 207Group-like patterns, 242–244grp module, 102Guttman-Rosler Transform, 113GZ (gzip), 172, 173gzip file-like object, 174gzip module, 173–175
gzip.close() method, 174gzip.flush() method, 174gzip.GzipFile class, 174, 459gzip.isatty() method, 174gzip.myfileobj attribute, 174gzip.open class, 174gzip.read() method, 174gzip.readline() method, 175gzip.readlines() method, 175gzip.write() method, 175gzip.writelines() method, 175
gzipped files, 173–175gzip object, 174–175
H
Handlersasynchronous events, 108states, 274, 279
hash() function, 30, 99Hashes,
see
Checksum
hash_rotor.py file, 197–198HeaderParserError exception, 361Headers, email, 351–354Helper functions, 364–365hex() function, 19Hex string, 19Hexadecimal numerals, 130Hexadecimal-encoded string, 159Higher-order functions, combinatorial, 5–7High-level EBNF parsing, 316–328High-level programmatic parsing, 328–341histogram.py file, 124–125Histograms, 123–126HLS color space, 104HOFs (higher-order functions), 1–7HSV color space, 104HTML, 344
character entity references, 383–384comments, 387content data, 387declarations, 387endtag, 387entity reference, 387last tag encountered, 388messages, 343parsers, 384–388PI (processing instruction), 388restoring instance to initial state, 388sending additional data to parser, 387templating system for delivery, 398text, 387
HTML documents, 383–388event-based framework for processing,
384–388parsing, 285processing, 285rating error probability, 225Unicode, 469whitespace compression, 455
htmlentitydefs module, 383–384htmllib module, 284, 285, 384HTMLParser module, 282, 384–388HTMLParser.HTMLParser class, 385–386
.close() method, 386
.feed() method, 387
.getpos() method, 387.handle_charref() method, 387
.handle_comment() method, 387
.handle_data() method, 387
.handle_decl() method, 387
.handle_endtag() method, 387
.handle_entityref() method, 387–388
.handle_pi() method, 388
.handle_startendtag() method, 388
mertz_final_index.fm Page 496 Monday, May 5, 2003 9:26 AM
I
NDEX
497
.handle_starttag() method, 388
.lasttag attribute, 388
.reset() method, 388HTMLParser_stack.py file, 385HTTP, 105, 121, 343–344httplib module, 343, 396HTTP_REFERER environment variable, 378HTTP_USER_AGENT environment variable, 378Huffman encoding, 456–457hypontenuse() function, 448hypothetical.ini file, 269
I
ic module, 396icopen module, 396Idempotent functions, 483IFF audio data, 104if/then/else statements, 433–434IF/THEN/END grammar, 265–267IF/THEN/END structures, 263–264ignore Unicode encoding, 188ignore token, 336Illegal characters, 336imageop module, 104IMAP4, 366–368IMAP clients, custom, 366–368IMAP instance object, 367IMAP server, 367imaplib module, 343–345, 366–368, 370imaplib.IMAP4 class, 367
.close() method, 367
.expunge() method, 367
.fetch() method, 367
.list() method, 367
.login() method, 367
.logout() method, 367
.search() method, 368
.select() method, 368imgfile module, 104imghdr module, 396imglib files, 104Immutable, 427, 483imp module, 107, 445__import__() function, 446import statements, 107, 420Importing packages and modules, 420in operator, 25, 30, 33–34"".index()string method, 135–136Indexed assignment, 31indexer.py utility, 201
*Indexer.fileids mapping, 201*Indexer.files mapping, 201*Indexer.words mapping, 201
IndexError exception, 30Info-Zip, 176Inheritance, 11.ini file, 269–271.__init__() method, 431I-node number, 70I-node protection mode, 70Input, redirected, 61–63input() function, 446Input sequence, 62Input string, parsing, 339inspect module, 107
Installing Python Modules,
(Ward), 106INSTALL.LOG file, 2–3int datatype, 12-13, 421–422
int.__and__() method, 18–19int.__hex__() method, 19int.__invert__() method, 19int.__lshift__() method, 19int.__oct__() method, 19int.__or__() method, 19int.__rand__() method, 18–19int.__rlshift__() method, 19int.__rxor__() method, 19int.__ror__() method, 19int.__rrshift__() method, 19int.__rshift__() method, 19int.__xor__() method, 19
int() function, 10, 11int objects, 18–19Integers, 18–19
bitwise operations, 18as fuzzy concepts, 12–13types, 50values, 132
Interactive shell prompts, 49Interfaces
audio hardware under Windows, 104Berkeley DB library, 92BSD DB library, 92Carbon API, 101Communications Tool Box, 101dbm-style databases, 90–93expat nonvalidating XML parser, 405GDBM (GNU DBM) library, 93GNU readline, 108MH mailboxes, 396Navigation Services, 103parser, 285Python DBM, 92Speech Manager, 102standard color selection dialog, 101Sun audio hardware, 105tokenizer, 285
mertz_final_index.fm Page 497 Monday, May 5, 2003 9:26 AM
498 I
NDEX
Interfaces,
continued
Unix (n) dmb library, 92Unix syslog library, 103urllib objects, 388WorldScript-Aware Styled Text Engine, 104
Intermediate Cryptology: Specialized Protocols
Web site, 164
Internetaccess configuration, 396accessing resources, 388–394Config replacement for open(), 396modules, 394–399text formats, 344
Internet protocols, 343Interpreted and/or scripting language, 418Interpreter
cleanup, 49copyright information, 49emulating, 105error messages, 50–51five components of version number, 52information about, 49–53string identifying operating system, 82version information, 51–52version number, 50warnings, 50–51
Introduction to Cryptology Concepts I
Web site, 164
Introduction to Cryptology Concepts II
Web site, 164
Introspection, 328IntType datatype, 18–19Invalid regular expressions, 255I/O
completion, 397low-level, 74
iocntl() system function, 101is not operator, 14"".isalnum() string method, 136"".isalpha() string method, 136isatty() file method, 16isCond() function, 2"".isdigit() string method, 136"".islower() string method, 136.is_multipart()method of email.Message, 356ISO-8859-1 character set, 383iso-8859-1 encoding, 187ISO-8859* encodings, 466isRegDBVal() function, 4isShortRegVal() function, 4"".isspace() string method, 136.issubset() method of sets, 429.issuperset() method of sets, 429"".istitle() string method, 136
"".isupper() string method, 136item tag table, 290items() function, 91.items() dictionary method, 91, 355, 356Iterator wrapper, 336–337
J
"".join() string method, 120-121, 130, 137JPEG files, 104jpeg module, 104jump_count callback function, 297jump_no_match condition, 299–300
K
Kasner, Edward, 263Key/value pairs, storing, 90KeyError exception, 26, 27, 178, 361.keys() dictionary method 355, 356, 380keyword module, 107keyword yield, 128Keywords, 263Knuth, Donald, 20, 232Kuchling, Andrew, 165
L
lambda operator, 113, 419, 447–448latin-1 encoding, 187Leadout eater, 296
Learning Python,
(Lutz & Ascher), xvLemburg, Marc-Andre, 165, 286Lempel-Ziv compression, 457–458len() function, 14, 26, 31, 55, 355LEX, 335–337
Lex & Yacc,
258lex module, 336Lexer, 261, 336–337Lexical anlayzer class, 286LexToken, 329, 336lex.token() function, 336Line break characters, 299Line matching, 283line variable, 2linecache module, 38–39, 64–65
checkcache() function, 65clearcache() function, 64getline() function, 38
Line-ending combinations, 315.lineno attribute, PLY, 335Line-oriented command interpreters, 105Line-oriented files, 39–41Lines
counting, 120–121
mertz_final_index.fm Page 498 Monday, May 5, 2003 9:26 AM
I
NDEX
499
listing, 315number in platform-portable way, 311reading files backwards by, 126–128
list datatype, 28–32list.__add__() method, 29–30list.append() method, 32list.__contains__() method, 30list.count() method, 32list.__delitem__() method, 30list.__delslice__() method, 30list.extend() method, 32list.__getitem__() method, 30list.__getslice__() method, 30list.__hash__() method, 30–31list.__iadd__() method, 29–30list.__imul__() method, 31list.index() method, 32list.__len__() method, 31list.__mul__() method, 31list.pop() method, 32list.remove() method, 32list.__rmul__() method, 31list.__setitem__() method, 31list.__setslice__() method, 31list.sort() method, 32
List comprehensions, 435–438list() function, 10, 11, 428List-application functions as flow control, 450list_capwords.py file, 137List-like datatypes, 28–32Lists, 28, 427–428
adding elements, 29–30, 32appending, 32assignment to slice, 31built-in, 130collection of substrings, 130containing value, 30counting number of occurrences in, 32decreasing size, 32extending, 32hash values, 30–31indexed assignment, 31indexing, 30length of sequence, 31new sequence object, 31offset index, 32removing item, 30removing last item, 32reversing, 32satisfying condition given by function argu-
ment, 436–437slice parameter, 30sorting, 32, 112–113split around character, 311
transformed items, 435–436writing to string buffer, 157–158
Literal strings, 323, 423"".ljust() string method, 138Local time, 88locale module, 102LOCALHOST addresses, 229Locating patterns, xlocation_parse.py file, 393Log files, 2–3logical_lines.py file, 227long datatype, 422long() function, 10Long integer digits to stringify, 98long objects, 18–19LongType datatype, 18–19Lookahead assertions, regular expressions, 219Lookahead quantifier, SimpleParse, 324Lookbehind assertions, 219–220Lossless data compression, 454Lossy data compression, 454"".lower() string method, 138Lower-bound quantifier, 241Lowercase letters, 131Low-level I/O, 74Low-level state machine parsing, 286–316"".lstrip() string method, 139
M
mac module, 74, 102macerrors module, 102macfs module, 102macfsn module, 102MacOS
data fork, 161implementation of functionality, 102resource fork, 161structured development of applications, 102
MacOS module, 102MacOS Python interpreter, 102macostools module, 102macpath module, 102macresource module, 102macspeech module, 102mactty module, 102Magic methods, 11–34, 431Mail servers, communicating with, 366–372mailbox class, 372mailbox module, 282, 345, 372–374
mailbox.BabyLMailbox class, 373mailbox.Maildir class, 374mailbox.MHMailbox class, 373mailbox.MmdfMailbox class, 373
mertz_final_index.fm Page 499 Monday, May 5, 2003 9:26 AM
500 I
NDEX
mailbox module,
continued
mailbox.PortableUnixMailbox class, 373mailbox.UnixMailbox class, 373
Mailboxes, 372–374mailcap file, 396man utility, 223map() function, 4–6, 447, 450Mapping object, 381MAP_PRIVATE flags, 149MAP_SHARED flags, 149Marked-up text, 317Marking up smart ASCII, 292–296markupbuilder.py file, 333–334marshal module, 94, 147Martelli, Alex, 20Matches, finding, 312MatchObject, 248–250, 253math module, 107maxlinelen argument, 351M2Crypto module, 165MD5 cryptographic hash, 167–169MD5 message digests, 167–169md5 module, 167–169md5 object, 167–169
md5.copy() method, 167–168md5.digest() method, 168md5.hexdigest() method, 168md5.md5 class, 167md5.MD5Type constant, 167md5.net class, 167md5.update() method, 168–169
Memory-mapped file objectschanging current file position, 152closing, 149copying substring within, 150creation of, 148–149current file position, 152index position of first substring, 149–150resizing, 151returning string from, 151underlying file size, 152writing into, 153
Mersenne Twister generator, 82Message objects
audio data, 347based on message text, 348cloning, 350dictionary-like behavior, 355–356generator object iterating through, 354–355holding string or Unicode string, 351–352image data, 348MIME content type, 359multipart, 347parsing text message into, 363
prebuilt header, 346serializing, 350–351single part, 348, 349text data, 348
Message payload, 360Message-ID header, 365Methods
built-in Unicode, 186–188documenting base class, 13mx.TextTool module, 308–309regular expressions, 249–255shelve databases, 98string object, 129user-defined classes types, 56–57
MH mailboxes interface, 396mhlib module, 396MIME datatypes, 374–376MIME writer, 396MIME-reading or MIME-writing programs, 396mimetools module, 345, 389, 396mimetools.Message object, 389mimetypes modules, 374–376
.common_types attribute, 375
.encodings_map attribute, 375guess_extension() function, 374–375guess_type() function, 374init() function, 375.inited attribute, 375.knownfiles attribute, 376read_mime_types() function, 375.suffix_map attribute, 376.types_map attribute, 376
MimeWriter module, 345, 396mimify module, 345, 396MiniAEFrame module, 102mkcwproject module, 102mk_unicode_chart.py file, 469–470mmap module, 147–153mmap objects, 148, 150mmap.mmap class, 148–149
.close() method, 149
.find() method, 149–150
.flush() method, 150
.move() method, 150
.read() method, 150–151
.read_byte() method, 151
.readline() method, 151
.resize() method, 151
.seek() method, 152
.size() method, 152
.tell() method, 152
.write() method, 153
.write_byte() method, 153mode attribute, 16
mertz_final_index.fm Page 500 Monday, May 5, 2003 9:26 AM
I
NDEX
501
mod_python, 376Modules
basic string transformations, 128–147building, 106controlling loading, 50email package, 345–348importance of, 41importing, 420installing, 106Internet, 394–399miscellaneous, 105–109multimedia formats, 104–105pathnames searched for, 50platform-specific operations, 100–104regular expressions, 231–255simple pattern matching, 232–234standard, 41–89standard Internet-related tools, 395–398standard library XML, 403–407third-party, 90third-party Internet-related tools, 398–399types, 57
Modulo division, 22most_common() function, 126Mount point, 66Mozilla, 392msvcrt module, 102MTA (Mail Transport Agent), 345MUA (Mail User Agent), 345.__mul__() method, 431multifile module, 285, 345multifile.MultiFile class, 285Multilingual applications, 102Multimedia formats, 104–105MultipartConversionError exception, 347Multiple criteria, 3Multiplication and floating-point numbers, 22Multiproducer, mulitconsumer queue, 108Multithreaded applications, creation of, 108Mutability, 28–29Mutable, 483–484Mutable objects, 42Mutable strings, 130mutex module, 107Mutual exclusion locks, 107mxCrypto module, 165mx.Date module, 86mx.DateTime, xvimx.TextTools module, xvi, 267
attributes, 308–309benchmarks, 287–296classes, 307–308charsplit() function, 311cmp() function, 310
collapse() function, 311commands, 299–300compound matches, 304–305concrete parse tree of components of report,
288constants, 298–299countlines() function, 311find() function, 312findall() function, 312functions, 310–311hex2str() function, 312invset() function, 310isascii() function, 312is_whitespace() function, 312join() function, 313lower() function, 313matching particular characters, 301–302matching sequences, 302–303methods, 308–309modifiers, 305–307multireplace() function, 313–314named jumped targets, 300parser, 319parser generator, 316–328prefix() function, 313replace() function, 314set() function, 310setfind() function, 314setsplit() function, 314setsplitx() function, 315splitat() function, 315splitlines() function, 315splitwords() function, 315str2hex() function, 315suffix() function, 316tag() function, 289, 290, 310-311, 322tag table, 288taglist, 292unconditional commands, 300–301upper() function, 316utility functions, 311–316version of typography() function, 292–295
mx.TextTools commandsAllIn, 301AllInCharSet, 301AllInSet, 301AllNotIn, 301Call, 305CallArg, 305EOF, 303Fail, 300–301Is, 302IsIn, 302IsInCharSet, 302
mertz_final_index.fm Page 501 Monday, May 5, 2003 9:26 AM
502 I
NDEX
mx.TextTools commands,
continued
IsInSet, 302IsNot, 302IsNotIn, 302.Jump, 300–301Move, 301sFindWord, 303Skip, 301SubTable, 304SubTableInList, 305.sWordEnd, 302–303sWordStart, 302–303Table, 304TableInList, 305Word, 302WordEnd, 302–303WordStart, 302–303
mx.TextTools constantsalpha, 298alphanumeric, 298alphanumeric_set, 298alpha_set, 298A2Z, 298a2z, 298A2Z_set, 298a2z_set, 298any, 299any_set, 299formfeed, 299formfeed_set, 299german_alpha, 298german_alpha_set, 298newline, 299newline_set, 299Umlaute, 298Umlaute_set, 298white, 299white_set, 299whitespace, 299whitespace_set, 299
mx.TextTools modifiersAppendMatch, 306AppendTagobj, 307AppendToTagobj, 306CallTag, 305–306LookAhead, 307
mx.TextTools.BMS class, 307–308.find() method, 308.findall() method, 308.match attribute, 309.search() method, 308.translate attribute, 309
mx.TextTools.CharSet class.contains() method, 309
.match() method, 309
.search() method, 309
.split() method, 309
.splitx() method, 309
.strip() method, 309mx.TextTools.FS class, 307–308
.find() method, 308
.findall() method, 308
.match attribute, 309
.search() method, 308
.translate attribute, 309mx.TextTools.TextSearch class, 307–308
.match attribute, 309
.search() method, 308mxTypography.py utility module, 292, 295–297,
319
N
Nac module, 103Named group backreference (?P=name), 244Named group identifier (?P<name>), 244Named terms as parts of patterns, 262Names, assignment, 418–419Namespaces, 418–421
adding or modifying bindings, 420defining, 430–432
Navigation Services interface, 103ndiff utility, 283Negation operator, SimpleParse, 325Negative lookahead assertion (?!...), 243Negative lookbehind assertion (?<!...), 243Nested loops, exiting gracefully from, 442–443Nested subpatterns, 258Nesting
filter() function, 4–6filters, 3–7map() function, 4–6
netrc file, 396netrc module, 396Netscape OSA modules, 397new module, 107, 445new_email_subjects.py file, 368–369News clients, storing messages, 344–345Newsgroups, 344New-style classes, 11–13.next() method of iterators, 372, 439, 440nis module, 103NIS Yellow Pages, 103NIST (National Institute of Standards and Technol-
ogy), 170NNTP (Network News Transport Protocol), 121
Client applications, 397nntplib module, 344, 397
mertz_final_index.fm Page 502 Monday, May 5, 2003 9:26 AM
I
NDEX
503
Node class, 332, 333Nodes, 260Non-alphanumeric character class (\W), 239Non-backreferenced atom (?:...), 242–243Non-digit character class (\D), 238Nonempty sequence, 83Non-greedy bounded quantifier, 242Non-greedy existential quantifier (+?), 241Non-greedy potentiality quantifier (??), 241Non-greedy quantifiers, 215Non-greedy universal quantifier (*?), 240Nonlooping tag table, 297Non-Windows systems, detaching applications, 80Non-word boundary (\B), 240nsremote module, 397Numbers, pretty printing, 229–231Numeric comparison operators, 21, 25Numeric error code, 80Numeric types, capabilities, 10Numeric values, encoding compactly, 84–86
O
object type, 14object.__eq__ method, 14object.__ne__ method, 14object.__len__ method, 14object.__nonzero__ method, 14object.__repr__ method, 15object.__str__ method, 15
Objectsbinding names to, 42binding trap, 42built-in, 89converting to strings, 90copying, 42–44creation in customizable ways, 107customizing string representation, 96–98datatypes, 54–55deep copy, 43–44enhanced, 11–13equality, 14file-like, 9–11, 15–17, 68immutable, 427inequality, 14inspecting, 107length of, 14list-like, 28magic methods, 11–13mutable, 28–29, 42, 427–428naming, 418–421number of references to, 53persistent, 41, 90pickling behavior, 93–94
recursive containers, 95references not limiting garbage collection,
109restricted access, 105serializing, 90–100shallow copy, 43–44snap shots, 42–43standard type names, 98storing, 90–100tuple-like, 28types, 53–57writing serialized form of, 93
oct() function, 19Octal numerals, 130Octal string, 19Open file objects, 55open function, 15–16Operating systems
accessing features, 74–82identifying, 82string referring to current directory, 81
operator module, 47–48optik module, 45optparse module, 45or operator, 434os module, 74–82, 102
access() function, 74–75altsep attribute, 81chdir() function, 75chmod() function, 75chown() function, 75chroot() function, 75curdir, 81defpath, 81environ variable, 78, 81OSError error, 78, 79, 81fstat() function, 69–71getcwd() function, 75getenv() function, 75–76getpid() function, 76kill() function, 76linesep attribute, 82link() function, 76listdir() function, 57, 76lstat() function, 69–71, 76mkdir() function, 76–77mkdirs() function, 77mkfifo() function, 77name attribute, 82nice() function, 77pardir attribute, 82os.path module, 65–68, 74
abspath() function, 65basename() function, 65
mertz_final_index.fm Page 503 Monday, May 5, 2003 9:26 AM
504 I
NDEX
os module,
continued
commonprefix() function, 65dirname() function, 65exists() function, 65expanduser() function, 65expandvars() function, 66getatime() function, 66getmtime() function, 66getsize() function, 66isabs() function, 66isdir() funcation, 66isfile() function, 66islink() function, 66ismount() function, 66join() function, 66normcase() function, 67normpath() function, 67realpath() function, 67samefile() function, 67sameopenfile() function, 67split() function, 67splitdrive() function, 67walk() function, 67–68
pathsep attribute, 82popen() function, 77popen2() function, 77–78popen3() function, 78popen4() function, 78putenv() function, 78readlink() function, 78remove() function, 78removedirs() function, 79rename() function, 79renames() function, 79rmdir() function, 79sep attribute, 82startfile() function, 79stat() function, 69–71, 79–80strerror() function, 80symlink() function, 80system() function, 80tempnam() function, 80tmpfile() function, 80uname() function, 81unlink() function, 81utime() function, 81
os2 module, 74Output file, decoding contents of argument, 161
P
p_*() function, 338Packages, 106, 420Packed binary strings, 84–86
p_add() rule, 340Paragraphs
counting, 120–121reading files backwards by, 126–128reformatting, 115–117
Parser libraries, 282–341parser module, 285Parser state machine, 340.parserbyname() method, SimpleParse, 322parser.out file, 340Parsers, 257, 261, 267
data becoming deep, 258–260grammar, 260–263HTML, 384–388interfaces, 285PLY module, 329mx.TextTools, 286-316SimpleParse, 316-328specialized, 282–286text becoming stateful, 258–260tokens, 261XHTML, 384–388yacc module, 339
parsetab.py file, 340Parsing
buyer/order report, 287–289command-line options, 44–47compound address, 365data with regular expressions, 223HTML files, 285input string, 339low-level state machine, 286–316pencil-and-paper, 264–265text message into message object, 363token list, 332–334URLs (Uniform Resource Locators), 392–394Windows-style configuration files, 282–283XML (Extensible Markup Language), 407
PasswordsASCII 13-byte encrypted, 166collecting without echoing to screen, 106POP3 server, 369
patch utility, 283Path delimiters, 82PATH environment variable, 81Path symbolic link, 78Pathnames, 60, 64–68Paths, 65–68
controlling module loading, 50directory listings, 58permissions, 74–75
Pattern modifiers (?Limsux), 242PatternObject, 248–250Patterns
mertz_final_index.fm Page 504 Monday, May 5, 2003 9:26 AM
I
NDEX
505
case-insensitive match, 233case-sensitive match, 233converting, 235function-defined, 336glob-style matching, 232–234listing elements matching, 233–234matching filenames against, 232–234matching string, 233regular expressions, 207–208simple matching, 232–234
pcre module, 231pdb module, 107Pencil-and-paper parsing, 264–265Permission bits, copying, 68Permissions
copying data, 69paths, 74–75
p_error() function, 340Persistent
files and strings, 147objects, 41
Peters, Tim, x, 20pickle module, 93–94, 106, 147
dump() function, 93dumps() function, 93–94load() function, 94loads() function, 94
pickle.Pickler class, 93Pipe character, 208Piped streams, 51Pipes, managing, 103pipes module, 103PixMap objects, 103PixMapWrapper module, 103PKZip, 176plain declaration, 334plain node, 334plain object, 334Plaintext string, 170plain_words tag table, 296Platforms
attributes, 80identifying, 50managing pipes, 103native byte order (endianness), 49
Platform-specific operations modules, 100–104PLY applications, token stream creation, 329–335PLY grammar, 334, 340PLY package
action code, 329allowable error conditions, 329error correction facility, 328error reporting, 328grammar rules, 337–338
LEX, 335–337lexer/tokenizer, 329lexing module, 335–337parser, 329parsing token list, 332–334productions, 337–338self-referential rules, 334speed, 328yacc module, 337–339
PLY parsers, xvi, 340–341Polymorphism, 8
capability-based, 10enhanced objects, 11–13identifying file-like objects, 9–11Pythonic, 9–11
POP3 clients, custom, 368–370POP3 protocol, 366, 369POP3 server, 369–370popen2 module, 107poplib module, 343, 344, 368–370
POP3 class, 369.apop() method, 369.dele() method, 369.pass_() method, 369.quit() method, 369.retr() method, 369.rset() method, 369.stat() method, 370.top() method, 370.user() method, 370
Positive lookahead assertion (?=...), 243Positive lookbehind assertion (?<=...), 243posix module, 74, 103POSIX tty control, 103posixfile module, 103Potentiality quantifier (?), 241, 324pprint module, 94–96
isrecursive() function, 95pformat() function, 96pprint() function, 96PrettyPrinter class, 96
pre module, 231, 234precedence variable, 340Predicative functions, 6–7Preferences manager for Python, 103pretty_nums.py file, 230–231Pretty-printing numbers, 229–231pretty-printing object, 96pricinglpy support data, 280Print calendars, 100print command, 352, 355print statement, 15, 51, 425–426Printable characters, 132
mertz_final_index.fm Page 505 Monday, May 5, 2003 9:26 AM
506 I
NDEX
Printingcalendars, 100–101datatypes, 425–427directory comparison report, 59–60formatted representation of object, 96reports on code, 107
Processescreation and management, 74ids, 76processor time, 87
profile module, 107Programmatic parsing, high-level, 328–341Programming languages, nesting, 259Programs
location of output, 49names in, 42
p_rulename() form, 332Pseudo terminal utilities, 103pseudo-random value generator, 82–84pstats module, 107pty module, 103Public-key encryption, 164, 484Punctuation, 131pwd module, 103.py files, 106, 108pyclbr module, 107py_compile module, 108pydoc module, 108py_resource module, 103Python
as byte-code compiled programming lan-guage, 418
container classes, 411–412dynamically and strongly typed, 418exiting, 52grammar, 261–262parser libraries, 282–341virtual machine, 418
Python & XML,
(Jones & Drake), xv, 399Python class browser, 107Python Codec Registry, 189Python Cookbook Web site, 112, 203Python Cryptography modules, 165Python DBM interface, 92Python debugger, 107
Python Essential Reference, Second Edition,
(Beazley), xv
Python Imaging Library,
104Python introspection, 328Python Lex-Yacc, 328–341
Python Library Reference,
xiii, 49, 74, 98, 366Python newsgroup, xivPython objects, 410–411, 413Python scripts, 49
Python Standard Library,
(Lundh), xv
Python Tutorial,
(Rossum), 417
Python Tutorial
Web site, 1Python Web site, xiv, 417Python XML-SIG, 408Pythonic polymorphism, 9–11pythonprefs module, 103PYX module, 414
document format, 414home page, 408
PyXML package, 408, 413
Q
Quantifiers, 240-242non-greedy, 215PLY grammar, 334SimpleParse module, 323–324
QUERY_STRING environment variable, 377, 378Queue module, 108
Quick Python Book, The,
(Harms & McDonald), xvQuickly sorting lines on custom criteria, 112–115QuickTime movies, 105quietconsole module, 103Quixote module, 398Quixote Web site, 398quopri encoding, 187quopri module, 122, 162
decode() function, 162decodestring() function, 162encode() function, 162encodestring() function, 162
Quoted printable encoding, 162Quoted Printable string, 159
R
rand() function, 82randline module, 40Random element, 83Random floating point value, 84Random generator, 84random module, 82–84
betavariate() function, 82choice() function, 83cunifvariate() function, 83expovariate() function, 83gamma() function, 83gauss() function, 83lognormvariate() function, 83normalvariate() function, 83paretovariate() function, 83Random class, 82random() function, 83
mertz_final_index.fm Page 506 Monday, May 5, 2003 9:26 AM
I
NDEX
507
randrange() function, 83seed() function, 84shuffle() function, 84uniform() function, 84vonmisesvariate() function, 84weibullvariate() function, 84
Ranges, 83Rapid searching, 201Ratio, 21raw_input() function, 446raw-unicode-escape encoding, 187re module, 147, 203, 215–216, 218, 231, 236–255
re.engine constant, 245re.error exception, 255re.escape() function, 245re.findall() function, 245re.I constant, 244re.IGNORECASE constant, 244re.L constant, 244re.LOCALE constant, 244re.M constant, 244re.compile() class factory, 248
.findall() method, 249
.flags attribute, 249
.groupindex attribute, 249
.match() method, 250
.pattern attribute, 250
.search() method, 250–251
.split() method, 251
.sub() method, 251
.subn() method, 252re.match() class factory, 248–249
.end() method, 252
.endpos attribute, 252
.expand() method, 252–253
.groupdict() method, 253
.grouping, 253
.groups() method, 253–254
.lastgroup attribute, 254
.lastindex attribute, 254
.pos attribute, 254
.re attribute, 254
.span() method, 254
.start() method, 255
.string attribute, 255re.MULTILINE constant, 244re.purge() function, 246re.S constant, 244re.search() class factory, 249
.end() method, 252
.endpos attribute, 252
.expand() method, 252–253
.groupdict() method, 253
.grouping, 253
.groups() method, 253–254
.lastgroup attribute, 254
.lastindex attribute, 254
.pos attribute, 254
.re attribute, 254
.span() method, 254
.start() method, 255
.string attribute, 255re.split() function, 223, 246re.sub() function, 213, 246–248re.subn() function, 248re.U constant, 245re.UNICODE constant, 245re.VERBOSE constant, 245re.X constant, 245
.read() method, 17, 37, 147, 285, 389read.backwards.pyutility, 127–128Reading
AIFC audio files, 104AIFF audio files, 104directory listings, 57–58file backwards by record, line, or paragraph,
126–128file in line-oriented style, 37–38gzip object, 174IFF audio data, 104line from file, 38–39lines with continuation characters, 226–227multiple files, 61–63URLs (Uniform Resource Locators), 391ZIP files, 176–181
.readline() method, 17, 37, 63, 285, 381, 389readline module, 108.readlines() method, 17, 37, 285, 381, 389rebinding names, 447reconvert module, 235reconvert.convert() function, 235Records, reading files backwards by, 126–128Recursive containers, 95Recursive objects, 97Redirected input, 61–63Redirected streams, 51re.DOTALL constant, 244reduce() function, 435–438, 447, 450Referential transparency, 484reformat_para.py file, 116–117Reformatting paragraphs, 115–117regex module, 231, 235Regular expressions, 194–195, 203–204
advanced extensions, 215–220alphanumeric character class (), 239alternation operator (|), 208, 240any character (.) wildcard, 207atomic operators, 236–240
mertz_final_index.fm Page 507 Monday, May 5, 2003 9:26 AM
508 I
NDEX
Regular expressions,
continued
atoms, 207attributes, 249–255backreferences, 210–211, 214, 218, 238backslash character , 206basic probability, 212beginning of line (^), 206, 239beginning of string (\A), 239bounded numeric quantifier, 242character class, 208character classes, 208, 238character codepages, 217checking for server errors, 224–226class factories, 248–249clearing cache, 246comments, 220comments (?#...), 242concept of state, 259constants, 244–245continuing over multiple lines, 220curly-brace quantification (), 210definition of, 204–205deprecated modules, 235detecting duplicate words, 223–224digit character class (\d), 208, 238end of line ($), 206, 239end of string (\Z), 239Escape (\) atomic operator, 236escape-style shortcuts, 208exact numeric quantifier, 241existential quantifier (+), 240extensions, 209-210, 215–220functions, 245–248grouping, 207grouping operators, 237–238group-like patterns, 242–244how many times atom occurs, 209identifying floating point, 262identifying URLs and email addresses in text,
228–229invalid, 255limitations, 203listing substrings, 246literal characters, 205, 207locating matched pattern, 213lookahead assertions, 219lookbehind assertions, 219–220looking for begin-line and end-line characters,
216lower-bound quantifier, 241matching patterns in text, 205–214matching too much, 211–212matching zero-length pattern for line begin-
nings (^), 208
methods, 249–255modifiers, 215–217modifying target text, 214modules, 234–255named group backreference (?P=name), 244named group identifier (?P<name>), 244negative lookahead assertion (?!...), 243negative lookbehind assertion (?<!...), 243newline character, 207non-alphanumeric character class (), 239non-backreferenced atom (?:...), 242–243non-digit character class (\D), 238non-greedy bounded quantifier, 242non-greedy existential quantifier (+), 241non-greedy potentiality quantifier (??), 241non-greedy quantifiers, 215non-greedy universal quantifier (*?), 240non-whitespace character class, 239non-word boundary, 240one or more times (+), 209operations, 236–255parsing data, 223pattern modifiers (?Limsux), 242pattern summary, 236, 237patterns, 207–208patterns matching tokens, 335positive lookahead assertion (?=...), 243positive lookbehind assertion (?<=...), 243potentiality quantifier (?), 241pretty-printing numbers, 229–231quantifiers, 209, 211–212, 240–242quoting as raw strings, 206reading lines with continuation characters,
226–227replacement patterns to accompany matches,
213reverse character class (^), 208spaces, 205standard modules, 231–255substituting literal text for literal text, 213summarizing command-line option documen-
tation, 221–223symbols with special meaning, 206text block flush left, 220–221Unicode alphabetic characters, 218universal quantifier (*), 240versions and optimizations, 231–232whitespace, 214whitespace character () shortcut, 208, 239wildcard character, 239word boundary, 239zero or one times (+), 209zero-width match, 206zero-width positional patterns, 207
mertz_final_index.fm Page 508 Monday, May 5, 2003 9:26 AM
I
NDEX
509
Regular files, 66, 70Relative file positioning, 156RELAX NG, 401Remote procedure calls, 407REMOTE_ADDR environment variable, 378re_new() function, 213repl function, 247"".replace() string method, 139–140Replacement backreferences, 214report2data() function, 289reporthook() function, 390Reports
concrete parse tree of components, 288other ways of processing, 281–282processing with concrete state machine,
274–280repr() function, 15, 229.__repr__() method, 54, 431
repr module, 96–98repr.maxlevel attribute, 97repr.maxlist attribute, 97repr.maxlong attribute, 98repr.maxother attribute, 98repr.maxstring attribute, 98repr.maxtuple attribute, 97repr.Repr() class, 97repr.repr() function, 98repr.repr_TYPE() function, 98
Representations, 97–98re_show() function, 205, 213, 217Resource fork, 161resource module, 103Resources, 390, 392return statement, 439rexec module, 108RFC-822, 344RFC-2822 date string, 365RFC-2231 encoded string, 364RFC-822 message manipulation class, 397RFC-2822 messages, 345, 350rfc822 module, 345, 397RFC-2822-formatted date, 364rfc822.Mailbox class, 372"".rfind() string method, 140–141RGB color model, 104rgbimg module, 105"".rindex() string method, 141riscos module, 74"".rjust() string method, 141–142rlcompleter module, 108RLE (run-length encoding), 455–456robotparser module, 285robots.txt access control file, 285rot13 encoding, 187
rotor objects, creation of, 169–170rotor.newrotor class, 169–170
.decrypt() method, 170
.decryptmore() method, 170
.encrypt() method, 170
.encryptmore() method, 170
.setkey() method, 170"".rstrip() string method, 142Runtime environment, 49–53RuntimeError exception, 281
S
salutation() function, 450–451Sample buyer/order report, 275SAX events, 406, 413SAX extension, 413SAX handlers, 406SAX parsers, 407SAX (Simple API for XML), 404–406sched module, 108Schneier, Bruce, 169Schwartz, Randal, 113Schwartzian Transforms, 113-115Scripting languages templating system, 34–35Scripts
locating resources, 102supporting old, 58
ScrolledText module, 108Search paths, 82Searching
dictionaries, 25–26rapid, 201
Secret Labs Regular Expression Engine, 236sed, 204.seek() method, 17, 147, 285 select module, 397self argument, 13self object, 15send_email.py file, 370–371Sequence operator (,), 325Sequences, 450
combining, 449difference and similarity of pairs, 283–284indexes, 449
Serial to line connections, 102Serialized objects, 94Serializing objects, 90–100Servers, checking for errors, 224–226services() function, 223Set datatype, 429–430.__setitem__() method, 431sets.Set module, 429–430.__setstate__() method, 93
mertz_final_index.fm Page 509 Monday, May 5, 2003 9:26 AM
510 I
NDEX
SGI, 104–105SGI systems (IRIX), 101SGML (Standard Generalized Markup Language),
285sgmllib module, 285SHA message digests, 170–172sha module, 167, 170–172
new class, 170–171sha class, 171
.copy() method, 171
.digest() method, 171
.hexdigest() method, 172
.update() method, 172sha object, 170–172SHA (Secure Hash Algorithm), 167, 170–172Shallow copy, 43–44sha.py file, 197Shared objects, 101Shared-key encryption, 484shelve databases, 98–99shelve module, 98–100, 147shlex module, 286shortline() function, 4show_opts() function, 223show_services.py file, 222shutil module, 68–69, 74
copy() function, 68copy2() function, 68copyfile() function, 68copyfileobj() function, 68copymode() function, 68copystat() function, 69copytree() function, 69rmtree() function, 69
Side effects, 447signal module, 108Signature-based functions, 8Silicon Graphics' Graphics Library, 104Simple generators, 439–441simple.cgi script, 377SimpleHTTPServer module, 105SimpleParse EBNF-style grammar
declaration patterns, 321–323quantified potentiality, 327–328
SimpleParse module, xvi, 286backtracking, 328grammar defining structure of processed text,
316literals, 323production, 322quantifiers, 323–324structures, 324–326taglist creation, 319traversal and use of generated mx.TextTools
taglist, 317useful productions, 326–327
simpleparse.common.calendar_names production, 326
simpleparse.common.chartypes production, 326simpleparse.common.comments production, 326simpleparse.common.iso_date production, 327simpleparse.common.iso_date_loose production,
327simpleparse.common.numbers production, 327simpleparse.common.phonetics production, 327simpleparse.common.string production, 327simpleparse.common.timezone_names produc-
tion, 327simpleTypography.py file, 317–320SimpleXMLRPCServer module, 407Siong, Ng Pheng, 165SIT, 173site module, 108.skip() method, PLY, 336slice() function, 57Slices, 43, 312–314Smart ASCII, marking up, 292–296, 317–318, 329Smart ASCII format, 272SMTP clients, 370–371SMTP server, 371SMTP (Simple Mail Transport Protocol), 121smtplib module, 343, 344, 370–371
smptlib.SMTP class, 371.login() method, 371.quit() method, 371.sendmail() method, 371
Snap shots, 42–43sndhdr module, 347, 397socket module, 343, 397Sockets, 70SocketServer module, 397Solutions, expressed in terms of what, 447.sort() method, 112–113Sorting
custom algorithm, 113–115custom comparison function, 113lines quickly on custom criteria, 112–115lists, 32, 112–113maintaining order, 105unnatural, 113
Sound file formats, 397Source code
analyzing, 106compiling possible incomplete, 106data as, 445–446mixed use of tabs and spaces, 286printing reports on, 107profiling performance characteristics, 107
mertz_final_index.fm Page 510 Monday, May 5, 2003 9:26 AM
INDEX 511
SourceForge Web site, xv, xviSpaces and tabs, 299Spam, 345"Spam Filtering Techniques," 345SpamBayes, 345Spark module, 328Special data values and formats, 82–89Special list functions, 448–449Specialized parsers, 282–286Specializing classes, 11Speech Manager interface, 102"".split() string method, 121, 142–144"".splitlines() string method, 144Splitting strings, 142–144, 216sprintf() function, 35sre module, 231, 236Stack frames, 49Stack traces, 109Standard color selection dialog interface, 101Standard Internet-related tools, 395–398Standard library
specialized parsers, 282–286text processing tools, 287XML modules, 403–407
Standard modules, 41–89Standard operations as functions, 47–48"".startswith() string method, 144.startswith() method, 3startText() function, 273Startup module, 108stat module, 69–71
S_ISBLK() function, 70S_ISCHR() function, 70S_ISDIR() function, 69–70S_ISFIFO() function, 70S_ISLNK() function, 70S_ISREG() function, 70S_ISSOCK() function, 70ST_ATIME constant, 70ST_CTIME constant, 71ST_DEV constant, 70ST_GID constant, 70ST_INO constant, 70ST_MODE constant, 70ST_MTIME constant, 71ST_NLINK constant, 70ST_SIZE constant, 70ST_UID constant, 70
statcache module, 58, 108State function body, 279State machines, 257
abstracting form, 273–274block-level, 292defining, 267–268
describing, 340input loop file, 272–273low-level parsing, 286–316parsers, 267state reuse, 280–281subgraphs, 280–281tag table, 288text processing, 268–269when not to use, 269–272when to use, 272–273
state variable, 273Stateful text, 258–260Stateful text file, 269StateMachine class, 274statemachine.py file, 273–274States
buyers tag table, 289–290concept of, 259diagram, 281handlers, 274, 279reuse, 280–281special behavior, 288tag table, 288tag tables, 289transitions, 279
stat_result object, 79statvfs module, 108STDERR pipes, 78STDERR (standard error stream), 425–427
file object, 50–51functions to spawn commands with pipes, 107
STDIN pipes, 77–78STDIN (standard input stream), 61, 147
file object, 51functions to spawn commands with pipes, 107
STDOUT pipes, 77–78STDOUT (standard output stream), 61–62, 147,
425–427buffered, nonvisible output, 103file object, 51functions to spawn commands with pipes, 107
StopIteration exception, 440, 441Storage object, 381Storing objects, 90–100str() function, 10, 15, 229, 355, 389, 450.__str__() method, 54, 352, 431str type, 33–34, 422-423
str.__add__() method, 33str.contains__() method, 33–34str.__getitem__() method, 33str.__getslice__() method, 33str.__hash__() method, 33str.__len__() method, 33str.__mul__() method, 33
mertz_final_index.fm Page 511 Monday, May 5, 2003 9:26 AM
512 INDEX
str type, continuedstr.__rmul__() method, 33
stray_punct tag table, 295–296Strict encoding, 187String buffer, 153–158String buffer objects, 153–154string datatype, 422–423String delimiting search paths, 82string functions, 111String interpolation datatypes, 423–425String methods, 111string module, 33, 111, 128–147
atof() function, 132atoi() function, 132atol() function, 132capitalize() function, 132–133capwords() function, 133center() function, 133count() function, 134digits constant, 130expandtabs() function, 134–135find() function, 135, 200, 221hexdigits constant, 130index() function, 135–136join() function, 129, 130, 137, 143joinfields() function, 138letters constant, 131ljust() function, 138lower() function, 138lowercase constant, 131lstrip() function, 139maketrans() function, 139, 145octdigits constant, 130printable constant, 132punctuation constant, 131replace() function, 129, 139–140, 213, 221rfind() function, 140–141rindex() function, 141rjust() function, 141–142rstrip() function, 142split() function, 37, 129, 137, 142–144, 223splitfields() function, 144strip() function, 144swapcase() function, 145translate() function, 139, 145–146upper() function, 146uppercase constant, 131whitespace constant, 131–132zfill() function, 146–147
string object, 129Stringifying calendars, 100–101StringIO module, 153–158
StringIO.StringIO class, 153-155.close() method, 155
.flush() method, 155
.getvalue() method, 155
.isatty() method, 155
.read() method, 156
.readline() method, 156
.readlines() method, 156
.seek() method, 156
.tell() method, 157
.truncate() method, 157
.write() method, 157
.writelines() method, 157–158Strings
all non-alphanumeric characters escaped, 245applying tag table, 310–311backslashes and double quotes escaped, 365base64 encoded, 158based on hex-encoded string hexstr, 312basic transformations, 128–147beginning of, 144Boolean values indicating property, 1368-byte, 186–188capitalized words, 133composed of slices from other strings, 313concatenating elements of list, 137concatenating to sha object, 172converting, 132converting letter case, 145converting objects to, 90cryptographic hash, 167–169customizing representation, 96–98default length, 98delimiting lines in file, 82dictionary-based interpolation, 35–36double quotes or angle brackets removed, 365ending, 134extracting content from fillers, 194–195file-based interface, 161–162as files, 147–158finding first occurence of any character, 314fuzzy matching against patterns, 283hexadecimal representation, 315identifying operating system, 82immutability, 129index position of substring, 135–136initial character converted to uppercase, 132–133interpolated values, 423–425interpolation of special characters, 35interpreter version information, 51–52leading and trailing whitespace characters
removed, 144leading whitespace characters removed, 139length of, 85listing
keys, 380
mertz_final_index.fm Page 512 Monday, May 5, 2003 9:26 AM
INDEX 513
lines, 144nonoverlapping substrings, 142–144in string buffer, 156
lowercase letters converted to uppercase, 146, 316
magic methods, 33manipulating image data stored as, 104message text contained in, 348modifying, 145–146multiple operations on, 4mutable, 130nonmagic methods, 33nonoverlapping occurrences of pattern, 245with normalized whitespace, 311occurrences of old replaced by new, 139–140one-byte from current position, 151packed values, 85padded with
leading spaces, 141–142leading zeros, 146–147symmetrical leading and trailing spaces,
133trailing spaces, 138
partial interpolation, 36–37path delimiters, 82path symbolic link, 78patterns matching, 233persistence, 147prefix in tuple, 313presence of single character in, 33as Python keyword, 107reading and writing, 15–17referring to directory, 81–82replacing, 314returning
from gzip object, 175from memory-mapped file object, 151from string buffer, 156
RFC-2231-encoded string, 364serialized form of object, 94with special characters escaped, 390splitting, 142–144, 216, 314–315starting at current file position, 150–151substrings, 134, 140tabs replaced by spaces, 134–135tagging, 290trailing whitespace characters removed, 142translation table, 139, 145–146uppercase letters converted to lowercase, 138,
313zlib compressed version, 182
"".strip() function, 144Strongly emphasized text, 317strongs tag table, 296
struct module, 84–86calcsize() function, 85pack() function, 85unpack() function, 86
Structured data, 90Structured text database, 484Structures
recursively containing themselves, 263SimpleParse module, 324–326
Subdirectories, 60Subelements, 400Subgraphs, 280–281Subshell, executing cmd command, 80Substrings, 134
copying within memory-mapped file object, 150
index position, 135–136, 140–141listing nonoverlapping, 142–144strings splitting into, 315
4Suite, 408, 414–415Summarizing command-line option documenta-
tion, 221–223Sun AU audio files, 105Sun audio hardware interface, 105sunau module, 105SUNAUDIODEV module, 105sunaudiodev module, 105"".swapcase() function, 145switch statement, 320symbol module, 285Symbolic constants, 69Symbolic links, 66–67, 69–70Symmetrical encryption, 163, 484sys function, 55sys module, 49–53, 74
sys.argv attribute, 49sys.byteorder attribute, 49sys.copyright attribute, 49sys.displayhook() function, 49sys.excepthook() function, 49sys.exc_traceback attribute, 49sys.exc_type attribute, 49sys.exc_value attribute, 49sys.execprefix attribute, 49sys.executable attribute, 49sys.exit() function, 52sys.getdefaultencoding() function, 52sys.getrefcount() function, 53sys.hexversion attribute, 50sys.last_traceback attribute, 49sys.last_type attribute, 49sys.last_value attribute, 49sys.maxint attribute, 50sys.maxunicode attribute, 50
mertz_final_index.fm Page 513 Monday, May 5, 2003 9:26 AM
514 INDEX
sys module, continuedsys.path attribute, 50sys.platform attribute, 50sys.stderr attribute, 50–51sys.__stderr__ attribute, 50–51sys.stdin attribute, 51sys.__stdin__ attribute, 51sys.stdout attribute, 51sys.__stdout__ attribute, 51sys.stdout.write() method, 425sys.tracebacklimit attribute, 49sys.version attribute, 51–52sys.version_info attribute, 52
syslog module, 103System configuration, 74SystemExit exception, 52
Ttabnanny module, 286tag() function, 296Tag tables, 288
applying to string, 310–311based on EBNF grammars, 316changing read-head position, 301correctly configuring, 299–300debugging, 297–298defining, 289documentary purposes, 300–301jump conditions, 299–300modifying, 320nonlooping, 297states, 288–289success state, 288tuple, 299type of pattern to match, 299
Tagging strings, 290Taglist, 292
comparing tuples on slice positions, 310generating, 318–320markup production, 319output, 321unreported production, 322usage, 318–320
TagStack class, 386tag_words tag table, 296Tasks
column statistics for delimited or flat-record files, 117–120
counting characters, words, lines and para-graphs, 120–121
quickly sorting lines on custom criteria, 112–115
reading file backwards by record, line, or para-
graph, 126–128reformatting paragraphs of text, 115–117text processing applications, 41transmitting binary data as ASCII, 121–123word or letter histograms, 123–126
TCL/Tk Python interface, 108.tell() method, 17, 285Telnet clients, 397telnetlib module, 343, 397tempfile module, 71
mktemp() function, 71TemporaryFile() function, 71
Temporaryfile object, 71filenames, 71files, 71, 80
TERMIOS module, 103termios module, 103t_error() function, 336Testing
capabilities, 10–11files, 69–71
Text, ix–xall letters, 131custom processing, xdescribing complex patterns, 204fast manipulation tools, 286–316identifying URLs and email addresses,
228–229locating patterns, xlowercase letters, 131matching patterns, 205–214processing, 2, 111reformatting paragraphs, 115–117stateful, 258–260uppercase letters, 131
Text blocks, 220–221Text editors, x, 115Text files
composed of multiple delimited parts as sev-eral files, 285
processing, 268–269stateful, 269stateful chunk, 268when not to use state machine, 269–272when to use state machine, 272–273
Text processingdefinition of, ix–xfilters, 3–4frequency, xiHOFs (higher-order functions), 1–7Internet protocols, 343large chunks of text, 2log files, 2–3
mertz_final_index.fm Page 514 Monday, May 5, 2003 9:26 AM
INDEX 515
philosophy, x–xistate machines, 268–269stateful, 267tasks, xi, 41
Textual sources and log files, 2–3textwrap module, 115Third-party
Internet-related tools, 398–399modules, 90XML-related tools, 408–416
thread module, 108Threaded applications, 107Threaded programming, 108threading module, 108Threat model, 196–198tidy, 384t_ignore variable, 331, 336Time, 86–89time module, 86–89
accept2dyear attribute, 86altzone constant, 86–87asctime() function, 87clock() function, 87ctime() function, 87daylight constant, 86–87gmtime() function, 88localtime() function, 88mktime() function, 88sleep() function, 88strftime() function, 88strptime() function, 89time() function, 89timezone constant, 86–87tzname constant, 86–87
Time tuple, 86–88Timestamps
copying data, 69email, 365
Timezone, 86"".title() function, 133Tix module, 108Tkinter module, 108t_MDASH(), 332t_newline() function, 331Token list, 329–335.token() method, 336token module, 285Token patterns, 335–336tokenize module, 285Tokenizers, 261, 285Tokens, 261
attributes, 335getting for grammar rules, 339identifying, 329
listing types, 330–331regular expressions matching, 335types, 329
tokens variable, 330, 335traceback module, 109traceback objects, 57"".translate() function, 145–146Transmitting binary data as ASCII, 121–123True division operator (/), 22t_RULENAME form, 329truncate method, 17try statements, 52, 443try/except/else statement, 443–444try/finally statement, 443–444tty module, 103tuple() function, 10tuple type, 28–32
tuple.__add__() method, 29–30tuple.__contains__() method, 30tuple.__getitem__() method, 30tuple.__getslice__() method, 30tuple.__hash__() method, 30–31tuple.__len__() method, 31tuple.__mul__() method, 31tuple.__rmul__() method, 31
Tuple-like objects, 28tuples, 316, 427Turing, Alan, 169turtle module, 108Twisted Matrix Laboratories Web site, 398Twisted Matrix library, 45Twisted module, 398twisted.python.usage module, 45Txt2Html utility, 272–273, 287, 292, 317.type attribute, PLY, 335type() function, 53–55, 421TYPE type, 98Typed arrays of numeric values, 105TypeError exception, 12, 30, 112, 157Types, 53–57types module, 53–57
BufferType constant, 55BuildinFunctionType constant, 55BuildinMethodType constant, 55ClassType constant, 55CodeType constant, 55ComplexType constant, 55DictionaryType constant, 55DictType constant, 55EllipsisType constant, 55FileType constant, 55–56FloatType constant, 56FrameType constant, 56FunctionType constant, 56
mertz_final_index.fm Page 515 Monday, May 5, 2003 9:26 AM
516 INDEX
types module, continuedGeneratorType constant, 56InstanceType constant, 56IntType constant, 56LambdaType constant, 56ListType constant, 56LongType constant, 56MethodType constant, 56ModuleType constant, 57NoneType constant, 57SliceType constant, 57StringType constant, 57StringTypes constant, 57TracebackType constant, 57TupleType constant, 57TypeType constant, 57UnboundMethodType constant, 56UnicodeType constant, 57XRangeType constant, 57
typography() function, 292–295typo_html.py file, 320
Uu"".encode() method, 188ulrparse module, 392–394ulrparse.urlparse() function, 393–394unary minus (-) operator, 22Unconditional commands, 300–301unichr() method, 188Unicode, 465
built-in functions and methods, 186–188CJK (Chinese-Japanese-Korean) alphabets,
185codepoint information, 191declarations, 468–469default string encoding, 52definition of, 466encodings, 185, 467finding codepoints, 469–470native support, 185resources, 470UTF-8, 185UTF-16, 185UTF-32, 185
Unicode characters, 50, 191–193Unicode Consortium, 466unicode datatype, 423Unicode file, 189–190unicode() function, 10–11, 186–188, 353Unicode object, 186–188Unicode string object, 188Unicode strings, 112, 186–188, 423unicodedata module, 191–193
bidirectional() function, 192category() function, 192combining() function, 192decimal() function, 192decomposition() function, 192–193digit() fuction, 193lookup() function, 193mirrored() function, 193name() function, 193numeric() function, 193
UnicodeError exception, 187, 189unicode-escape encoding, 187Unit testing framework, 109unittest module, 109Universal quantifier (*), 240, 323Unix, 102–103Unix (n) dmb library interface, 92Unix password database, 103Unix shell-like syntaxes, 286Unix syslog library interface, 103Unix-like directories, 77Unix-like systems
detailed information about current operating system, 81
hard link from path, 76killing external process, 76mailcap file, 396netrc file, 396path symbolic link, 78processing permissions, 74root directory, 75soft link between paths, 80wc utility, 120
Unix-style passwords, 166untabify.py utility, 221Updating
dictionaries, 27os.enviorn variable, 78
"".upper() function, 146Uppercase letters, 131urldump.py file, 284urlencoded query for POST or GET request,
390–391url_examine.py file, 224
urllib module, 388–392, 398FancyURLopener class, 391quote() function, 390quote_plus() function, 390unquote() function, 390unquote_plus() function, 390URFancyLopener.version attribute, 392urlencode() function, 389–391URLFancyopener.get_user_passwd() method,
391
mertz_final_index.fm Page 516 Monday, May 5, 2003 9:26 AM
INDEX 517
URLFancyopener.open() method, 392URLFancyopener.open_unknown() method,
392URLFancyopener.prompt_user_passwd()
method, 392URLFancyopener.retrieve() method, 392urlopen() function, 389URLopener.open() method, 392URLopener.open_unknown() method, 392URLopener.retrieve() method, 392URLopener.version attribute, 392urlretrieve() function, 390
urllib.URLopener class, 391urllib objects interface, 388.close() method, 389
urllib2 module, 398urlparse module, 282
urljoin() function, 394urlunparse() function, 394
URLs (Uniform Resource Locators), 389components, 392–394constructing, 394copying, 392identifying, 228–229opening, 392parsing, 392–394reading, 391
US-ASCII encoding, 186user module, 108User-defined classes, 55–57UserDict module, 11, 36
UserDict.UserDict class, 24–27.clear() method, 26.__cmp__() method, 24–25.__contains__() method, 25.copy() method, 26.__delitem__() method, 25.get() method, 26.__getitem__() method, 25.has_key() method, 26.items() method, 26–27.iteritems() method, 26–27.iterkeys() method, 27.itervalues() method, 27.keys() method, 27.__len__() method, 26.popitem() method, 27.setdefault() method, 27.__setitem__() method, 26.update() method, 27.values() method, 27
UserInt module, 12USERLEVEL state, 269UserList module, 11
UserList class, 28–32.__add__() method, 29–30.append() method, 32.__contains__() method, 30.count() method, 32.__delitem__() method, 30.__delslice__() method, 30.extend() method, 32.__getitem__() method, 30.__getslice__() method, 30.__hash__() method, 30–31.__iadd__() method, 29–30.__imul__() method, 31.index() method, 32.__len__() method, 31.__mul__() method, 31.pop() method, 32.remove() method, 32.__rmul__() method, 31.__setitem__() method, 31.__setslice__() method, 31.sort() method, 32
UserString module, 11UserString class, 33–34
.__contains__() method, 33–34
.__iadd__() method, 33
.__imul__() method, 33
.__radd__() method, 33UTF-8 encoded files, 468UTF-16 encoded files, 468utf-7 encoding, 187utf-8 encoding, 187utf-16 encoding, 187utf-16-be encoding, 187utf-16-le encoding, 187Utilities, command-line switches, 44–47Utility functions, 271–272, 311–316, 406uu module, 122, 163uu.decode() function, 163uu.encode() function, 163UUencoding, 121–122, 159
VValid XML documents, 401Validating parser, 403Validating XML parser, 413.value attribute, PLY, 335, 336ValueError exception, 135, 141, 380.values() dictionary method, 355, 380van Rossum, Guido, 1, 417Vaults of Parnassus Encryption/Encoding index
Web site, 164Vaults of Parnassus Web site, xv, 112, 203, 398
mertz_final_index.fm Page 517 Monday, May 5, 2003 9:26 AM
518 INDEX
videoreader module, 105Virtual machine, 418Viruses, 345visitor.cgi script, 379Visual C++ Runtime libraries, 102
WW module, 103Wall, Larry, xiWarning messages, modifying behavior, 109Warnings interpreter, 50–51warnings module, 109waste module, 104WAV audio files, 105wave module, 105W3C Document Object Model, Level 2, 404wc utility, 120W3C XML Schema, 401–402wc.py file, 121weakref module, 109Web application server, 399Web bugs, 378Web clients, 396Web pages, dynamic, 34Web servers
checking for errors, 224–226robots.txt access control file, 285
Webbrowser module, 398Well-formed XML documents, 401wget, 392whichdb module, 93whichdb.whichdb() function, 93while/else/continue/break statements, 438–439Whitespace, 263
regular expressions, 214as single divider, 143
Whitespace character () shortcut, 208, 239Whitespace characters, 312Whitespace compression, 455Whitespace-separated words, 315whrandom module, 109Wichmann-Hill random number generator, 82, 84,
109Widgets for Mac, 103Wildcard character (.""), 239Windows
access to registry, 100alternative path delimiter, 81launching application, 79
Windows-specific functions, 102Windows-style configuration files, 282–283_winreg module, 100winsound module, 104
WinZip, 176Word boundary (\b), 239Word or letter histograms, 123–126Word similarity, 283Word-based Huffman compressed text, 460–461word_huffman module, 460–464wordplusscanner.py file, 331–332Words, counting, 120–121wordscanner.py file, 330word_set characters, 290Working directory, 75World Wide Web applications
accessing Internet resources, 388–394CGI (Common Gateway Interface), 376–383HTML documents, 383–388
WorldScript-Aware Styled Text Engine interface, 104
write() method, 17writelines() method, 17write_payload_list.py file, 361Writer objects, 284Writing
AIFC audio files, 104AIFF audio files, 104gzipped files, 173–175networked applications, 398ZIP files, 176–181
XXDR (eXternal Data Representation), 104xdrlib module, 104XHTML parsers, 384–388XHTML-style empty tag, 388XML documents
canonicalized, 413CDATA, 401comments, 401document type declarations, 401DOM (Document Object Model), 384, 403–
404DTD (Document Type Definition), 401–402event-based API, 404indices of, 409knowledge management, 414–415nodes, 401processing instructions, 401PYX format, 414SAX (Simple API for XML), 404schemas, 402–403transforming into Python objects, 409–410tree of nodes, 403–404valid, 401validating parser, 403
mertz_final_index.fm Page 518 Monday, May 5, 2003 9:26 AM
INDEX 519
well-formed, 401XML Schema, 401XSLT stylesheets, 403
XML (Extensible Markup Language), 344, 399attributes in XML tags mapping names to val-
ues, 399–400data model, 399–401dialect and DTDs, 261dialects, 399elements, 400nesting tags inside tags, 400–401nodes, 400–401parsing, 407subelements, 400third-party tools, 408–416
xml package, 282XML Schema, 401xmlcat.py file, 406xml.dom module, 384, 404xml.dom.minidom module, 405xml.dom.pulldom module, 405, 413xmllib module, 407xml.parsers.expat module, 405xml_pickle module, 147xmlproc, 413XML-RPC format, 407xmlrpclib module, 407xml.sax module, 384xml.sax package, 405–406xml.sax.handler module, 406xml.sax.saxutils module, 406xml.sax.writers, 413xml.sax.xmlreader module, 4074xpath, 409XPath support, 413xrange() function, 434.xreadlines() method, 17, 37, 434xreadlines module, 72xreadlines.xreadlines() function, 72, 120, 126XSLT
stylesheets, 403support, 413transformations, 415
Yyacc empty productions, 338yacc module, 337–339yacc.errok(), 340yacc.py, 329yacc.restart(), 340YaccSlice object, 332, 338yacc.token(), 340YAML format, 415–416
YAML home page, 409yaml module, 415–416YAML tools, 94yield statement, 439–441YIQ color space, 104
ZZawinski, Jamie, 204Z_BEST_COMPRESSION constant, 182Z_BEST_SPEED constant, 182Zero-case lexer, 261ZIP archives, 177–178ZIP files, 176–181ZIP format, 172–173, 177zip() function, 447, 449zipfile module, 173, 176–181
BadZipFile exception, 181error exception, 181is_zipfile() function, 177PyZipFile class, 177stringCentralDir constant, 177stringEndArchive constant, 177stringFileHeader constant, 177structCentralDir constant, 177structEndArchive constant, 177structFileHeader constant, 177ZIP_DEFLATED symbolic name, 177ZipFile class, 177–179
.close() method, 178
.compression attribute, 179
.debug attribute, 179
.filelist attribute, 179
.filename attribute, 179
.fp attribute, 179
.getinfo() method, 178
.infolist() method, 178
.mode attribute, 179
.namelist() method, 178
.NameToInfo attribute, 179
.printdir() method, 178
.read attribute, 178
.start_dir attribute, 179
.testzip() method, 178
.write() method, 179
.writestr() method, 179ZipInfo class, 178-179
.comment attribute, 180
.compress_size attribute, 180
.compress_type attribute, 180
.CRC attribute, 179
.create_system attribute, 180
.create_version attribute, 180
.date_time attribute, 180
mertz_final_index.fm Page 519 Monday, May 5, 2003 9:26 AM
520 INDEX
zipfile module, continued.external_attr attribute, 180.extract_version attribute, 180.filename attribute, 180.file_offset attribute, 180.file_size attribute, 180.header_offset attribute, 180.volume attribute, 180
ZIP_STORED symbolic name, 177zlib library, 181–185zlib module, 173, 181–185
adler32() function, 182compress() function, 182compressobj object, 183compressobj.flush() method, 184
crc32() function, 182decompress() function, 182decompressobj object, 183decompressobj.decompress() method, 185decompressobj.flush() method, 185decompressobj.unused_data attribute, 184error exception, 185Z_BEST_COMPRESSION constant, 181Z_BEST_SPEED constant, 181ZLIB_VERSION attribute, 181compressobj.compress() method, 183–184Z_HUFFMAN_ONLY constant, 181
ZODB (Zope Object Database) library, 100, 147Zope home page, 399Zope module, 399
mertz_final_index.fm Page 520 Monday, May 5, 2003 9:26 AM