Carview!

Content preview·Buy reprint rights for this chapter

Python is a general-purpose programming language. It has been around for quite a while: Guido van Rossum, Python's creator, started developing Python back in 1990. This stable and mature language is very high level, dynamic, object-oriented, and cross-platform—all characteristics that are very attractive to developers. Python runs on all major hardware platforms and operating systems, so it doesn't constrain your platform choices.

Python offers high productivity for all phases of the software life cycle: analysis, design, prototyping, coding, testing, debugging, tuning, documentation, deployment, and, of course, maintenance. Python's popularity has seen steady, unflagging growth over the years. Today, familiarity with Python is an advantage for every programmer, as Python is likely to have some useful role to play as a part of any software solution.

Python provides a unique mix of elegance, simplicity, and power. You'll quickly become productive with Python, thanks to its consistency and regularity, its rich standard library, and the many other modules that are readily available for it. Python is easy to learn, so it is quite suitable if you are new to programming, yet at the same time it is powerful enough for the most sophisticated expert.

The Python language, while not minimalist, is rather spare, for good pragmatic reasons. When a language offers one good way to express a design idea, supplying other ways has only modest benefits, while the cost in terms of language complexity grows with the number of features. A complicated language is harder to learn and to master (and to implement efficiently and without bugs) than a simpler one. Any complications and quirks in a language hamper productivity in software maintenance, particularly in large projects, where many developers cooperate and often maintain code originally written by others.

Python is simple, but not simplistic. It adheres to the idea that if a language behaves a certain way in some contexts, it should ideally work similarly in all contexts. Python also follows the principle that a language should not have convenient shortcuts, special cases, ad hoc exceptions, overly subtle distinctions, or mysterious and tricky under-the-covers optimizations. A good language, like any other designed artifact, must balance such general principles with taste, common sense, and a high degree of practicality.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Python language, while not minimalist, is rather spare, for good pragmatic reasons. When a language offers one good way to express a design idea, supplying other ways has only modest benefits, while the cost in terms of language complexity grows with the number of features. A complicated language is harder to learn and to master (and to implement efficiently and without bugs) than a simpler one. Any complications and quirks in a language hamper productivity in software maintenance, particularly in large projects, where many developers cooperate and often maintain code originally written by others.

Python is simple, but not simplistic. It adheres to the idea that if a language behaves a certain way in some contexts, it should ideally work similarly in all contexts. Python also follows the principle that a language should not have convenient shortcuts, special cases, ad hoc exceptions, overly subtle distinctions, or mysterious and tricky under-the-covers optimizations. A good language, like any other designed artifact, must balance such general principles with taste, common sense, and a high degree of practicality.

Python is a general-purpose programming language, so Python's traits are useful in any area of software development. There is no area where Python cannot be part of an optimal solution. "Part" is an important word here—while many developers find that Python fills all of their needs, Python does not have to stand alone. Python programs can cooperate with a variety of other software components, making it an ideal language for gluing together components written in other languages.

Python is a very-high-level language. This means that Python uses a higher level of abstraction, conceptually farther from the underlying machine, than do classic compiled languages, such as C, C++, and Fortran, which are traditionally called high-level languages. Python is also simpler, faster to process, and more regular than classic high-level languages. This affords high programmer productivity and makes Python an attractive development tool. Good compilers for classic compiled languages can often generate binary machine code that runs much faster than Python code. However, in most cases, the performance of Python-coded applications proves sufficient. When it doesn't, you can apply the optimization techniques covered in Chapter 17 to enhance your program's performance while keeping the benefits of high programming productivity.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

There is more to Python programming than just the Python language: the standard Python library and other extension modules are almost as important for effective Python use as the language itself. The Python standard library supplies many well-designed, solid, 100% pure Python modules for convenient reuse. It includes modules for such tasks as data representation, string and text processing, interacting with the operating system and filesystem, and web programming. Because these modules are written in Python, they work on all platforms supported by Python.

Extension modules, from the standard library or from elsewhere, let Python applications access functionality supplied by the underlying operating system or other software components, such as graphical user interfaces (GUIs), databases, and networks. Extensions afford maximal speed in computationally intensive tasks, such as XML parsing and numeric array computations. Extension modules that are not coded in Python, however, do not necessarily enjoy the same cross-platform portability as pure Python code.

You can write special-purpose extension modules in lower-level languages to achieve maximum performance for small, computationally intensive parts that you originally prototyped in Python. You can also use tools such as SWIG to make existing C/C++ libraries into Python extension modules, as we'll see in Chapter 24. Finally, you can embed Python in applications coded in other languages, exposing existing application functionality to Python scripts via dedicated Python extension modules.

This book documents many modules, both from the standard library and from other sources, in areas such as client- and server-side network programming, GUIs, numerical array processing, databases, manipulation of text and binary files, and interaction with the operating system.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python currently has two production-quality implementations, CPython and Jython, and one experimental implementation, Python .NET. This book primarily addresses CPython, which I refer to as just Python for simplicity. However, the distinction between a language and its implementations is an important one.

Classic Python (a.k.a., CPython, often just called Python) is the fastest, most up-to-date, most solid and complete implementation of Python. CPython is a compiler, interpreter, and set of built-in and optional extension modules, coded in standard C. CPython can be used on any platform where the C compiler complies with the ISO/IEC 9899:1990 standard (i.e., all modern, popular platforms). In Chapter 2, I'll explain how to download and install CPython. All of this book, except Chapter 24 and a few sections explicitly marked otherwise, applies to CPython.

Jython is a Python implementation for any Java Virtual Machine (JVM) compliant with Java 1.2 or better. Such JVMs are available for all popular, modern platforms. To use Jython well, you need some familiarity with fundamental Java classes. You do not have to code in Java, but documentation and examples for existing Java classes are couched in Java terms, so you need a nodding acquaintance with Java to read and understand them. You also need to use Java supporting tools for tasks such as manipulating .jar files and signing applets. This book deals with Python, not with Java. For Jython usage, you should complement this book with Jython Essentials, by Noel Rappin and Samuele Pedroni (O'Reilly), possibly Java in a Nutshell, by David Flanagan (O'Reilly), and, if needed, some of the many other Java resources available.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python is developed by the Python Labs of Zope Corporation, which consists of half a dozen core developers headed by Guido van Rossum, Python's inventor, architect, and Benevolent Dictator For Life (BDFL). This title means that Guido has the final say on what becomes part of the Python language and standard libraries.

Python intellectual property is vested in the Python Software Foundation (PSF), a non-profit corporation devoted to promoting Python, with dozens of individual members (nominated for their contributions to Python, and including all of the Python core team) and corporate sponsors. Most PSF members have commit privileges to Python's CVS tree on SourceForge (https://sf.net/cvs/?group_id=5470), and most Python CVS committers are members of the PSF.

Proposed changes to Python are detailed in public documents called Python Enhancement Proposals (PEPs), debated (and sometimes advisorily voted upon) by Python developers and the wider Python community, and finally approved or rejected by Guido, who takes debate and votes into account but is not bound by them. Hundreds of people contribute to Python development, through PEPs, discussion, bug reports, and proposed patches to Python sources, libraries, and documentation.

Python Labs releases minor versions of Python (2.x, for growing values of x) about once or twice a year. 2.0 was released in October 2000, 2.1 in April 2001, and 2.2 in December 2001. Python 2.3 is scheduled to be released in early 2003. Each minor release adds features that make Python more powerful and simpler to use, but also takes care to maintain backward compatibility. One day there will be a Python 3.0 release, which will be allowed to break backward compatibility to some extent. However, that release is still several years in the future, and no specific plans for it currently exist.

Each minor release 2.x starts with alpha releases, tagged as 2.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The richest of all Python resources is the Internet. The starting point is Python's site, https://www.python.org, which is full of interesting links that you will want to explore. And https://www.jython.org is a must if you have any interest in Jython.

Python and Jython come with good documentation. The manuals are available in many formats, suitable for viewing, searching, and printing. You can browse the manuals on the Web at https://www.python.org/doc/current/. You can find links to the various formats you can download at https://www.python.org/doc/current/download.html, and https://www.python.org/doc/ has links to a large variety of documents. For Jython, https://www.jython.org/docs/ has links to Jython-specific documents as well as general Python ones. The Python FAQ (Frequently Asked Questions) is at https://www.python.org/doc/FAQ.html, and the Jython-specific FAQ is at https://www.jython.org/cgi-bin/faqw.py?req=index.

Most Python documentation (including this book) assumes some software development knowledge. However, Python is quite suitable for first-time programmers, so there are exceptions to this rule. A few good introductory online texts are:

Josh Cogliati's "Non-Programmers Tutorial For Python," available at https://www.honors.montana.edu/~jjc/easytut/easytut/
Alan Gauld's "Learning to Program," available at https://www.crosswinds.net/~agauld/
Allen Downey and Jeffrey Elkner's "How to Think Like a Computer Scientist (Python Version)," available at https://www.ibiblio.org/obp/thinkCSpy/

The URL https://www.python.org/psa/MailingLists.html

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

You can install Python, in both classic (CPython) and JVM (Jython) versions, on most platforms. With a suitable development system (C for CPython, Java for Jython), you can install Python from its source code distribution. On popular platforms, you also have the alternative of installing from a prebuilt binary distribution.

Installing CPython from a binary distribution is faster, saves you substantial work on some platforms, and is the only possibility if you have no suitable C development system. Installing from a source code distribution gives you more control and flexibility, and is the only possibility if you can't find a suitable prebuilt binary distribution for your platform. Even if you install from binaries, I recommend you also download the source distribution, which includes examples and demos that may be missing from prebuilt binary packages.

To install Python from source code, you need a platform with an ISO-compliant C compiler and ancillary tools such as make. On Windows, the normal way to build Python is with the Microsoft product Visual C++.

To download Python source code, visit https://www.python.org and follow the link labeled Download. The latest version at the time of this writing is:

https://www.python.org/ftp/python/2.2.2/Python-2.2.2.tgz

The .tgz file extension is equivalent to .tar.gz (i.e., a tar archive of files, compressed by the powerful and popular gzip compressor).

On Windows, installing Python from source code can be a chore unless you are already familiar with Microsoft Visual C++ and used to working at the Windows command line (i.e., in the text-oriented windows known as MS-DOS Prompt or Command Prompt, depending on your version of Windows).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

To install Python from source code, you need a platform with an ISO-compliant C compiler and ancillary tools such as make. On Windows, the normal way to build Python is with the Microsoft product Visual C++.

To download Python source code, visit https://www.python.org and follow the link labeled Download. The latest version at the time of this writing is:

https://www.python.org/ftp/python/2.2.2/Python-2.2.2.tgz

The .tgz file extension is equivalent to .tar.gz (i.e., a tar archive of files, compressed by the powerful and popular gzip compressor).

On Windows, installing Python from source code can be a chore unless you are already familiar with Microsoft Visual C++ and used to working at the Windows command line (i.e., in the text-oriented windows known as MS-DOS Prompt or Command Prompt, depending on your version of Windows).

If the following instructions give you trouble, I suggest you skip ahead to the material on installing Python from binaries later in this chapter. It may be a good idea, on Windows, to do an installation from binaries anyway, even if you also install from source code. This way, if you notice anything strange while using the version you installed from source code, you can double-check with the installation from binaries. If the strangeness goes away, it must have been due to some quirk in your installation from source code, and then you know you must double-check the latter.

In the following sections, for clarity, I assume you have made a new directory named C:\Py and downloaded Python-2.2.2.tgz there. Of course, you can choose to name and place the directory as it best suits you.

Section 2.1.1.1: Uncompressing and unpacking the Python source code

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

If your platform is popular and current, you may find a prebuilt and packaged binary version of Python ready for installation. Binary packages are typically self-installing, either directly as executable programs, or via appropriate system tools, such as the RedHat Package Manager (RPM) on Linux and the Microsoft Installer (MSI) on Windows. Once you have downloaded a package, install it by running the program and interactively choosing installation parameters, such as the directory where Python is to be installed.

To download Python binaries, visit https://www.python.org and follow the link labeled Download. At the time of this writing, the only binary installer directly available from the main Python site is a Windows installer executable:

https://www.python.org/ftp/python/2.2.2/Python-2.2.2.exe

Many third parties supply free binary Python installers for other platforms. For Linux distributions, see https://rpmfind.net if your distribution is RPM-based (RedHat, Mandrake, SUSE, and so on) or https://www.debian.org for Debian. The site https://www.python.org/download/ provides links to binary distributions for Macintosh, OS/2, Amiga, RISC OS, QNX, VxWorks, IBM AS/400, Sony PlayStation 2, and Sharp Zaurus. Older Python versions, mainly 1.5.2, are also usable and functional, though not as powerful and polished as the current Python 2.2.2. The download page provides links to 1.5.2 installers for older or less popular platforms (MS-DOS, Windows 3.1, Psion, BeOS, etc.).

ActivePython (https://www.activestate.com/Products/ActivePython) is a binary package of Python 2.2 for 32-bit versions of Windows and x86 Linux.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

To install Jython, you need a Java Virtual Machine (JVM) that complies with Java 1.1 or higher. See https://www.jython.org/platform.html for advice on JVMs for your platform.

To download Jython, visit https://www.jython.org and follow the link labeled Download. The latest version at the time of this writing is:

https://prdownloads.sf.net/jython/jython-21.class

In the following section, for clarity, I assume you have created a new directory named C:\Jy and downloaded jython-21.class there. Of course, you can choose to name and place the directory as it best suits you. On Unix-like platforms, in particular, the directory name will more likely be something like ~/Jy.

The Jython installer .class file is a self-installing program. Open an MS-DOS Prompt window (or a shell prompt on a Unix-like platform), change directory to C:\Jy, and run your Java interpreter on the Jython installer. Make sure to include directory C:\Jy in the Java CLASSPATH. With most releases of Sun's Java Development Kit (JDK), for example, you can run:

C:\Jy> java 
               -cp 
               . 
               jython-21

This runs a GUI installer that lets you choose destination directory and options. If you want to avoid the GUI, you can use the -o switch on the command line. The switch lets you specify the installation directory and options directly on the command line. For example:

C:\Jy> java 
               -cp 
               . 
               jython-21 
               -o 
               C:\Jython-2.1 
               demo 
               lib 
               source

installs Jython, with all optional components (demos, libraries, and source code), in directory C:\Jython-2.1. The Jython installation builds two small, useful command files. One, run as jython (named jython.bat on Windows), runs the interpreter. The other, run as

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

To develop software systems in Python, you produce text files that contain Python source code and documentation. You can use any text editor, including those in Integrated Development Environments (IDEs). You then process the source files with the Python compiler and interpreter. You can do this directly, or implicitly inside an IDE, or via another program that embeds Python. The Python interpreter also lets you execute Python code interactively, as do IDEs.

The Python interpreter program is run as python (it's named python.exe on Windows). python includes both the interpreter itself and the Python compiler, which is implicitly invoked, as needed, on imported modules. Depending on your system, the program may have to be in a directory listed in your PATH environment variable. Alternatively, as with any other program, you can give a complete pathname to it at the command (shell) prompt, or in the shell script (or .BAT file, shortcut target, etc.) that runs it. On Windows, you can also use Start → Programs → Python 2.2 → Python (command line).

Besides PATH, other environment variables affect the python program. Some environment variables have the same effects as options passed to python on the command line; these are documented in the next section. A few provide settings not available via command-line options:

PYTHONHOME: The Python installation directory. A libsubdirectory, containing the standard Python library modules, should exist under this directory. On Unix-like systems, the standard library modules should be in subdirectory lib/python-2.2 for Python 2.2, lib/python-2.3 for Python 2.3, and so on.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Python interpreter program is run as python (it's named python.exe on Windows). python includes both the interpreter itself and the Python compiler, which is implicitly invoked, as needed, on imported modules. Depending on your system, the program may have to be in a directory listed in your PATH environment variable. Alternatively, as with any other program, you can give a complete pathname to it at the command (shell) prompt, or in the shell script (or .BAT file, shortcut target, etc.) that runs it. On Windows, you can also use Start → Programs → Python 2.2 → Python (command line).

Besides PATH, other environment variables affect the python program. Some environment variables have the same effects as options passed to python on the command line; these are documented in the next section. A few provide settings not available via command-line options:

PYTHONHOME: The Python installation directory. A libsubdirectory, containing the standard Python library modules, should exist under this directory. On Unix-like systems, the standard library modules should be in subdirectory lib/python-2.2 for Python 2.2, lib/python-2.3 for Python 2.3, and so on.
PYTHONPATH: A list of directories, separated by colons on Unix-like systems and by semicolons on Windows. Modules are imported from these directories. This extends the initial value for Python's sys.path variable. Modules, importing, and the sys.path variable are covered in Chapter 7.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Python interpreter's built-in interactive mode is the simplest development environment for Python. It is a bit primitive, but it is lightweight, has a small footprint, and starts fast. Together with an appropriate text editor (as discussed later in this chapter) and line-editing and history facilities, it is a usable and popular development environment. However, there are a number of other development environments that you can also use.

Python's Integrated DeveLopment Environment (IDLE) comes with the standard Python distribution. IDLE is a cross-platform, 100% pure Python application based on Tkinter (see Chapter 16). IDLE offers a Python shell, similar to interactive Python interpreter sessions but richer in functionality. It also includes a text editor optimized to edit Python source code, an integrated interactive debugger, and several specialized browsers/viewers.

IDLE is mature, stable, easy to use, and rich in functionality. Promising new Python IDEs that share IDLE's free and cross-platform nature are emerging. Red Hat's Source Navigator (https://sources.redhat.com/sourcenav/) supports many languages. It runs on Linux, Solaris, HPUX, and Windows. Boa Constructor (https://boa-constructor.sf.net/) is Python-only and still beta-level, but well worth trying out. Boa Constructor includes a GUI builder for the wxWindows cross-platform GUI toolkit.

Python is cross-platform, and this book focuses on cross-platform tools and components. However, Python also provides good platform-specific facilities, including IDEs, on many platforms it supports. For the Macintosh, MacPython includes an IDE (see https://www.python.org/doc/current/mac/mac.html

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Whatever tools you use to produce your Python application, you can see your application as a set of Python source files. A script is a file that you can run directly. A module is a file that you can import (as covered in Chapter 7) to provide functionality to other files or to interactive sessions. A Python file can be both a module and a script, exposing functionality when imported, but also suitable for being run directly. A useful and widespread convention is that Python files that are primarily meant to be imported as modules, when run directly, should execute self-test operations. Testing is covered in Chapter 17.

The Python interpreter automatically compiles Python source files as needed. Python source files normally have extension .py. Python saves the compiled bytecode file for each module in the same directory as the module's source, with the same basename and extension .pyc (or .pyo if Python is run with option -O). Python does not save the compiled bytecode form of a script when you run the script directly; rather, Python recompiles the script each time you run it. Python saves bytecode files only for modules you import. It automatically rebuilds each module's bytecode file whenever necessary, for example when you edit the module's source. Eventually, for deployment, you may package Python modules using tools covered in Chapter 26.

You can run Python code interactively, with the Python interpreter or an IDE. Normally, however, you initiate execution by running a top-level script. To run a script, you give its path as an argument to python, as covered earlier in this chapter. Depending on your operating system, you can invoke python directly, from a shell script, or in a command file. On Unix-like systems, you can make a Python script directly executable by setting the file's permission bits x and r and beginning the script with a so-called shebang line, which is a first line of the form:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The jython interpreter built during installation (see Chapter 2) is run similarly to the python program:

[path]jython {options} [ -j jar | -c command | file | - ] {arguments}

-j jar tells jython that the main script to run is __run__.py in the .jar file. Options -i, -S, and -v are the same as for python. --help is like python's -h, and --version is like python's --V. Instead of environment variables, jython uses a text file named registryin the installation directory to record properties with structured names. Property python.path, for example, is the Jython equivalent of Python's environment variable PYTHONPATH. You can also set properties with jython command-line options, in the form -D name = value.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

This chapter is a quick guide to the Python language. To learn Python from scratch, I suggest you start with Learning Python, by Mark Lutz and David Ascher (O'Reilly). If you already know other programming languages and just want to learn the specifics of Python, this chapter is for you. I'm not trying to teach Python here, so we're going to cover a lot of ground at a pretty fast pace.

The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language and specifies such things as what variable names look like and what characters are used for comments. Each Python source file, like any other text file, is a sequence of characters. You can also usefully see it as a sequence of lines, tokens, or statements. These different syntactic views complement and reinforce each other. Python is very particular about program layout, especially with regard to lines and indentation, so you'll want to pay attention to this information if you are coming to Python from another language.

A Python program is composed of a sequence of logical lines, each made up of one or more physical lines. Each physical line may end with a comment. A pound sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the physical line end are part of the comment, and the Python interpreter ignores them. A line containing only whitespace, possibly with a comment, is called a blank line, and is ignored by the interpreter. In an interactive interpreter session, you must enter an empty physical line (without any whitespace or comment) to terminate a multiline statement.

In Python, the end of a physical line marks the end of most statements. Unlike in other languages, Python statements are not normally terminated with a delimiter, such as a semicolon (

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language and specifies such things as what variable names look like and what characters are used for comments. Each Python source file, like any other text file, is a sequence of characters. You can also usefully see it as a sequence of lines, tokens, or statements. These different syntactic views complement and reinforce each other. Python is very particular about program layout, especially with regard to lines and indentation, so you'll want to pay attention to this information if you are coming to Python from another language.

A Python program is composed of a sequence of logical lines, each made up of one or more physical lines. Each physical line may end with a comment. A pound sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the physical line end are part of the comment, and the Python interpreter ignores them. A line containing only whitespace, possibly with a comment, is called a blank line, and is ignored by the interpreter. In an interactive interpreter session, you must enter an empty physical line (without any whitespace or comment) to terminate a multiline statement.

In Python, the end of a physical line marks the end of most statements. Unlike in other languages, Python statements are not normally terminated with a delimiter, such as a semicolon (;). When a statement is too long to fit on a single physical line, you can join two adjacent physical lines into a logical line by ensuring that the first physical line has no comment and ends with a backslash (\). Python also joins adjacent physical lines into one logical line if an open parenthesis ((), bracket ([), or brace ({) has not yet been closed. Triple-quoted string literals can also span physical lines. Physical lines after the first one in a logical line are known as

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The operation of a Python program hinges on the data it handles. All data values in Python are represented by objects, and each object, or value, has a type. An object's type determines what operations the object supports, or, in other words, what operations you can perform on the data value. The type also determines the object's attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object. I cover object attributes and items in detail later in this chapter.

The built-in type( obj ) accepts any object as its argument and returns the type object that represents the type of obj. Another built-in function, isinstance( obj,type ), returns True if object obj is represented by type object type; otherwise, it returns False (built-in names True and False were introduced in Python 2.2.1; in older versions, 1 and 0 are used instead).

Python has built-in objects for fundamental data types such as numbers, strings, tuples, lists, and dictionaries, as covered in the following sections. You can also create user-defined objects, known as classes, as discussed in detail in Chapter 5.

The built-in number objects in Python support integers (plain and long), floating-point numbers, and complex numbers. All numbers in Python are immutable objects, meaning that when you perform an operation on a number object, you always produce a new number object. Operations on numbers, called arithmetic operations, are covered later in this chapter.

Integer literals can be decimal, octal, or hexadecimal. A decimal literal is represented by a sequence of digits where the first digit is non-zero. An octal literal is specified with a

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A Python program accesses data values through references. A reference is a name that refers to the specific location in memory of a value (object). References take the form of variables, attributes, and items. In Python, a variable or other reference has no intrinsic type. The object to which a reference is bound at a given time does have a type, however. Any given reference may be bound to objects of different types during the execution of a program.

In Python, there are no declarations. The existence of a variable depends on a statement that binds the variable, or, in other words, that sets a name to hold a reference to some object. You can also unbind a variable by resetting the name so it no longer holds a reference. Assignment statements are the most common way to bind variables and other references. The del statement unbinds references.

Binding a reference that was already bound is also known as rebinding it. Whenever binding is mentioned in this book, rebinding is implicitly included except where it is explicitly excluded. Rebinding or unbinding a reference has no effect on the object to which the reference was bound, except that an object disappears when nothing refers to it. The automatic cleanup of objects to which there are no references is known as garbage collection.

You can name a variable with any identifier except the 29 that are reserved as Python's keywords (see Section 4.1.2.2 earlier in this chapter). A variable can be global or local. A global variable is an attribute of a module object (Chapter 7 covers modules). A local variable lives in a function's local namespace (see Section 4.10 later in this chapter).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

An expression is a phrase of code that the Python interpreter can evaluate to produce a value. The simplest expressions are literals and identifiers. You build other expressions by joining subexpressions with the operators and/or delimiters in Table 4-2. This table lists the operators in decreasing order of precedence, so operators with higher precedence are listed before those with lower precedence. Operators listed together have the same precedence. The A column lists the associativity of the operator, which can be L (left-to-right), R (right-to-left), or NA (non-associative).

In Table 4-2, expr, key, f, index, x, and y indicate any expression, while attr and arg indicate identifiers. The notation ,... indicates that commas join zero or more repetitions, except for string conversion, where one or more repetitions are allowed. A trailing comma is also allowed and innocuous in all such cases, except with string conversion, where it's forbidden.

Table 4-2: Operator precedence in expressions
Operator	Description	A
`expr,...`	String conversion	NA
{key:expr,...}	Dictionary creation	NA

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python supplies the usual numeric operations, as you've just seen in Table 4-2. All numbers are immutable objects, so when you perform a numeric operation on a number object, you always produce a new number object. You can access the parts of a complex object z as read-only attributes z.real and z.imag. Trying to rebind these attributes on a complex object raises an exception.

Note that a number's optional + or - sign, and the + that joins a floating-point literal to an imaginary one to make a complex number, are not part of the literals' syntax. They are ordinary operators, subject to normal operator precedence rules (see Table 4-2). This is why, for example, -2**2 evaluates to -4: exponentiation has higher precedence than unary minus, so the whole expression parses as -(2**2), not as (-2)**2.

You can perform arithmetic operations and comparisons between any two numbers. If the operands' types differ, coercion applies: Python converts the operand with the smaller type to the larger type. The types, in order from smallest to largest, are integers, long integers, floating-point numbers, and complex numbers.

You can also perform an explicit conversion by passing a numeric argument to any of the built-ins: int, long, float, and complex. int and long drop their argument's fractional part, if any (e.g., int(9.8) is 9). Converting from a complex number to any other numeric type drops the imaginary part. You can also call complex with two arguments, giving real and imaginary parts.

Each built-in type can also take a string argument with the syntax of an appropriate numeric literal with two small extensions: the argument string may start with a sign and, for complex numbers, may sum or subtract real and imaginary parts. int and long can also be called with two arguments: the first one a string to convert, and the second one the radix, an integer between 2 and 36 to use as the base for the conversion (e.g.,

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python supports a variety of operations that can be applied to sequence types, including strings, lists, and tuples.

Sequences are containers with items accessible by indexing or slicing, as we'll discuss shortly. The built-in len function takes a container as an argument and returns the number of items in the container. The built-in min and max functions take one argument, a non-empty sequence (or other iterable) whose items are comparable, and they return the smallest and largest items in the sequence, respectively. You can also call min and max with multiple arguments, in which case they return the smallest and largest arguments, respectively.

Section 4.6.1.1: Coercion and conversions

There is no implicit coercion between different sequence types except that normal strings are coerced to Unicode strings if needed. Conversion to strings is covered in detail in Chapter 9. You can call the built-in tuple and list functions with a single argument (a sequence or other iterable) to get an instance of the type you're calling, with the same items in the same order as in the argument.

Section 4.6.1.2: Concatenation

You can concatenate sequences of the same type with the + operator. You can also multiply any sequence S by an integer n with the * operator. The result of S * n or n * S is the concatenation of n copies of S. If n is zero or less than zero, the result is an empty sequence of the same type as S.

Section 4.6.1.3: Sequence membership

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python provides a variety of operations that can be applied to dictionaries. Since dictionaries are containers, the built-in len function can take a dictionary as its single argument and return the number of items (key/value pairs) in the dictionary object.

In Python 2.2 and later, the k in D operator tests to see whether object k is one of the keys of the dictionary D. It returns True if it is and False if it isn't. Similarly, the k not in D operator is just like not ( k in D).

The value in a dictionary D that is currently associated with key k is denoted by an indexing: D [ k ]. Indexing with a key that is not present in the dictionary raises an exception. For example:

d = { 'x':42, 'y':3.14, 'z':7 } 
d['x']                           # 42
d['z']                           # 7
d['a']                           # raises exception

Plain assignment to a dictionary indexed with a key that is not yet in the dictionary (e.g., D [ newkey ]= value) is a valid operation that adds the key and value as a new item in the dictionary. For instance:

d = { 'x':42, 'y':3.14, 'z':7 } 
d['a'] = 16                      # d is now {'x':42,'y':3.14,'z':7,'a':16}

The del statement, in the form del D [ k ], removes from the dictionary the item whose key is k. If k is not a key in dictionary D, del D [ k ] raises an exception.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A print statement is denoted by the keyword print followed by zero or more expressions separated by commas. print is a handy, simple way to output values in text form. print outputs each expression x as a string that's just like the result of calling str( x ) (covered in Chapter 8). print implicitly outputs a space between expressions, and it also implicitly outputs \n after the last expression, unless the last expression is followed by a trailing comma (,). Here are some examples of print statements:

letter = 'c'
print "give me a", letter, "..."           # prints: give me a c ...
answer = 42
print "the answer is:", answer             # prints: the answer is: 42

The destination of print's output is the file or file-like object that is the value of the stdout attribute of the sys module (covered in Chapter 8). You can control output format more precisely by performing string formatting yourself, with the % operator or other string manipulation techniques, as covered in Chapter 9. You can also use the write or writelines methods of file objects, as covered in Chapter 10. However, print is very simple to use, and simplicity is an important advantage in the common case where all you need are the simple output strategies that print supplies.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A program's control flow is the order in which the program's code executes. The control flow of a Python program is regulated by conditional statements, loops, and function calls. This section covers the if statement and for and while loops; functions are covered later in this chapter. Raising and handling exceptions also affects control flow; exceptions are covered in Chapter 6.

Often, you need to execute some statements only if some condition holds, or choose statements to execute depending on several mutually exclusive conditions. The Python compound statement if, which uses if, elif, and else clauses, lets you conditionally execute blocks of statements. Here's the syntax for the if statement:

if expression:
    statement(s)
elif expression:
    statement(s)
elif expression:
    statement(s)
...
else:
    statement(s)

The elif and else clauses are optional. Note that unlike some languages, Python does not have a switch statement, so you must use if, elif, and else for all conditional processing.

Here's a typical if statement:

if x < 0: print "x is negative"
elif x % 2: print "x is positive and odd"
else: print "x is even and non-negative"

When there are multiple statements in a clause (i.e., the clause controls a block of statements), the statements are placed on separate logical lines after the line containing the clause's keyword (known as the header line of the clause) and indented rightward from the header line. The block terminates when the indentation returns to that of the clause header (or further left from there). When there is just a single simple statement, as here, it can follow the : on the same logical line as the header, but it can also be placed on a separate logical line, immediately after the header line and indented rightward from it. Many Python practitioners consider the separate-line style more readable:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Most statements in a typical Python program are organized into functions. A function is a group of statements that executes upon request. Python provides many built-in functions and allows programmers to define their own functions. A request to execute a function is known as a function call. When a function is called, it may be passed arguments that specify data upon which the function performs its computation. In Python, a function always returns a result value, either None or a value that represents the results of its computation. Functions defined within class statements are also called methods. Issues specific to methods are covered in Chapter 5; the general coverage of functions in this section, however, also applies to methods.

In Python, functions are objects (values) and are handled like other objects. Thus, you can pass a function as an argument in a call to another function. Similarly, a function can return another function as the result of a call. A function, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. Functions can also be keys into a dictionary. For example, if you need to quickly find a function's inverse given the function, you could define a dictionary whose keys and values are functions and then make the dictionary bidirectional (using some functions from module math, covered in Chapter 15):

inverse = {sin:asin, cos:acos, tan:atan, log:exp}
for f in inverse.keys( ): inverse[inverse[f]] = f

The fact that functions are objects in Python is often expressed by saying that functions are first-class objects.

The def statement is the most common way to define a function. def is a single-clause compound statement with the following syntax:

def function-name(parameters): 
    statement(s)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python is an object-oriented programming language. Unlike some other object-oriented languages, Python doesn't force you to use the object-oriented paradigm exclusively. Python also supports procedural programming with modules and functions, so you can select the most suitable programming paradigm for each part of your program. Generally, the object-oriented paradigm is suitable when you want to group state (data) and behavior (code) together in handy packets of functionality. It's also useful when you want to use some of Python's object-oriented mechanisms covered in this chapter, such as inheritance or special methods. The procedural paradigm, based on modules and functions, tends to be simpler and is more suitable when you don't need any of the benefits of object-oriented programming. With Python, you often mix and match the two paradigms.

Python 2.2 and 2.3 are in transition between two slightly different object models. This chapter starts by describing the classic object model, which was the only one available in Python 2.1 and earlier and is still the default model in Python 2.2 and 2.3. The chapter then covers the small differences that define the powerful new-style object model and discusses how to use the new-style object model with Python 2.2 and 2.3. Because the new-style object model builds on the classic one, you'll need to understand the classic model before you can learn about the new model. Finally, the chapter covers special methods for both the classic and new-style object models, as well as metaclasses for Python 2.2 and later.

The new-style object model will become the default in a future version of Python. Even though the classic object model is still the default, I suggest you use the new-style object model when programming with Python 2.2 and later. Its advantages over the classic object model, while small, are measurable, and there are practically no compensating disadvantages. Therefore, it's simpler just to stick to the new-style object model, rather than try to decide which model to use each time you code a new class.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A classic class is a Python object with several characteristics:

You can call a class object as if it were a function. The call creates another object, known as an instance of the class, that knows what class it belongs to.
A class has arbitrarily named attributes that you can bind and reference.
The values of class attributes can be data objects or function objects.
Class attributes bound to functions are known as methods of the class.
A method can have a special Python-defined name with two leading and two trailing underscores. Python invokes such special methods, if they are present, when various kinds of operations take place on class instances.
A class can inherit from other classes, meaning it can delegate to other class objects the lookup of attributes that are not found in the class itself.

An instance of a class is a Python object with arbitrarily named attributes that you can bind and reference. An instance object implicitly delegates to its class the lookup of attributes not found in the instance itself. The class, in turn, may delegate the lookup to the classes from which it inherits, if any.

In Python, classes are objects (values), and are handled like other objects. Thus, you can pass a class as an argument in a call to a function. Similarly, a function can return a class as the result of a call. A class, just like any other object, can be bound to a variable (local or global), an item in a container, or an attribute of an object. Classes can also be keys into a dictionary. The fact that classes are objects in Python is often expressed by saying that classes are first-class objects.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Most of what I have covered so far in this chapter also holds for the new-style object model introduced in Python 2.2. New-style classes and instances are first-class objects just like classic ones, both can have arbitrary attributes, you call a class to create an instance of the class, and so on. In this section, I'm going to cover the few differences between the new-style and classic object models.

In Python 2.2 and 2.3, a class is new-style if it inherits from built-in type object directly or indirectly (i.e., if it subclasses any built-in type, such as list, dict, file, object, and so on). In Python 2.1 and earlier, a class cannot inherit from a built-in type, and built-in type object does not exist. In Section 5.4 later in this chapter, I cover other ways to make a class new-style, ways that you can use in Python 2.2 or later whether a class has superclasses or not.

As I said at the beginning of this chapter, I suggest you get into the habit of using new-style classes when you program in Python 2.2 or later. The new-style object model has small but measurable advantages, and there are practically no compensating disadvantages. It's simpler just to stick to the new-style object model, rather than try to decide which model to use each time you code a new class.

As of Python 2.2, the built-in object type is the ancestor of all built-in types and new-style classes. The object type defines some special methods (as documented in Section 5.3 later in this chapter) that implement the default semantics of objects:

__new__ , __init__: You can create a direct instance of object, and such creation implicitly uses the static method __new_ _ of type object to create the new instance, and then uses the new instance's

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A class may define or inherit special methods (i.e., methods whose names begin and end with double underscores). Each special method relates to a specific operation. Python implicitly invokes a special method whenever you perform the related operation on an instance object. In most cases, the method's return value is the operation's result, and attempting an operation when its related method is not present raises an exception. Throughout this section, I will point out the cases in which these general rules do not apply. In the following, x is the instance of class C on which you perform the operation, and y is the other operand, if any. The formal argument self of each method also refers to instance object x.

Some special methods relate to general-purpose operations. A class that defines or inherits these methods allows its instances to control such operations. These operations can be divided into the following categories:

Initialization and finalization: An instance can control its initialization (a frequent need) via special method __init__, and/or its finalization (a rare need) via __del__.
Representation as string: An instance can control how Python represents it as a string via special methods __repr__, __str_ _, and __unicode__.
Comparison, hashing, and use in a Boolean context: An instance can control how it compares with other objects (methods __lt__ and __cmp__), how dictionaries use it as a key (__hash__), and whether it evaluates to true or false in Boolean contexts (_ _nonzero__).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Any object, even a class object, has a type. In Python, types and classes are also first-class objects. The type of a class object is also known as the class's metaclass. An object's behavior is determined largely by the type of the object. This also holds for classes: a class's behavior is determined largely by the class's metaclass. Metaclasses are an advanced subject, and you may want to skip the rest of this chapter on first reading. However, fully grasping metaclasses can help you obtain a deeper understanding of Python, and sometimes it can even be useful to define your own custom metaclasses.

The distinction between classic and new-style classes relies on the fact that each class's behavior is determined by its metaclass. In other words, the reason classic classes behave differently from new-style classes is that classic and new-style classes are object of different types (metaclasses):

class Classic: pass
class Newstyle(object): pass
print type(Classic)                  # prints: <type 'class'>
print type(Newstyle)                 # prints: <type 'type'>

The type of Classic is object types.ClassType from standard module types, while the type of Newstyle is built-in object type. type is also the metaclass of all Python built-in types, including itself (i.e., print type(type) also prints <type 'type'>).

To execute a class statement, Python first collects the base classes into a tuple t (an empty one, if there are no base classes) and executes the class body in a temporary dictionary d. Then, Python determines the metaclass M to use for the new class object C created by the class statement.

When '__metaclass__' is a key in d, M is d ['__metaclass__']. Thus, you can explicitly control class C's metaclass by binding the attribute __metaclass__ in C's class body. Otherwise, when t is non-empty (i.e., when C has one or more base classes),

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python uses exceptions to communicate errors and anomalies. An exception is an object that indicates an error or anomalous condition. When Python detects an error, it raises an exception; that is, it signals the occurrence of an anomalous condition by passing an exception object to the exception-propagation mechanism. Your code can also explicitly raise an exception by executing a raise statement.

Handling an exception means receiving the exception object from the propagation mechanism and performing whatever actions are needed to deal with the anomalous situation. If a program does not handle an exception, it terminates with an error traceback message. However, a program can handle exceptions and keep running despite errors or other abnormal conditions.

Python also uses exceptions to indicate some special situations that are not errors, and are not even abnormal occurrences. For example, as covered in Chapter 4, an iterator's next method raises the exception StopIteration when the iterator has no more items. This is not an error, and it is not even an anomalous condition, since most iterators run out of items eventually.

The try statement provides Python's exception-handling mechanism. It is a compound statement that can take one of two different forms:

A try clause followed by one or more except clauses
A try clause followed by exactly one finally clause

Here's the syntax for the try/except form of the try statement:

try:
    statement(s)
except [expression [, target]]:
    statement(s)
[else:
    statement(s)]

This form of the try statement has one or more except clauses, as well as an optional else clause.

The body of each except clause is known as an

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The try statement provides Python's exception-handling mechanism. It is a compound statement that can take one of two different forms:

A try clause followed by one or more except clauses
A try clause followed by exactly one finally clause

Here's the syntax for the try/except form of the try statement:

try:
    statement(s)
except [expression [, target]]:
    statement(s)
[else:
    statement(s)]

This form of the try statement has one or more except clauses, as well as an optional else clause.

The body of each except clause is known as an exception handler. The code executes if the expression in the except clause matches an exception object that propagates from the try clause. expression is an optional class or tuple of classes that matches any exception object of one of the listed classes or any of their subclasses. The optional target is an identifier that names a variable that Python binds to the exception object just before the exception handler executes. A handler can also obtain the current exception object by calling the exc_info function of module sys (covered in Chapter 8).

Here is an example of the try/except form of the try statement:

try: 1/0
except ZeroDivisionError: print "caught divide-by-0 attempt"

If a try statement has several except clauses, the exception propagation mechanism tests the except clauses in order: the first except clause whose expression matches the exception object is used as the handler. Thus, you must always list handlers for specific cases before you list handlers for more general cases. If you list a general case first, the more specific except clauses that follow will never enter the picture.

The last except clause may lack an expression. This clause handles any exception that reaches it during propagation. Such unconditional handling is a rare need, but it does occur, generally in wrapper functions that must perform some extra task before reraising an exception, as we'll discuss later in the chapter.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

When an exception is raised, the exception-propagation mechanism takes control. The normal control flow of the program stops, and Python looks for a suitable exception handler. Python's try statement establishes exception handlers via its except clauses. The handlers deal with exceptions raised in the body of the try clause, as well as exceptions that propagate from any of the functions called by that code, directly or indirectly. If an exception is raised within a try clause that has an applicable except handler, the try clause terminates and the handler executes. When the handler finishes, execution continues with the statement after the try statement.

If the statement raising the exception is not within a try clause that has an applicable handler, the function containing the statement terminates, and the exception propagates upward to the statement that called the function. If the call to the terminated function is within a try clause that has an applicable handler, that try clause terminates, and the handler executes. Otherwise, the function containing the call terminates, and the propagation process repeats, unwinding the stack of function calls until an applicable handler is found.

If Python cannot find such a handler, by default the program prints an error message to the standard error stream (the file sys.stderr). The error message includes a traceback that gives details about functions terminated during propagation. You can change Python's default error-reporting behavior by setting sys.excepthook (covered in Chapter 8). After error reporting, Python goes back to the interactive session, if any, or terminates if no interactive session is active. When the exception class is SystemExit, termination is silent and includes the interactive session, if any.

Here are some functions that we can use to see exception propagation at work.

def f( ):
    print "in f, before 1/0"
    1/0                           # raises a ZeroDivisionError exception
    print "in f, after 1/0"
def g( ):
    print "in g, before f( )"
    f( )
    print "in g, after f( )"
def h( ):
    print "in h, before g( )"
    try:
        g( )
        print "in h, after g( )"
    except ZeroDivisionError:
        print "ZD exception caught"
    print "function h ends"

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

You can use the raise statement to raise an exception explicitly. raise is a simple statement with the following syntax:

raise [expression1[, expression2]]

Only an exception handler (or a function that a handler calls, directly or indirectly) can use raise without any expressions. A plain raise statement reraises the same exception object that the handler received. The handler terminates, and the exception propagation mechanism keeps searching for other applicable handlers. Using a raise without expressions is useful when a handler discovers that it is unable to handle an exception it receives, so the exception should keep propagating.

When only expression1 is present, it can be an instance object or a class object. In this case, if expression1 is an instance object, Python raises that instance. When expression1 is a class object, raise instantiates the class without arguments and raises the resulting instance. When both expressions are present, expression1 must be a class object. raise instantiates the class, with expression2 as the argument (or multiple arguments if expression2 is a tuple), and raises the resulting instance.

Here's an example of a typical use of the raise statement:

def crossProduct(seq1, seq2):
    if not seq1 or not seq2:
        raise ValueError, "Sequence arguments must be non-empty"
    return [ (x1, x2) for x1 in seq1 for x2 in seq2 ]

The crossProduct function returns a list of all pairs with one item from each of its sequence arguments, but first it tests both arguments. If either argument is empty, the function raises ValueError, rather than just returning an empty list as the list comprehension would normally do. Note that there is no need for crossProduct to test if seq1 and seq2 are iterable: if either isn't, the list comprehension itself will raise the appropriate exception, presumably a TypeError. Once an exception is raised, be it by Python itself or with an explicit raise statement in your code, it's up to the caller to either handle it (with a suitable

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Exceptions are instances of subclasses of the built-in Exception class. For backward compatibility, Python also lets you use strings, or instances of any class, as exception objects, but such usage risks future incompatibility and gives no benefits. An instance of any subclass of Exception has an attribute args, the tuple of arguments used to create the instance. args holds error-specific information, usable for diagnostic or recovery purposes.

All exceptions that Python itself raises are instances of subclasses of Exception. The inheritance structure of exception classes is important, as it determines which except clauses handle which exceptions.

The SystemExit class inherits directly from Exception. Instances of SystemExit are normally raised by the exit function in module sys (covered in Chapter 8).

Other standard exceptions derive from StandardError, a direct subclass of Exception. Three subclasses of StandardError, like StandardError itself and Exception, are never instantiated directly. Their purpose is to make it easier for you to specify except clauses that handle a broad range of related errors. These subclasses are:

ArithmeticError: The base class for exceptions due to arithmetic errors (i.e., OverflowError, ZeroDivisionError, FloatingPointError)
LookupError: The base class for exceptions that a container raises when it receives an invalid key or index (i.e., IndexError, KeyError

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

You can subclass any of the standard exception classes in order to define your own exception class. Typically, such a subclass adds nothing more than a docstring:

class InvalidAttribute(AttributeError):
    "Used to indicate attributes that could never be valid"

Given the semantics of try/except, raising a custom exception class such as InvalidAttribute is almost the same as raising its standard exception superclass, AttributeError. Any except clause able to handle AttributeError can handle InvalidAttribute just as well. In addition, client code that knows specifically about your InvalidAttribute custom exception class can handle it specifically, without having to handle all other cases of AttributeError if it is not prepared for those. For example:

class SomeFunkyClass(object):
    "much hypothetical functionality snipped"
    def __getattr__(self, name):
        "this __getattr__ only clarifies the kind of attribute error"
        if name.startswith('_'):
            raise InvalidAttribute, "Unknown private attribute "+name
        else:
            raise AttributeError, "Unknown attribute "+name

Now client code can be more selective in its handlers. For example:

s = SomeFunkyClass( )
try:
    value = getattr(s, thename)
except InvalidAttribute, err:
    warnings.warn(str(err))
    value = None
# other cases of AttributeError just propagate, as they're unexpected

A special case of custom exception class that you may sometimes find useful is one that wraps another exception and adds further information. To gather information about a pending exception, you can use the exc_info function from module sys (covered in Chapter 8). Given this, your custom exception class could be defined as follows:

import sys
class CustomException(Exception):
    "Wrap arbitrary pending exception, if any, in addition to other info"
    def __init__(self, *args):
        Exception.__init__(self, *args)
        self.wrapped_exc = sys.exc_info( )

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Most programming languages that support exceptions are geared to raise exceptions only in very rare cases. Python's emphasis is different. In Python, exceptions are considered appropriate whenever they make a program simpler and more robust. A common idiom in other languages, sometimes known as "look before you leap" (LBYL), is to check in advance, before attempting an operation, for all circumstances that might make the operation invalid. This is not ideal, for several reasons:

The checks may diminish the readability and clarity of the common, mainstream cases where everything is okay.
The work needed for checking may duplicate a substantial part of the work done in the operation itself.
The programmer might easily err by omitting some needed check.
The situation might change between the moment the checks are performed and the moment the operation is attempted.

The preferred idiom in Python is generally to attempt the operation in a try clause and handle the exceptions that may result in except clauses. This idiom is known as "it's easier to ask forgiveness than permission" (EAFP), a motto widely credited to Admiral Grace Murray Hopper, co-inventor of COBOL, and shares none of the defects of "look before you leap." Here is a function written using the LBYL idiom:

def safe_divide_1(x, y):
    if y=  =0:
        print "Divide-by-0 attempt detected"
        return None
    else:
        return x/y

With LBYL, the checks come first, and the mainstream case is somewhat hidden at the end of the function.

Here is the equivalent function written using the EAFP idiom:

def safe_divide_2(x, y):
    try:
        return x/y
    except ZeroDivisionError:  
        print "Divide-by-0 attempt detected"
        return None

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A typical Python program is made up of several source files. Each source file corresponds to a module, which packages program code and data for reuse. Modules are normally independent of each other so that other programs can reuse the specific modules they need. A module explicitly establishes dependencies upon another module by using import or from statements. In some other programming languages, global variables can provide a hidden conduit for coupling between modules. In Python, however, global variables are not global to all modules, but instead such variables are attributes of a single module object. Thus, Python modules communicate in explicit and maintainable ways.

Python also supports extensions, which are components written in other languages, such as C, C++, or Java, for use with Python. Extensions are seen as modules by the Python code that uses them (called client code). From the client code viewpoint, it does not matter whether a module is 100% pure Python or an extension. You can always start by coding a module in Python. Later, if you need better performance, you can recode some modules in a lower-level language without changing the client code that uses the modules. Chapter 24 and Chapter 25 discuss writing extensions in C and Java.

This chapter discusses module creation and loading. It also covers grouping modules into packages, which are modules that contain other modules, forming a hierarchical, tree-like structure. Finally, the chapter discusses using Python's distribution utilities (distutils) to prepare packages and modules for distribution and to install distributed packages and modules.

A module is a Python object with arbitrarily named attributes that you can bind and reference. The Python code for a module named aname normally resides in a file named aname.py, as covered in Section 7.2 later in this chapter.

In Python, modules are objects (values) and are handled like other objects. Thus, you can pass a module as an argument in a call to a function. Similarly, a function can return a module as the result of a call. A module, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. For example, the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A module is a Python object with arbitrarily named attributes that you can bind and reference. The Python code for a module named aname normally resides in a file named aname.py, as covered in Section 7.2 later in this chapter.

In Python, modules are objects (values) and are handled like other objects. Thus, you can pass a module as an argument in a call to a function. Similarly, a function can return a module as the result of a call. A module, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. For example, the sys.modules dictionary, covered later in this chapter, holds module objects as its values.

You can use any Python source file as a module by executing an import statement in some other code. import has the following syntax:

import modname [as varname][,...]

The import keyword is followed by one or more module specifiers, separated by commas. In the simplest and most common case, modname is an identifier, the name of a variable that Python binds to the module object when the import statement finishes. In this case, Python looks for the module of the same name to satisfy the import request. For example:

import MyModule

looks for the module named MyModule and binds the variable named MyModule in the current scope to the module object. modname can also be a sequence of identifiers separated by dots (.) that names a module in a package, as covered in later in this chapter.

When as varname is part of an import statement, Python binds the variable named varname to the module object, but the module name that Python looks for is modname. For example:

import MyModule as Alias

looks for the module named MyModule and binds the variable named Alias in the current scope to the module object. varname is always a simple identifier.

Section 7.1.1.1: Module body

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Module-loading operations rely on attributes of the built-in sys module (covered in Chapter 8). The module-loading process described here is carried out by built-in function

__import_
_

. Your code can call __import__ directly, with the module name string as an argument.

_
_import__

returns the module object or raises ImportError if the import fails.

To import a module named M, __import__ first checks dictionary sys.modules, using string M as the key. When key M is in the dictionary,

_
_import__

returns the corresponding value as the requested module object. Otherwise, __import__ binds sys.modules[ M ] to a new empty module object with a __name__ of M, then looks for the right way to initialize (load) the module, as covered in Section 7.2.2 later in this section.

Thanks to this mechanism, the loading operation takes place only the first time a module is imported in a given run of the program. When a module is imported again, the module is not reloaded, since __import__ finds and returns the module's entry in sys.modules. Thus, all imports of a module after the first one are extremely fast because they're just dictionary lookups.

When a module is loaded, __import__ first checks whether the module is built-in. Built-in modules are listed in tuple sys.builtin_module_names, but rebinding that tuple does not affect module loading. A built-in module, like any other Python extension, is initialized by calling the module's initialization function. The search for built-in modules also finds frozen modules and modules in platform-specific locations (e.g., resources on the Mac, the Registry in Windows).

If module M is not built-in or frozen,

_
_import__

looks for M's code as a file on the filesystem.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A package is a module that contains other modules. Modules in a package may be subpackages, resulting in a hierarchical tree-like structure. A package named P resides in a subdirectory, also called P, of some directory in sys.path. The module body of P is in the file P/__init_ _.py. You must have a file named P/__init_ _.py, even if it's empty (representing an empty module body), in order to indicate to Python that directory P is indeed a package. Other .py files in directory P are the modules of package P. Subdirectories of P containing __init_ _.py files are subpackages of P. Nesting can continue to any depth.

You can import a module named M in package P as P.M. More dots let you navigate a hierarchical package structure. A package is always loaded before a module in the package is loaded. If you use the syntax import P.M, variable P is bound to the module object of package P, and attribute M of object P is bound to module P.M. If you use the syntax import P.M as V, variable V is bound directly to module P.M.

Using from P import M to import a specific module M from package P is fully acceptable programming practice. In other words, the from statement is specifically okay in this case.

A module M in a package P can import any other module X of P with the statement import X. Python searches the module's own package directory before searching the directories in sys.path. However, this applies only to sibling modules, not to ancestors or other more-complicated relationships. The simplest, cleanest way to share objects (such as functions or constants) among modules in a package P is to group the shared objects in a file named P/Common.py. Then you can import Common from every module in the package that needs to access the objects, and then refer to the objects as Common .f, Common

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python modules, extensions, and applications can be packaged and distributed in several forms:

Compressed archive files: Generally .zip for Windows and .tar.gz or .tgz for Unix-based systems, but both forms are portable
Self-unpacking or self-installing executables: Normally .exe for Windows
Platform-specific installers: For example, .msi on Windows, .rpm and .srpm on Linux, and .deb on Debian GNU/Linux

When you distribute a package as a self-installing executable or platform-specific installer, a user can then install the package simply by running the installer. How to run such an installer program depends on the platform, but it no longer matters what language the program was written in.

When you distribute a package as an archive file or as an executable that unpacks but does not install itself, it does matter that the package was coded in Python. In this case, the user must first unpack the archive file into some appropriate directory, say C:\Temp\MyPack on a Windows machine or ~/MyPack on a Unix-like machine. Among the extracted files there should be a script, conventionally named setup.py, that uses the Python facility known as the distribution utilities (package distutils). The distributed package is then almost as easy to install as a self-installing executable would be. The user opens a command-prompt window and changes to the directory into which the archive is unpacked. Then the user runs, for example:

C:\Temp\MyPack> python 
               setup.py 
               install

The setup.py script, run with this

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The term built-in has more than one meaning in Python. In most contexts, a built-in is any object directly accessible to a Python program without an import statement. Chapter 7 showed the mechanism that Python uses to allow this direct access. Built-in types in Python include numbers, sequences, dictionaries, functions (covered in Chapter 4), classes (covered in Chapter 5), the standard exception classes (covered in Chapter 6), and modules (covered in Chapter 7). The built-in file object is covered in Chapter 10, and other built-in types covered in Chapter 13 are intrinsic to Python's internal operation. This chapter provides additional coverage of the core built-in types, and it also covers the built-in functions available in module

__builtin_
_

.

As I mentioned in Chapter 7, some modules are called built-in because they are an integral part of the Python standard library, even though it takes an import statement to access them. Built-in modules are distinct from separate, optional add-on modules, also called Python extensions. This chapter documents the following core built-in modules: sys, getopt, copy, bisect, UserList, UserDict, and UserString. Chapter 9 covers some string-related core built-in modules, while Parts III and IV of the book cover many other useful built-in modules.

This section documents Python's core built-in types, like int, float, and dict. Note that prior to Python 2.2, these names referred to factory functions for creating objects of these types. As of Python 2.2, however, they refer to actual type objects. Since you can call type objects just as if they were functions, this change does not break existing programs.

classmethod

Python 2.2 and later

classmethod(function)

Creates and returns a class method object. In practice, you call this built-in type only within a class body. See Section 5.2.2.2.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

This section documents Python's core built-in types, like int, float, and dict. Note that prior to Python 2.2, these names referred to factory functions for creating objects of these types. As of Python 2.2, however, they refer to actual type objects. Since you can call type objects just as if they were functions, this change does not break existing programs.

classmethod

Python 2.2 and later

classmethod(function)

Creates and returns a class method object. In practice, you call this built-in type only within a class body. See Section 5.2.2.2.

complex

complex(real,imag=0)

Converts any number, or a suitable string, to a complex number. imag may be present only when real is a number, and is the imaginary part of the resulting complex number.

dict

Python 2.2 and later

dict(x={ })

Returns a new dictionary object with the same items as argument x. When x is a dictionary, dict( x ) returns a copy of x, like x .copy( ) does. Alternatively, x can be a sequence of pairs, that is, a sequence whose items are sequences with two items each. In this case, dict( x ) returns a dictionary whose keys are the first items of each pair in x, while the corresponding values are the corresponding second items. In other words, when

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

This section documents the Python functions available in module __builtin__ in alphabetical order. Note that the names of these built-ins are not reserved words. Thus, your program can bind for its own purposes, in local or global scope, an identifier that has the same name as a built-in function. Names bound in local or global scope have priority over names bound in built-in scope, so local and global names hide built-in ones. You can also rebind names in built-in scope, as covered in Chapter 7. You should avoid hiding built-ins that your code might need.

__import__

__import__(module_name[,globals[,locals[,fromlist]]])

Loads the module named by string module_name and returns the resulting module object. globals, which defaults to the result of globals( ), and locals, which defaults to the result of locals( ) (both covered in this section), are dictionaries that __import__ treats as read-only and uses only to get context for package-relative imports, covered in Section 7.3. fromlist defaults to an empty list, but can be a list of strings that name the module attributes to be imported in a from statement. See Section 7.2 for more details on module loading.

In practice, when you call __import__, you generally pass only the first argument, except in the rare and dubious case in which you use __import__ for a package-relative import. When you replace the built-in

_
_import__

function with your own in order to provide special import functionality, you may have to take globals, locals, and fromlist into account.

abs

abs(x)

Returns the absolute value of number

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The attributes of the sys module are bound to data and functions that provide information on the state of the Python interpreter or that affect the interpreter directly. This section documents the most frequently used attributes of sys, in alphabetical order.

argv

The list of command-line arguments passed to the main script. argv[0] is the name or full path of the main script, or '-c' if the -c option was used. See Section 8.4 later in this chapter for a good way to use sys.argv.

displayhook

displayhook(value)

In interactive sessions, the Python interpreter calls displayhook, passing it the result of each expression-statement entered. The default displayhook does nothing if value is None, otherwise it preserves and displays value:

if value is not None:
    __builtin__._ = value
    print repr(value)

You can rebind sys.displayhook in order to change interactive behavior. The original value is available as

sys.__displayhook_
_

.

excepthook

excepthook(type,value,traceback)

When an exception is not caught by any handler, Python calls excepthook, passing it the exception class, exception object, and traceback object, as covered in Chapter 6. The default excepthook displays the error and traceback. You can rebind sys.excepthook to change what is displayed for uncaught exceptions (just before Python returns to the interactive loop or terminates). The original value is also available as

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The getopt module helps parse the command-line options and arguments passed to a Python program, available in sys.argv. The getopt module distinguishes arguments proper from options: options start with '-' (or '--' for long-form options). The first non-option argument terminates option parsing (similar to most Unix commands, and differently from GNU and Windows commands). Module getopt supplies a single function, also called getopt.

getopt

getopt(args,options,long_options=[ ])

Parses command-line options. args is usually sys.argv[1:]. options is a string: each character is an option letter, followed by ':' if the option takes a parameter. long_options is a list of strings, each a long-option name, without the leading '--', followed by '=' if the option takes a parameter.

When getopt encounters an error, it raises GetoptError, an exception class supplied by the getopt module. Otherwise, getopt returns a pair ( opts,args_proper ), where opts is a list of pairs of the form ( option,parameter ) in the same order in which options are found in args. Each option is a string that starts with a single hyphen for a short-form option or two hyphens for a long-form one; each parameter is also a string (an empty string for options that don't take parameters). args_proper is the list of program argument strings that are left after removing the options.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

As discussed in Chapter 4, assignment in Python does not copy the right-hand side object being assigned. Rather, assignment adds a reference to the right-hand side object. When you want a copy of object x, you can ask x for a copy of itself. If x is a list, x [:] is a copy of x. If x is a dictionary, x .copy( ) returns a copy of x.

The copy module supplies a copy function that creates and returns a copy of most types of objects. Normal copies, such as x [:] for a list x and copy.copy( x ), are also known as shallow copies. When x has references to other objects (e.g., items or attributes), a normal copy of x has distinct references to the same objects. Sometimes, however, you need a deep copy, where referenced objects are copied recursively. Module copy supplies a deepcopy( x ) function that performs a deep copy and returns it as the function's result.

copy

copy(x)

Creates and returns a copy of x for x of most types (copies of modules, classes, frames, arrays, and internal types are not supported). If x is immutable, copy.copy( x ) may return x itself as an optimization. A class can customize the way copy.copy copies its instances by having a special method

__copy_
_(self)

that returns a new object, a copy of self.

deepcopy

deepcopy(x,[memo])

Makes a deep copy of x and returns it. Deep copying implies a recursive walk over a directed graph of references. A precaution is needed to preserve the graph's shape: when references to the same object are met more than once during the walk, distinct copies must not be made. Rather, references to the same copied object must be used. Consider the following simple example:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The bisect module uses a bisection algorithm to keep a list in sorted order as items are inserted. bisect's operation is faster than calling a list's sort method after each insertion. This section documents the main functions supplied by bisect.

bisect

bisect(seq,item,lo=0,hi=sys.maxint)

Returns the index i into seq where item should be inserted to keep seq sorted. In other words, i is such that each item in seq [:i ] is less than or equal to item, and each item in seq [ i :] is greater than or equal to item. seq must be a sorted sequence. For any sorted sequence seq, seq [bisect( seq,y

)-1]=
=

y is equivalent to y in seq, but faster if len( seq ) is large. You may pass optional arguments lo and hi to operate on the slice seq [ lo:hi ].

insort

insort(seq,item,lo=0,hi=sys.maxint)

Like seq .insert(bisect( seq,item ),item ). In other words, seq must be a sorted mutable sequence, and insort modifies seq by inserting item at the right spot, so that seq remains sorted. You may pass optional arguments lo and hi to operate on the slice seq [ lo:hi ].

Module bisect also supplies functions bisect_left

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The UserList, UserDict, and UserString modules each supply one class, with the same name as the respective module, that implements all the methods needed for the class's instances to be mutable sequences, mappings, and strings, respectively. When you need such polymorphism, you can subclass one of these classes and override some methods rather than have to implement everything yourself. In Python 2.2 and later, you can subclass built-in types list, dict, and str directly, to similar effect (see Section 5.2). However, these modules can still be handy if you need to create a classic class in order to keep your code compatible with Python 2.1 or earlier.

Each instance of one of these classes has an attribute called data that is a Python object of the corresponding built-in type (list, dict, and str, respectively). You can instantiate each class with an argument of the appropriate type (the argument is copied, so you can later modify it without side effects). UserList and UserDict can also be instantiated without arguments to create initially empty containers.

Module UserString also supplies class MutableString, which is very similar to class UserString except that instances of MutableString are mutable. Instances of MutableString and its subclasses cannot be keys into a dictionary. Instances of both UserString and MutableString can be Unicode strings rather than plain strings: just use a Unicode string as the initializer argument at instantiation time.

If you subclass UserList, UserDict, UserString, or MutableString and then override

__init_
_

, make sure the __init__ method you write can also be called with one argument of the appropriate type (as well as without arguments for UserList and UserDict). Also be sure that your

_
_init__

method explicitly and appropriately calls the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python supports plain and Unicode strings extensively, with statements, operators, built-in functions, methods, and dedicated modules. This chapter covers the methods of string objects, talks about string formatting, documents the string, pprint, and repr modules, and discusses issues related to Unicode strings.

Regular expressions let you specify pattern strings and allow searches and substitutions. Regular expressions are not easy to master, but they are a powerful tool for processing text. Python offers rich regular expression functionality through the built-in re module, as documented in this chapter.

Plain and Unicode strings are immutable sequences, as covered in Chapter 4. All immutable-sequence operations (repetition, concatenation, indexing, slicing) apply to strings. A string object s also supplies several non-mutating methods, as documented in this section. Unless otherwise noted, each method returns a plain string when s is a plain string, or a Unicode string when s is a Unicode string. Terms such as letters, whitespace, and so on refer to the corresponding attributes of the string module, covered later in this chapter. See also the later section Section 9.2.1.

capitalize

s.capitalize( )

Returns a copy of s where the first character, if a letter, is uppercase, and all other letters, if any, are lowercase.

center

s.center(n)

Returns a string of length

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Plain and Unicode strings are immutable sequences, as covered in Chapter 4. All immutable-sequence operations (repetition, concatenation, indexing, slicing) apply to strings. A string object s also supplies several non-mutating methods, as documented in this section. Unless otherwise noted, each method returns a plain string when s is a plain string, or a Unicode string when s is a Unicode string. Terms such as letters, whitespace, and so on refer to the corresponding attributes of the string module, covered later in this chapter. See also the later section Section 9.2.1.

capitalize

s.capitalize( )

Returns a copy of s where the first character, if a letter, is uppercase, and all other letters, if any, are lowercase.

center

s.center(n)

Returns a string of length max(len( s ),n ), with a copy of s in the central part, surrounded by equal numbers of spaces on both sides (e.g., 'ciao'.center(2) is 'ciao', 'ciao'.center(7) is ' ciao ').

count

s.count(sub,start=0,end=sys.maxint)

Returns the number of occurrences of substring sub in s [ start:end

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The string module supplies functions that duplicate each method of string objects, as covered in the previous section. Each function takes the string object as its first argument. Module string also has several useful string-valued attributes:

ascii_letters: The string ascii_lowercase+ascii_uppercase
ascii_lowercase: The string 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase: The string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
digits: The string '0123456789'
hexdigits: The string '0123456789abcdefABCDEF'
letters: The string lowercase+uppercase
lowercase: A string containing all characters that are deemed lowercase letters: at least 'abcdefghijklmnopqrstuvwxyz', but more letters (e.g., accented ones) may be present, depending on the active locale

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

In Python, a string-formatting expression has the syntax:

               format % values

where format is a plain or Unicode string containing format specifiers and values is any single object or a collection of objects in a tuple or dictionary. Python's string-formatting operator has roughly the same set of features as the C language's printf and operates in a similar way. Each format specifier is a substring of format that starts with a percent sign (%) and ends with one of the conversion characters shown in Table 9-1.

Table 9-1: String-formatting conversion characters
Character	Output format	Notes
`d`, `i`	Signed decimal integer	Value must be number
u	Unsigned decimal integer	Value must be number
o	Unsigned octal integer	Value must be number
x	Unsigned hexadecimal integer (lowercase letters)	Value must be number
X	Unsigned hexadecimal integer (uppercase letters)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The pprint module pretty-prints complicated data structures, with formatting that may be more readable than that supplied by built-in function repr (see Chapter 8). To fine-tune the formatting, you can instantiate the PrettyPrinter class supplied by module pprint and apply detailed control, helped by auxiliary functions also supplied by module pprint. Most of the time, however, one of the two main functions exposed by module pprint suffices.

pformat

pformat(obj)

Returns a string representing the pretty-printing of obj.

pprint

pprint(obj,stream=sys.stdout)

Outputs the pretty-printing of obj to file object stream, with a terminating newline.

The following statements are the same:

print pprint.pformat(x)
pprint.pprint(x)

Either of these constructs will be roughly the same as print x in many cases, such as when the string representation of x fits within one line. However, with something like x =range(30), print x displays x in two lines, breaking at an arbitrary point, while using module pprint displays x over 30 lines, one line per item. You can use module pprint when you prefer the module's specific display effects to the ones of normal string representation.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The repr module supplies an alternative to the built-in function repr (see Chapter 8), with limits on length for the representation string. To fine-tune the length limits, you can instantiate or subclass the Repr class supplied by module repr and apply detailed control. Most of the time, however, the main function exposed by module repr suffices.

repr

repr(obj)

Returns a string representing obj, with sensible limits on length.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Plain strings are converted into Unicode strings either explicitly, with the unicode built-in, or implicitly, when you pass a plain string to a function that expects Unicode. In either case, the conversion is done by an auxiliary object known as a codec (for coder-decoder). A codec can also convert Unicode strings to plain strings either explicitly, with the encode method of Unicode strings, or implicitly.

You identify a codec by passing the codec name to unicode or encode. When you pass no codec name and for implicit conversion, Python uses a default encoding, normally 'ascii'. (You can change the default encoding in the startup phase of a Python program, as covered in Chapter 13; see also setdefaultencoding in Chapter 8.) Every conversion has an explicit or implicit argument errors, a string specifying how conversion errors are to be handled. The default is 'strict', meaning any error raises an exception. When errors is 'replace', the conversion replaces each character causing an error with '?' in a plain-string result or with u'\ufffd' in a Unicode result. When errors is 'ignore', the conversion silently skips characters that cause errors.

The mapping of codec names to codec objects is handled by the codecs module. This module lets you develop your own codec objects and register them so that they can be looked up by name, just like built-in codecs. Module codecs also lets you look up any codec explicitly, obtaining the functions the codec uses for encoding and decoding, as well as factory functions to wrap file-like objects. Such advanced facilities of module codecs are rarely used, and are not covered further in this book.

The codecs module, together with the encodings package, supplies built-in codecs useful to Python developers dealing with internationalization issues. Any supplied codec can be installed as the default by module

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A regular expression is a string that represents a pattern. With regular expression functionality, you can compare that pattern to another string and see if any part of the string matches the pattern.

The re module supplies all of Python's regular expression functionality. The compile function builds a regular expression object from a pattern string and optional flags. The methods of a regular expression object look for matches of the regular expression in a string and/or perform substitutions. Module re also exposes functions equivalent to a regular expression's methods, but with the regular expression's pattern string as their first argument.

Regular expressions can be difficult to master, and this book does not purport to teach them—I cover only the ways in which you can use them in Python. For general coverage of regular expressions, I recommend the book Mastering Regular Expressions, by Jeffrey Friedl (O'Reilly). Friedl's book offers thorough coverage of regular expressions at both the tutorial and advanced levels.

The pattern string representing a regular expression follows a specific syntax:

Alphabetic and numeric characters stand for themselves. A regular expression whose pattern is a string of letters and digits matches the same string.
Many alphanumeric characters acquire special meaning in a pattern when they are preceded by a backslash (\).
Punctuation works the other way around. A punctuation character is self-matching when escaped, and has a special meaning when unescaped.
The backslash character itself is matched by a repeated backslash (i.e., the pattern

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

This chapter covers dealing with files and the filesystem in Python. A file is a stream of bytes that a program can read and/or write, while a filesystem is a hierarchical repository of files on a particular computer system. Because files are such a core programming concept, several other chapters also contain material about handling files of specific kinds.

In Python, the os module supplies many of the functions that operate on the filesystem, so this chapter starts by introducing the os module. The chapter then proceeds to cover operations on the filesystem, including comparing, copying, and deleting directories and files, working with file paths, and accessing low-level file descriptors.

Next, this chapter discusses the typical ways Python programs read and write data, via built-in file objects and the polymorphic concept of file-like objects (i.e., objects that are not files, but still behave to some extent like files). Python file objects directly support the concept of text files, which are streams of characters encoded as bytes. The chapter also covers Python's support for data in compressed form, such as archives in the popular ZIP format.

While many modern programs rely on a graphical user interface (GUI), text-based, non-graphical user interfaces are often still useful, as they are simple, fast to program, and lightweight. This chapter concludes with material about text input and output in Python, including information about presenting text that is understandable to different users, no matter where they are or what language they speak. This is known as internationalization (often abbreviated i18n).

The os module is an umbrella module that presents a reasonably uniform cross-platform view of the different capabilities of various operating systems. The module provides functionality for creating files, manipulating files and directories, and creating, managing, and destroying processes. This chapter covers the filesystem-related capabilities of the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The os module is an umbrella module that presents a reasonably uniform cross-platform view of the different capabilities of various operating systems. The module provides functionality for creating files, manipulating files and directories, and creating, managing, and destroying processes. This chapter covers the filesystem-related capabilities of the os module, while Chapter 14 covers the process-related capabilities.

The os module supplies a name attribute, which is a string that identifies the kind of platform on which Python is being run. Possible values for name are 'posix' (all kinds of Unix-like platforms), 'nt' (all kinds of 32-bit Windows platforms), 'mac', 'os2', and 'java'. You can often exploit unique capabilities of a platform, at least in part, through functions supplied by os. This book deals with cross-platform programming, however, not with platform-specific functionality, so I do not cover parts of os that exist only on one kind of platform, nor do I cover platform-specific modules. All functionality covered in this book is available at least on both 'posix' and 'nt' platforms. However, I do cover any differences among the ways in which each given piece of functionality is provided on different platforms.

When a request to the operating system fails, os raises an exception, an instance of OSError. os also exposes class OSError with the name os.error. Instances of OSError expose three useful attributes:

errno: The numeric error code of the operating system error
strerror

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Using the os module, you can manipulate the filesystem in a variety of ways: creating, copying, and deleting files and directories, comparing files, and examining filesystem information about files and directories. This section documents the attributes and methods of the os module that you use for these purposes, and also covers some related modules that operate on the filesystem.

A file or directory is identified by a string, known as its path, whose syntax depends on the platform. On both Unix-like and Windows platforms, Python accepts Unix syntax for paths, with slash (/) as the directory separator. On non-Unix-like platforms, Python also accepts platform-specific path syntax. On Windows, for example, you can use backslash (\) as the separator. However, you do need to double up each backslash to \\ in normal string literals or use raw-string syntax as covered in Chapter 4. In the rest of this chapter, for brevity, Unix syntax is assumed in both explanations and examples.

Module os supplies attributes that provide details about path strings on the current platform. You should typically use the higher-level path manipulation operations covered in Section 10.2.4 later in this chapter, rather than lower-level string operations based on these attributes. However, the attributes may still be useful at times:

curdir: The string that denotes the current directory ('.' on Unix and Windows)
defpath

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

As discussed earlier in this chapter, file is a built-in type in Python. With a file object, you can read and/or write data to a file as seen by the underlying operating system. Python reacts to any I/O error related to a file object by raising an instance of built-in exception class IOError. Errors that cause this exception include open failing to open or create a file, calling a method on a file object to which that method doesn't apply (e.g., calling write on a read-only file object or calling seek on a non-seekable file), and I/O errors diagnosed by a file object's methods. This section documents file objects, as well as some auxiliary modules that help you access and deal with their contents.

You normally create a Python file object with the built-in open, which has the following syntax:

open(filename,mode='r',bufsize=-1)

open opens the file named by filename, which must be a string that denotes any path to a file. open returns a Python file object, which is an instance of the built-in type file. Calling file is just like calling open, but file was first introduced in Python 2.2. If you explicitly pass a mode string, open can also create filename if the file does not already exist (depending on the value of mode, as we'll discuss in a moment). In other words, despite its name, open is not limited to opening existing files, but is also able to create new ones if needed.

Section 10.3.1.1: File mode

mode is a string that denotes how the file is to be opened (or created). mode can have the following values:

'r'

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

File objects supply all functionality that is strictly needed for file I/O. There are some auxiliary Python library modules, however, that offer convenient supplementary functionality, making I/O even easier and handier in several important special cases.

The fileinput module lets you loop over all the lines in a list of text files. Performance is quite good, comparable to the performance of direct iteration on each file, since fileinput uses internal buffering to minimize I/O. Therefore, you can use module fileinput for line-oriented file input whenever you find the module's rich functionality convenient, without worrying about performance. The input function is the main function of module fileinput, and the module also provides a FileInput class that supports the same functionality as the module's functions.

close

close( )

Closes the whole sequence, so that iteration stops and no file remains open.

FileInput

class FileInput(files=None,inplace=0,backup='',bufsize=0)

Creates and returns an instance f of class FileInput. Arguments are the same as for fileinput.input, and methods of f have the same names, arguments, and semantics as functions of module fileinput. f also supplies a method readline, which reads and returns the next line. You can use class FileInput explicitly, rather than the single implicit instance used by the functions of module fileinput, when you want to nest or otherwise mix loops that read lines from more than one sequence of files.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

You can implement file-like objects by writing Python classes that supply the methods you need. If all you want is for data to reside in memory rather than on a file as seen by the operating system, you can use the StringIO or cStringIO module. The two modules are almost identical: each supplies a factory function to create in-memory file-like objects. The difference between them is that objects created by module StringIO are instances of class StringIO.StringIO. You may inherit from this class to create your own customized file-like objects, overriding the methods that you need to specialize. Objects created by module cStringIO, on the other hand, are instances of a special-purpose type, not of a class. Performance is much better when you can use cStringIO, but inheritance is not feasible. Furthermore, cStringIO does not support Unicode.

Each module supplies a factory function named StringIO that creates a file-like object fl.

StringIO

StringIO(str='')

Creates and returns an in-memory file-like object fl, with all methods and attributes of a built-in file object. The data contents of fl are initialized to be a copy of argument str, which must be a plain string for the StringIO factory function in cStringIO, while it can be a plain or Unicode string for the function in StringIO.

Besides all methods and attributes of built-in file objects, as covered in Section 10.3.2 earlier in this chapter, fl supplies one supplementary method, getvalue.

getvalue

fl. getvalue( )

Returns the current data contents of

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Although storage space and transmission bandwidth are increasingly cheap and abundant, in many cases you can save such resources, at the expense of some computational effort, by using compression. Since computational power grows cheaper and more abundant even faster than other resources, such as bandwidth, compression's popularity keeps growing. Python makes it easy for your programs to support compression by supplying dedicated modules for compression as part of every Python distribution.

The gzip module lets you read and write files compatible with those handled by the powerful GNU compression programs gzip and gunzip. The GNU programs support several compression formats, but module gzip supports only the highly effective native gzip format, normally denoted by appending the extension .gz to a filename. Module gzip supplies the GzipFile class and an open factory function.

GzipFile

class GzipFile(filename=None,mode=None,compresslevel=9, fileobj=None)

Creates and returns a file-like object f that wraps the file or file-like object fileobj. f supplies all methods of built-in file objects except seek and tell. Thus, f is not seekable: you can only access f sequentially, whether for reading or writing. When fileobj is None, filename must be a string that names a file: GzipFile opens that file with the given mode (by default, 'rb'), and f wraps the resulting file object. mode should be one of 'ab', 'rb', 'wb', or None. If mode is None, f uses the mode of fileobj if it is able to find out the mode; otherwise it uses 'rb'. If filename is None, f uses the filename of fileobj if able to find out the name; otherwise it uses ''. compresslevel

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python presents non-GUI text input and output channels to your programs as file objects, so you can use the methods of file objects (covered in Section 10.3 earlier in this chapter) to manipulate these channels.

The sys module, covered in Chapter 8, has attributes stdout and stderr, file objects to which you can write. Unless you are using some sort of shell redirection, these streams connect to the terminal in which your script is running. Nowadays, actual terminals are rare: the terminal is generally a screen window that supports text input/output (e.g., an MS-DOS Prompt console on Windows or an xterm window on Unix).

The distinction between sys.stdout and sys.stderr is a matter of convention. sys.stdout, known as your script's standard output, is where your program emits results. sys.stderr, known as your script's standard error, is where error messages go. Separating program results from error messages helps you use shell redirection effectively. Python respects this convention, using sys.stderr for error and warning messages.

Programs that output results to standard output often need to write to sys.stdout. Python's print statement can be a convenient alternative to sys.stdout.write. The print statement has the following syntax:

print [>>fileobject,] expressions [,]

The normal destination of print's output is the file or file-like object that is the value of the stdout attribute of the sys module. However, when >> fileobject, is present right after keyword print, the statement uses the given fileobject instead of sys.stdout. expressions is a list of zero or more expressions separated by commas (,). print outputs each expression, in order, as a string (using the built-in str, covered in Chapter 8), with a space to separate strings. After all expressions,

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The tools we have covered so far support the minimal subset of text I/O functionality that all platforms supply. Most platforms also offer richer-text I/O capabilities, such as responding to single keypresses (not just to entire lines of text) and showing text in any spot of the terminal (not just sequentially).

Python extensions and core Python modules let you access platform-specific functionality. Unfortunately, various platforms expose this functionality in different ways. To develop cross-platform Python programs with rich-text I/O functionality, you may need to wrap different modules uniformly, importing platform-specific modules conditionally (usually with the try/except idiom covered in Chapter 6).

The readline module wraps the GNU Readline Library. Readline lets the user edit text lines during interactive input, and also recall previous lines for further editing and re-entry. GNU Readline is widely installed on Unix-like platforms, and is available at https://cnswww.cns.cwru.edu/~chet/readline/rltop.html. A Windows port (https://starship.python.net/crew/kernr/) is available, but not widely deployed. Chris Gonnerman's module, Alternative Readline for Windows, implements a subset of Python's standard readline module (using a small dedicated .pyd file instead of GNU Readline) and can be freely downloaded from https://newcenturycomputers.net/projects/readline.html.

When either readline module is loaded, Python uses Readline for all line-oriented input, such as raw_input. The interactive Python interpreter always tries loading readline to enable line editing and recall for interactive sessions. You can call functions supplied by module readline to control advanced functionality, particularly the history functionality for recalling lines entered in previous sessions, and the completion functionality for context-sensitive completion of the word being entered. See

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The cmd module offers a simple way to handle interactive sessions of commands. Each command is a line of text. The first word of each command is a verb defining the requested action. The rest of the line is passed as an argument to the method that implements the action that the verb requests.

Module cmd supplies class Cmd to use as a base class, and you define your own subclass of cmd.Cmd. The subclass supplies methods with names starting with do_ and help_, and may also optionally override some of Cmd's methods. When the user enters a command line such as verb and the rest, as long as the subclass defines a method named do_ verb, Cmd.onecmd calls:

self.do_verb('and 
               the 
               rest')

Similarly, as long as the subclass defines a method named help_ verb, Cmd.do_help calls it when the command line starts with either 'help verb' or '? verb'. Cmd, by default, also shows suitable error messages if the user tries to use, or asks for help about, a verb for which the subclass does not define a needed method.

An instance c of a subclass of class Cmd supplies the following methods (many of these methods are meant to be overridden by the subclass).

cmdloop

c.cmdloop(intro=None)

Performs an entire interactive session of line-oriented commands. cmdloop starts by calling c

.preloop(
)

, then outputs string intro (c .intro, if intro is None). Then c .cmdloop enters a loop. In each iteration of the loop, cmdloop reads line s with s =raw_input(

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Most programs present some information to users as text. Such text should be understandable and acceptable to the user. For example, in some countries and cultures, the date "March 7" can be concisely expressed as "3/7". Elsewhere, "3/7" indicates "July 3", and the string that means "March 7" is "7/3". In Python, such cultural conventions are handled with the help of standard module locale.

Similarly, a greeting can be expressed in one natural language by the string "Benvenuti", while in another language the string to use is "Welcome". In Python, such translations are handled with the help of standard module gettext.

Both kinds of issues are commonly called internationalization (often abbreviated i18n, as there are 18 letters between i and n in the full spelling). This is actually a misnomer, as the issues also apply to programs used within one nation by users of different languages or cultures.

Python's support for cultural conventions is patterned on that of C, slightly simplified. In this architecture, a program operates in an environment of cultural conventions known as a locale. The locale setting permeates the program and is typically set early on in the program's operation. The locale is not thread-specific, and module locale is not thread-safe. In a multithreaded program, set the program's locale before starting secondary threads.

If a program does not call locale.setlocale, the program operates in a neutral locale known as the C locale. The C locale is named from this architecture's origins in the C language, and is similar, but not identical, to the U.S. English locale. Alternatively, a program can find out and accept the user's default locale. In this case, module locale interacts with the operating system (via the environment, or in other system-dependent ways) to establish the user's preferred locale. Finally, a program can set a specific locale, presumably determining which locale to set on the basis of user interaction, or via persistent configuration settings such as a program initialization file.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python supports a variety of ways of making data persistent. One such way, known as serialization, involves viewing the data as a collection of Python objects. These objects can be saved, or serialized, to a byte stream, and later loaded and recreated, or deserialized, back from the byte stream. Object persistence layers on top of serialization and adds such features as object naming. This chapter covers the built-in Python modules that support serialization and object persistence.

Another way to make data persistent is to store it in a database. One simple type of database is actually just a file format that uses keyed access to enable selective reading and updating of relevant parts of the data. Python supplies modules that support several variations of this file format, known as DBM, and these modules are covered in this chapter.

A relational database management system (RDBMS), such as MySQL or Oracle, provides a more powerful approach to storing, searching, and retrieving persistent data. Relational databases rely on dialects of Structured Query Language (SQL) to create and alter a database's schema, insert and update data in the database, and query the database according to search criteria. This chapter does not provide any reference material on SQL. For that purpose, I recommend SQL in a Nutshell, by Kevin Kline (O'Reilly). Unfortunately, despite the existence of SQL standards, no two RDBMSes implement exactly the same SQL dialect.

The Python standard library does not come with an RDBMS interface. However, many free third-party modules let your Python programs access a specific RDBMS. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI. This chapter covers the DBAPI standard and mentions some of the third-party modules that implement it.

Python supplies a number of modules that deal with I/O operations that serialize (save) entire Python objects to various kinds of byte streams, and deserialize (load and recreate) Python objects back from such streams. Serialization is also called

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python supplies a number of modules that deal with I/O operations that serialize (save) entire Python objects to various kinds of byte streams, and deserialize (load and recreate) Python objects back from such streams. Serialization is also called marshaling.

The marshal module supports the specific serialization tasks needed to save and reload compiled Python files (.pyc and .pyo). marshal only handles instances of fundamental built-in data types: None, numbers (plain and long integers, float, complex), strings (plain and Unicode), code objects, and built-in containers (tuples, lists, dictionaries) whose items are instances of elementary types. marshal does not handle instances of user-defined types, nor classes and instances of classes. marshal is faster than other serialization modules. Code objects are supported only by marshal, not by other serialization modules. Module marshal supplies the following functions.

dump, dumps

dump(value,fileobj) dumps(value)

dumps returns a string representing object value. dump writes the same string to file object fileobj, which must be opened for writing in binary mode. dump( v,f ) is just like f .write(dumps( v )). fileobj cannot be a file-like object: it must be an instance of type file.

load, loads

load(fileobj) loads(str)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A DBM-like file is a file that contains a set of pairs of strings ( key,data ), with support for fetching or storing the data given a key, known as keyed access. DBM-like files were originally supported on early Unix systems, with functionality roughly equivalent to that of access methods popular on other mainframe and minicomputers of the time, such as ISAM, the Indexed-Sequential Access Method. Today, several different libraries, available for many platforms, let programs written in many different languages create, update, and read DBM-like files.

Keyed access, while not as powerful as the data access functionality of relational databases, may often suffice for a program's needs. And if DBM-like files are sufficient, you may end up with a program that is smaller, faster, and more portable than one that uses an RDBMS.

The classic dbm library, whose first version introduced DBM-like files many years ago, has limited functionality, but tends to be available on most Unix platforms. The GNU version, gdbm, is richer and also widespread. The BSD version, dbhash, offers superior functionality. Python supplies modules that interface with each of these libraries if the relevant underlying library is installed on your system. Python also offers a minimal DBM module, dumbdbm (usable anywhere, as it does not rely on other installed libraries), and generic DBM modules, which are able to automatically identify, select, and wrap the appropriate DBM library to deal with an existing or new DBM file. Depending on your platform, your Python distribution, and what dbm-like libraries you have installed on your computer, the default Python build may install some subset of these modules. In general, at a minimum, you can rely on having module dbm on Unix-like platforms, module dbhash on Windows, and dumbdbm on any platform.

The

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python comes with the bsddb module, which wraps the Berkeley Database library (also known as BSD DB) if that library is installed on your system and your Python installation is built to support it. With the BSD DB library, you can create hash, binary tree, or record-based files that generally behave like dictionaries. On Windows, Python includes a port of the BSD DB library, thus ensuring that module bsddb is always usable. To download BSD DB sources, binaries for other platforms, and detailed documentation on BSD DB, see https://www.sleepycat.com. Module bsddb supplies three factory functions, btopen, hashopen, and rnopen.

btopen, hashopen, rnopen

btopen(filename,flag='r',*many_other_optional_arguments) hashopen(filename,flag='r',*many_other_optional_arguments) rnopen(filename,flag='r',*many_other_optional_arguments)

btopen opens or creates the binary tree format file named by filename (a string that denotes any path to a file, not just a name), and returns a suitable BTree object to access and manipulate the file. Argument flag has exactly the same values and meaning as for anydbm.open. Other arguments indicate low-level options that allow fine-grained control, but are rarely used.

hashopen and rnopen work the same way, but open or create hash format and record format files, returning objects of type Hash and Record. hashopen is generally the fastest format and makes sense when you are using keys to look up records. However, if you also need to access records in sorted order, use btopen, or if you need to access records in the same order in which you originally wrote them, use

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

As I mentioned earlier, the Python standard library does not come with an RDBMS interface, but there are many free third-party modules that let your Python programs access specific databases. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI.

At the time of this writing, Python's DBAPI Special Interest Group (SIG) was busy preparing a new version of the DBAPI (possibly to be known as 3.0 when it is ready). Programs written against DBAPI 2.0 should work with minimal or no changes with the future DBAPI 3.0, although 3.0 will no doubt offer further enhancements that future programs will be able to take advantage of.

If your Python program runs only on Windows, you may prefer to access databases by using Microsoft's ADO package through COM. For more information on using Python on Windows, see the book Python Programming on Win32, by Mark Hammond and Andy Robinson (O'Reilly). Since ADO and COM are platform-specific, and this book focuses on cross-platform use of Python, I do not cover ADO nor COM further in this book.

After importing a DBAPI-compliant module, you call the module's connect function with suitable parameters. connect returns an instance of class Connection, which represents a connection to the database. This instance supplies commit and rollback methods to let you deal with transactions, a close method to call as soon as you're done with the database, and a cursor method that returns an instance of class Cursor. This instance supplies the methods and attributes that you'll use for all database operations. A DBAPI-compliant module also supplies exception classes, descriptive attributes, factory functions, and type-description attributes.

A DBAPI-compliant module supplies exception classes Warning, Error, and several subclasses of Error. Warning indicates such anomalies as data truncation during insertion. Error's subclasses indicate various kinds of errors that your program can encounter when dealing with the database and the DBAPI-compliant module that interfaces to it. Generally, your code uses a statement of the form:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A Python program can handle time in several ways. Time intervals are represented by floating-point numbers, in units of seconds (a fraction of a second is the fractional part of the interval). Particular instants in time are expressed in seconds since a reference instant, known as the epoch. (Midnight, UTC, of January 1, 1970, is a popular epoch used on both Unix and Windows platforms.) Time instants often also need to be expressed as a mixture of units of measurement (e.g., years, months, days, hours, minutes, and seconds), particularly for I/O purposes.

This chapter covers the time module, which supplies Python's core time-handling functionality. The time module strongly depends on the system C library. The chapter also presents the sched and calendar modules and the essentials of the popular extension module mx.DateTime. mx.DateTime has more uniform behavior across platforms than time, which helps account for its popularity.

Python 2.3 will introduce a new datetime module to manipulate dates and times in other ways. At https://starship.python.net/crew/jbauer/normaldate/, you can download Jeff Bauer's normalDate.py, which gains simplicity by dealing only with dates, not with times. Neither of these modules is further covered in this book.

The underlying C library determines the range of dates that the time module can handle. On Unix systems, years 1970 and 2038 are the typical cut-off points, a limitation that mx.DateTime lets you avoid. Time instants are normally specified in UTC (Coordinated Universal Time, once known as GMT, or Greenwich Mean Time). Module time also supports local time zones and Daylight Saving Time (DST), but only to the extent that support is supplied by the underlying C system library.

As an alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers known as a time-tuple. Items in time-tuples are covered in Table 12-1. All items are integers, and therefore time-tuples cannot keep track of fractions of a second. In Python 2.2 and later, the result of any function in module

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The underlying C library determines the range of dates that the time module can handle. On Unix systems, years 1970 and 2038 are the typical cut-off points, a limitation that mx.DateTime lets you avoid. Time instants are normally specified in UTC (Coordinated Universal Time, once known as GMT, or Greenwich Mean Time). Module time also supports local time zones and Daylight Saving Time (DST), but only to the extent that support is supplied by the underlying C system library.

As an alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers known as a time-tuple. Items in time-tuples are covered in Table 12-1. All items are integers, and therefore time-tuples cannot keep track of fractions of a second. In Python 2.2 and later, the result of any function in module time that used to return a time-tuple is now of type struct_time. You can still use the result as a tuple, but you can also access the items as read-only attributes x .tm_year, x .tm_mon, and so on, using the attribute names listed in Table 12-1. Wherever a function used to require a time-tuple argument, you can now pass an instance of struct_time or any other sequence whose items are nine integers in the applicable ranges.

Table 12-1: Tuple form of time representation
Item	Meaning	Field name	Range	Notes

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The sched module supplies a class that implements an event scheduler. sched supplies a scheduler class.

scheduler

class scheduler(timefunc,delayfunc)

An instance s of scheduler is initialized with two functions, which s then uses for all time-related operations. timefunc must be callable without arguments to get the current time instant (in any unit of measure), meaning that you can pass time.time. delayfunc must be callable with one argument (a time duration, in the same units timefunc returns), and it should delay for about that amount of time, meaning you can pass time.sleep. scheduler also calls delayfunc with argument 0 after each event, to give other threads a chance; again, this is compatible with the behavior of time.sleep.

A scheduler instance s supplies the following methods.

cancel

s.cancel(event_token)

Removes an event from s's queue of scheduled events. event_token must be the result of a previous call to s .enter or s .enterabs, and the event must not yet have happened; otherwise cancel raises RuntimeError.

empty

s.empty( )

Returns True if s's queue of scheduled events is empty, otherwise False.

enterabs

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The calendar module supplies calendar-related functions, including functions to print a text calendar for any given month or year. By default, calendar considers Monday the first day of the week and Sunday the last one. You can change this setting by calling function calendar.setfirstweekday. calendar handles years in the range supported by module time, typically 1970 to 2038. Module calendar supplies the following functions.

calendar

calendar(year,w=2,l=1,c=6)

Returns a multiline string with a calendar for year year formatted into three columns separated by c spaces. w is the width in characters of each date; each line has length 21* w +18+2* c. l is the number of lines used for each week.

firstweekday

firstweekday( )

Returns the current setting for the weekday that starts each week. By default, when calendar is first imported, this is 0, meaning Monday.

isleap

isleap(year)

Returns True if year is a leap year, otherwise False.

leapdays

leapdays(y1,y2)

Returns the total number of leap days in the years in range( y1,y2 ).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

DateTime is one of the modules in the mx package made available by eGenix GmbH. mx is open source, and at the time of this writing, mx.DateTime has liberal license conditions similar to those of Python itself. mx.DateTime's popularity stems from its functional richness and cross-platform portability. I present only an essential subset of mx.DateTime's rich functionality here; the module comes with detailed documentation about its advanced time and date handling features.

Module DateTime supplies several date and time types whose instances are immutable (and therefore suitable as dictionary keys). Type DateTime represents a time instant and includes an absolute date, which is the number of days since an epoch of January 1, year 1 CE, according to the Gregorian calendar (0001-01-01 is day 1), and an absolute time, which is a floating-point number of seconds since midnight. Type DateTimeDelta represents an interval of elapsed time, which is a floating-point number of seconds. Class RelativeDateTime lets you specify dates in relative terms, such as "next Monday" or "first day of next month." DateTime and DateTimeDelta are covered in detail later in this section, but RelativeDateTime is not.

Date and time types supply customized string conversion, invoked via the built-in str or automatically during implicit conversion (e.g., in a print statement). The resulting strings are in standard ISO 8601 formats, such as:

YYYY-MM-DD HH:MM:SS.ss

For finer-grained control of string formatting, use method strftime. Function DateTimeFrom constructs DateTime instances from strings. Submodules of module mx.DateTime supply other formatting and parsing functions, using different standards and conventions.

Module DateTime supplies factory functions to build instances of type

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python directly exposes many of the mechanisms it uses internally. This helps you understand Python at an advanced level, and means you can hook your own code into such documented Python mechanisms and control those mechanisms to some extent. For example, Chapter 7 covered the import statement and the way Python arranges for built-ins to be made implicitly visible. This chapter covers other advanced techniques that Python offers for controlling execution, while Chapter 17 covers execution-control possibilities that apply specifically to the three crucial phases of development: testing, debugging, and profiling.

With Python's exec statement, it is possible to execute code that you read, generate, or otherwise obtain during the running of a program. The exec statement dynamically executes a statement or a suite of statements. exec is a simple keyword statement with the following syntax:

exec code[ in globals[,locals]]

code can be a string, an open file-like object, or a code object. globals and locals are dictionaries. If both are present, they are the global and local namespaces, respectively, in which code executes. If only globals is present, exec uses globals in the role of both namespaces. If neither globals nor locals is present, code executes in the current scope. Running exec in current scope is not good programming practice, since it can bind, rebind, or unbind any name. To keep things under control, you should use exec only with specific, explicit dictionaries.

More generally, use exec only when it's really indispensable. Most often, it is better avoided in favor of more specific mechanisms. For example, a frequently asked question is, "How do I set a variable whose name I just read or constructed?" Strictly speaking, exec lets you do this. For example, if the name of the variable you want to set is in variable varname, you might use:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

With Python's exec statement, it is possible to execute code that you read, generate, or otherwise obtain during the running of a program. The exec statement dynamically executes a statement or a suite of statements. exec is a simple keyword statement with the following syntax:

exec code[ in globals[,locals]]

code can be a string, an open file-like object, or a code object. globals and locals are dictionaries. If both are present, they are the global and local namespaces, respectively, in which code executes. If only globals is present, exec uses globals in the role of both namespaces. If neither globals nor locals is present, code executes in the current scope. Running exec in current scope is not good programming practice, since it can bind, rebind, or unbind any name. To keep things under control, you should use exec only with specific, explicit dictionaries.

More generally, use exec only when it's really indispensable. Most often, it is better avoided in favor of more specific mechanisms. For example, a frequently asked question is, "How do I set a variable whose name I just read or constructed?" Strictly speaking, exec lets you do this. For example, if the name of the variable you want to set is in variable varname, you might use:

exec varname+'=23'

Don't do this. An exec statement like this in current scope causes you to lose control of your namespace, leading to bugs that are extremely hard to track and more generally making your program unfathomably difficult to understand. An improvement is to keep the "variables" you need to set, not as variables, but as entries in a dictionary, say mydict. You can then use the following variation:

exec varname+'=23' in mydict

While this is not as terrible as the previous example, it is still a bad idea. The best approach is to keep such "variables" as dictionary entries and not use

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python code executed dynamically normally suffers no special restrictions. Python's general philosophy is to give the programmer tools and mechanisms that make it easy to write good, safe code, and trust the programmer to use them appropriately. Sometimes, however, trust might not be warranted. When code to execute dynamically comes from an untrusted source, the code itself is untrusted. In such cases it's important to selectively restrict the execution environment so that such code cannot accidentally or maliciously inflict damage. If you never need to execute untrusted code, you can skip this section. However, Python makes it easy to impose appropriate restrictions on untrusted code if you ever do need to execute it.

When the __builtins__ item in the global namespace isn't the standard __builtin__ module (or the latter's dictionary), Python knows the code being run is restricted. Restricted code executes in a sandbox environment, previously prepared by the trusted code, that requests the restricted code's execution. Standard modules rexec and Bastion help you prepare an appropriate sandbox. To ensure that restricted code cannot escape the sandbox, a few crucial internals (e.g., the __dict__ attributes of modules, classes, and instances) are not directly available to restricted code.

There is no special protection against restricted code raising exceptions. On the contrary, Python diagnoses any attempt by restricted code to violate the sandbox restrictions by raising an exception. Therefore, you should generally run restricted code in the try clause of a try/except statement, as covered in Chapter 6. Make sure you catch all exceptions and handle them appropriately if your program needs to keep running in such cases.

There is no built-in protection against untrusted code attempting to inflict damage by consuming large amounts of memory or time (so-called denial-of-service attacks). If you need to ward against such attacks, you can run untrusted code in a separate process. The separate process uses the mechanisms described in this section to restrict the untrusted code's execution, while the main process monitors the separate one and terminates it if and when resource consumption becomes excessive. Processes are covered in Chapter 14. Resource monitoring is currently supported by the standard Python library only on Unix-like platforms (by platform-specific module

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Some of the internal Python objects that I mention in this section are hard to use. Using such objects correctly requires some study of Python's own C (or Java) sources. Such black magic is rarely needed, except to build general-purpose development frameworks and similar wizardly tasks. Once you do understand things in depth, Python empowers you to exert control, if and when you need to. Since Python exposes internal objects to your Python code, you can exert that control by coding in Python, even when a nodding acquaintance with C (or Java) is needed to understand what is going on.

The built-in type named type acts as a factory object, returning objects that are types themselves (type was a built-in function in Python 2.1 and earlier). Type objects don't need to support any special operations except equality comparison and representation as strings. Most type objects are callable, and return new instances of the type when called. In particular, built-in types such as int, float, list, str, tuple, and dict all work this way. The attributes of the types module are the built-in types, each with one or more names. For example, types.DictType and types.DictionaryType both refer to

type({
})

, also known since Python 2.2 as the built-in type dict. Besides being callable to generate instances, type objects are useful in Python 2.2 and later because you can subclass them, as covered in Chapter 5.

As well as by using built-in function compile, you can also get a code object via the func_code attribute of a function or method object. A code object's co_varnames attribute is the tuple of names of local variables, including the formal arguments; the co_argcount attribute is the number of arguments. Code objects are not callable, but you can rebind the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python's garbage collection normally proceeds transparently and automatically, but you can choose to exert some direct control. The general principle is that Python collects each object x at some time after x becomes unreachable, that is, when no chain of references can reach x by starting from a local variable of a function that is executing, nor from a global variable of a loaded module. Normally, an object x becomes unreachable when there are no references at all to x. However, a group of objects can also be unreachable when they reference each other.

Classic Python keeps in each object x a count, known as a reference count, of how many references to x are outstanding. When x's reference count drops to 0, CPython immediately collects x. Function getrefcount of module sys accepts any object and returns its reference count (at least 1, since getrefcount itself has a reference to the object it's examining). Other versions of Python, such as Jython, rely on different garbage collection mechanisms, supplied by the platform they run on (e.g., the JVM). Modules gc and weakref therefore apply only to CPython.

When Python garbage-collects x and there are no references at all to x, Python then finalizes x (i.e., calls x.__del__( )) and makes the memory that x occupied available for other uses. If x held any references to other objects, Python removes the references, which in turn may make other objects collectable by leaving them unreachable.

The gc module exposes the functionality of Python's garbage collector. gc deals only with objects that are unreachable in a subtle way, being part of mutual reference loops. In such a loop, each object in the loop refers to others, keeping the reference counts of all objects positive. However, an outside reference no longer exists to the whole set of mutually referencing objects. Therefore, the whole group, also known as cyclic garbage, is unreachable, and therefore garbage collectable. Looking for such cyclic garbage loops takes time, which is why module

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The atexit module lets you register termination functions (i.e., functions to be called at program termination, last in, first out). Termination functions are similar to clean-up handlers established by try/finally. However, termination functions are globally registered and called at the end of the whole program, while clean-up handlers are established lexically and called at the end of a specific try clause. Both termination functions and clean-up handlers are called whether the program terminates normally or abnormally, but not when the termination is caused by calling os._exit. Module atexit supplies a single function called register.

register

register(func,*args,**kwds)

Ensures that func(*args,**kwds) is called at program termination time.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python provides a specific hook to let each site customize some aspects of Python's behavior at the start of each run. Customization by each single user is not enabled by default, but Python specifies how programs that want to run user-provided code at startup can explicitly request such customization.

Python loads standard module site just before the main script. If Python is run with option -S, Python does not load site. -S allows faster startup, but saddles the main script with initialization chores. site's tasks are:

Putting sys.path in standard form (absolute paths, no duplicates).
Interpreting each .pth file found in the Python home directory, adding entries to sys.path, and/or importing modules, as each .pth file indicates.
Adding built-ins used to display information in interactive sessions (quit, exit, copyright, credits, and license).
Setting the default Unicode encoding to 'ascii'. site's source code includes two blocks, each guarded by if 0:, one to set the default encoding to be locale dependent, and the other to disable default encoding and decoding between Unicode and plain strings. You may optionally edit site.py to select either block.
Trying to import sitecustomize (should import sitecustomize raise an ImportError exception, site catches and ignores it). sitecustomize is the module that each site's installation can optionally use for further site-specific customization beyond site's tasks. It is generally best not to edit site.py, as any Python upgrade or reinstallation might overwrite your customizations.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A thread is a flow of control that shares global state with other threads; all threads appear to execute simultaneously. Threads are not easy to master, but once you do, they may offer a simpler architecture or better performance (faster response, but typically not better throughput) for some problems. This chapter covers the facilities that Python provides for dealing with threads, including the thread, threading, and Queue modules.

A process is an instance of a running program. Sometimes you get better results with multiple processes than with threads. The operating system protects processes from one another. Processes that want to communicate must explicitly arrange to do so, via local inter-process communication (IPC). Processes may communicate via files (covered in Chapter 10) or via databases (covered in Chapter 11). In both cases, the general way in which processes communicate using such data storage mechanisms is that one process can write data, and another process can later read that data back. This chapter covers the process-related parts of module os, including simple IPC by means of pipes, and a cross-platform IPC mechanism known as memory-mapped files, supplied to Python programs by module mmap.

Network mechanisms are well suited for IPC, as they work between processes that run on different nodes of a network as well as those that run on the same node. Chapter 19 covers low-level network mechanisms that provide a flexible basis for IPC. Other, higher-level mechanisms, known as distributed computing, such as CORBA, DCOM/COM+, EJB, SOAP, XML-RPC, and .NET, make IPC easier, whether locally or remotely. However, distributed computing is not covered in this book.

Python offers multithreading on platforms that support threads, such as Win32, Linux, and most variants of Unix. The Python interpreter does not freely switch threads. Python uses a global interpreter lock (GIL) to ensure that switching between threads happens only between bytecode instructions or when C code deliberately releases the GIL (Python's C code releases the GIL around blocking I/O and sleep operations). An action is said to be

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python offers multithreading on platforms that support threads, such as Win32, Linux, and most variants of Unix. The Python interpreter does not freely switch threads. Python uses a global interpreter lock (GIL) to ensure that switching between threads happens only between bytecode instructions or when C code deliberately releases the GIL (Python's C code releases the GIL around blocking I/O and sleep operations). An action is said to be atomic if it's guaranteed that no thread switching within Python's process occurs between the start and the end of the action. In practice, an operation that looks atomic actually is atomic when executed on an object of a built-in type (augmented assignment on an immutable object, however, is not atomic). However, in general it is not a good idea to rely on atomicity. For example, you never know when you might be dealing with a derived class rather than an object of a built-in type, meaning there might be callbacks to Python code.

Python offers multithreading in two different flavors. An older and lower-level module, thread, offers a bare minimum of functionality, and is not recommended for direct use by your code. The higher-level module threading, built on top of thread, was loosely inspired by Java's threads, and is the recommended tool. The key design issue in multithreading systems is most often how best to coordinate multiple threads. threading therefore supplies several synchronization objects. Module Queue is very useful for thread synchronization as it supplies a synchronized FIFO queue type, which is extremely handy for communication and coordination between threads.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The only part of the thread module that your code should use directly is the lock objects that module thread supplies. Locks are simple thread-synchronization primitives. Technically, thread's locks are non-reentrant and unowned: they do not keep track of what thread last locked them, so there is no specific owner thread for a lock. A lock is in one of two states, locked or unlocked.

To get a new lock object (in the unlocked state), call the function named allocate_lock without arguments. This function is supplied by both modules thread and threading. A lock object L supplies three methods.

acquire

L.acquire(wait=True)

When wait is True, acquire locks L. If L is already locked, the calling thread suspends and waits until L is unlocked, then locks L. Even if the calling thread was the one that last locked L, it still suspends and waits until another thread releases L. When wait is False and L is unlocked, acquire locks L and returns True. When wait is False and L is locked, acquire does not affect L, and returns False.

locked

L.locked( )

Returns True if L is locked, otherwise False.

release

L.release( )

Unlocks L, which must be locked. When L is locked, any thread may call L .release, not just the thread that last locked L. When more than one thread is waiting on

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Queue module supplies first-in, first-out (FIFO) queues that support multithread access, with one main class and two exception classes.

Queue

class Queue(maxsize=0)

Queue is the main class for module Queue and is covered in the next section. When maxsize is greater than 0, the new Queue instance q is deemed full when q has maxsize items. A thread inserting an item with the block option, when q is full, suspends until another thread extracts an item. When maxsize is less than or equal to 0, q is never considered full, and is limited in size only by available memory, like normal Python containers.

Empty

Empty is the class of the exception that q .get(False) raises when q is empty.

Full

Full is the class of the exception that q .put( x ,False) raises when q is full.

An instance q of class Queue supplies the following methods.

empty

q.empty( )

Returns True if q is empty, otherwise False.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The threading module is built on top of module thread and supplies multithreading functionality in a more usable form. The general approach of threading is similar to that of Java, but locks and conditions are modeled as separate objects (in Java, such functionality is part of every object), and threads cannot be directly controlled from the outside (meaning there are no priorities, groups, destruction, or stopping). All methods of objects supplied by threading are atomic.

threading provides numerous classes for dealing with threads, including Thread, Condition, Event, RLock, and Semaphore. Besides factory functions for the classes detailed in the following sections of this chapter, threading supplies the currentThread factory function.

currentThread

currentThread( )

Returns a Thread object for the calling thread. If the calling thread was not created by module threading, currentThread creates and returns a semi-dummy Thread object with limited functionality.

A Thread object t models a thread. You can pass t's main function as an argument when you create t, or you can subclass Thread and override the run method (you may also override

_
_init__

, but should not override other methods). t is not ready to run when you create it: to make t ready (active), call t .start( ). Once t is active, it terminates when its main function ends, either normally or by propagating an exception. A Thread t can be a daemon, meaning that Python can terminate even if t is still active, while a normal (non-daemon) thread keeps Python alive until the thread terminates. Class

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A threaded program should always arrange for a single thread to deal with any given object or subsystem that is external to the program (such as a file, a database, a GUI, or a network connection). Having multiple threads that deal with the same external object can often cause unpredictable problems.

Whenever your threaded program must deal with some external object, devote a thread to such dealings, using a Queue object from which the external-interfacing thread gets work requests that other threads post. The external-interfacing thread can return results by putting them on one or more other Queue objects. The following example shows how to package this architecture into a general, reusable class, assuming that each unit of work on the external subsystem can be represented by a callable object:

import threading, Queue
class ExternalInterfacing(Threading.Thread):
    def __init__(self, externalCallable, **kwds):
        Threading.Thread.__init__(self, **kwds)
        self.setDaemon(1)
        self.externalCallable = externalCallable
        self.workRequestQueue = Queue.Queue( )
        self.resultQueue = Queue.Queue( )
        self.start( )
    def request(self, *args, **kwds):
        "called by other threads as externalCallable would be"
        self.workRequestQueue.put((args,kwds))
        return self.resultQueue.get( )
    def run(self):
        while 1:
            args, kwds = self.workRequestQueue.get( )
            self.resultQueue.put(self.externalCallable(*args, **kwds))

Once some ExternalInterfacing object ei is instantiated, all other threads may now call ei .request just like they would call someExternalCallable without such a mechanism (with or without arguments as appropriate). The advantage of the ExternalInterfacing mechanism is that all calls upon someExternalCallable are now serialized. This means they are performed by just one thread (the thread object bound to ei) in some defined sequential order, without overlap, race conditions (hard-to-debug errors that depend on which thread happens to get there first), or other anomalies that might otherwise result.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The operating system supplies each process P with an environment, which is a set of environment variables whose names are identifiers (most often, by convention, uppercase identifiers) and whose contents are strings. For example, in Chapter 3, we covered environment variables that affect Python's operations. Operating system shells offer various ways to examine and modify the environment, by such means as shell commands and others mentioned in Chapter 3.

The environment of any process P is determined when P starts. After startup, only P itself can change P's environment. Nothing that P does affects the environment of P's parent process (the process that started P), nor those of child processes previously started from P and now running, nor of processes unrelated to P. Changes to P's environment affect only P itself: the environment is not a means of IPC. Child processes of P normally get a copy of P's environment as their starting environment: in this sense, changes to P's environment do affect child processes that P starts after such changes.

Module os supplies attribute environ, a mapping that represents the current process's environment. os.environ is initialized from the process environment when Python starts. Changes to os.environ update the current process's environment if the platform supports such updates. Keys and values in os.environ must be strings. On Windows, but not on Unix-like platforms, keys into os.environ are implicitly uppercased. For example, here's how to try to determine what shell or command processor you're running under:

import os
shell = os.environ.get('COMSPEC')
if shell is None: shell = os.environ.get('SHELL')
if shell is None: shell = 'an unknown command processor'
print 'Running under', shell

If a Python program changes its own environment (e.g., via os.environ['X']='Y'), this does not affect the environment of the shell or command processor that started the program. Like in other cases, changes to a process's environment affect only the process itself, not others.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The os module offers several ways for your program to run other programs. The simplest way to run another program is through function os.system, although this offers no way to control the external program. The os module also provides a number of functions whose names start with exec. These functions offer fine-grained control. A program run by one of the exec functions, however, replaces the current program (i.e., the Python interpreter) in the same process. In practice, therefore, you use the exec functions mostly on platforms that let a process duplicate itself by fork (i.e., Unix-like platforms). Finally, os functions whose names start with spawn and popen offer intermediate simplicity and power: they are cross-platform and not quite as simple as system, but simple and usable enough for most purposes.

The exec and spawn functions run a specified executable file given the executable file's path, arguments to pass to it, and optionally an environment mapping. The system and popen functions execute a command, a string passed to a new instance of the platform's default shell (typically /bin/sh on Unix, command.com or cmd.exe on Windows). A command is a more general concept than an executable file, as it can include shell functionality (pipes, redirection, built-in shell commands) using the normal shell syntax specific to the current platform.

execl, execle, execlp, execv, execve, execvp, execvpe

execl(path,*args) execle(path,*args) execlp(path,*args) execv(path,args) execve(path,args,env) execvp(path,args) execvpe(path,args,env)

These functions run the executable file (program) indicated by string

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The mmap module supplies memory-mapped file objects. An mmap object behaves similarly to a plain (not Unicode) string, so you can often pass an mmap object where a plain string is expected. However, there are differences:

An mmap object does not supply the methods of a string object
An mmap object is mutable, while string objects are immutable
An mmap object also corresponds to an open file and behaves polymorphically to a Python file object (as covered in Chapter 10)

An mmap object m can be indexed or sliced, yielding plain strings. Since m is mutable, you can also assign to an indexing or slicing of m. However, when you assign to a slice of m, the right-hand side of the assignment statement must be a string of exactly the same length as the slice you're assigning to. Therefore, many of the useful tricks available with list slice assignment (covered in Chapter 4) do not apply to mmap slice assignment.

Module mmap supplies a factory function that is different on Unix-like systems and Windows.

mmap

mmap(filedesc,length,tagname='') # Windows mmap(filedesc,length,flags=MAP_SHARED, prot=PROT_READ|PROT_WRITE) # Unix

Creates and returns an mmap object m that maps into memory the first length bytes of the file indicated by file descriptor filedesc. filedesc must normally be a file descriptor opened for both reading and writing (except, on Unix-like platforms, when argument prot requests only reading or only writing). File descriptors are covered in Section 10.2.8. To get an mmap object m that refers to a Python file object

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

In Python, you can perform numeric computations with operators (as covered in Chapter 4) and built-in functions (as covered in Chapter 8). Python also provides the math, cmath, operator, and random modules, which support additional numeric computation functionality, as documented in this chapter.

You can represent arrays in Python with lists and tuples (covered in Chapter 4), as well as with the array standard library module, which is covered in this chapter. You can also build advanced array manipulation functions with loops, list comprehensions, iterators, generators, and built-ins such as map, reduce, and filter, but such functions can be complicated and slow. Therefore, when you process large arrays of numbers in these ways, your program's performance can be below your machine's full potential.

The Numeric package addresses these issues, providing high-performance support for multidimensional arrays (matrices) and advanced mathematical operations, such as linear algebra and Fourier transforms. Numeric does not come with standard Python distributions, but you can freely download it at https://sourceforge.net/projects/numpy, either as source code (which is easy to build and install on many platforms) or as a prebuilt self-installing .exe file for Windows. Visit https://www.pfdubois.com/numpy/ for an extensive tutorial and other resources, such as a mailing list about Numeric. Note that the Numeric package is not just for numeric processing. Much of Numeric is about multidimensional arrays and advanced array handling that you can use for any Python sequence.

Numeric is a large, rich package. For full understanding, study the tutorial, work through the examples, and experiment interactively. This chapter presents a reference to an essential subset of Numeric on the assumption that you already have some grasp of array manipulation and numeric computing issues. If you are unfamiliar with this subject, the Numeric tutorial can help.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The math module supplies mathematical functions on floating-point numbers, while the cmath module supplies equivalent functions on complex numbers. For example, math.sqrt(-1) raises an exception, but cmath.sqrt(-1) returns 1j.

Each module also exposes two attributes of type float bound to the values of fundamental mathematical constants, pi and e.

acos

math and cmath

acos(x)

Returns the arccosine of x in radians.

acosh

cmath only

acosh(x)

Returns the arc hyperbolic cosine of x in radians.

asin

math and cmath

asin(x)

Returns the arcsine of x in radians.

asinh

cmath only

asinh(x)

Returns the arc hyperbolic sine of x in radians.

atan

math and cmath

atan(x)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The operator module supplies functions that are equivalent to Python's operators. These functions are handy for use with map and reduce, and in other cases where callables must be stored, passed as arguments, or returned as function results. The functions in operator have the same names as the corresponding special methods (covered in Chapter 5). Each function is available with two names, with and without the leading and trailing double underscores (e.g., both operator.add( a,b ) and

operator.__add_
_(

a,b ) return a + b). Table 15-1 lists the functions supplied by operator.

Table 15-1: Functions supplied by operator
Method	Signature	Behaves like
abs	abs(a)	`abs(` a `)`
add	add(a,b)	a `+` b
and_	and_(a,b)	a `&`

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The random module generates pseudo-random numbers with various distributions. The underlying uniform pseudo-random generator uses the Whichmann-Hill algorithm, with a period of length 6,953,607,871,644. The resulting pseudo-random numbers, while quite good, are not of cryptographic quality. If you want physically generated random numbers rather than algorithmically generated pseudo-random numbers, you may use /dev/random or /dev/urandom on platforms that support such pseudo-devices (such as recent Linux releases). For an alternative, see https://www.fourmilab.ch/hotbits.

All functions of module random are methods of a hidden instance of class random.Random. You can instantiate Random explicitly to get multiple generators that do not share state. Explicit instantiation is advisable if you require random numbers in multiple threads (threads are covered in Chapter 14). This section documents the most frequently used functions exposed by module random.

choice

choice(seq)

Returns a random item from non-empty sequence seq.

getstate

getstate( )

Returns an object S that represents the current state of the generator. You can later pass S to function setstate in order to restore the generator's state.

jumpahead

jumpahead(n)

Advances the generator state as if n random numbers had been generated. Computing the new state is faster than generating

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The array module supplies a type, also called array, whose instances are mutable sequences, like lists. An array a is a one-dimensional sequence whose items can be only characters, or only numbers of one specific numeric type that is fixed when a is created.

The extension module Numeric, covered later in this chapter, also supplies a type called array that is far more powerful than array.array. For advanced array operations and multidimensional arrays, I recommend Numeric even if your array elements are not numbers.

array.array is a simple type, whose main advantage is that, compared to a list, it can save memory to hold objects all of the same (numeric or character) type. An array object a has a one-character read-only attribute a .typecode, set when a is created, that gives the type of a's items. Table 15-2 shows the possible type codes for array.

Table 15-2: Type codes for the array module
Type code	C type	Python type	Minimum size
'c'	char	`str` (length 1)	1 byte
'b'	char	int	1 byte

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The main module in the Numeric package is the Numeric module, which provides the array object type, a set of functions that manipulate these objects, and universal functions that operate on arrays and other sequences. The Numeric package also supports a variety of optional modules for things like linear algebra, random numbers, masked arrays, and Fast Fourier Transforms.

Numeric is one of the rare Python packages often used with the idiom from Numeric import *. You can also use import Numeric and qualify each name by preceding it with Numeric. However, if you need many of the package's names, importing all the names at once is handy. Another popular alternative is to import Numeric with a shorter name (e.g., import Numeric as N) and qualify each name by preceding it with N.

Although quite solid and stable, Numeric is under continuous development, with functionality being added and limitations removed. This chapter describes specifically Numeric Version 21.3, the latest released version at the time of this writing. A successor to Numeric, named numarray, is being developed by the Numeric community, and is not quite ready for production use yet. numarray is not totally compatible with Numeric, but shares most of Numeric's functionality and enriches it further. Information on numarray is available at https://stsdas.stsci.edu/numarray/.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Numeric provides an array type that represents a grid of items. An array object a has a specified number of dimensions, known as its rank, up to some arbitrarily high limit (normally 40, when Numeric is built with default options). A scalar (i.e., a single number) has rank 0, a vector has rank 1, a matrix has rank 2, and so forth.

The values that occupy cells in the grid of an array object, known as the elements of the array, are homogeneous, meaning they are all of the same type, and all element values are stored within one memory area. This contrasts with a list or tuple, where the items may be of different types and each is stored as a separate Python object. This means a Numeric array occupies far less memory than a Python list or tuple with the same number of items. The type of a's elements is encoded as a's type code, a one-character string, as shown in Table 15-3. Factory functions that build array instances, covered in Section 15.6.6 later in this chapter, take a typecode argument that is one of the values in Table 15-3.

Table 15-3: Type codes for Numeric arrays
Type code	C type	Python type	Synonym
'c'	char	`str` (length 1)	Character

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Numeric supplies named functions with the same semantics as Python's arithmetic, comparison, and bitwise operators. Similar semantics (element-wise operation, broadcasting, coercion) are also available with other mathematical functions, both binary and unary, that Numeric supplies. For example, Numeric supplies typical mathematical functions similar to those supplied by built-in module math, such as sin, cos, log, and exp.

These functions are objects of type ufunc (which stands for universal function) and share several traits in addition to those they have in common with array operators. Every ufunc instance u is callable, is applicable to sequences as well as to arrays, and lets you specify an optional output argument. If u is binary (i.e., if u accepts two operand arguments), u also has four callable attributes, named u .accumulate, u .outer, u .reduce, and u .reduceat. The ufunc objects supplied by Numeric apply only to arrays with numeric type codes (i.e., not to arrays with type code 'O' or 'c').

Any ufunc u applies to sequences, not just to arrays. When you start with a list L, it's faster to call u directly on L rather than to convert L to an array. u's return value is an array a; you can perform further computation, if any, on a, and then, if you need a list result, you can convert the resulting array to a list by calling its method tolist. For example, say you must compute the logarithm of each item of a list and return another list. On my system, with N set to 2222 and using python -O, a list comprehension such as:

def logsupto(N):
    return [math.log(x) for x in range(2,N)]

takes about 5.6 milliseconds. Using Python's built-in map:

def logsupto(N):
    return map(math.log, range(2,N))

takes around half the time, 2.8 milliseconds. Using Numeric's ufunc named log:

def logsupto(N):
    return Numeric.log(range(2,N)).tolist( )

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Many other modules are built on top of Numeric or cooperate with it. You can download some of them from the same URL as Numeric (https://sourceforge.net/projects/numpy). Some of these extra modules may already be included in the package you have downloaded. Documentation for the modules is also part of the documentation for Numeric. A rich library of scientific tools that work well with Numeric is SciPy, available at https://www.scipy.org. I highly recommend it if you are using Python for scientific or engineering computing.

Here are some key optional Numeric modules:

MLab: MLab supplies many Python functions written on top of Numeric. MLab's functions are similar in name and operation to functions supplied by the product Matlab.
FFT: FFT supplies Python-callable Fast Fourier Transforms (FFTs) of data held in Numeric arrays. FFT can wrap either the well-known FFTPACK Fortran-coded library or the compatible C-coded fftpack library.
LinearAlgebra: LinearAlgebra supplies Python-callable functions, operating on data held in Numeric arrays, that wrap either the well-known LAPACK Fortran-coded library or the compatible C-coded lapack_lite library. LinearAlgebra lets you invert matrices, solve linear systems, compute eigenvalues and eigenvectors, perform singular value decomposition, and least-squares-solve overdetermined linear systems.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Most professional applications interact with users through a graphical user interface (GUI). A GUI is normally programmed through a toolkit, which is a library that implements controls (also known as widgets) that are visible objects such as buttons, labels, text entry fields, and menus. A GUI toolkit lets you compose controls into a coherent whole, display them on-screen, and interact with the user, receiving input via such devices as the keyboard and mouse.

Python gives you a choice among many GUI toolkits. Some are platform-specific, but most are cross-platform to different degrees, supporting at least Windows and Unix-like platforms, and often the Macintosh as well. Check https://phaseit.net/claird/comp.lang.python/python_GUI.html for a list of dozens of GUI toolkits available for Python. One package, anygui (https://anygui.org), lets you program simple GUIs to one common programming interface and deploy them with any of a variety of backends.

The most widespread Python GUI toolkit is Tkinter. Tkinter is an object-oriented Python wrapper around the cross-platform toolkit Tk, which is also used with other scripting languages such as Tcl (for which it was originally developed) and Perl. Tkinter, like the underlying Tcl/Tk, runs on Windows, Macintosh, and Unix-like platforms. Tkinter itself comes with standard Python distributions. On Windows, the standard Python distribution also includes the Tcl/Tk components needed to run Tkinter. On other platforms, you must obtain and install Tcl/Tk separately.

This chapter covers an essential subset of Tkinter, sufficient to build simple graphical frontends for Python applications. A richer introduction is available at https://www.pythonware.com/library/tkinter/introduction/.

The Tkinter module makes it easy to build simple GUI applications. You simply import Tkinter, create, configure, and position the widgets you want, and then enter the Tkinter

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Tkinter module makes it easy to build simple GUI applications. You simply import Tkinter, create, configure, and position the widgets you want, and then enter the Tkinter main loop. Your application becomes event-driven, which means that the user interacts with the widgets, causing events, and your application responds via the functions you installed as handlers for these events.

The following example shows a simple application that exhibits this general structure:

import sys, Tkinter
Tkinter.Label(text="Welcome!").pack( )
Tkinter.Button(text="Exit", command=sys.exit).pack( )
Tkinter.mainloop( )

The calls to Label and Button create the respective widgets and return them as results. Since we specify no parent windows, Tkinter puts the widgets directly in the application's main window. The named arguments specify each widget's configuration. In this simple case, we don't need to bind variables to the widgets. We just call the pack method on each widget, handing control of the widget's geometry to a layout manager object known as the packer. A layout manager is an invisible component whose job is to position widgets within other widgets (known as container or parent widgets), handling geometrical layout issues. The previous example passes no arguments to control the packer's operation, so therefore the packer operates in a default way.

When the user clicks on the button, the command callable of the Button widget executes without arguments. The example passes function sys.exit as the argument named command when it creates the Button. Therefore, when the user clicks on the button, sys.exit( ) executes and terminates the application (as covered in Chapter 8).

After creating and packing the widgets, the example calls Tkinter's mainloop function, and thus enters the Tkinter main loop and becomes event-driven. Since the only event for which the example installs a handler is a click on the button, nothing happens from the application's viewpoint until the user clicks the button. Meanwhile, however, the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Tkinter module supplies many kinds of widgets, and most of them have several things in common. All widgets are instances of classes that inherit from class Widget. Class Widget itself is abstract; that is, you never instantiate Widget itself. You only instantiate concrete subclasses corresponding to specific kinds of widgets. Class Widget's functionality is common to all the widgets you instantiate.

To instantiate any kind of widget, call the widget's class. The first argument is the parent window of the widget, also known as the widget's master. If you omit this positional argument, the widget's master is the application's main window. All other arguments are in named form, option = value. You can also set or change options on an existing widget w by calling w .config( option = value ). You can get an option of w by calling w. cget('option'), which returns the option's value. Each widget w is a mapping, so you can also get an option as w ['option'] and set or change it with w ['option']= value.

Many widgets accept some common options. Some options affect a widget's colors, others affect lengths (normally in pixels), and there are various other kinds. This section details the most commonly used options.

Section 16.2.1.1: Color options

Tkinter represents colors with strings. The string can be a color name, such as 'red' or 'orange', or it may be of the form '# RRGGBB', where each of R, G, and B is a hexadecimal digit, to represent a color by the values of red, green, and blue components on a scale of 0 to 255. Don't worry; if your screen can't display millions of different colors, as implied by this scheme; Tkinter maps any requested color to the closest color that your screen can display. The common color options are:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Tkinter module provides a number of simple widgets that cover most needs of basic GUI applications. This section documents the Button, Checkbutton, Entry, Label, Listbox, Radiobutton, Scale, and Scrollbar widgets.

Class Button implements a pushbutton, which the user clicks to execute an action. Instantiate Button with option text= somestring to let the button show text, or image= imageobject to let the button show an image. You normally use option command= callable to have callable execute without arguments when the user clicks the button. callable can be a function, a bound method of an object, an instance of a class with a __call__ method, or a lambda.

Besides methods common to all widgets, an instance b of class Button supplies two button-specific methods.

flash

b.flash( )

Draws the user's attention to button b by redrawing b a few times, alternatively in normal and active states.

invoke

b.invoke( )

Calls without arguments the callable object that is b's command option, just like b .cget('command')( ). This can be handy when, within some other action, you want the program to act just as if the button had been clicked.

Class

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Tkinter module supplies widgets whose purpose is to contain other widgets. A Frame instance does nothing more than act as a container. A Toplevel instance (including Tkinter's root window, also known as the application's main window) is a top-level window, so your window manager interacts with it (typically by supplying suitable decoration and handling certain requests). To ensure that a widget parent, which must be a Frame or Toplevel instance, is the parent (also known as master) of another widget child, pass parent as the first parameter when you instantiate child.

Class Frame represents a rectangular area of the screen contained in other frames or top-level windows. Frame's only purpose is to contain other widgets. Option borderwidth defaults to 0, so an instance of Frame normally displays no border. You can configure the option with borderwidth=1 if you want the frame border's outline to be visible.

Class Toplevel represents a rectangular area of the screen that is a top-level window and therefore receives decoration from whatever window manager handles your screen. Each instance of Toplevel can interact with the window manager and can contain other widgets. Every program using Tkinter has at least one top-level window, known as the root window. You can instantiate Tkinter's root window explicitly using root

=Tkinter.Tk(
)

; otherwise Tkinter instantiates its root window implicitly as and when first needed. If you want to have more than one top-level window, first instantiate the main one with root =Tkinter.Tk( ). Later in your program, you can instantiate other top-level windows as needed, with calls such as another_toplevel

=Tkinter.Toplevel(
)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Class Menu implements all kinds of menus: menubars of top-level windows, submenus, and pop-up menus. To use a Menu instance m as the menubar for a top-level window w, set w's configuration option menu= m. To use m as a submenu of a Menu instance x, call x .add_cascade with a named argument menu= m. To use m as a pop-up menu, call method m .post.

Besides configuration options covered in Section 16.2.1 earlier in this chapter, a Menu instance m supports option postcommand= callable. Tkinter calls callable without arguments each time it is about to display m (whether because of a call to m .post or because of user actions). You can use this option to update a dynamic menu just in time when necessary.

By default, a Tkinter menu shows a tear-off entry (a dashed line before other entries), which lets the user get a copy of the menu in a separate Toplevel window. Since such tear-offs are not part of user interface standards on popular platforms, you may want to disable tear-off functionality by using configuration option tearoff=0 for the menu.

Besides methods common to all widgets, an instance m of class Menu supplies several menu-specific methods.

add, add_cascade, add_checkbutton, add_command, add_radiobutton, add_separator

m.add(entry_kind, **entry_options)

Adds after m's existing entries a new entry whose kind is the string entry_kind, which is one of the strings 'cascade', 'checkbutton', 'command

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Class Text implements a powerful multiline text editor, able to display images and embedded widgets as well as text in one or more fonts and colors. An instance t of Text supports many ways to refer to specific points in t's contents. t supplies methods and configuration options allowing fine-grained control of operations, content, and rendering. This section covers a large, frequently used subset of this vast functionality. In some very simple cases, you can get by with just three Text-specific idioms:

               t.delete('1.0', END)             # clear the widget's contents
t.insert(END, astring)           # append astring to the widget's contents
somestring = t.get('1.0', END)   # get the widget's contents as a string

END is an index on any Text instance t, indicating the end of t's text. '1.0' is also an index, indicating the start of t's text (first line, first column). For more about indices, see Section 16.6.5 later in this chapter.

An instance t of class Text supplies many methods. Methods dealing with marks and tags are covered in later sections. Many methods accept one or two indices into t's contents. The most frequently used methods are the following.

delete

t.delete(i[,j])

t .delete( i ) removes t's character at index i. t .delete( i,j ) removes all characters from index i to index j, included.

get

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Class Canvas is a powerful, flexible widget used for many purposes, including plotting and, in particular, building custom widgets. Building custom widgets is an advanced topic, and I do not cover it further in this book. This section covers only a subset of Canvas functionality used for the simplest kind of plotting.

Coordinates within a Canvas instance c are in pixels, with the origin at the upper left corner of c and positive coordinates growing rightward and downward. There are advanced methods that let you change c's coordinate system, but I do not cover them in this book.

What you draw on a Canvas instance c are canvas items, which can be lines, polygons, Tkinter images, arcs, ovals, texts, and others. Each item has an item handle by which you can refer to the item. You can also assign symbolic names called tags to sets of canvas items (the sets of items with different tags can overlap). ALL is a predefined tag that applies to all items; CURRENT is a predefined tag that applies to the item under the mouse pointer.

Tags on a Canvas instance are different from tags on a Text instance. The canvas tags are nothing more than sets of items with no independent existence. When you perform any operation, passing a Canvas tag as the item identifier, the operation occurs on those items that are in the tag's current set. It makes no difference if items are later removed from or added to that tag's set.

You create a canvas item by calling on c a method with a name of the form create_ kindofitem, which returns the new item's handle. Methods itemcget and itemconfig of c let you get and change items' options.

A Canvas instance c supplies methods that you can call on items. The item argument can be an item's handle, as returned for example by c .create_line, or a tag, meaning all items in that tag's set (or no items at all, if the tag's set is currently empty), unless otherwise indicated in the method's description.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

In all the examples so far, we have made each widget visible by calling method pack on the widget. This is representative of real-life Tkinter usage. However, two other layout managers exist and are sometimes useful. This section covers all three layout managers provided by the Tkinter module.

Never mix geometry managers for the same container widget: all children of each given container widget must be handled by the same geometry manager, or very strange effects (including Tkinter going into infinite loops) may result.

Calling method pack on a widget delegates widget geometry management to a simple and flexible layout manager component called the Packer. The Packer sizes and positions each widget within a container (parent) widget, according to each widget's space needs (including options padx and pady). Each widget w supplies the following Packer-related methods.

pack

w.pack(**pack_options)

Delegates geometry management to the packer. pack_options may include:

expand: When true, w expands to fill any space not otherwise used in w's parent.
fill: Determines whether w fills any extra space allocated to it by the packer, or keeps its own minimal dimensions: NONE (default), X (fill only horizontally), Y (fill only vertically), or BOTH (fill both horizontally and vertically).

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

So far, we've seen only the most elementary kind of event handling: the callbacks performed on callables installed with the command= option of buttons and menu entries of various kinds. Tkinter also lets you install callables to call back when needed to handle a variety of events. However, Tkinter does not let you create your own custom events; you are limited to working with events predefined by Tkinter itself.

General event callbacks must accept one argument event that is a Tkinter event object. Such an event object has several attributes describing the event:

char: A single-character string that is the key's code (only for keyboard events)
keysym: A string that is the key's symbolic name (only for keyboard events)
num: Button number (only for mouse-button events); 1 and up
x, y: Mouse position, in pixels, relative to the upper left corner of the widget
x_root , y_root

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

You're not finished with a programming task when you're done writing the code: you're finished when your code is running correctly and with acceptable performance. Testing means verifying that your code is running correctly by exercising the code under known conditions and checking that the results are as expected. Debugging means discovering the causes of incorrect behavior and removing them (the removal is often easy once you have figured out the causes).

Optimizing is often used as an umbrella term for activities meant to ensure acceptable performance. Optimizing breaks down into benchmarking (measuring performance for given tasks and checking that it's within acceptable bounds), profiling (instrumenting the program to find out what parts are performance bottlenecks), and optimizing proper (removing bottlenecks to make overall program performance acceptable). Clearly, you can't remove performance bottlenecks until you've found out where they are (using profiling), which in turn requires knowing that there are performance problems (using benchmarking).

All of these tasks are large and important, and each could fill a book by itself. This chapter does not explore every related technique and implication; it focuses on Python-specific techniques, approaches, and tools.

In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and even more distinctions could be drawn, but my goal is to focus on the issues of most immediate importance to software developers.

Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional testing) involves running an entire program with known inputs. Some classic books on testing draw the distinction between

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and even more distinctions could be drawn, but my goal is to focus on the issues of most immediate importance to software developers.

Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional testing) involves running an entire program with known inputs. Some classic books on testing draw the distinction between white-box testing, done with knowledge of a program's internals, and black-box testing, done from the outside. This classic viewpoint parallels the modern one of unit versus system testing.

Unit and system testing serve different goals. Unit testing proceeds apace with development; you can and should test each unit as you're developing it. Indeed, one modern approach is known as test-first coding: for each feature that your program must have, you first write unit tests, and only then do you proceed to write code that implements the feature. Test-first coding seems a strange approach, but it has several advantages. For example, it ensures that you won't omit unit tests for some feature. Further, test-first coding is helpful because it urges you to focus first on what tasks a certain function, class, or method should accomplish, and to deal only afterwards with implementing that function, class, or method. In order to test a unit, which may depend on other units not yet fully developed, you often have to write stubs, which are fake implementations of various units' interfaces that give known and correct responses in cases needed to test other units.

System testing comes afterwards, since it requires the system to exist with some subset of system functionality believed to be in working condition. System testing provides a sanity check: given that each module in the program works properly (passes unit tests), does the whole program work? If each unit is okay but the system as a whole is not, there is a problem with integration between units. For this reason, system testing is also known as integration testing.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Since Python's development cycle is so fast, the most effective way to debug is often to edit your code to make it output relevant information at key points. Python has many ways to let your code explore its own state in order to extract information that may be relevant for debugging. The inspect and traceback modules specifically support such exploration, which is also known as reflection or introspection.

Once you have obtained debugging-relevant information, statement print is often the simplest way to display it. You can also log debugging information to files. Logging is particularly useful for programs that run unattended for a long time, as is typically the case for server programs. Displaying debugging information is like displaying other kinds of information, as covered in Chapter 10 and Chapter 16, and similarly for logging it, as covered in Chapter 10 and Chapter 11. Python 2.3 will also include a module specifically dedicated to logging. As covered in Chapter 8, rebinding attribute excepthook of module sys lets your program log detailed error information just before your program is terminated by a propagating exception.

Python also offers hooks enabling interactive debugging. Module pdb supplies a simple text-mode interactive debugger. Other interactive debuggers for Python are part of integrated development environments (IDEs), such as IDLE and various commercial offerings. However, I do not cover IDEs in this book.

The inspect module supplies functions to extract information from all kinds of objects, including the Python call stack (which records all function calls currently executing) and source files. At the time of this writing, module inspect is not yet available for Jython. The most frequently used functions of module inspect are as follows.

getargspec, formatargspec

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Warnings are messages about errors or anomalies that may not be serious enough to be worth disrupting the program's control flow (as would happen by raising a normal exception). The warnings module offers you fine-grained control over which warnings are output and what happens to them. Your code can conditionally output a warning by calling function warn in module warnings. Other functions in the module let you control how warnings are formatted, set their destinations, and conditionally suppress some warnings (or transform some warnings into exceptions).

Module warnings supplies several exception classes representing warnings. Class Warning subclasses Exception and is the base class for all warnings. You may define your own warning classes; they must subclass Warning, either directly or via one of its other existing subclasses, which are:

DeprecationWarning: Using deprecated features only supplied for backward compatibility
RuntimeWarning: Using features whose semantics are error-prone
SyntaxWarning: Using features whose syntax is error-prone
UserWarning: Other user-defined warnings that don't fit any of the above cases

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

"First make it work. Then make it right. Then make it fast." This quotation, often with slight variations, is widely known as the golden rule of programming. As far as I've been able to ascertain, the quotation is attributed to Kent Beck, who credits his father with it. Being widely known makes the principle no less important, particularly because it's more honored in the breach than in the observance. A negative form, slightly exaggerated for emphasis, is in a quotation by Don Knuth: "Premature optimization is the root of all evil in programming."

Optimization is premature if your code is not working yet. First make it work. Optimization is also premature if your code is working but you are not satisfied with the overall architecture and design. Remedy structural flaws before worrying about optimization: first make it work, then make it right. These first two steps are not optional—working, well-architected code is always a must.

In contrast, you don't always need to make it fast. Benchmarks may show that your code's performance is already acceptable after the first two steps. When performance is not acceptable, profiling often shows that all performance issues are in a small subset, perhaps 10% to 20% of the code where your program spends 80% or 90% of the time. Such performance-crucial regions of your code are also known as its bottlenecks, or hot spots. It's a waste of effort to optimize large portions of code that account for, say, 10% of your program's running time. Even if you made that part run 10 times as fast (a rare feat), your program's overall runtime would only decrease by 9%, a speedup no user will even notice. If optimization is needed, focus your efforts where they'll matter, on bottlenecks. You can optimize bottlenecks while keeping your code 100% pure Python. In some cases, you can resort to recoding some computational bottlenecks as Python extensions, potentially gaining even better performance.

Start by designing, coding, and testing your application in Python, often using some already available extension modules. This takes much less time than it would take with a classic compiled language. Then benchmark the application to find out if the resulting code is fast enough. Often it is, and you're done—congratulations!

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A program can work on the Internet as a client (a program that accesses resources) or as a server (a program that makes services available). Both kinds of program deal with protocol issues, such as how to access and communicate data, and with data formatting issues. For order and clarity, the Python library deals with these issues in several different modules. This book will cover the topics in separate chapters. This chapter deals with the modules in the Python library that support protocol issues of client programs.

Nowadays, data access can often be achieved most simply through Uniform Resource Locators (URLs). Python supports URLs with modules urlparse, urllib, and urllib2. For rarer cases, when you need fine-grained control of data access protocols normally accessed via URLs, Python supplies modules httplib and ftplib. Protocols for which URLs are often insufficient include mail (modules poplib and smtplib), Network News (module nntplib), and Telnet (module telnetlib). Python also supports the XML-RPC protocol for distributed computing with module xmlrpclib.

A URL identifies a resource on the Internet. A URL is a string composed of several optional parts, called components, known as scheme, location, path, query, and fragment. A URL with all its parts looks something like:

scheme://lo.ca.ti.on/pa/th?query#fragment

For example, in https://www.python.org:80/faq.cgi?src=fie, the scheme is http, the location is www.python.org:80, the path is /faq.cgi, the query is src=fie, and there is no fragment. Some of the punctuation characters form a part of one of the components they separate, while others are just separators and are part of no component. Omitting punctuation implies missing components. For example, in

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A URL identifies a resource on the Internet. A URL is a string composed of several optional parts, called components, known as scheme, location, path, query, and fragment. A URL with all its parts looks something like:

scheme://lo.ca.ti.on/pa/th?query#fragment

For example, in https://www.python.org:80/faq.cgi?src=fie, the scheme is http, the location is www.python.org:80, the path is /faq.cgi, the query is src=fie, and there is no fragment. Some of the punctuation characters form a part of one of the components they separate, while others are just separators and are part of no component. Omitting punctuation implies missing components. For example, in mailto:me@you.com, the scheme is mailto, the path is me@you.com, and there is no location, query, or fragment. The missing // means the URL has no location part, the missing ? means it has no query part, and the missing # means it has no fragment part.

The urlparse module supplies functions to analyze and synthesize URL strings. In Python 2.2, the most frequently used functions of module urlparse are urljoin, urlsplit, and urlunsplit.

urljoin

urljoin(base_url_string,relative_url_string)

Returns a URL string u, obtained by joining relative_url_string, which may be relative, with base_url_string. The joining procedure that urljoin performs to obtain its result u may be summarized as follows:

When either of the argument strings is empty, u is the other argument.
When relative_url_string explicitly specifies a scheme different from that of

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Most email today is sent via servers that implement the Simple Mail Transport Protocol (SMTP) and received via servers that implement the Post Office Protocol Version 3 (POP3). These protocols are supported by the Python standard library modules smtplib and poplib, respectively. Some servers, instead of or in addition to POP3, implement the richer and more advanced Internet Message Access Protocol Version 4 (IMAP4), supported by the Python standard library module imaplib, which I do not cover in this book.

The poplib module supplies a class POP3 to access a POP mailbox.

POP3

class POP3(host,port=110)

Returns an instance p of class POP3 connected to the given host and port.

Instance p supplies many methods, of which the most frequently used are the following.

dele

p.dele(msgnum)

Marks message msgnum for deletion. The server performs deletions when this connection terminates by a call to method quit. Returns the response string.

list

p.list(msgnum=None)

Returns a pair (

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Modules urllib and urllib2 are most often the handiest ways to access servers for http, https, and ftp protocols. The Python standard library also supplies specific modules to use for these data access protocols.

Module httplib supplies a class HTTPConnection to connect to an HTTP server.

HTTPConnection

class HTTPConnection(host,port=80)

Returns an instance h of class HTTPConnection, ready for connection (but not yet connected) to the given host and port.

Instance h supplies several methods, of which the most frequently used are the following.

close

h.close( )

Closes the connection to the HTTP server.

getresponse

h.getresponse( )

Returns an instance r of class HTTPResponse, which represents the response received from the HTTP server. Call after method request has returned. Instance r supplies the following attributes and methods:

r .getheader( name

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Network News, also known as Usenet News, is mostly transmitted with the Network News Transport Protocol (NNTP). The Python standard library supports this protocol in its module nntplib. The nntplib module supplies a class NNTP to connect to an NNTP server.

NNTP

class NNTP( host,port=119,user=None,passwd=None,readermode=False)

Returns an instance n of class NNTP connected to the given host and port, and optionally authenticated with the given user and passwd if user is not None. When readermode is True, also sends a 'mode reader' command; you may need this, depending on what NNTP server you connect to and on what NNTP commands you send to that server.

An instance n of NNTP supplies many methods. Each of n's methods returns a tuple whose first item is a string (referred to as response in the following section) that is the response from the NNTP server to the NNTP command corresponding to the method (method post just returns the response string, not a tuple). Each method returns the response string just as the NNTP server supplies it. The string starts with an integer in decimal form (the integer is known as the return code), followed by a space, followed by more text.

For some commands, the extra text after the return code is just a comment or explanation supplied by the NNTP server. For other commands, the NNTP standard specifies the format of the text that follows the return code on the response line. In those cases, the relevant method also parses the text in question, yielding other items in the method's resulting tuple, so your code need not perform such parsing itself; rather, you can just access further items in the method's result tuple, as specified in the following sections.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Telnet is an old protocol, specified by RFC 854 (see https://www.faqs.org/rfcs/rfc854.html), and normally used for interactive user sessions. The Python standard library supports this protocol in its module telnetlib. Module telnetlib supplies a class Telnet to connect to a Telnet server.

Telnet

class Telnet(host=None,port=23)

Returns an instance t of class Telnet. When host (and optionally port) is given, implicitly calls t .open( host,port ).

Instance t supplies many methods, of which the most frequently used are as follows.

close

t.close( )

Closes the connection.

expect

t.expect(res,timeout=None)

Reads data from the connection until it matches any of the regular expressions that are the items of list res, or until timeout seconds elapse when timeout is not None. Regular expressions and match objects are covered in Chapter 9. Returns a tuple of three items ( i,mo,txt ), where i is the index in res of the regular expression that matched, mo is the match object, and txt is all the text read until the match, included. Raises EOFError when the connection is closed and no data is available; otherwise, when it gets no match, returns (-1,None,txt ), where

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

There are many standards for distributed computing, from simple Remote Procedure Call (RPC) ones to rich object-oriented ones such as CORBA. You can find several third-party Python modules supporting these standards on the Internet.

The Python standard library comes with support for both server and client use of a simple yet powerful standard known as XML-RPC. For in-depth coverage of XML-RPC, I recommend the book Programming Web Services with XML-RPC, by Simon St. Laurent and Joe Johnson (O'Reilly). XML-RPC uses HTTP as the underlying transport and encodes requests and replies in XML. For server-side support, see Section 19.2.2.4 in Chapter 19. Client-side support is supplied by module xmlrpclib.

The xmlrcplib module supports a class ServerProxy, which you instantiate to connect to an XML-RPC server. An instance s of ServerProxy is a proxy for the server it connects to. In other words, you call arbitrary methods on s, and s packages up the method name and argument values as an XML-RPC request, sends the request to the XML-RPC server, receives the server's response, and unpackages the response as the method's result. The arguments to such method calls can be of any type supported by XML-RPC:

Boolean: Constant attributes True and False of module xmlrpclib (since module xlmrpclib predates the introduction of bool into Python, it does not use Python's built-in True and False values for this purpose)
Integers, floating-point numbers, strings, arrays: Passed and returned as Python int, float, Unicode, and list values
Structures: Passed and returned as Python dict

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

To communicate with the Internet, programs use devices known as sockets. The Python library supports sockets through module socket, as well as wrapping them into higher-level modules covered in Chapter 18. To help you write server programs, the Python library also supplies higher-level modules to use as frameworks for socket servers. Standard and third-party Python modules and extensions also support timed and asynchronous socket operations. This chapter covers socket, the server-side framework modules, and the essentials of other, more advanced modules.

The modules covered in this chapter offer many conveniences compared to C-level socket programming. However, in the end, the modules rely on native socket functionality supplied by the underlying operating system. While it is often possible to write effective network clients by using just the modules covered in Chapter 18, without needing to understand sockets, writing effective network servers most often does require some understanding of sockets. Thus, the lower-level module socket is covered in this chapter and not in Chapter 18, even though both clients and servers use sockets.

However, I only cover the ways in which module socket lets your program access sockets; I do not try to impart the detailed understanding of sockets, and of other aspects of network behavior independent of Python, that you may need to make use of socket's functionality. To understand socket behavior in detail on any kind of platform, I recommend W. Richard Stevens' Unix Network Programming, Volume 1 (Prentice-Hall). Higher-level modules are simpler and more powerful, but a detailed understanding of the underlying technology is always useful, and sometimes it can prove indispensable.

The socket module supplies a factory function, also named socket, that you call to generate a socket object s. You perform network operations by calling methods on s. In a client program, you connect to a server by calling s

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The socket module supplies a factory function, also named socket, that you call to generate a socket object s. You perform network operations by calling methods on s. In a client program, you connect to a server by calling s .connect. In a server program, you wait for clients to connect by calling s .bind and s .listen. When a client requests a connection, you accept the request by calling s .accept, which returns another socket object s1 connected to the client. Once you have a connected socket object, you transmit data by calling its method send, and receive data by calling its method recv.

Python supports both current Internet Protocol (IP) standards. IPv4 is more widespread, while IPv6 is newer. In IPv4, a network address is a pair ( host,port ), where host is a Domain Name System (DNS) hostname such as 'www.python.org' or a dotted-quad IP address string such as '194.109.137.226'. port is an integer indicating a socket's port number. In IPv6, a network address is a tuple ( host, port, flowinfo, scopeid ). Since IPv6 infrastructure is not yet widely deployed, I do not cover IPv6 further in this book. When host is a DNS hostname, Python implicitly looks up the name, using your platform's DNS infrastructure, and uses the dotted-quad IP address corresponding to that name.

Module socket supplies an exception class error. Functions and methods of the module raise error instances to diagnose socket-specific errors. Module socket also supplies many functions. Several of these functions translate data, such as integers, between your host's native format and network standard format. The higher-level protocol that your program and its counterpart are using on a socket determines what kind of conversions you must perform.

The most frequently used functions of module

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The Python library supplies a framework module, SocketServer, to help you implement Internet servers. SocketServer supplies server classes TCPServer, for connection-oriented servers using TCP, and UDPServer, for datagram-oriented servers using UDP, with the same interface.

An instance s of either TCPServer or UDPServer supplies many attributes and methods, and you can subclass either class and override some methods to architect your own specialized server framework. However, I do not cover such advanced and rarely used possibilities in this book.

Classes TCPServer and UDPServer implement synchronous servers, able to serve one request at a time. Classes ThreadingTCPServer and ThreadingUDPServer implement threaded servers, spawning a new thread per request. You are responsible for synchronizing the resulting threads as needed. Threading is covered in Chapter 14.

For normal use of SocketServer, subclass the BaseRequestHandler class provided by SocketServer and override the handle method. Then, instantiate a server class, passing the address pair on which to serve and your subclass of BaseRequestHandler. Finally, call method serve_forever on the server class instance.

An instance h of BaseRequestHandler supplies the following methods and attributes.

client_address

The h .client_address attribute is the pair ( host,port ) of the client, set by the base class at connection.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Socket programs, particularly servers, must often be ready to perform many tasks at once. Example 19-1 accepts a connection request, then serves a single client until that client has finished—other connection requests must wait. This is not acceptable for servers in production use. Clients cannot wait too long: the server must be able to service multiple clients at once.

One approach that lets your program perform several tasks at once is threading, covered in Chapter 14. Module SocketServer optionally supports threading, as covered earlier in this chapter. An alternative to threading that can offer better performance and scalability is event-driven (also known as asynchronous) programming.

An event-driven program sits in an event loop, where it waits for events. In networking, typical events are "a client requests connection," "data arrived on a socket," and "a socket is available for writing." The program responds to each event by executing a small slice of work to service that event, then goes back to the event loop to wait for the next event. The Python library supports event-driven network programming with low-level select module and higher-level asyncore and asynchat modules. Even more complete support for event-driven programming is in the Twisted package (available at https://www.twistedmatrix.com), particularly in subpackage twisted.internet.

The select module exposes a cross-platform low-level function that lets you implement high-performance asynchronous network servers and clients. Module select offers additional platform-dependent functionality on Unix-like platforms, but I cover only cross-platform functionality in this book.

select

select(inputs,outputs,excepts

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

When a web browser (or other web client) requests a page from a web server, the server may return either static or dynamic content. Serving dynamic content involves server-side web programs that generate and deliver content on the fly, often based on information that is stored in a database. The one longstanding Web-wide standard for server-side programming is known as CGI, which stands for Common Gateway Interface. In server-side programming, a client sends a structured request to a web server. The server runs another program, passing the content of the request. The server captures the output of the other program, and sends that output to the client as the response to the original request. In other words, the server's role is that of a gateway between the client and the other program. The other program is called a CGI program or CGI script.

CGI enjoys the typical advantages of standards. When you program to the CGI standard, your program can be deployed on different web servers, and work despite the differences. This chapter focuses on CGI scripting in Python. It also mentions the downsides of CGI (basically, issues of scalability under high load) and some of the alternative, nonstandard server-side architectures that you can use instead of CGI.

This chapter assumes that you are familiar with both HTML and HTTP. For reference material on both of these standards, see Webmaster in a Nutshell, by Stephen Spainhour and Robert Eckstein (O'Reilly). For detailed coverage of HTML, I recommend HTML & XHTML: The Definitive Guide, by Chuck Musciano and Bill Kennedy (O'Reilly). And for additional coverage of HTTP, see the HTTP Pocket Reference, by Clinton Wong (O'Reilly).

CGI's standardization lets you use any language to code CGI scripts. Python is a very-high-level, high-productivity language, and thus quite suitable for CGI coding. The Python standard library supplies modules to handle typical CGI-related tasks.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

CGI's standardization lets you use any language to code CGI scripts. Python is a very-high-level, high-productivity language, and thus quite suitable for CGI coding. The Python standard library supplies modules to handle typical CGI-related tasks.

CGI scripts are often used to handle HTML form submissions. In this case, the action attribute of the form tag specifies a URL for a CGI script to handle the form, and the method attribute is either GET or POST, indicating how the form data is sent to the script. According to the CGI standard, the GET method should be used for forms without side effects, such as asking the server to query a database and display the results, while the POST method is meant for forms with side effects, such as asking the server to update a database. In practice, however, GET is also often used to create side effects. The distinction between GET and POST in practical use is that GET encodes the form's contents as a query string joined to the action URL to form a longer URL, while POST transmits the form's contents as an encoded stream of data, which a CGI script sees as the script's standard input.

The GET method is slightly faster. You can use a fixed GET-form URL wherever you can use a hyperlink. However, GET cannot send large amounts of data to the server, since many clients and servers limit URL lengths (you're safe up to about 200 bytes). The POST method has no size limits. You must use POST when the form contains input tags with type=file—the form tag must then have enctype=multipart/form-data.

The CGI standard does not specify whether a single script can access both the query string (used for GET) and the script's standard input (used for POST). Many clients and servers let you get away with it, but relying on this nonstandard practice may negate the portability advantages that you would otherwise get from the fact that CGI is a standard. Python's standard module cgi, covered in the next section, recovers form data from the query string only, when any query string is present; otherwise, when no query string is present,

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

HTTP is a stateless protocol, meaning that it retains no session state between transactions. Cookies, as specified by the HTTP 1.1 standard, let web clients and servers cooperate to build a stateful session from a sequence of HTTP transactions.

Each time a server sends a response to a client's request, the server may initiate or continue a session by sending one or more Set-Cookie headers, whose contents are small data items called cookies. When a client sends another request to the server, the client may continue a session by sending Cookie headers with cookies previously received from that server or other servers in the same domain. Each cookie is a pair of strings, the name and value of the cookie, plus optional attributes. Attribute max-age is the maximum number of seconds the cookie should be kept. The client should discard saved cookies after their maximum age. If max-age is missing, then the client should discard the cookie when the user's interactive session ends.

Cookies have no intrinsic privacy nor authentication. Cookies travel in the clear on the Internet, and therefore are vulnerable to sniffing. A malicious client might return cookies different from cookies previously received. To use cookies for authentication or identification or to hold sensitive information, the server must encrypt and encode cookies sent to clients, and decode, decrypt, and verify cookies received back from clients.

Encryption, encoding, decoding, decryption, and verification may all be slow when applied to large amounts of data. Decryption and verification require the server to keep some amount of server-side state. Sending substantial amounts of data back and forth on the network is also slow. The server should therefore persist most state data locally, in files or databases. In most cases, a server should use cookies only as small, encrypted, verifiable keys confirming the identity of a user or session, using DBM files or a relational database (covered in Chapter 11) for session state. HTTP sets a limit of 2 KB on cookie size, but I suggest you normally use substantially smaller cookies.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A CGI script runs as a new process each time a client requests it. Process startup time, interpreter initialization, connection to databases, and script initialization all add up to measurable overhead. On fast, modern server platforms, the overhead is bearable for light to moderate loads. On a busy server, CGI may not scale up well. Web servers support server-specific ways to reduce overhead, running scripts in processes that can serve for several hits rather than starting up a new CGI process per hit.

Microsoft's ASP (Active Server Pages) is a server extension leveraging a lower-level library, ISAPI, and Microsoft's COM technology. Most ASP pages are coded in the VBScript language, but ASP is language-independent. As the reptilian connection suggests, Python and ASP go very well together, as long as Python is installed with the platform-specific win32all extensions, specifically ActiveScripting. Many other server extensions are cross-platform, not tied to specific operating systems.

The popular content server framework Zope (https://www.zope.org) is a Python application. If you need advanced content management features, Zope should definitely be among the solutions you consider. However, Zope is a large, rich, powerful system, needing a full book of its own to do it justice. Therefore, I do not cover Zope further in this book.

FastCGI lets you write scripts similar to CGI scripts, yet use each process to handle multiple hits, either sequentially or simultaneously in separate threads. FastCGI is available for Apache and other free web servers, but at the time of this writing not for Microsoft IIS. See https://www.fastcgi.com for FastCGI overviews and details. Go to https://alldunn.com/python/fcgi.py

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

What travels on a network are streams of bytes or text. However, what you want to send over the network often has more structure. The Multipurpose Internet Mail Extensions (MIME) and other encoding standards bridge the gap by specifying how to represent structured data as bytes or text. Python supports such encodings through many library modules, such as base64, quopri, uu, and the modules of the email package. This chapter covers these modules.

Several kinds of media (e.g., email messages) contain only text. When you want to transmit binary data via such media, you need to encode the data as text strings. The Python standard library supplies modules that support the standard encodings known as Base 64, Quoted Printable, and UU.

The base64 module supports the encoding specified in RFC 1521 as Base 64. The Base 64 encoding is a compact way to represent arbitrary binary data as text, without any attempt to produce human-readable results. Module base64 supplies four functions.

decode

decode(infile,outfile)

Reads text-file-like object infile, by calling infile .readline until end of file (i.e, until a call to infile .readline returns an empty string), decodes the Base 64-encoded text thus read, and writes the decoded data to binary-file-like object outfile.

decodestring

decodestring(s)

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Several kinds of media (e.g., email messages) contain only text. When you want to transmit binary data via such media, you need to encode the data as text strings. The Python standard library supplies modules that support the standard encodings known as Base 64, Quoted Printable, and UU.

The base64 module supports the encoding specified in RFC 1521 as Base 64. The Base 64 encoding is a compact way to represent arbitrary binary data as text, without any attempt to produce human-readable results. Module base64 supplies four functions.

decode

decode(infile,outfile)

Reads text-file-like object infile, by calling infile .readline until end of file (i.e, until a call to infile .readline returns an empty string), decodes the Base 64-encoded text thus read, and writes the decoded data to binary-file-like object outfile.

decodestring

decodestring(s)

Decodes text string s, which contains one or more complete lines of Base 64-encoded text, and returns the byte string with the corresponding decoded data.

encode

encode(infile,outfile)

Reads binary-file-like object infile

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python supplies the email package to handle parsing, generation, and manipulation of MIME files such as email messages, network news posts, and so on. The Python standard library also contains other modules that handle some parts of these jobs. However, the new email package offers a more complete and systematic approach to these important tasks. I therefore suggest you use package email, not the older modules that partially overlap with parts of email's functionality. Package email has nothing to do with receiving or sending email; for such tasks, see modules poplib and smtplib, covered in Chapter 18. Instead, package email deals with how you handle messages after you receive them or before you send them.

Package email supplies two factory functions returning an instance m of class email.Message.Message. These functions rely on class email.Parser.Parser, but the factory functions are handier and simpler. Therefore, I do not cover module Parser further in this book.

message_from_string

message_from_string(s)

Builds m by parsing string s.

message_from_file

message_from_file(f)

Builds m by parsing the contents of file-like object f, which must be open for reading.

The email.Message module supplies class Message. All parts of package email produce, modify, or use instances of class

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Most documents on the Web use HTML, the HyperText Markup Language. Markup is the insertion of special tokens, known as tags, in a text document to give structure to the text. HTML is an application of the large, general standard known as SGML, the Standard General Markup Language. In practice, many of the Web's documents use HTML in sloppy or incorrect ways. Browsers have evolved many practical heuristics over the years to try and compensate for this, but even so, it still often happens that a browser displays an incorrect web page in some weird way.

Moreover, HTML was never suitable for much more than presenting documents on a screen. Complete and precise extraction of the information in the document, working backward from the document's presentation, is often unfeasible. To tighten things up again, HTML has evolved into a more rigorous standard called XHTML. XHTML is very similar to traditional HTML, but it is defined in terms of XML and more precisely than HTML. You can handle XHTML with the tools covered in Chapter 23.

Despite the difficulties, it's often possible to extract at least some useful information from HTML documents. Python supplies the sgmllib, htmllib, and HTMLParser modules for the task of parsing HTML documents, whether this parsing is for the purpose of presenting the documents, or, more typically, as part of an attempt to extract information from them. Generating HTML and embedding Python in HTML are also frequent tasks. No standard Python library module supports HTML generation or embedding directly, but you can use normal Python string manipulation, and third-party modules can also help.

The name of the sgmllib module is misleading: sgmllib parses only a tiny subset of SGML, but it is still a good way to get information from HTML files. sgmllib supplies one class, SGMLParser, which you subclass to override and add methods. The most frequently used methods of an instance

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The name of the sgmllib module is misleading: sgmllib parses only a tiny subset of SGML, but it is still a good way to get information from HTML files. sgmllib supplies one class, SGMLParser, which you subclass to override and add methods. The most frequently used methods of an instance s of your subclass X of SGMLParser are as follows.

close

s.close( )

Tells the parser that there is no more input data. When X overrides close, x .close must call SGMLParser.close to ensure that buffered data get processed.

do_tag

s.do_tag(attributes)

X supplies a method with such a name for each tag, with no corresponding end tag, that X wants to process. tag must be in lowercase in the method name, but can be in any mix of cases in the parsed text. SGMLParser's handle_tag method calls do_ tag as appropriate. attributes is a list of pairs ( name,value ), where name is each attribute's name, lowercased, and value is the value, processed to resolve entity references and character references and to remove surrounding quotes.

end_tag

s.end_tag( )

X supplies a method with such a name for each tag whose end tag X wants to process.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The htmllib module supplies a class named HTMLParser that subclasses SGMLParser and defines start_ tag, do_ tag, and end_ tag methods for tags defined in HTML 2.0. HTMLParser implements and overrides methods in terms of calls to methods of a formatter object, covered later in this chapter. You can subclass HTMLParser to add or override methods. In addition to the start_ tag, do_ tag, and end_ tag methods, an instance h of HTMLParser supplies the following attributes and methods.

anchor_bgn

h.anchor_bgn(href,name,type)

Called for each <a> tag. href, name, and type are the string values of the tag's attributes with the same names. HTMLParser's implementation of anchor_bgn maintains a list of outgoing hyperlinks (i.e., href arguments of method s .anchor_bgn) in an instance attribute named s .anchorlist.

anchor_end

h.anchor_end( )

Called for each </a> end tag. HTMLParser's implementation of anchor_end emits to the formatter a footnote reference that is an index within s .anchorlist. In other words, by default, HTMLParser asks the formatter to format an <a>/</a> tag pair as the text inside the tag, followed by a footnote reference number that points to the URL in the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Module HTMLParser supplies one class, HTMLParser, that you subclass to override and add methods. HTMLParser.HTMLParser is similar to sgmllib.SGMLParser, but is simpler and able to parse XHTML as well. The main differences between HTMLParser and SGMLParser are the following:

HMTLParser does not call back to methods named do_ tag, start_ tag, and end_ tag. To process tags and end tags, your subclass X of HTMLParser must override methods handle_starttag and/or handle_endtag and check explicitly for the tags it wants to process.
HMTLParser does not keep track of, nor check, tag nesting in any way.
HMTLParser does nothing, by default, to resolve character and entity references. Your subclass X of HTMLParser must override methods handle_charref and/or handle_entityref if it needs to perform processing of such references.

The most frequently used methods of an instance h of a subclass X of HTMLParser are as follows.

close

h.close( )

Tells the parser that there is no more input data. When X overrides close, h .close must also call HTMLParser.close to ensure that buffered data gets processed.

feed

h.feed(data)

Passes to the parser a part of the text being parsed. The parser processes some prefix of the text and holds the rest in a buffer until the next call to

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python does not come with tools to generate HTML. If you want an advanced framework for structured HTML generation, I recommend Robin Friedrich's HTMLGen 2.2 (available at https://starship.python.net/crew/friedrich/HTMLgen/html/main.html), but I do not cover the package in this book. To generate XHTML, you can also use the approaches covered in Section 23.4 in Chapter 23.

If your favorite approach is to embed Python code within HTML in the manner made popular by JSP, ASP, and PHP, one possibility is to use Python Server Pages (PSP) as supported by Webware, mentioned in Chapter 20. Another package, focused more specifically on the embedding approach, is Spyce (available at https://spyce.sf.net/). For all but the simplest problems, development and maintenance are eased by separating logic and presentation issues through templating, covered in the next section. Both Webware and Spyce optionally support templating in lieu of embedding.

To generate HTML, the best approach is often templating. With templating, you start with a template, which is a text string (often read from a file, database, etc.) that is valid HTML, but includes markers, also known as placeholders, where dynamically generated text must be inserted. Your program generates the needed text and substitutes it into the template. In the simplest case, you can use markers of the form '%( name )s'. Bind the dynamically generated text as the value for key 'name' in some dictionary d. The Python string formatting operator %, covered in Chapter 9, now does all you need. If t is your template, t%d is a copy of the template with all values properly substituted.

For advanced templating tasks, I recommend Cheetah (available at

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

XML, the eXtensible Markup Language, has taken the programming world by storm over the last few years. Like SGML, XML is a metalanguage, a language to describe markup languages. On top of the XML 1.0 specification, the XML community (in good part inside the World Wide Web Consortium, W3C) has standardized other technologies, such as various schema languages, Namespaces, XPath, XLink, XPointer, and XSLT.

Industry consortia in many fields have defined industry-specific markup languages on top of XML, to facilitate data exchange among applications in the various fields. Such industry standards let applications exchange data even if the applications are coded in different languages and deployed on different platforms by different firms. XML, related technologies, and XML-based markup languages are the basis of interapplication, cross-language, cross-platform data interchange in modern applications.

Python has excellent support for XML. The standard Python library supplies the xml package, which lets you use fundamental XML technology quite simply. The third-party package PyXML (available at https://pyxml.sf.net) extends the standard library's xml with validating parsers, richer DOM implementations, and advanced technologies such as XPath and XSLT. Downloading and installing PyXML upgrades Python's own xml packages, so it can be a good idea to do so even if you don't use PyXML-specific features.

On top of PyXML, you can choose to install yet another freely available third-party package, 4Suite (available at https://4suite.org). 4Suite provides yet more XML parsers for special niches, advanced technologies such as XLink and XPointer, and code supporting standards built on top of XML, such as the Resource Description Framework (RDF).

As an alternative to Python's built-in XML support, PyXML, and 4Suite, you can try ReportLab's new pyRXP, a fast validating XML parser based on Tobin's RXP. pyRXP is DOM-like in that it constructs an in-memory representation of the whole XML document you're parsing. However, pyRXP does not construct a DOM-compliant tree, but rather a lightweight tree of Python tuples to save memory and enhance speed. For more information on pyRXP, see

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

When your application must parse XML documents, your first, fundamental choice is what kind of parsing to use. You can use event-driven parsing, where the parser reads the document sequentially and calls back to your application each time it parses a significant aspect of the document (such as an element). Or you can use object-based parsing, where the parser reads the whole document and builds in-memory data structures, representing the document, that you can then navigate. SAX is the main, normal way to perform event-driven parsing, and DOM is the main, normal way to perform object-based parsing. In each case there are alternatives, such as direct use of expat for event-driven parsing and pyRXP for object-based parsing, but I do not cover these alternatives in this book. Another interesting possibility is offered by pulldom, which is covered later in this chapter.

Event-driven parsing requires fewer resources, which makes it particularly suitable when you need to parse very large documents. However, event-driven parsing requires you to structure your application accordingly, performing your processing (and typically building auxiliary data structures) in your methods that are called by the parser. Object-based parsing gives you more flexibility about the ways in which you can structure your application. It may be more suitable when you need to perform very complicated processing, as long as you can afford the extra resources needed for object-based parsing (typically, this means that you are not dealing with very large documents). Object-based approaches also support programs that need to modify or create XML documents, as covered later in this chapter.

As a general guideline, when you are still undecided after studying the various trade-offs, I suggest you try event-driven parsing when you can see a reasonably direct way to perform your program's tasks through this approach. Event-driven parsing is more scalable; therefore, if your program can perform its task via event-driven parsing, it will be applicable to larger documents than it would be able to handle otherwise. If event-driven parsing is too confining, try

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

In most cases, the best way to extract information from an XML document is to parse the document with a parser compliant with SAX, the Simple API for XML. SAX defines a standard API that can be implemented on top of many different underlying parsers. The SAX approach to parsing has similarities to the HTML parsers covered in Chapter 22. As the parser encounters XML elements, text contents, and other significant events in the input stream, the parser calls back to methods of your classes. Such event-driven parsing, based on callbacks to your methods as relevant events occur, also has similarities to the event-driven approach that is almost universal in GUIs and in some networking frameworks. Event-driven approaches in various programming fields may not appear natural to beginners, but enable high performance and particularly high scalability, making them very suitable for high-workload cases.

To use SAX, you define a content handler class, subclassing a library class and overriding some methods. Then, you build a parser object p, install an instance of your class as p's handler, and feed p the input stream to parse. p calls methods on your handler to reflect the document's structure and contents. Your handler's methods perform application-specific processing. The xml.sax package supplies a factory function to build p, as well as convenience functions for simpler operation in typical cases. xml.sax also supplies exception classes, used to diagnose invalid input and other errors.

Optionally, you can also register with parser p other kinds of handlers besides the content handler. You can supply a custom error handler to use an error diagnosis strategy different from normal exception raising, and try to diagnose several errors during a parse. You can supply a custom DTD handler to receive information about notation and unparsed entities from the XML document's Document Type Definition (DTD). You can supply a custom entity resolver to handle external entity references in advanced, customized ways. These additional possibilities are advanced and rarely used, so I do not cover them in this book.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

SAX parsing does not build any structure in memory to represent the XML document. This makes SAX fast and highly scalable, as your application builds exactly as little or as much in-memory structure as needed for its specific tasks. However, for particularly complicated processing tasks involving reasonably small XML documents, you may prefer to let the library build in-memory structures that represent the whole XML document, and then traverse those structures. The XML standards describe the DOM (Document Object Model) for XML. A DOM object represents an XML document as a tree whose root is the document object, while other nodes correspond to elements, text contents, element attributes, and so on.

The Python standard library supplies a minimal implementation of the XML DOM standard, xml.dom.minidom. minidom builds everything up in memory, with the typical pros and cons of the DOM approach to parsing. The Python standard library also supplies a different DOM-like approach in module xml.dom.pulldom. pulldom occupies an interesting middle ground between SAX and DOM, presenting the stream of parsing events as a Python iterator object so that you do not code callbacks, but rather loop over the events and examine each event to see if it's of interest. When you do find an event of interest to your application, you can ask pulldom to build the DOM subtree rooted in that event's node by calling method expandNode, and then work with that subtree as you would in minidom. Paul Prescod, pulldom's author and XML and Python expert, describes the net result as "80% of the performance of SAX, 80% of the convenience of DOM." Other DOM parsers are part of the PyXML and 4Suite extension packages, mentioned at the start of this chapter.

The xml.dom package supplies exception class DOMException and subclasses of it to support fine-grained exception handling. xml.dom also supplies a class

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Just like for HTML and other kinds of structured text, the simplest way to output an XML document is often to prepare and write it using Python's normal string and file operations, covered in Chapter 9 and Chapter 10. Templating, covered in Chapter 22, is also often the best approach. Subclassing class XMLGenerator, covered earlier in this chapter, is a good way to generate an XML document that is like an input XML document, except for a few changes.

The xml.dom.minidom module offers yet another possibility, because its classes support methods to generate, insert, remove, and alter nodes in a DOM tree representing the document. You can create a DOM tree by parsing and then alter it, or you can create an empty DOM tree and populate it, and then output the resulting XML document with methods toxml, toprettyxml, or writexml of the Document instance. You can also output a subtree of the DOM tree by calling these methods on the Node that is the subtree's root.

The Document class supplies factory methods to create new instances of subclasses of Node. The most frequently used factory methods of a Document instance d are as follows.

createComment

d.createComment(data)

Builds and returns an instance c of class Comment for a comment with text data.

createElement

d.createElement(tagname)

Builds and returns an instance e of class Element for an element with the given tag.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Classic Python runs on a portable C-coded virtual machine. Python's built-in objects, such as numbers, sequences, dictionaries, and files, are coded in C, as are several modules in Python's standard library. Modern platforms support dynamic-load libraries, with file extensions such as .dll on Windows and .so on Linux, and building Python produces such binary files. You can code your own extension modules for Python in C, using the Python C API covered in this chapter, to produce and deploy dynamic libraries that Python scripts and interactive sessions can later use with the import statement, covered in Chapter 7.

Extending Python means building modules that Python code can import to access the features the modules supply. Embedding Python means executing Python code from your application. For such execution to be useful, Python code must in turn be able to access some of your application's functionality. In practice, therefore, embedding implies some extending, as well as a few embedding-specific operations.

Embedding and extending are covered extensively in Python's online documentation; you can find an in-depth tutorial at https://www.python.org/doc/ext/ext.html and a reference manual at https://www.python.org/doc/api/api.html. Many details are best studied in Python's extensively documented sources. Download Python's source distribution and study the sources of Python's core, C-coded extension modules and the example extensions supplied for study purposes.

This chapter covers the basics of extending and embedding Python with C. It also mentions, but does not cover, other possibilities for extending Python.

A Python extension module named x resides in a dynamic library with the same filename (x.pyd on Windows, x.so on most Unix-like platforms) in an appropriate directory (normally the site-packages subdirectory of the Python library directory). You generally build the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

A Python extension module named x resides in a dynamic library with the same filename (x.pyd on Windows, x.so on most Unix-like platforms) in an appropriate directory (normally the site-packages subdirectory of the Python library directory). You generally build the x extension module from a C source file x.c with the overall structure:

#include <Python.h>
/* omitted: the body of the x module */
void
initx(void)
{
    /* omitted: the code that initializes the module named x */
}

When you have built and installed the extension module, a Python statement import x loads the dynamic library, then locates and calls the function named init x, which must do all that is needed to initialize the module object named x.

To build and install a C-coded Python extension module, it's simplest and most productive to use the distribution utilities, distutils, covered in Chapter 26. In the same directory as x.c, place a file named setup.py that contains at least the following statements:

from distutils.core import setup, Extension
setup(name='x', ext_modules=[ Extension('x',sources=['x.c']) ])

From a shell prompt in this directory, you can now run:

C:\> python setup.py install

to build the module and install it so that it becomes usable in your Python installation. The distutils perform all needed compilation and linking steps, with the right compiler and linker commands and flags, and copy the resulting dynamic library in an appropriate directory, dependent on your Python installation. Your Python code can then access the resulting module with the statement import x.

Your C function init x generally has the following overall structure:

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

You can code Python extensions in other classic compiled languages besides C. For Fortran, the choice is between Paul Dubois's Pyfort (available at https://pyfortran.sf.net) and Pearu Peterson's F2PY (available at https://cens.ioc.ee/projects/f2py2e/). Both packages support and require the Numeric package covered in Chapter 15, since numeric processing is Fortran's typical application area.

For C++, the choice is between Gordon McMillan's simple, lightweight SCXX (available at https://www.mcmillan-inc.com/scxx.html), which uses no templates and is thus suitable for older C++ compilers, Paul Dubois's CXX (available at https://cxx.sf.net), and David Abrahams's Boost Python Library (available at https://www.boost.org/libs/python/doc). Boost is a package of C++ libraries of uniformly high quality for compilers that support templates well, and includes the Boost Python component. Paul Dubois, CXX's author, recommends considering Boost. You may also choose to use Python's C API from your C++ code, using C++ in this respect as if it was C, and foregoing the extra convenience that C++ affords. However, if you're already using C++ rather than C anyway, then using SCXX, CXX, or Boost can substantially improve your programming productivity when compared to using Python's C API.

If your Python extension is basically a wrapper over an existing C or C++ library (as many are), consider SWIG, the Simplified Wrapper and Interface Generator (available at https://www.swig.org). SWIG generates the C source code for your extension based on the library's header files, generally with some help in terms of further annotations in an interface description file.

Greg Ewing is developing a language, Pyrex, specifically for coding Python extensions. Pyrex (found at https://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

If you have an application already written in C or C++ (or any other classic compiled language), you may want to embed Python as your application's scripting language. To embed Python in languages other than C, the other language must be able to call C functions. In the following, I cover only the C view of things, since other languages vary widely regarding what you have to do in order to call C functions from them.

In order for Python scripts to communicate with your application, your application must supply extension modules with Python-accessible functions and classes that expose your application's functionality. If these modules are linked with your application rather than residing in dynamic libraries that Python can load when necessary, register your modules with Python as additional built-in modules by calling the PyImport_AppendInittab C API function.

PyImport_AppendInittab

int PyImport_AppendInittab(char* name,void (*initfunc)(void))

name is the module name, which Python scripts use in import statements to access the module. initfunc is the module initialization function, taking no argument and returning no result, as covered earlier in this chapter (i.e., initfunc is the module's function that would be named init name for a normal extension module residing in a dynamic library). PyImport_AppendInittab must be called before calling Py_Initialize.

You may want to set the program name and arguments, which Python scripts can access as sys.argv, by calling either or both of the following C API functions.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Jython implements Python on a Java Virtual Machine (JVM). Jython's built-in objects, such as numbers, sequences, dictionaries, and files, are coded in Java. To extend Classic Python with C, you code C modules using the Python C API (as covered in Chapter 24). To extend Jython with Java, you do not have to code Java modules in special ways: every Java package on the Java CLASSPATH (or on Jython's sys.path) is automatically available to your Jython scripts and Jython interactive sessions for use with the import statement covered in Chapter 7. This applies to Java's standard libraries, third-party Java libraries you have installed, and Java classes you have coded yourself. You can also extend Java with C using the Java Native Interface (JNI), and such extensions will also be available to Jython code, just as if they had been coded in pure Java rather than in JNI-compliant C.

For details on advanced issues related to interoperation between Java and Jython, I recommend Jython Essentials, by Samuele Pedroni and Noel Rappin (O'Reilly). In this chapter, I offer a brief overview of the simplest interoperation scenarios, which suffices for a large number of practical needs. Importing, using, extending, and implementing Java classes and interfaces in Jython just works in most practical cases of interest. In some cases, however, you need to be aware of issues related to accessibility, type conversions, and overloading, as covered in this chapter. Embedding the Jython interpreter in Java-coded applications is similar to embedding the Python interpreter in C-coded applications (as covered in Chapter 24), but the Jython task is easier. Jython offers yet another possibility for interoperation with Java, using the jythonccompiler to turn your Python sources into classic, static JVM bytecode .class and .jar files. You can then use these bytecode files in Java applications and frameworks, exactly as if their source code had been in Java rather than in Python.

Unlike Java, Jython does not implicitly and automatically import

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Unlike Java, Jython does not implicitly and automatically import java.lang. Your Jython code can explicitly import java.lang, or even just import java, and then use classes such as java.lang.System and java.lang.String as if they were Python classes. Specifically, your Jython code can use imported Java classes as if they were Python classes with a __slots__ class attribute (i.e., you cannot create arbitrary new instance attributes). You can subclass a Java class with your own Python class, and instances of your class let you create new attributes just by binding them, as usual.

You may choose to import a top-level Java package (such as java) rather than specific subpackages (such as java.lang). Your Python code acquires the ability to access all subpackages when you import the top-level package. For example, after import java, your code can use classes java.lang.String, java.util.Vector, and so on.

The Jython runtime wraps every Java class you import in a transparent proxy, which manages communication between Python and Java code behind the scenes. This gives an extra reason to avoid the dubious idiom from somewhere import *, in addition to the reasons mentioned in Chapter 7. When you perform such a bulk import, the Jython runtime must build proxy wrappers for all the Java classes in package somewhere, spending substantial amounts of memory and time wrapping classes your code will probably not use. Avoid from ... import * except for occasional convenience in interactive exploratory sessions, and stick with the import statement. Alternatively, it's okay to use specific, explicit from statements for classes you know your Python code wants to use (e.g., from java.lang import System).

Jython relies on a registry of Java properties as a cross-platform equivalent of the kind of settings that would normally use the Windows registry, or environment variables on Unix-like systems. Jython's registry file is a standard Java properties file named

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Your Java-coded application can embed the Jython interpreter in order to use Jython for scripting. jython.jar must be in your Java CLASSPATH. Your Java code must import org.python.core.* and org.python.util.* in order to access Jython's classes. To initialize Jython's state and instantiate an interpreter, use the Java statements:

PySystemState.initialize( );
PythonInterpreter interp = new PythonInterpreter( );

Jython also supplies several advanced overloads of this method and constructor in order to let you determine in detail how PySystemState is set up, and to control the system state and global scope for each interpreter instance. However, in typical, simple cases, the previous Java code is all your application needs.

Once you have an instance interp of class PythonInterpreter, you can call method interp .eval to have the interpreter evaluate a Python expression held in a Java string. You can also call any of several overloads of interp .exec and interp .execfile to have the interpreter execute Python statements held in a Java string, a precompiled Jython code object, a file, or a Java InputStream.

The Python code you execute can import your Java classes in order to access your application's functionality. Your Java code can set attributes in the interpreter namespace by calling overloads of interp .set, and get attributes from the interpreter namespace by calling overloads of interp .get. The methods' overloads give you a choice. You can work with native Java data and let Jython perform type conversions, or you can work directly with PyObject, the base class of all Python objects, covered later in this chapter. The most frequently used methods and overloads of a PythonInterpreter instance interp are the following.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Jython comes with the jythonc compiler. You can feed jythonc your .py source files, and jythonc compiles them into normal JVM bytecode and packages them into .class and .jar files. Since jythonc generates static, classic bytecode, it cannot quite cope with the whole range of dynamic possibilities that Python allows. For example, jythonc cannot successfully compile Python classes that determine their base classes dynamically at runtime, as the normal Python interpreters allow. However, except for such extreme examples of dynamically changeable class structures, jythonc does support compilation of essentially the whole Python language into Java bytecode.

jythonc resides in the Tools/jythonc directory of your Jython installation. You invoke it from a shell (console) command line with the syntax:

jythonc options 
                  modules

options are zero or more option flags starting with --. modules are zero or more names of Python source files to compile, either as Python-style names of modules residing on Python's sys.path, or as relative or absolute paths to Python source files. Include the .py extension in each path to a source file, but not in a module name.

More often than not, you will specify the jythonc option --jar jarfile, to build a .jar file of compiled bytecode rather than separate .class files. Most other options deal with what to put in the .jar file. You can choose to make the file self-sufficient (for browsers and other Java runtime environments that do not support using multiple .jar files) at the expense of making the file larger. Option --all ensures all Jython core classes are copied into the .jar file, while --core tries to be more conservative, copying as few core classes as feasible. Option --addpackages packages lets you list (in packages, a comma-separated list) those external Java packages whose classes are copied into the

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Python's distutils allow you to package Python programs and extensions in several ways, and to install programs and extensions to work with your Python installation. As I mentioned in Chapter 24, the distutils also afford the most effective way to build C-coded extensions you write yourself, even when you are not interested in distributing such extensions. This chapter covers the distutils, as well as third-party tools that complement the distutils and let you package Python programs for distribution as standalone applications, installable on machines with specific hardware and operating systems without a separate installation of Python.

The distutils are a rich and flexible set of tools to package Python programs and extensions for distribution to third parties. I cover typical, simple use of the distutils for the most common packaging needs. For in-depth, highly detailed discussion of distutils, I recommend two manuals that are part of Python's online documentation: Distributing Python Modules (available at https://www.python.org/doc/current/dist/), and Installing Python Modules (available at https://www.python.org/doc/current/inst/), both by Greg Ward, the principal author of the distutils.

A distribution is the set of files to package into a single file for distribution purposes. A di stribution may include zero, one, or more Python packages and other Python modules (as covered in Chapter 7), as well as, optionally, Python scripts, C-coded (and other) extensions, supporting data files, and auxiliary files containing metadata about the distribution itself. A distribution is said to be pure if all code it includes is Python, and non-pure if it also includes non-Python code (most often, C-coded extensions).

You should normally place all the files of a distribution in a directory, known as the distribution root directory

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The distutils are a rich and flexible set of tools to package Python programs and extensions for distribution to third parties. I cover typical, simple use of the distutils for the most common packaging needs. For in-depth, highly detailed discussion of distutils, I recommend two manuals that are part of Python's online documentation: Distributing Python Modules (available at https://www.python.org/doc/current/dist/), and Installing Python Modules (available at https://www.python.org/doc/current/inst/), both by Greg Ward, the principal author of the distutils.

A distribution is the set of files to package into a single file for distribution purposes. A di stribution may include zero, one, or more Python packages and other Python modules (as covered in Chapter 7), as well as, optionally, Python scripts, C-coded (and other) extensions, supporting data files, and auxiliary files containing metadata about the distribution itself. A distribution is said to be pure if all code it includes is Python, and non-pure if it also includes non-Python code (most often, C-coded extensions).

You should normally place all the files of a distribution in a directory, known as the distribution root directory, and in subdirectories of the distribution root. Mostly, you can arrange the subtree of files and directories rooted at the distribution root to suit your own organizational needs. However, remember from Chapter 7 that a Python package must reside in its own directory, and a package's directory must contain a file named __init__.py (or subdirectories with __init__.py files, for subpackages) as well as other modules belonging to that package.

The distribution root directory must contain a Python script that by convention is named

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

The distutils help you package up your Python extensions and applications. However, an end user can install the resulting packaged form only after installing Python. This is particularly a problem on Windows, where end users want to run a single installer to get an application working on their machine. Installing Python first and then running your application's installer may prove too much of a hassle for such end users.

Thomas Heller has developed a simple solution, a distutils add-on named py2exe, freely available for download from https://starship.python.net/crew/theller/py2exe/. This URL also contains detailed documentation of py2exe, and I recommend that you study that documentation if you intend to use py2exe in advanced ways. However, the simplest kinds of use, which I cover in the rest of this section, cover most practical needs.

After downloading and installing py2exe (on a Windows machine where Microsoft Visual C++ 6 is also installed), you just need to add the line:

import py2exe

at the start of your otherwise normal distutils script setup.py. Now, in addition to other distutils commands, you have one more option. Running:

python setup.py py2exe

builds and collects in a subdirectory of your distribution root directory an .exe file and one or more .dll files. If your distribution's name metadata is, for example, myapp, then the directory into which the .exe and .dll files are collected is named dist\myapp \. Any files specified by option data_files in your setup.py script are placed in subdirectories of dist\myapp \. The .exe file corresponds to your application's first or single entry in the scripts keyword argument value, and also contains the bytecode-compiled form of all Python modules and packages that your setup.py specifies or implies. Among the .dll files is, at minimum, the Python dynamic load library, for example python22.dll if you use Python 2.2, plus any other .pyd or .dll files that your application needs, excluding

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Content preview·Buy reprint rights for this chapter

Gordon McMillan has developed a richer and more general solution to the same problem that py2exe solves—preparing compact ways to package up Python applications for installation on end user machines that may not have Python installed. The Installer tool, freely downloadable from https://www.mcmillan-inc.com/install1.html, is more general than py2exe, which supports only Windows platforms. Installer natively supports Linux as well as Windows. Also, Installer's portable, cross-platform architecture may allow you to extend it to support other Unix-like platforms with a reasonable amount of effort.

Installer does not rely on distutils. To use Installer, you must learn its own specification files' syntax and semantics. Installer can do much more than py2exe, so it's not surprising that there is more for you to learn before making full use of it. However, I recommend studying and trying out Installer if you have the specific need of building standalone Python applications for Linux or other Unix-like architectures, or if you have tried py2exe and found it did not quite meet your needs.

Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!

Table of Contents

Section 2.1.1.1: Uncompressing and unpacking the Python source code

Section 4.6.1.1: Coercion and conversions

Section 4.6.1.2: Concatenation

Section 4.6.1.3: Sequence membership

Section 7.1.1.1: Module body

Section 10.3.1.1: File mode

Section 16.2.1.1: Color options