CARVIEW |
Select Language
HTTP/2 302
server: nginx
date: Sun, 17 Aug 2025 02:33:18 GMT
content-type: text/plain; charset=utf-8
content-length: 0
x-archive-redirect-reason: found capture at 20080204120652
location: https://web.archive.org/web/20080204120652/https://www.oreilly.com/catalog/pythonian/toc.html
server-timing: captures_list;dur=0.493949, exclusion.robots;dur=0.018319, exclusion.robots.policy;dur=0.009058, esindex;dur=0.009991, cdx.remote;dur=137.121784, LoadShardBlock;dur=185.589338, PetaboxLoader3.datanode;dur=54.871460, PetaboxLoader3.resolve;dur=55.342896
x-app-server: wwwb-app212
x-ts: 302
x-tr: 346
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
set-cookie: wb-p-SERVER=wwwb-app212; path=/
x-location: All
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
HTTP/2 200
server: nginx
date: Sun, 17 Aug 2025 02:33:19 GMT
content-type: text/html
x-archive-orig-date: Mon, 04 Feb 2008 12:06:51 GMT
x-archive-orig-server: Apache
x-archive-orig-p3p: policyref="https://www.oreillynet.com/w3c/p3p.xml",CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAa IVDa CONo OUR DELa PUBi OTRa IND PHY ONL UNI PUR COM NAV INT DEM CNT STA PRE"
x-archive-orig-last-modified: Wed, 30 Jan 2008 08:41:49 GMT
x-archive-orig-accept-ranges: bytes
x-archive-orig-content-length: 755357
x-archive-orig-x-cache: MISS from olive.bp
x-archive-orig-x-cache-lookup: MISS from olive.bp:3128
x-archive-orig-via: 1.0 olive.bp:3128 (squid/2.6.STABLE13)
x-archive-orig-connection: close
x-archive-guessed-content-type: text/html
x-archive-guessed-charset: utf-8
memento-datetime: Mon, 04 Feb 2008 12:06:52 GMT
link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Tue, 08 Apr 2003 06:14:53 GMT", ; rel="prev memento"; datetime="Thu, 08 Nov 2007 23:45:59 GMT", ; rel="memento"; datetime="Mon, 04 Feb 2008 12:06:52 GMT", ; rel="next memento"; datetime="Sat, 19 Apr 2008 04:51:42 GMT", ; rel="last memento"; datetime="Mon, 26 Apr 2021 03:07:24 GMT"
content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org
x-archive-src: 52_1_20080204091648_crawl104-c/52_1_20080204120423_crawl100.arc.gz
server-timing: captures_list;dur=0.583600, exclusion.robots;dur=0.020090, exclusion.robots.policy;dur=0.008740, esindex;dur=0.009723, cdx.remote;dur=8.521156, LoadShardBlock;dur=214.274503, PetaboxLoader3.datanode;dur=141.033703, PetaboxLoader3.resolve;dur=156.116815, load_resource;dur=102.578556
x-app-server: wwwb-app212
x-ts: 200
x-tr: 651
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
x-location: All
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
content-encoding: gzip
O'Reilly Media | Python in a Nutshell
Buy this Book
Read it Now!
Reprint Licensing

--
Please select a chapter from the Table of Contents and click the button above to begin the licensing process.
Python in a Nutshell
Cover | Table of Contents | Colophon
Table of Contents
- Chapter 1: Introduction to Python
- Content preview·Buy reprint rights for this chapterPython is a general-purpose programming language. It has been around for quite a while: Guido van Rossum, Python's creator, started developing Python back in 1990. This stable and mature language is very high level, dynamic, object-oriented, and cross-platform—all characteristics that are very attractive to developers. Python runs on all major hardware platforms and operating systems, so it doesn't constrain your platform choices.Python offers high productivity for all phases of the software life cycle: analysis, design, prototyping, coding, testing, debugging, tuning, documentation, deployment, and, of course, maintenance. Python's popularity has seen steady, unflagging growth over the years. Today, familiarity with Python is an advantage for every programmer, as Python is likely to have some useful role to play as a part of any software solution.Python provides a unique mix of elegance, simplicity, and power. You'll quickly become productive with Python, thanks to its consistency and regularity, its rich standard library, and the many other modules that are readily available for it. Python is easy to learn, so it is quite suitable if you are new to programming, yet at the same time it is powerful enough for the most sophisticated expert.The Python language, while not minimalist, is rather spare, for good pragmatic reasons. When a language offers one good way to express a design idea, supplying other ways has only modest benefits, while the cost in terms of language complexity grows with the number of features. A complicated language is harder to learn and to master (and to implement efficiently and without bugs) than a simpler one. Any complications and quirks in a language hamper productivity in software maintenance, particularly in large projects, where many developers cooperate and often maintain code originally written by others.Python is simple, but not simplistic. It adheres to the idea that if a language behaves a certain way in some contexts, it should ideally work similarly in all contexts. Python also follows the principle that a language should not have convenient shortcuts, special cases, ad hoc exceptions, overly subtle distinctions, or mysterious and tricky under-the-covers optimizations. A good language, like any other designed artifact, must balance such general principles with taste, common sense, and a high degree of practicality.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Python Language
- Content preview·Buy reprint rights for this chapterThe Python language, while not minimalist, is rather spare, for good pragmatic reasons. When a language offers one good way to express a design idea, supplying other ways has only modest benefits, while the cost in terms of language complexity grows with the number of features. A complicated language is harder to learn and to master (and to implement efficiently and without bugs) than a simpler one. Any complications and quirks in a language hamper productivity in software maintenance, particularly in large projects, where many developers cooperate and often maintain code originally written by others.Python is simple, but not simplistic. It adheres to the idea that if a language behaves a certain way in some contexts, it should ideally work similarly in all contexts. Python also follows the principle that a language should not have convenient shortcuts, special cases, ad hoc exceptions, overly subtle distinctions, or mysterious and tricky under-the-covers optimizations. A good language, like any other designed artifact, must balance such general principles with taste, common sense, and a high degree of practicality.Python is a general-purpose programming language, so Python's traits are useful in any area of software development. There is no area where Python cannot be part of an optimal solution. "Part" is an important word here—while many developers find that Python fills all of their needs, Python does not have to stand alone. Python programs can cooperate with a variety of other software components, making it an ideal language for gluing together components written in other languages.Python is a very-high-level language. This means that Python uses a higher level of abstraction, conceptually farther from the underlying machine, than do classic compiled languages, such as C, C++, and Fortran, which are traditionally called high-level languages. Python is also simpler, faster to process, and more regular than classic high-level languages. This affords high programmer productivity and makes Python an attractive development tool. Good compilers for classic compiled languages can often generate binary machine code that runs much faster than Python code. However, in most cases, the performance of Python-coded applications proves sufficient. When it doesn't, you can apply the optimization techniques covered in Chapter 17 to enhance your program's performance while keeping the benefits of high programming productivity.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Python Standard Library and Extension Modules
- Content preview·Buy reprint rights for this chapterThere is more to Python programming than just the Python language: the standard Python library and other extension modules are almost as important for effective Python use as the language itself. The Python standard library supplies many well-designed, solid, 100% pure Python modules for convenient reuse. It includes modules for such tasks as data representation, string and text processing, interacting with the operating system and filesystem, and web programming. Because these modules are written in Python, they work on all platforms supported by Python.Extension modules, from the standard library or from elsewhere, let Python applications access functionality supplied by the underlying operating system or other software components, such as graphical user interfaces (GUIs), databases, and networks. Extensions afford maximal speed in computationally intensive tasks, such as XML parsing and numeric array computations. Extension modules that are not coded in Python, however, do not necessarily enjoy the same cross-platform portability as pure Python code.You can write special-purpose extension modules in lower-level languages to achieve maximum performance for small, computationally intensive parts that you originally prototyped in Python. You can also use tools such as SWIG to make existing C/C++ libraries into Python extension modules, as we'll see in Chapter 24. Finally, you can embed Python in applications coded in other languages, exposing existing application functionality to Python scripts via dedicated Python extension modules.This book documents many modules, both from the standard library and from other sources, in areas such as client- and server-side network programming, GUIs, numerical array processing, databases, manipulation of text and binary files, and interaction with the operating system.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Python Implementations
- Content preview·Buy reprint rights for this chapterPython currently has two production-quality implementations, CPython and Jython, and one experimental implementation, Python .NET. This book primarily addresses CPython, which I refer to as just Python for simplicity. However, the distinction between a language and its implementations is an important one.Classic Python (a.k.a., CPython, often just called Python) is the fastest, most up-to-date, most solid and complete implementation of Python. CPython is a compiler, interpreter, and set of built-in and optional extension modules, coded in standard C. CPython can be used on any platform where the C compiler complies with the ISO/IEC 9899:1990 standard (i.e., all modern, popular platforms). In Chapter 2, I'll explain how to download and install CPython. All of this book, except Chapter 24 and a few sections explicitly marked otherwise, applies to CPython.Jython is a Python implementation for any Java Virtual Machine (JVM) compliant with Java 1.2 or better. Such JVMs are available for all popular, modern platforms. To use Jython well, you need some familiarity with fundamental Java classes. You do not have to code in Java, but documentation and examples for existing Java classes are couched in Java terms, so you need a nodding acquaintance with Java to read and understand them. You also need to use Java supporting tools for tasks such as manipulating .jar files and signing applets. This book deals with Python, not with Java. For Jython usage, you should complement this book with Jython Essentials, by Noel Rappin and Samuele Pedroni (O'Reilly), possibly Java in a Nutshell, by David Flanagan (O'Reilly), and, if needed, some of the many other Java resources available.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Python Development and Versions
- Content preview·Buy reprint rights for this chapterPython is developed by the Python Labs of Zope Corporation, which consists of half a dozen core developers headed by Guido van Rossum, Python's inventor, architect, and Benevolent Dictator For Life (BDFL). This title means that Guido has the final say on what becomes part of the Python language and standard libraries.Python intellectual property is vested in the Python Software Foundation (PSF), a non-profit corporation devoted to promoting Python, with dozens of individual members (nominated for their contributions to Python, and including all of the Python core team) and corporate sponsors. Most PSF members have commit privileges to Python's CVS tree on SourceForge (
https://sf.net/cvs/?group_id=5470
), and most Python CVS committers are members of the PSF.Proposed changes to Python are detailed in public documents called Python Enhancement Proposals (PEPs), debated (and sometimes advisorily voted upon) by Python developers and the wider Python community, and finally approved or rejected by Guido, who takes debate and votes into account but is not bound by them. Hundreds of people contribute to Python development, through PEPs, discussion, bug reports, and proposed patches to Python sources, libraries, and documentation.Python Labs releases minor versions of Python (2.x, for growing values of x) about once or twice a year. 2.0 was released in October 2000, 2.1 in April 2001, and 2.2 in December 2001. Python 2.3 is scheduled to be released in early 2003. Each minor release adds features that make Python more powerful and simpler to use, but also takes care to maintain backward compatibility. One day there will be a Python 3.0 release, which will be allowed to break backward compatibility to some extent. However, that release is still several years in the future, and no specific plans for it currently exist.Each minor release 2.x starts with alpha releases, tagged as 2.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Python Resources
- Content preview·Buy reprint rights for this chapterThe richest of all Python resources is the Internet. The starting point is Python's site,
https://www.python.org
, which is full of interesting links that you will want to explore. Andhttps://www.jython.org
is a must if you have any interest in Jython.Python and Jython come with good documentation. The manuals are available in many formats, suitable for viewing, searching, and printing. You can browse the manuals on the Web athttps://www.python.org/doc/current/
. You can find links to the various formats you can download athttps://www.python.org/doc/current/download.html
, andhttps://www.python.org/doc/
has links to a large variety of documents. For Jython,https://www.jython.org/docs/
has links to Jython-specific documents as well as general Python ones. The Python FAQ (Frequently Asked Questions) is athttps://www.python.org/doc/FAQ.html
, and the Jython-specific FAQ is athttps://www.jython.org/cgi-bin/faqw.py?req=index
.Most Python documentation (including this book) assumes some software development knowledge. However, Python is quite suitable for first-time programmers, so there are exceptions to this rule. A few good introductory online texts are:-
Josh Cogliati's "Non-Programmers Tutorial For Python," available at
https://www.honors.montana.edu/~jjc/easytut/easytut/
-
Alan Gauld's "Learning to Program," available at
https://www.crosswinds.net/~agauld/
-
Allen Downey and Jeffrey Elkner's "How to Think Like a Computer Scientist (Python Version)," available at
https://www.ibiblio.org/obp/thinkCSpy/
The URLhttps://www.python.org/psa/MailingLists.html
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Chapter 2: Installation
- Content preview·Buy reprint rights for this chapterYou can install Python, in both classic (CPython) and JVM (Jython) versions, on most platforms. With a suitable development system (C for CPython, Java for Jython), you can install Python from its source code distribution. On popular platforms, you also have the alternative of installing from a prebuilt binary distribution.Installing CPython from a binary distribution is faster, saves you substantial work on some platforms, and is the only possibility if you have no suitable C development system. Installing from a source code distribution gives you more control and flexibility, and is the only possibility if you can't find a suitable prebuilt binary distribution for your platform. Even if you install from binaries, I recommend you also download the source distribution, which includes examples and demos that may be missing from prebuilt binary packages.To install Python from source code, you need a platform with an ISO-compliant C compiler and ancillary tools such as make. On Windows, the normal way to build Python is with the Microsoft product Visual C++.To download Python source code, visit
https://www.python.org
and follow the link labeled Download. The latest version at the time of this writing is:https://www.python.org/ftp/python/2.2.2/Python-2.2.2.tgz
The .tgz file extension is equivalent to .tar.gz (i.e., a tar archive of files, compressed by the powerful and popular gzip compressor).On Windows, installing Python from source code can be a chore unless you are already familiar with Microsoft Visual C++ and used to working at the Windows command line (i.e., in the text-oriented windows known as MS-DOS Prompt or Command Prompt, depending on your version of Windows).Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Installing Python from Source Code
- Content preview·Buy reprint rights for this chapterTo install Python from source code, you need a platform with an ISO-compliant C compiler and ancillary tools such as make. On Windows, the normal way to build Python is with the Microsoft product Visual C++.To download Python source code, visit
https://www.python.org
and follow the link labeled Download. The latest version at the time of this writing is:https://www.python.org/ftp/python/2.2.2/Python-2.2.2.tgz
The .tgz file extension is equivalent to .tar.gz (i.e., a tar archive of files, compressed by the powerful and popular gzip compressor).On Windows, installing Python from source code can be a chore unless you are already familiar with Microsoft Visual C++ and used to working at the Windows command line (i.e., in the text-oriented windows known as MS-DOS Prompt or Command Prompt, depending on your version of Windows).If the following instructions give you trouble, I suggest you skip ahead to the material on installing Python from binaries later in this chapter. It may be a good idea, on Windows, to do an installation from binaries anyway, even if you also install from source code. This way, if you notice anything strange while using the version you installed from source code, you can double-check with the installation from binaries. If the strangeness goes away, it must have been due to some quirk in your installation from source code, and then you know you must double-check the latter.In the following sections, for clarity, I assume you have made a new directory named C:\Py and downloaded Python-2.2.2.tgz there. Of course, you can choose to name and place the directory as it best suits you.Section 2.1.1.1: Uncompressing and unpacking the Python source code
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Installing Python from Binaries
- Content preview·Buy reprint rights for this chapterIf your platform is popular and current, you may find a prebuilt and packaged binary version of Python ready for installation. Binary packages are typically self-installing, either directly as executable programs, or via appropriate system tools, such as the RedHat Package Manager (RPM) on Linux and the Microsoft Installer (MSI) on Windows. Once you have downloaded a package, install it by running the program and interactively choosing installation parameters, such as the directory where Python is to be installed.To download Python binaries, visit
https://www.python.org
and follow the link labeled Download. At the time of this writing, the only binary installer directly available from the main Python site is a Windows installer executable:https://www.python.org/ftp/python/2.2.2/Python-2.2.2.exe
Many third parties supply free binary Python installers for other platforms. For Linux distributions, seehttps://rpmfind.net
if your distribution is RPM-based (RedHat, Mandrake, SUSE, and so on) orhttps://www.debian.org
for Debian. The sitehttps://www.python.org/download/
provides links to binary distributions for Macintosh, OS/2, Amiga, RISC OS, QNX, VxWorks, IBM AS/400, Sony PlayStation 2, and Sharp Zaurus. Older Python versions, mainly 1.5.2, are also usable and functional, though not as powerful and polished as the current Python 2.2.2. The download page provides links to 1.5.2 installers for older or less popular platforms (MS-DOS, Windows 3.1, Psion, BeOS, etc.).ActivePython (https://www.activestate.com/Products/ActivePython
) is a binary package of Python 2.2 for 32-bit versions of Windows and x86 Linux.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Installing Jython
- Content preview·Buy reprint rights for this chapterTo install Jython, you need a Java Virtual Machine (JVM) that complies with Java 1.1 or higher. See
https://www.jython.org/platform.html
for advice on JVMs for your platform.To download Jython, visithttps://www.jython.org
and follow the link labeled Download. The latest version at the time of this writing is:https://prdownloads.sf.net/jython/jython-21.class
In the following section, for clarity, I assume you have created a new directory named C:\Jy and downloaded jython-21.class there. Of course, you can choose to name and place the directory as it best suits you. On Unix-like platforms, in particular, the directory name will more likely be something like ~/Jy.The Jython installer .class file is a self-installing program. Open an MS-DOS Prompt window (or a shell prompt on a Unix-like platform), change directory to C:\Jy, and run your Java interpreter on the Jython installer. Make sure to include directory C:\Jy in the JavaCLASSPATH
. With most releases of Sun's Java Development Kit (JDK), for example, you can run:C:\Jy> java -cp . jython-21
This runs a GUI installer that lets you choose destination directory and options. If you want to avoid the GUI, you can use the-o
switch on the command line. The switch lets you specify the installation directory and options directly on the command line. For example:C:\Jy> java -cp . jython-21 -o C:\Jython-2.1 demo lib source
installs Jython, with all optional components (demos, libraries, and source code), in directory C:\Jython-2.1. The Jython installation builds two small, useful command files. One, run as jython (named jython.bat on Windows), runs the interpreter. The other, run asAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 3: The Python Interpreter
- Content preview·Buy reprint rights for this chapterTo develop software systems in Python, you produce text files that contain Python source code and documentation. You can use any text editor, including those in Integrated Development Environments (IDEs). You then process the source files with the Python compiler and interpreter. You can do this directly, or implicitly inside an IDE, or via another program that embeds Python. The Python interpreter also lets you execute Python code interactively, as do IDEs.The Python interpreter program is run as python (it's named python.exe on Windows). python includes both the interpreter itself and the Python compiler, which is implicitly invoked, as needed, on imported modules. Depending on your system, the program may have to be in a directory listed in your
PATH
environment variable. Alternatively, as with any other program, you can give a complete pathname to it at the command (shell) prompt, or in the shell script (or .BAT file, shortcut target, etc.) that runs it. On Windows, you can also use Start → Programs → Python 2.2 → Python (command line).BesidesPATH
, other environment variables affect the python program. Some environment variables have the same effects as options passed to python on the command line; these are documented in the next section. A few provide settings not available via command-line options:-
PYTHONHOME
-
The Python installation directory. A lib subdirectory, containing the standard Python library modules, should exist under this directory. On Unix-like systems, the standard library modules should be in subdirectory lib/python-2.2 for Python 2.2, lib/python-2.3 for Python 2.3, and so on.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- The python Program
- Content preview·Buy reprint rights for this chapterThe Python interpreter program is run as python (it's named python.exe on Windows). python includes both the interpreter itself and the Python compiler, which is implicitly invoked, as needed, on imported modules. Depending on your system, the program may have to be in a directory listed in your
PATH
environment variable. Alternatively, as with any other program, you can give a complete pathname to it at the command (shell) prompt, or in the shell script (or .BAT file, shortcut target, etc.) that runs it. On Windows, you can also use Start → Programs → Python 2.2 → Python (command line).BesidesPATH
, other environment variables affect the python program. Some environment variables have the same effects as options passed to python on the command line; these are documented in the next section. A few provide settings not available via command-line options:-
PYTHONHOME
-
The Python installation directory. A lib subdirectory, containing the standard Python library modules, should exist under this directory. On Unix-like systems, the standard library modules should be in subdirectory lib/python-2.2 for Python 2.2, lib/python-2.3 for Python 2.3, and so on.
-
PYTHONPATH
-
A list of directories, separated by colons on Unix-like systems and by semicolons on Windows. Modules are imported from these directories. This extends the initial value for Python's
sys.path
variable. Modules, importing, and thesys.path
variable are covered in Chapter 7.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Python Development Environments
- Content preview·Buy reprint rights for this chapterThe Python interpreter's built-in interactive mode is the simplest development environment for Python. It is a bit primitive, but it is lightweight, has a small footprint, and starts fast. Together with an appropriate text editor (as discussed later in this chapter) and line-editing and history facilities, it is a usable and popular development environment. However, there are a number of other development environments that you can also use.Python's Integrated DeveLopment Environment (IDLE) comes with the standard Python distribution. IDLE is a cross-platform, 100% pure Python application based on Tkinter (see Chapter 16). IDLE offers a Python shell, similar to interactive Python interpreter sessions but richer in functionality. It also includes a text editor optimized to edit Python source code, an integrated interactive debugger, and several specialized browsers/viewers.IDLE is mature, stable, easy to use, and rich in functionality. Promising new Python IDEs that share IDLE's free and cross-platform nature are emerging. Red Hat's Source Navigator (
https://sources.redhat.com/sourcenav/
) supports many languages. It runs on Linux, Solaris, HPUX, and Windows. Boa Constructor (https://boa-constructor.sf.net/
) is Python-only and still beta-level, but well worth trying out. Boa Constructor includes a GUI builder for the wxWindows cross-platform GUI toolkit.Python is cross-platform, and this book focuses on cross-platform tools and components. However, Python also provides good platform-specific facilities, including IDEs, on many platforms it supports. For the Macintosh, MacPython includes an IDE (seehttps://www.python.org/doc/current/mac/mac.html
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Running Python Programs
- Content preview·Buy reprint rights for this chapterWhatever tools you use to produce your Python application, you can see your application as a set of Python source files. A script is a file that you can run directly. A module is a file that you can import (as covered in Chapter 7) to provide functionality to other files or to interactive sessions. A Python file can be both a module and a script, exposing functionality when imported, but also suitable for being run directly. A useful and widespread convention is that Python files that are primarily meant to be imported as modules, when run directly, should execute self-test operations. Testing is covered in Chapter 17.The Python interpreter automatically compiles Python source files as needed. Python source files normally have extension .py. Python saves the compiled bytecode file for each module in the same directory as the module's source, with the same basename and extension .pyc (or .pyo if Python is run with option
-O
). Python does not save the compiled bytecode form of a script when you run the script directly; rather, Python recompiles the script each time you run it. Python saves bytecode files only for modules you import. It automatically rebuilds each module's bytecode file whenever necessary, for example when you edit the module's source. Eventually, for deployment, you may package Python modules using tools covered in Chapter 26.You can run Python code interactively, with the Python interpreter or an IDE. Normally, however, you initiate execution by running a top-level script. To run a script, you give its path as an argument to python, as covered earlier in this chapter. Depending on your operating system, you can invoke python directly, from a shell script, or in a command file. On Unix-like systems, you can make a Python script directly executable by setting the file's permission bitsx
andr
and beginning the script with a so-called shebang line, which is a first line of the form:Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Jython Interpreter
- Content preview·Buy reprint rights for this chapterThe jython interpreter built during installation (see Chapter 2) is run similarly to the python program:
[path]jython {options} [ -j jar | -c command | file | - ] {arguments}
-j
jar tells jython that the main script to run is __run__.py in the .jar file. Options-i
,-S
, and-v
are the same as for python.--help
is like python's-h
, and--version
is like python's--V
. Instead of environment variables, jython uses a text file named registry in the installation directory to record properties with structured names. Propertypython.path
, for example, is the Jython equivalent of Python's environment variablePYTHONPATH
. You can also set properties with jython command-line options, in the form-D
name=
value.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 4: The Python Language
- Content preview·Buy reprint rights for this chapterThis chapter is a quick guide to the Python language. To learn Python from scratch, I suggest you start with Learning Python, by Mark Lutz and David Ascher (O'Reilly). If you already know other programming languages and just want to learn the specifics of Python, this chapter is for you. I'm not trying to teach Python here, so we're going to cover a lot of ground at a pretty fast pace.The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language and specifies such things as what variable names look like and what characters are used for comments. Each Python source file, like any other text file, is a sequence of characters. You can also usefully see it as a sequence of lines, tokens, or statements. These different syntactic views complement and reinforce each other. Python is very particular about program layout, especially with regard to lines and indentation, so you'll want to pay attention to this information if you are coming to Python from another language.A Python program is composed of a sequence of logical lines, each made up of one or more physical lines. Each physical line may end with a comment. A pound sign (
#
) that is not inside a string literal begins a comment. All characters after the#
and up to the physical line end are part of the comment, and the Python interpreter ignores them. A line containing only whitespace, possibly with a comment, is called a blank line, and is ignored by the interpreter. In an interactive interpreter session, you must enter an empty physical line (without any whitespace or comment) to terminate a multiline statement.In Python, the end of a physical line marks the end of most statements. Unlike in other languages, Python statements are not normally terminated with a delimiter, such as a semicolon (Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Lexical Structure
- Content preview·Buy reprint rights for this chapterThe lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language and specifies such things as what variable names look like and what characters are used for comments. Each Python source file, like any other text file, is a sequence of characters. You can also usefully see it as a sequence of lines, tokens, or statements. These different syntactic views complement and reinforce each other. Python is very particular about program layout, especially with regard to lines and indentation, so you'll want to pay attention to this information if you are coming to Python from another language.A Python program is composed of a sequence of logical lines, each made up of one or more physical lines. Each physical line may end with a comment. A pound sign (
#
) that is not inside a string literal begins a comment. All characters after the#
and up to the physical line end are part of the comment, and the Python interpreter ignores them. A line containing only whitespace, possibly with a comment, is called a blank line, and is ignored by the interpreter. In an interactive interpreter session, you must enter an empty physical line (without any whitespace or comment) to terminate a multiline statement.In Python, the end of a physical line marks the end of most statements. Unlike in other languages, Python statements are not normally terminated with a delimiter, such as a semicolon (;
). When a statement is too long to fit on a single physical line, you can join two adjacent physical lines into a logical line by ensuring that the first physical line has no comment and ends with a backslash (\
). Python also joins adjacent physical lines into one logical line if an open parenthesis ((
), bracket ([
), or brace ({
) has not yet been closed. Triple-quoted string literals can also span physical lines. Physical lines after the first one in a logical line are known asAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Data Types
- Content preview·Buy reprint rights for this chapterThe operation of a Python program hinges on the data it handles. All data values in Python are represented by objects, and each object, or value, has a type. An object's type determines what operations the object supports, or, in other words, what operations you can perform on the data value. The type also determines the object's attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object. I cover object attributes and items in detail later in this chapter.The built-in
type(
obj)
accepts any object as its argument and returns the type object that represents the type of obj. Another built-in function,isinstance(
obj,type)
, returnsTrue
if object obj is represented by type object type; otherwise, it returnsFalse
(built-in namesTrue
andFalse
were introduced in Python 2.2.1; in older versions,1
and0
are used instead).Python has built-in objects for fundamental data types such as numbers, strings, tuples, lists, and dictionaries, as covered in the following sections. You can also create user-defined objects, known as classes, as discussed in detail in Chapter 5.The built-in number objects in Python support integers (plain and long), floating-point numbers, and complex numbers. All numbers in Python are immutable objects, meaning that when you perform an operation on a number object, you always produce a new number object. Operations on numbers, called arithmetic operations, are covered later in this chapter.Integer literals can be decimal, octal, or hexadecimal. A decimal literal is represented by a sequence of digits where the first digit is non-zero. An octal literal is specified with aAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Variables and Other References
- Content preview·Buy reprint rights for this chapterA Python program accesses data values through references. A reference is a name that refers to the specific location in memory of a value (object). References take the form of variables, attributes, and items. In Python, a variable or other reference has no intrinsic type. The object to which a reference is bound at a given time does have a type, however. Any given reference may be bound to objects of different types during the execution of a program.In Python, there are no declarations. The existence of a variable depends on a statement that binds the variable, or, in other words, that sets a name to hold a reference to some object. You can also unbind a variable by resetting the name so it no longer holds a reference. Assignment statements are the most common way to bind variables and other references. The
del
statement unbinds references.Binding a reference that was already bound is also known as rebinding it. Whenever binding is mentioned in this book, rebinding is implicitly included except where it is explicitly excluded. Rebinding or unbinding a reference has no effect on the object to which the reference was bound, except that an object disappears when nothing refers to it. The automatic cleanup of objects to which there are no references is known as garbage collection.You can name a variable with any identifier except the 29 that are reserved as Python's keywords (see Section 4.1.2.2 earlier in this chapter). A variable can be global or local. A global variable is an attribute of a module object (Chapter 7 covers modules). A local variable lives in a function's local namespace (see Section 4.10 later in this chapter).Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Expressions and Operators
- Content preview·Buy reprint rights for this chapterAn expression is a phrase of code that the Python interpreter can evaluate to produce a value. The simplest expressions are literals and identifiers. You build other expressions by joining subexpressions with the operators and/or delimiters in Table 4-2. This table lists the operators in decreasing order of precedence, so operators with higher precedence are listed before those with lower precedence. Operators listed together have the same precedence. The A column lists the associativity of the operator, which can be L (left-to-right), R (right-to-left), or NA (non-associative).In Table 4-2, expr, key, f, index, x, and y indicate any expression, while attr and arg indicate identifiers. The notation
,..
. indicates that commas join zero or more repetitions, except for string conversion, where one or more repetitions are allowed. A trailing comma is also allowed and innocuous in all such cases, except with string conversion, where it's forbidden.Table 4-2: Operator precedence in expressions OperatorDescriptionA`expr,...`
String conversionNA{key:expr,...}
Dictionary creationNAAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Numeric Operations
- Content preview·Buy reprint rights for this chapterPython supplies the usual numeric operations, as you've just seen in Table 4-2. All numbers are immutable objects, so when you perform a numeric operation on a number object, you always produce a new number object. You can access the parts of a complex object
z
as read-only attributesz.real
andz.imag
. Trying to rebind these attributes on a complex object raises an exception.Note that a number's optional+
or-
sign, and the+
that joins a floating-point literal to an imaginary one to make a complex number, are not part of the literals' syntax. They are ordinary operators, subject to normal operator precedence rules (see Table 4-2). This is why, for example,-2**2
evaluates to-4
: exponentiation has higher precedence than unary minus, so the whole expression parses as-(2**2)
, not as(-2)**2
.You can perform arithmetic operations and comparisons between any two numbers. If the operands' types differ, coercion applies: Python converts the operand with the smaller type to the larger type. The types, in order from smallest to largest, are integers, long integers, floating-point numbers, and complex numbers.You can also perform an explicit conversion by passing a numeric argument to any of the built-ins:int
,long
,float
, andcomplex
.int
andlong
drop their argument's fractional part, if any (e.g.,int(9.8)
is9
). Converting from a complex number to any other numeric type drops the imaginary part. You can also callcomplex
with two arguments, giving real and imaginary parts.Each built-in type can also take a string argument with the syntax of an appropriate numeric literal with two small extensions: the argument string may start with a sign and, for complex numbers, may sum or subtract real and imaginary parts.int
andlong
can also be called with two arguments: the first one a string to convert, and the second one the radix, an integer between 2 and 36 to use as the base for the conversion (e.g.,Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Sequence Operations
- Content preview·Buy reprint rights for this chapterPython supports a variety of operations that can be applied to sequence types, including strings, lists, and tuples.Sequences are containers with items accessible by indexing or slicing, as we'll discuss shortly. The built-in
len
function takes a container as an argument and returns the number of items in the container. The built-inmin
andmax
functions take one argument, a non-empty sequence (or other iterable) whose items are comparable, and they return the smallest and largest items in the sequence, respectively. You can also callmin
andmax
with multiple arguments, in which case they return the smallest and largest arguments, respectively.Section 4.6.1.1: Coercion and conversions
There is no implicit coercion between different sequence types except that normal strings are coerced to Unicode strings if needed. Conversion to strings is covered in detail in Chapter 9. You can call the built-intuple
andlist
functions with a single argument (a sequence or other iterable) to get an instance of the type you're calling, with the same items in the same order as in the argument.Section 4.6.1.2: Concatenation
You can concatenate sequences of the same type with the+
operator. You can also multiply any sequence S by an integer n with the*
operator. The result of S*
n or n*
S is the concatenation of n copies of S. If n is zero or less than zero, the result is an empty sequence of the same type as S.Section 4.6.1.3: Sequence membership
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Dictionary Operations
- Content preview·Buy reprint rights for this chapterPython provides a variety of operations that can be applied to dictionaries. Since dictionaries are containers, the built-in
len
function can take a dictionary as its single argument and return the number of items (key/value pairs) in the dictionary object.In Python 2.2 and later, the kin
D operator tests to see whether object k is one of the keys of the dictionary D. It returnsTrue
if it is andFalse
if it isn't. Similarly, the knot
in
D operator is just likenot
(
kin
D).The value in a dictionary D that is currently associated with key k is denoted by an indexing: D[
k]
. Indexing with a key that is not present in the dictionary raises an exception. For example:d = { 'x':42, 'y':3.14, 'z':7 } d['x'] # 42 d['z'] # 7 d['a'] # raises exception
Plain assignment to a dictionary indexed with a key that is not yet in the dictionary (e.g., D[
newkey]=
value) is a valid operation that adds the key and value as a new item in the dictionary. For instance:d = { 'x':42, 'y':3.14, 'z':7 } d['a'] = 16 # d is now {'x':42,'y':3.14,'z':7,'a':16}
Thedel
statement, in the formdel
D[
k]
, removes from the dictionary the item whose key is k. If k is not a key in dictionary D,del
D[
k]
raises an exception.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The print Statement
- Content preview·Buy reprint rights for this chapterA print statement is denoted by the keyword
print
followed by zero or more expressions separated by commas.print
is a handy, simple way to output values in text form.print
outputs each expression x as a string that's just like the result of callingstr(
x)
(covered in Chapter 8).print
implicitly outputs a space between expressions, and it also implicitly outputs\n
after the last expression, unless the last expression is followed by a trailing comma (,). Here are some examples ofprint
statements:letter = 'c' print "give me a", letter, "..." # prints: give me a c ... answer = 42 print "the answer is:", answer # prints: the answer is: 42
The destination ofprint
's output is the file or file-like object that is the value of thestdout
attribute of thesys
module (covered in Chapter 8). You can control output format more precisely by performing string formatting yourself, with the%
operator or other string manipulation techniques, as covered in Chapter 9. You can also use thewrite
orwritelines
methods of file objects, as covered in Chapter 10. However,print
is very simple to use, and simplicity is an important advantage in the common case where all you need are the simple output strategies thatprint
supplies.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Control Flow Statements
- Content preview·Buy reprint rights for this chapterA program's control flow is the order in which the program's code executes. The control flow of a Python program is regulated by conditional statements, loops, and function calls. This section covers the
if
statement andfor
andwhile
loops; functions are covered later in this chapter. Raising and handling exceptions also affects control flow; exceptions are covered in Chapter 6.Often, you need to execute some statements only if some condition holds, or choose statements to execute depending on several mutually exclusive conditions. The Python compound statementif
, which usesif
,elif
, andelse
clauses, lets you conditionally execute blocks of statements. Here's the syntax for theif
statement:if expression: statement(s) elif expression: statement(s) elif expression: statement(s) ... else: statement(s)
Theelif
andelse
clauses are optional. Note that unlike some languages, Python does not have aswitch
statement, so you must useif
,elif
, andelse
for all conditional processing.Here's a typicalif
statement:if x < 0: print "x is negative" elif x % 2: print "x is positive and odd" else: print "x is even and non-negative"
When there are multiple statements in a clause (i.e., the clause controls a block of statements), the statements are placed on separate logical lines after the line containing the clause's keyword (known as the header line of the clause) and indented rightward from the header line. The block terminates when the indentation returns to that of the clause header (or further left from there). When there is just a single simple statement, as here, it can follow the : on the same logical line as the header, but it can also be placed on a separate logical line, immediately after the header line and indented rightward from it. Many Python practitioners consider the separate-line style more readable:Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Functions
- Content preview·Buy reprint rights for this chapterMost statements in a typical Python program are organized into functions. A function is a group of statements that executes upon request. Python provides many built-in functions and allows programmers to define their own functions. A request to execute a function is known as a function call. When a function is called, it may be passed arguments that specify data upon which the function performs its computation. In Python, a function always returns a result value, either
None
or a value that represents the results of its computation. Functions defined withinclass
statements are also called methods. Issues specific to methods are covered in Chapter 5; the general coverage of functions in this section, however, also applies to methods.In Python, functions are objects (values) and are handled like other objects. Thus, you can pass a function as an argument in a call to another function. Similarly, a function can return another function as the result of a call. A function, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. Functions can also be keys into a dictionary. For example, if you need to quickly find a function's inverse given the function, you could define a dictionary whose keys and values are functions and then make the dictionary bidirectional (using some functions from modulemath
, covered in Chapter 15):inverse = {sin:asin, cos:acos, tan:atan, log:exp} for f in inverse.keys( ): inverse[inverse[f]] = f
The fact that functions are objects in Python is often expressed by saying that functions are first-class objects.Thedef
statement is the most common way to define a function.def
is a single-clause compound statement with the following syntax:def function-name(parameters): statement(s)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 5: Object-Oriented Python
- Content preview·Buy reprint rights for this chapterPython is an object-oriented programming language. Unlike some other object-oriented languages, Python doesn't force you to use the object-oriented paradigm exclusively. Python also supports procedural programming with modules and functions, so you can select the most suitable programming paradigm for each part of your program. Generally, the object-oriented paradigm is suitable when you want to group state (data) and behavior (code) together in handy packets of functionality. It's also useful when you want to use some of Python's object-oriented mechanisms covered in this chapter, such as inheritance or special methods. The procedural paradigm, based on modules and functions, tends to be simpler and is more suitable when you don't need any of the benefits of object-oriented programming. With Python, you often mix and match the two paradigms.Python 2.2 and 2.3 are in transition between two slightly different object models. This chapter starts by describing the classic object model, which was the only one available in Python 2.1 and earlier and is still the default model in Python 2.2 and 2.3. The chapter then covers the small differences that define the powerful new-style object model and discusses how to use the new-style object model with Python 2.2 and 2.3. Because the new-style object model builds on the classic one, you'll need to understand the classic model before you can learn about the new model. Finally, the chapter covers special methods for both the classic and new-style object models, as well as metaclasses for Python 2.2 and later.The new-style object model will become the default in a future version of Python. Even though the classic object model is still the default, I suggest you use the new-style object model when programming with Python 2.2 and later. Its advantages over the classic object model, while small, are measurable, and there are practically no compensating disadvantages. Therefore, it's simpler just to stick to the new-style object model, rather than try to decide which model to use each time you code a new class.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Classic Classes and Instances
- Content preview·Buy reprint rights for this chapterA classic class is a Python object with several characteristics:
-
You can call a class object as if it were a function. The call creates another object, known as an instance of the class, that knows what class it belongs to.
-
A class has arbitrarily named attributes that you can bind and reference.
-
The values of class attributes can be data objects or function objects.
-
Class attributes bound to functions are known as methods of the class.
-
A method can have a special Python-defined name with two leading and two trailing underscores. Python invokes such special methods, if they are present, when various kinds of operations take place on class instances.
-
A class can inherit from other classes, meaning it can delegate to other class objects the lookup of attributes that are not found in the class itself.
An instance of a class is a Python object with arbitrarily named attributes that you can bind and reference. An instance object implicitly delegates to its class the lookup of attributes not found in the instance itself. The class, in turn, may delegate the lookup to the classes from which it inherits, if any.In Python, classes are objects (values), and are handled like other objects. Thus, you can pass a class as an argument in a call to a function. Similarly, a function can return a class as the result of a call. A class, just like any other object, can be bound to a variable (local or global), an item in a container, or an attribute of an object. Classes can also be keys into a dictionary. The fact that classes are objects in Python is often expressed by saying that classes are first-class objects.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- New-Style Classes and Instances
- Content preview·Buy reprint rights for this chapterMost of what I have covered so far in this chapter also holds for the new-style object model introduced in Python 2.2. New-style classes and instances are first-class objects just like classic ones, both can have arbitrary attributes, you call a class to create an instance of the class, and so on. In this section, I'm going to cover the few differences between the new-style and classic object models.In Python 2.2 and 2.3, a class is new-style if it inherits from built-in type
object
directly or indirectly (i.e., if it subclasses any built-in type, such aslist
,dict
,file
,object
, and so on). In Python 2.1 and earlier, a class cannot inherit from a built-in type, and built-in typeobject
does not exist. In Section 5.4 later in this chapter, I cover other ways to make a class new-style, ways that you can use in Python 2.2 or later whether a class has superclasses or not.As I said at the beginning of this chapter, I suggest you get into the habit of using new-style classes when you program in Python 2.2 or later. The new-style object model has small but measurable advantages, and there are practically no compensating disadvantages. It's simpler just to stick to the new-style object model, rather than try to decide which model to use each time you code a new class.As of Python 2.2, the built-inobject
type is the ancestor of all built-in types and new-style classes. Theobject
type defines some special methods (as documented in Section 5.3 later in this chapter) that implement the default semantics of objects:-
__new__
,__init__
-
You can create a direct instance of
object
, and such creation implicitly uses the static method__new_ _
of typeobject
to create the new instance, and then uses the new instance's
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Special Methods
- Content preview·Buy reprint rights for this chapterA class may define or inherit special methods (i.e., methods whose names begin and end with double underscores). Each special method relates to a specific operation. Python implicitly invokes a special method whenever you perform the related operation on an instance object. In most cases, the method's return value is the operation's result, and attempting an operation when its related method is not present raises an exception. Throughout this section, I will point out the cases in which these general rules do not apply. In the following, x is the instance of class C on which you perform the operation, and y is the other operand, if any. The formal argument
self
of each method also refers to instance object x.Some special methods relate to general-purpose operations. A class that defines or inherits these methods allows its instances to control such operations. These operations can be divided into the following categories:- Initialization and finalization
-
An instance can control its initialization (a frequent need) via special method
__init__
, and/or its finalization (a rare need) via__del__
. - Representation as string
-
An instance can control how Python represents it as a string via special methods
__repr__
,__str_ _
, and__unicode__
. - Comparison, hashing, and use in a Boolean context
-
An instance can control how it compares with other objects (methods
__lt__
and__cmp__
), how dictionaries use it as a key (__hash__
), and whether it evaluates to true or false in Boolean contexts (_ _nonzero__
).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Metaclasses
- Content preview·Buy reprint rights for this chapterAny object, even a class object, has a type. In Python, types and classes are also first-class objects. The type of a class object is also known as the class's metaclass. An object's behavior is determined largely by the type of the object. This also holds for classes: a class's behavior is determined largely by the class's metaclass. Metaclasses are an advanced subject, and you may want to skip the rest of this chapter on first reading. However, fully grasping metaclasses can help you obtain a deeper understanding of Python, and sometimes it can even be useful to define your own custom metaclasses.The distinction between classic and new-style classes relies on the fact that each class's behavior is determined by its metaclass. In other words, the reason classic classes behave differently from new-style classes is that classic and new-style classes are object of different types (metaclasses):
class Classic: pass class Newstyle(object): pass print type(Classic) # prints: <type 'class'> print type(Newstyle) # prints: <type 'type'>
The type ofClassic
is objecttypes.ClassType
from standard moduletypes
, while the type ofNewstyle
is built-in objecttype
.type
is also the metaclass of all Python built-in types, including itself (i.e.,print
type(type)
also prints<type
'type'>
).To execute aclass
statement, Python first collects the base classes into a tuple t (an empty one, if there are no base classes) and executes the class body in a temporary dictionary d. Then, Python determines the metaclass M to use for the new class object C created by theclass
statement.When '__metaclass__
' is a key in d, M is d['__metaclass__']
. Thus, you can explicitly control class C's metaclass by binding the attribute__metaclass__
in C's class body. Otherwise, when t is non-empty (i.e., when C has one or more base classes),Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 6: Exceptions
- Content preview·Buy reprint rights for this chapterPython uses exceptions to communicate errors and anomalies. An exception is an object that indicates an error or anomalous condition. When Python detects an error, it raises an exception; that is, it signals the occurrence of an anomalous condition by passing an exception object to the exception-propagation mechanism. Your code can also explicitly raise an exception by executing a
raise
statement.Handling an exception means receiving the exception object from the propagation mechanism and performing whatever actions are needed to deal with the anomalous situation. If a program does not handle an exception, it terminates with an error traceback message. However, a program can handle exceptions and keep running despite errors or other abnormal conditions.Python also uses exceptions to indicate some special situations that are not errors, and are not even abnormal occurrences. For example, as covered in Chapter 4, an iterator'snext
method raises the exceptionStopIteration
when the iterator has no more items. This is not an error, and it is not even an anomalous condition, since most iterators run out of items eventually.Thetry
statement provides Python's exception-handling mechanism. It is a compound statement that can take one of two different forms:-
A
try
clause followed by one or moreexcept
clauses -
A
try
clause followed by exactly onefinally
clause
Here's the syntax for thetry
/except
form of thetry
statement:try: statement(s) except [expression [, target]]: statement(s) [else: statement(s)]
This form of thetry
statement has one or moreexcept
clauses, as well as an optionalelse
clause.The body of eachexcept
clause is known as anAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- The try Statement
- Content preview·Buy reprint rights for this chapterThe
try
statement provides Python's exception-handling mechanism. It is a compound statement that can take one of two different forms:-
A
try
clause followed by one or moreexcept
clauses -
A
try
clause followed by exactly onefinally
clause
Here's the syntax for thetry
/except
form of thetry
statement:try: statement(s) except [expression [, target]]: statement(s) [else: statement(s)]
This form of thetry
statement has one or moreexcept
clauses, as well as an optionalelse
clause.The body of eachexcept
clause is known as an exception handler. The code executes if the expression in theexcept
clause matches an exception object that propagates from thetry
clause. expression is an optional class or tuple of classes that matches any exception object of one of the listed classes or any of their subclasses. The optional target is an identifier that names a variable that Python binds to the exception object just before the exception handler executes. A handler can also obtain the current exception object by calling theexc_info
function of modulesys
(covered in Chapter 8).Here is an example of thetry
/except
form of thetry
statement:try: 1/0 except ZeroDivisionError: print "caught divide-by-0 attempt"
If atry
statement has severalexcept
clauses, the exception propagation mechanism tests theexcept
clauses in order: the firstexcept
clause whose expression matches the exception object is used as the handler. Thus, you must always list handlers for specific cases before you list handlers for more general cases. If you list a general case first, the more specificexcept
clauses that follow will never enter the picture.The lastexcept
clause may lack an expression. This clause handles any exception that reaches it during propagation. Such unconditional handling is a rare need, but it does occur, generally in wrapper functions that must perform some extra task before reraising an exception, as we'll discuss later in the chapter.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Exception Propagation
- Content preview·Buy reprint rights for this chapterWhen an exception is raised, the exception-propagation mechanism takes control. The normal control flow of the program stops, and Python looks for a suitable exception handler. Python's
try
statement establishes exception handlers via itsexcept
clauses. The handlers deal with exceptions raised in the body of thetry
clause, as well as exceptions that propagate from any of the functions called by that code, directly or indirectly. If an exception is raised within atry
clause that has an applicableexcept
handler, thetry
clause terminates and the handler executes. When the handler finishes, execution continues with the statement after thetry
statement.If the statement raising the exception is not within atry
clause that has an applicable handler, the function containing the statement terminates, and the exception propagates upward to the statement that called the function. If the call to the terminated function is within atry
clause that has an applicable handler, thattry
clause terminates, and the handler executes. Otherwise, the function containing the call terminates, and the propagation process repeats, unwinding the stack of function calls until an applicable handler is found.If Python cannot find such a handler, by default the program prints an error message to the standard error stream (the filesys.stderr
). The error message includes a traceback that gives details about functions terminated during propagation. You can change Python's default error-reporting behavior by settingsys.excepthook
(covered in Chapter 8). After error reporting, Python goes back to the interactive session, if any, or terminates if no interactive session is active. When the exception class isSystemExit
, termination is silent and includes the interactive session, if any.Here are some functions that we can use to see exception propagation at work.def f( ): print "in f, before 1/0" 1/0 # raises a ZeroDivisionError exception print "in f, after 1/0" def g( ): print "in g, before f( )" f( ) print "in g, after f( )" def h( ): print "in h, before g( )" try: g( ) print "in h, after g( )" except ZeroDivisionError: print "ZD exception caught" print "function h ends"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The raise Statement
- Content preview·Buy reprint rights for this chapterYou can use the
raise
statement to raise an exception explicitly.raise
is a simple statement with the following syntax:raise [expression1[, expression2]]
Only an exception handler (or a function that a handler calls, directly or indirectly) can useraise
without any expressions. A plainraise
statement reraises the same exception object that the handler received. The handler terminates, and the exception propagation mechanism keeps searching for other applicable handlers. Using araise
without expressions is useful when a handler discovers that it is unable to handle an exception it receives, so the exception should keep propagating.When only expression1 is present, it can be an instance object or a class object. In this case, if expression1 is an instance object, Python raises that instance. When expression1 is a class object,raise
instantiates the class without arguments and raises the resulting instance. When both expressions are present, expression1 must be a class object.raise
instantiates the class, with expression2 as the argument (or multiple arguments if expression2 is a tuple), and raises the resulting instance.Here's an example of a typical use of theraise
statement:def crossProduct(seq1, seq2): if not seq1 or not seq2: raise ValueError, "Sequence arguments must be non-empty" return [ (x1, x2) for x1 in seq1 for x2 in seq2 ]
ThecrossProduct
function returns a list of all pairs with one item from each of its sequence arguments, but first it tests both arguments. If either argument is empty, the function raisesValueError
, rather than just returning an empty list as the list comprehension would normally do. Note that there is no need forcrossProduct
to test ifseq1
andseq2
are iterable: if either isn't, the list comprehension itself will raise the appropriate exception, presumably aTypeError
. Once an exception is raised, be it by Python itself or with an explicitraise
statement in your code, it's up to the caller to either handle it (with a suitableAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Exception Objects
- Content preview·Buy reprint rights for this chapterExceptions are instances of subclasses of the built-in
Exception
class. For backward compatibility, Python also lets you use strings, or instances of any class, as exception objects, but such usage risks future incompatibility and gives no benefits. An instance of any subclass ofException
has an attributeargs
, the tuple of arguments used to create the instance.args
holds error-specific information, usable for diagnostic or recovery purposes.All exceptions that Python itself raises are instances of subclasses ofException
. The inheritance structure of exception classes is important, as it determines whichexcept
clauses handle which exceptions.TheSystemExit
class inherits directly fromException
. Instances ofSystemExit
are normally raised by theexit
function in modulesys
(covered in Chapter 8).Other standard exceptions derive fromStandardError
, a direct subclass ofException
. Three subclasses ofStandardError
, likeStandardError
itself andException
, are never instantiated directly. Their purpose is to make it easier for you to specifyexcept
clauses that handle a broad range of related errors. These subclasses are:-
ArithmeticError
-
The base class for exceptions due to arithmetic errors (i.e.,
OverflowError
,ZeroDivisionError
,FloatingPointError
) -
LookupError
-
The base class for exceptions that a container raises when it receives an invalid key or index (i.e.,
IndexError
,KeyError
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Custom Exception Classes
- Content preview·Buy reprint rights for this chapterYou can subclass any of the standard exception classes in order to define your own exception class. Typically, such a subclass adds nothing more than a docstring:
class InvalidAttribute(AttributeError): "Used to indicate attributes that could never be valid"
Given the semantics oftry
/except
, raising a custom exception class such asInvalidAttribute
is almost the same as raising its standard exception superclass,AttributeError
. Anyexcept
clause able to handleAttributeError
can handleInvalidAttribute
just as well. In addition, client code that knows specifically about yourInvalidAttribute
custom exception class can handle it specifically, without having to handle all other cases ofAttributeError
if it is not prepared for those. For example:class SomeFunkyClass(object): "much hypothetical functionality snipped" def __getattr__(self, name): "this __getattr__ only clarifies the kind of attribute error" if name.startswith('_'): raise InvalidAttribute, "Unknown private attribute "+name else: raise AttributeError, "Unknown attribute "+name
Now client code can be more selective in its handlers. For example:s = SomeFunkyClass( ) try: value = getattr(s, thename) except InvalidAttribute, err: warnings.warn(str(err)) value = None # other cases of AttributeError just propagate, as they're unexpected
A special case of custom exception class that you may sometimes find useful is one that wraps another exception and adds further information. To gather information about a pending exception, you can use theexc_info
function from modulesys
(covered in Chapter 8). Given this, your custom exception class could be defined as follows:import sys class CustomException(Exception): "Wrap arbitrary pending exception, if any, in addition to other info" def __init__(self, *args): Exception.__init__(self, *args) self.wrapped_exc = sys.exc_info( )
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Error-Checking Strategies
- Content preview·Buy reprint rights for this chapterMost programming languages that support exceptions are geared to raise exceptions only in very rare cases. Python's emphasis is different. In Python, exceptions are considered appropriate whenever they make a program simpler and more robust. A common idiom in other languages, sometimes known as "look before you leap" (LBYL), is to check in advance, before attempting an operation, for all circumstances that might make the operation invalid. This is not ideal, for several reasons:
-
The checks may diminish the readability and clarity of the common, mainstream cases where everything is okay.
-
The work needed for checking may duplicate a substantial part of the work done in the operation itself.
-
The programmer might easily err by omitting some needed check.
-
The situation might change between the moment the checks are performed and the moment the operation is attempted.
The preferred idiom in Python is generally to attempt the operation in atry
clause and handle the exceptions that may result inexcept
clauses. This idiom is known as "it's easier to ask forgiveness than permission" (EAFP), a motto widely credited to Admiral Grace Murray Hopper, co-inventor of COBOL, and shares none of the defects of "look before you leap." Here is a function written using the LBYL idiom:def safe_divide_1(x, y): if y= =0: print "Divide-by-0 attempt detected" return None else: return x/y
With LBYL, the checks come first, and the mainstream case is somewhat hidden at the end of the function.Here is the equivalent function written using the EAFP idiom:def safe_divide_2(x, y): try: return x/y except ZeroDivisionError: print "Divide-by-0 attempt detected" return None
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Chapter 7: Modules
- Content preview·Buy reprint rights for this chapterA typical Python program is made up of several source files. Each source file corresponds to a module, which packages program code and data for reuse. Modules are normally independent of each other so that other programs can reuse the specific modules they need. A module explicitly establishes dependencies upon another module by using
import
orfrom
statements. In some other programming languages, global variables can provide a hidden conduit for coupling between modules. In Python, however, global variables are not global to all modules, but instead such variables are attributes of a single module object. Thus, Python modules communicate in explicit and maintainable ways.Python also supports extensions, which are components written in other languages, such as C, C++, or Java, for use with Python. Extensions are seen as modules by the Python code that uses them (called client code). From the client code viewpoint, it does not matter whether a module is 100% pure Python or an extension. You can always start by coding a module in Python. Later, if you need better performance, you can recode some modules in a lower-level language without changing the client code that uses the modules. Chapter 24 and Chapter 25 discuss writing extensions in C and Java.This chapter discusses module creation and loading. It also covers grouping modules into packages, which are modules that contain other modules, forming a hierarchical, tree-like structure. Finally, the chapter discusses using Python's distribution utilities (distutils
) to prepare packages and modules for distribution and to install distributed packages and modules.A module is a Python object with arbitrarily named attributes that you can bind and reference. The Python code for a module named aname normally resides in a file named aname.py, as covered in Section 7.2 later in this chapter.In Python, modules are objects (values) and are handled like other objects. Thus, you can pass a module as an argument in a call to a function. Similarly, a function can return a module as the result of a call. A module, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. For example, theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Module Objects
- Content preview·Buy reprint rights for this chapterA module is a Python object with arbitrarily named attributes that you can bind and reference. The Python code for a module named aname normally resides in a file named aname.py, as covered in Section 7.2 later in this chapter.In Python, modules are objects (values) and are handled like other objects. Thus, you can pass a module as an argument in a call to a function. Similarly, a function can return a module as the result of a call. A module, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. For example, the
sys.modules
dictionary, covered later in this chapter, holds module objects as its values.You can use any Python source file as a module by executing animport
statement in some other code.import
has the following syntax:import modname [as varname][,...]
Theimport
keyword is followed by one or more module specifiers, separated by commas. In the simplest and most common case, modname is an identifier, the name of a variable that Python binds to the module object when theimport
statement finishes. In this case, Python looks for the module of the same name to satisfy theimport
request. For example:import MyModule
looks for the module namedMyModule
and binds the variable namedMyModule
in the current scope to the module object. modname can also be a sequence of identifiers separated by dots (.) that names a module in a package, as covered in later in this chapter.Whenas
varname is part of animport
statement, Python binds the variable named varname to the module object, but the module name that Python looks for is modname. For example:import MyModule as Alias
looks for the module namedMyModule
and binds the variable namedAlias
in the current scope to the module object. varname is always a simple identifier.Section 7.1.1.1: Module body
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Module Loading
- Content preview·Buy reprint rights for this chapterModule-loading operations rely on attributes of the built-in
sys
module (covered in Chapter 8). The module-loading process described here is carried out by built-in function__import_ _
. Your code can call__import__
directly, with the module name string as an argument._ _import__
returns the module object or raisesImportError
if the import fails.To import a module named M,__import__
first checks dictionarysys.modules
, using string M as the key. When key M is in the dictionary,_ _import__
returns the corresponding value as the requested module object. Otherwise,__import__
bindssys.modules[
M]
to a new empty module object with a__name__
of M, then looks for the right way to initialize (load) the module, as covered in Section 7.2.2 later in this section.Thanks to this mechanism, the loading operation takes place only the first time a module is imported in a given run of the program. When a module is imported again, the module is not reloaded, since__import__
finds and returns the module's entry insys.modules
. Thus, all imports of a module after the first one are extremely fast because they're just dictionary lookups.When a module is loaded,__import__
first checks whether the module is built-in. Built-in modules are listed in tuplesys.builtin_module_names
, but rebinding that tuple does not affect module loading. A built-in module, like any other Python extension, is initialized by calling the module's initialization function. The search for built-in modules also finds frozen modules and modules in platform-specific locations (e.g., resources on the Mac, the Registry in Windows).If module M is not built-in or frozen,_ _import__
looks for M's code as a file on the filesystem.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Packages
- Content preview·Buy reprint rights for this chapterA package is a module that contains other modules. Modules in a package may be subpackages, resulting in a hierarchical tree-like structure. A package named P resides in a subdirectory, also called P, of some directory in
sys.path
. The module body of P is in the file P/__init_ _.py. You must have a file named P/__init_ _.py, even if it's empty (representing an empty module body), in order to indicate to Python that directory P is indeed a package. Other .py files in directory P are the modules of package P. Subdirectories of P containing __init_ _.py files are subpackages of P. Nesting can continue to any depth.You can import a module named M in package P as P.M. More dots let you navigate a hierarchical package structure. A package is always loaded before a module in the package is loaded. If you use the syntaximport
P.M, variable P is bound to the module object of package P, and attribute M of object P is bound to module P.M. If you use the syntaximport
P.Mas
V, variable V is bound directly to module P.M.Usingfrom
Pimport
M to import a specific module M from package P is fully acceptable programming practice. In other words, thefrom
statement is specifically okay in this case.A module M in a package P can import any other module X of P with the statementimport
X. Python searches the module's own package directory before searching the directories insys.path
. However, this applies only to sibling modules, not to ancestors or other more-complicated relationships. The simplest, cleanest way to share objects (such as functions or constants) among modules in a package P is to group the shared objects in a file named P/Common.py. Then you canimport
Common
from every module in the package that needs to access the objects, and then refer to the objects asCommon
.f,Common
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Distribution Utilities (distutils)
- Content preview·Buy reprint rights for this chapterPython modules, extensions, and applications can be packaged and distributed in several forms:
- Compressed archive files
-
Generally .zip for Windows and .tar.gz or .tgz for Unix-based systems, but both forms are portable
- Self-unpacking or self-installing executables
-
Normally .exe for Windows
- Platform-specific installers
-
For example, .msi on Windows, .rpm and .srpm on Linux, and .deb on Debian GNU/Linux
When you distribute a package as a self-installing executable or platform-specific installer, a user can then install the package simply by running the installer. How to run such an installer program depends on the platform, but it no longer matters what language the program was written in.When you distribute a package as an archive file or as an executable that unpacks but does not install itself, it does matter that the package was coded in Python. In this case, the user must first unpack the archive file into some appropriate directory, say C:\Temp\MyPack on a Windows machine or ~/MyPack on a Unix-like machine. Among the extracted files there should be a script, conventionally named setup.py, that uses the Python facility known as the distribution utilities (packagedistutils
). The distributed package is then almost as easy to install as a self-installing executable would be. The user opens a command-prompt window and changes to the directory into which the archive is unpacked. Then the user runs, for example:C:\Temp\MyPack> python setup.py install
The setup.py script, run with thisAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 8: Core Built-ins
- Content preview·Buy reprint rights for this chapterThe term built-in has more than one meaning in Python. In most contexts, a built-in is any object directly accessible to a Python program without an
import
statement. Chapter 7 showed the mechanism that Python uses to allow this direct access. Built-in types in Python include numbers, sequences, dictionaries, functions (covered in Chapter 4), classes (covered in Chapter 5), the standard exception classes (covered in Chapter 6), and modules (covered in Chapter 7). The built-infile
object is covered in Chapter 10, and other built-in types covered in Chapter 13 are intrinsic to Python's internal operation. This chapter provides additional coverage of the core built-in types, and it also covers the built-in functions available in module__builtin_ _
.As I mentioned in Chapter 7, some modules are called built-in because they are an integral part of the Python standard library, even though it takes animport
statement to access them. Built-in modules are distinct from separate, optional add-on modules, also called Python extensions. This chapter documents the following core built-in modules:sys
,getopt
,copy
,bisect
,UserList
,UserDict
, andUserString
. Chapter 9 covers some string-related core built-in modules, while Parts III and IV of the book cover many other useful built-in modules.This section documents Python's core built-in types, likeint
,float
, anddict
. Note that prior to Python 2.2, these names referred to factory functions for creating objects of these types. As of Python 2.2, however, they refer to actual type objects. Since you can call type objects just as if they were functions, this change does not break existing programs.classmethodPython 2.2 and laterclassmethod(function)Creates and returns a class method object. In practice, you call this built-in type only within a class body. See Section 5.2.2.2.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Built-in Types
- Content preview·Buy reprint rights for this chapterThis section documents Python's core built-in types, like
int
,float
, anddict
. Note that prior to Python 2.2, these names referred to factory functions for creating objects of these types. As of Python 2.2, however, they refer to actual type objects. Since you can call type objects just as if they were functions, this change does not break existing programs.classmethodPython 2.2 and laterclassmethod(function)Creates and returns a class method object. In practice, you call this built-in type only within a class body. See Section 5.2.2.2.complexcomplex(real,imag=0)Converts any number, or a suitable string, to a complex number. imag may be present only when real is a number, and is the imaginary part of the resulting complex number.dictPython 2.2 and laterdict(x={ })Returns a new dictionary object with the same items as argument x. When x is a dictionary,dict(
x)
returns a copy of x, like x.copy( )
does. Alternatively, x can be a sequence of pairs, that is, a sequence whose items are sequences with two items each. In this case,dict(
x)
returns a dictionary whose keys are the first items of each pair in x, while the corresponding values are the corresponding second items. In other words, whenAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Built-in Functions
- Content preview·Buy reprint rights for this chapterThis section documents the Python functions available in module
__builtin__
in alphabetical order. Note that the names of these built-ins are not reserved words. Thus, your program can bind for its own purposes, in local or global scope, an identifier that has the same name as a built-in function. Names bound in local or global scope have priority over names bound in built-in scope, so local and global names hide built-in ones. You can also rebind names in built-in scope, as covered in Chapter 7. You should avoid hiding built-ins that your code might need.__import____import__(module_name[,globals[,locals[,fromlist]]])Loads the module named by string module_name and returns the resulting module object. globals, which defaults to the result ofglobals( )
, and locals, which defaults to the result oflocals( )
(both covered in this section), are dictionaries that__import__
treats as read-only and uses only to get context for package-relative imports, covered in Section 7.3. fromlist defaults to an empty list, but can be a list of strings that name the module attributes to be imported in afrom
statement. See Section 7.2 for more details on module loading.In practice, when you call__import__
, you generally pass only the first argument, except in the rare and dubious case in which you use__import__
for a package-relative import. When you replace the built-in_ _import__
function with your own in order to provide special import functionality, you may have to take globals, locals, and fromlist into account.absabs(x)Returns the absolute value of numberAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The sys Module
- Content preview·Buy reprint rights for this chapterThe attributes of the
sys
module are bound to data and functions that provide information on the state of the Python interpreter or that affect the interpreter directly. This section documents the most frequently used attributes ofsys
, in alphabetical order.argvThe list of command-line arguments passed to the main script.argv[0]
is the name or full path of the main script, or '-c
' if the-c
option was used. See Section 8.4 later in this chapter for a good way to usesys.argv
.displayhookdisplayhook(value)In interactive sessions, the Python interpreter callsdisplayhook
, passing it the result of each expression-statement entered. The defaultdisplayhook
does nothing if value isNone
, otherwise it preserves and displays value:if value is not None: __builtin__._ = value print repr(value)
You can rebindsys.displayhook
in order to change interactive behavior. The original value is available assys.__displayhook_ _
.excepthookexcepthook(type,value,traceback)When an exception is not caught by any handler, Python callsexcepthook
, passing it the exception class, exception object, and traceback object, as covered in Chapter 6. The defaultexcepthook
displays the error and traceback. You can rebindsys.excepthook
to change what is displayed for uncaught exceptions (just before Python returns to the interactive loop or terminates). The original value is also available asAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The getopt Module
- Content preview·Buy reprint rights for this chapterThe
getopt
module helps parse the command-line options and arguments passed to a Python program, available insys.argv
. Thegetopt
module distinguishes arguments proper from options: options start with '-
' (or '--
' for long-form options). The first non-option argument terminates option parsing (similar to most Unix commands, and differently from GNU and Windows commands). Modulegetopt
supplies a single function, also calledgetopt
.getoptgetopt(args,options,long_options=[ ])Parses command-line options. args is usuallysys.argv[1:]
. options is a string: each character is an option letter, followed by ':
' if the option takes a parameter. long_options is a list of strings, each a long-option name, without the leading '--
', followed by '=
' if the option takes a parameter.Whengetopt
encounters an error, it raisesGetoptError
, an exception class supplied by thegetopt
module. Otherwise,getopt
returns a pair(
opts,args_proper)
, where opts is a list of pairs of the form(
option,parameter)
in the same order in which options are found in args. Each option is a string that starts with a single hyphen for a short-form option or two hyphens for a long-form one; each parameter is also a string (an empty string for options that don't take parameters). args_proper is the list of program argument strings that are left after removing the options.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The copy Module
- Content preview·Buy reprint rights for this chapterAs discussed in Chapter 4, assignment in Python does not copy the right-hand side object being assigned. Rather, assignment adds a reference to the right-hand side object. When you want a copy of object x, you can ask x for a copy of itself. If x is a list, x
[:]
is a copy of x. If x is a dictionary, x.copy( )
returns a copy of x.Thecopy
module supplies acopy
function that creates and returns a copy of most types of objects. Normal copies, such as x[:]
for a list x andcopy.copy(
x)
, are also known as shallow copies. When x has references to other objects (e.g., items or attributes), a normal copy of x has distinct references to the same objects. Sometimes, however, you need a deep copy, where referenced objects are copied recursively. Modulecopy
supplies adeepcopy(
x)
function that performs a deep copy and returns it as the function's result.copycopy(x)Creates and returns a copy of x for x of most types (copies of modules, classes, frames, arrays, and internal types are not supported). If x is immutable,copy.copy(
x)
may return x itself as an optimization. A class can customize the waycopy.copy
copies its instances by having a special method__copy_ _(self)
that returns a new object, a copy ofself
.deepcopydeepcopy(x,[memo])Makes a deep copy of x and returns it. Deep copying implies a recursive walk over a directed graph of references. A precaution is needed to preserve the graph's shape: when references to the same object are met more than once during the walk, distinct copies must not be made. Rather, references to the same copied object must be used. Consider the following simple example:Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The bisect Module
- Content preview·Buy reprint rights for this chapterThe
bisect
module uses a bisection algorithm to keep a list in sorted order as items are inserted.bisect
's operation is faster than calling a list'ssort
method after each insertion. This section documents the main functions supplied bybisect
.bisectbisect(seq,item,lo=0,hi=sys.maxint)Returns the index i into seq where item should be inserted to keep seq sorted. In other words, i is such that each item in seq[
:i]
is less than or equal to item, and each item in seq[
i:]
is greater than or equal to item. seq must be a sorted sequence. For any sorted sequence seq, seq[bisect(
seq,y)-1]= =
y is equivalent to yin
seq, but faster iflen(
seq)
is large. You may pass optional arguments lo and hi to operate on the slice seq[
lo:hi]
.insortinsort(seq,item,lo=0,hi=sys.maxint)Like seq.insert(bisect(
seq,item)
,item)
. In other words, seq must be a sorted mutable sequence, andinsort
modifies seq by inserting item at the right spot, so that seq remains sorted. You may pass optional arguments lo and hi to operate on the slice seq[
lo:hi]
.Modulebisect
also supplies functionsbisect_left
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The UserList, UserDict, and UserString Modules
- Content preview·Buy reprint rights for this chapterThe
UserList
,UserDict
, andUserString
modules each supply one class, with the same name as the respective module, that implements all the methods needed for the class's instances to be mutable sequences, mappings, and strings, respectively. When you need such polymorphism, you can subclass one of these classes and override some methods rather than have to implement everything yourself. In Python 2.2 and later, you can subclass built-in typeslist
,dict
, andstr
directly, to similar effect (see Section 5.2). However, these modules can still be handy if you need to create a classic class in order to keep your code compatible with Python 2.1 or earlier.Each instance of one of these classes has an attribute calleddata
that is a Python object of the corresponding built-in type (list
,dict
, andstr
, respectively). You can instantiate each class with an argument of the appropriate type (the argument is copied, so you can later modify it without side effects).UserList
andUserDict
can also be instantiated without arguments to create initially empty containers.ModuleUserString
also supplies classMutableString
, which is very similar to classUserString
except that instances ofMutableString
are mutable. Instances ofMutableString
and its subclasses cannot be keys into a dictionary. Instances of bothUserString
andMutableString
can be Unicode strings rather than plain strings: just use a Unicode string as the initializer argument at instantiation time.If you subclassUserList
,UserDict
,UserString
, orMutableString
and then override__init_ _
, make sure the__init__
method you write can also be called with one argument of the appropriate type (as well as without arguments forUserList
andUserDict
). Also be sure that your_ _init__
method explicitly and appropriately calls theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 9: Strings and Regular Expressions
- Content preview·Buy reprint rights for this chapterPython supports plain and Unicode strings extensively, with statements, operators, built-in functions, methods, and dedicated modules. This chapter covers the methods of string objects, talks about string formatting, documents the
string
,pprint
, andrepr
modules, and discusses issues related to Unicode strings.Regular expressions let you specify pattern strings and allow searches and substitutions. Regular expressions are not easy to master, but they are a powerful tool for processing text. Python offers rich regular expression functionality through the built-inre
module, as documented in this chapter.Plain and Unicode strings are immutable sequences, as covered in Chapter 4. All immutable-sequence operations (repetition, concatenation, indexing, slicing) apply to strings. A string object s also supplies several non-mutating methods, as documented in this section. Unless otherwise noted, each method returns a plain string when s is a plain string, or a Unicode string when s is a Unicode string. Terms such as letters, whitespace, and so on refer to the corresponding attributes of thestring
module, covered later in this chapter. See also the later section Section 9.2.1.capitalizes.capitalize( )Returns a copy of s where the first character, if a letter, is uppercase, and all other letters, if any, are lowercase.centers.center(n)Returns a string of lengthAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Methods of String Objects
- Content preview·Buy reprint rights for this chapterPlain and Unicode strings are immutable sequences, as covered in Chapter 4. All immutable-sequence operations (repetition, concatenation, indexing, slicing) apply to strings. A string object s also supplies several non-mutating methods, as documented in this section. Unless otherwise noted, each method returns a plain string when s is a plain string, or a Unicode string when s is a Unicode string. Terms such as letters, whitespace, and so on refer to the corresponding attributes of the
string
module, covered later in this chapter. See also the later section Section 9.2.1.capitalizes.capitalize( )Returns a copy of s where the first character, if a letter, is uppercase, and all other letters, if any, are lowercase.centers.center(n)Returns a string of lengthmax(len(
s)
,n)
, with a copy of s in the central part, surrounded by equal numbers of spaces on both sides (e.g., 'ciao'.center(2)
is 'ciao
', 'ciao'.center(7)
is 'ciao
').counts.count(sub,start=0,end=sys.maxint)Returns the number of occurrences of substring sub in s[
start:endAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The string Module
- Content preview·Buy reprint rights for this chapterThe
string
module supplies functions that duplicate each method of string objects, as covered in the previous section. Each function takes the string object as its first argument. Modulestring
also has several useful string-valued attributes:-
ascii_letters
-
The string
ascii_lowercase+ascii_uppercase
-
ascii_lowercase
-
The string '
abcdefghijklmnopqrstuvwxyz
' -
ascii_uppercase
-
The string '
ABCDEFGHIJKLMNOPQRSTUVWXYZ
' -
digits
-
The string '
0123456789
' -
hexdigits
-
The string '
0123456789abcdefABCDEF
' -
letters
-
The string
lowercase+uppercase
-
lowercase
-
A string containing all characters that are deemed lowercase letters: at least '
abcdefghijklmnopqrstuvwxyz
', but more letters (e.g., accented ones) may be present, depending on the active locale
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- String Formatting
- Content preview·Buy reprint rights for this chapterIn Python, a string-formatting expression has the syntax:
format % values
where format is a plain or Unicode string containing format specifiers and values is any single object or a collection of objects in a tuple or dictionary. Python's string-formatting operator has roughly the same set of features as the C language'sprintf
and operates in a similar way. Each format specifier is a substring of format that starts with a percent sign (%
) and ends with one of the conversion characters shown in Table 9-1.Table 9-1: String-formatting conversion characters CharacterOutput formatNotesd
,i
Signed decimal integerValue must be numberu
Unsigned decimal integerValue must be numbero
Unsigned octal integerValue must be numberx
Unsigned hexadecimal integer (lowercase letters)Value must be numberX
Unsigned hexadecimal integer (uppercase letters)Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The pprint Module
- Content preview·Buy reprint rights for this chapterThe
pprint
module pretty-prints complicated data structures, with formatting that may be more readable than that supplied by built-in functionrepr
(see Chapter 8). To fine-tune the formatting, you can instantiate thePrettyPrinter
class supplied by modulepprint
and apply detailed control, helped by auxiliary functions also supplied by modulepprint
. Most of the time, however, one of the two main functions exposed by modulepprint
suffices.pformatpformat(obj)Returns a string representing the pretty-printing of obj.pprintpprint(obj,stream=sys.stdout)Outputs the pretty-printing of obj to file object stream, with a terminating newline.The following statements are the same:print pprint.pformat(x) pprint.pprint(x)
Either of these constructs will be roughly the same asprint
x in many cases, such as when the string representation of x fits within one line. However, with something like x=range(30)
,print
x
displaysx
in two lines, breaking at an arbitrary point, while using modulepprint
displaysx
over 30 lines, one line per item. You can use modulepprint
when you prefer the module's specific display effects to the ones of normal string representation.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The repr Module
- Content preview·Buy reprint rights for this chapterThe
repr
module supplies an alternative to the built-in functionrepr
(see Chapter 8), with limits on length for the representation string. To fine-tune the length limits, you can instantiate or subclass theRepr
class supplied by modulerepr
and apply detailed control. Most of the time, however, the main function exposed by modulerepr
suffices.reprrepr(obj)Returns a string representing obj, with sensible limits on length.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Unicode
- Content preview·Buy reprint rights for this chapterPlain strings are converted into Unicode strings either explicitly, with the
unicode
built-in, or implicitly, when you pass a plain string to a function that expects Unicode. In either case, the conversion is done by an auxiliary object known as a codec (for coder-decoder). A codec can also convert Unicode strings to plain strings either explicitly, with theencode
method of Unicode strings, or implicitly.You identify a codec by passing the codec name tounicode
orencode
. When you pass no codec name and for implicit conversion, Python uses a default encoding, normally 'ascii
'. (You can change the default encoding in the startup phase of a Python program, as covered in Chapter 13; see also setdefaultencoding in Chapter 8.) Every conversion has an explicit or implicit argument errors, a string specifying how conversion errors are to be handled. The default is 'strict
', meaning any error raises an exception. When errors is 'replace
', the conversion replaces each character causing an error with '?
' in a plain-string result or withu'\ufffd
' in a Unicode result. When errors is 'ignore
', the conversion silently skips characters that cause errors.The mapping of codec names to codec objects is handled by thecodecs
module. This module lets you develop your own codec objects and register them so that they can be looked up by name, just like built-in codecs. Modulecodecs
also lets you look up any codec explicitly, obtaining the functions the codec uses for encoding and decoding, as well as factory functions to wrap file-like objects. Such advanced facilities of modulecodecs
are rarely used, and are not covered further in this book.Thecodecs
module, together with theencodings
package, supplies built-in codecs useful to Python developers dealing with internationalization issues. Any supplied codec can be installed as the default by moduleAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Regular Expressions and the re Module
- Content preview·Buy reprint rights for this chapterA regular expression is a string that represents a pattern. With regular expression functionality, you can compare that pattern to another string and see if any part of the string matches the pattern.The
re
module supplies all of Python's regular expression functionality. Thecompile
function builds a regular expression object from a pattern string and optional flags. The methods of a regular expression object look for matches of the regular expression in a string and/or perform substitutions. Modulere
also exposes functions equivalent to a regular expression's methods, but with the regular expression's pattern string as their first argument.Regular expressions can be difficult to master, and this book does not purport to teach them—I cover only the ways in which you can use them in Python. For general coverage of regular expressions, I recommend the book Mastering Regular Expressions, by Jeffrey Friedl (O'Reilly). Friedl's book offers thorough coverage of regular expressions at both the tutorial and advanced levels.The pattern string representing a regular expression follows a specific syntax:-
Alphabetic and numeric characters stand for themselves. A regular expression whose pattern is a string of letters and digits matches the same string.
-
Many alphanumeric characters acquire special meaning in a pattern when they are preceded by a backslash (
\
). -
Punctuation works the other way around. A punctuation character is self-matching when escaped, and has a special meaning when unescaped.
-
The backslash character itself is matched by a repeated backslash (i.e., the pattern
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Chapter 10: File and Text Operations
- Content preview·Buy reprint rights for this chapterThis chapter covers dealing with files and the filesystem in Python. A file is a stream of bytes that a program can read and/or write, while a filesystem is a hierarchical repository of files on a particular computer system. Because files are such a core programming concept, several other chapters also contain material about handling files of specific kinds.In Python, the
os
module supplies many of the functions that operate on the filesystem, so this chapter starts by introducing theos
module. The chapter then proceeds to cover operations on the filesystem, including comparing, copying, and deleting directories and files, working with file paths, and accessing low-level file descriptors.Next, this chapter discusses the typical ways Python programs read and write data, via built-in file objects and the polymorphic concept of file-like objects (i.e., objects that are not files, but still behave to some extent like files). Python file objects directly support the concept of text files, which are streams of characters encoded as bytes. The chapter also covers Python's support for data in compressed form, such as archives in the popular ZIP format.While many modern programs rely on a graphical user interface (GUI), text-based, non-graphical user interfaces are often still useful, as they are simple, fast to program, and lightweight. This chapter concludes with material about text input and output in Python, including information about presenting text that is understandable to different users, no matter where they are or what language they speak. This is known as internationalization (often abbreviated i18n).Theos
module is an umbrella module that presents a reasonably uniform cross-platform view of the different capabilities of various operating systems. The module provides functionality for creating files, manipulating files and directories, and creating, managing, and destroying processes. This chapter covers the filesystem-related capabilities of theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The os Module
- Content preview·Buy reprint rights for this chapterThe
os
module is an umbrella module that presents a reasonably uniform cross-platform view of the different capabilities of various operating systems. The module provides functionality for creating files, manipulating files and directories, and creating, managing, and destroying processes. This chapter covers the filesystem-related capabilities of theos
module, while Chapter 14 covers the process-related capabilities.Theos
module supplies aname
attribute, which is a string that identifies the kind of platform on which Python is being run. Possible values forname
are 'posix
' (all kinds of Unix-like platforms), 'nt
' (all kinds of 32-bit Windows platforms), 'mac
', 'os2
', and 'java
'. You can often exploit unique capabilities of a platform, at least in part, through functions supplied byos
. This book deals with cross-platform programming, however, not with platform-specific functionality, so I do not cover parts ofos
that exist only on one kind of platform, nor do I cover platform-specific modules. All functionality covered in this book is available at least on both 'posix
' and 'nt
' platforms. However, I do cover any differences among the ways in which each given piece of functionality is provided on different platforms.When a request to the operating system fails,os
raises an exception, an instance ofOSError
.os
also exposes classOSError
with the nameos.error
. Instances ofOSError
expose three useful attributes:-
errno
-
The numeric error code of the operating system error
-
strerror
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Filesystem Operations
- Content preview·Buy reprint rights for this chapterUsing the
os
module, you can manipulate the filesystem in a variety of ways: creating, copying, and deleting files and directories, comparing files, and examining filesystem information about files and directories. This section documents the attributes and methods of theos
module that you use for these purposes, and also covers some related modules that operate on the filesystem.A file or directory is identified by a string, known as its path, whose syntax depends on the platform. On both Unix-like and Windows platforms, Python accepts Unix syntax for paths, with slash (/
) as the directory separator. On non-Unix-like platforms, Python also accepts platform-specific path syntax. On Windows, for example, you can use backslash (\
) as the separator. However, you do need to double up each backslash to\\
in normal string literals or use raw-string syntax as covered in Chapter 4. In the rest of this chapter, for brevity, Unix syntax is assumed in both explanations and examples.Moduleos
supplies attributes that provide details about path strings on the current platform. You should typically use the higher-level path manipulation operations covered in Section 10.2.4 later in this chapter, rather than lower-level string operations based on these attributes. However, the attributes may still be useful at times:-
curdir
-
The string that denotes the current directory ('
.
' on Unix and Windows) -
defpath
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- File Objects
- Content preview·Buy reprint rights for this chapterAs discussed earlier in this chapter,
file
is a built-in type in Python. With a file object, you can read and/or write data to a file as seen by the underlying operating system. Python reacts to any I/O error related to a file object by raising an instance of built-in exception classIOError
. Errors that cause this exception includeopen
failing to open or create a file, calling a method on a file object to which that method doesn't apply (e.g., callingwrite
on a read-only file object or callingseek
on a non-seekable file), and I/O errors diagnosed by a file object's methods. This section documents file objects, as well as some auxiliary modules that help you access and deal with their contents.You normally create a Python file object with the built-inopen
, which has the following syntax:open(filename,mode='r',bufsize=-1)
open
opens the file named by filename, which must be a string that denotes any path to a file.open
returns a Python file object, which is an instance of the built-in typefile
. Callingfile
is just like callingopen
, butfile
was first introduced in Python 2.2. If you explicitly pass a mode string,open
can also create filename if the file does not already exist (depending on the value of mode, as we'll discuss in a moment). In other words, despite its name,open
is not limited to opening existing files, but is also able to create new ones if needed.Section 10.3.1.1: File mode
mode is a string that denotes how the file is to be opened (or created).mode
can have the following values:- '
r
'
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - '
- Auxiliary Modules for File I/O
- Content preview·Buy reprint rights for this chapterFile objects supply all functionality that is strictly needed for file I/O. There are some auxiliary Python library modules, however, that offer convenient supplementary functionality, making I/O even easier and handier in several important special cases.The
fileinput
module lets you loop over all the lines in a list of text files. Performance is quite good, comparable to the performance of direct iteration on each file, sincefileinput
uses internal buffering to minimize I/O. Therefore, you can use modulefileinput
for line-oriented file input whenever you find the module's rich functionality convenient, without worrying about performance. Theinput
function is the main function of modulefileinput
, and the module also provides aFileInput
class that supports the same functionality as the module's functions.closeclose( )Closes the whole sequence, so that iteration stops and no file remains open.FileInputclass FileInput(files=None,inplace=0,backup='',bufsize=0)Creates and returns an instance f of classFileInput
. Arguments are the same as forfileinput.input
, and methods of f have the same names, arguments, and semantics as functions of modulefileinput
. f also supplies a methodreadline
, which reads and returns the next line. You can use classFileInput
explicitly, rather than the single implicit instance used by the functions of modulefileinput
, when you want to nest or otherwise mix loops that read lines from more than one sequence of files.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The StringIO and cStringIO Modules
- Content preview·Buy reprint rights for this chapterYou can implement file-like objects by writing Python classes that supply the methods you need. If all you want is for data to reside in memory rather than on a file as seen by the operating system, you can use the
StringIO
orcStringIO
module. The two modules are almost identical: each supplies a factory function to create in-memory file-like objects. The difference between them is that objects created by moduleStringIO
are instances of classStringIO.StringIO
. You may inherit from this class to create your own customized file-like objects, overriding the methods that you need to specialize. Objects created by modulecStringIO
, on the other hand, are instances of a special-purpose type, not of a class. Performance is much better when you can usecStringIO
, but inheritance is not feasible. Furthermore,cStringIO
does not support Unicode.Each module supplies a factory function namedStringIO
that creates a file-like object fl.StringIOStringIO(str='')Creates and returns an in-memory file-like object fl, with all methods and attributes of a built-in file object. The data contents of fl are initialized to be a copy of argument str, which must be a plain string for theStringIO
factory function incStringIO
, while it can be a plain or Unicode string for the function inStringIO
.Besides all methods and attributes of built-in file objects, as covered in Section 10.3.2 earlier in this chapter, fl supplies one supplementary method,getvalue
.getvaluefl. getvalue( )Returns the current data contents ofAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Compressed Files
- Content preview·Buy reprint rights for this chapterAlthough storage space and transmission bandwidth are increasingly cheap and abundant, in many cases you can save such resources, at the expense of some computational effort, by using compression. Since computational power grows cheaper and more abundant even faster than other resources, such as bandwidth, compression's popularity keeps growing. Python makes it easy for your programs to support compression by supplying dedicated modules for compression as part of every Python distribution.The
gzip
module lets you read and write files compatible with those handled by the powerful GNU compression programs gzip and gunzip. The GNU programs support several compression formats, but modulegzip
supports only the highly effective native gzip format, normally denoted by appending the extension .gz to a filename. Modulegzip
supplies theGzipFile
class and anopen
factory function.GzipFileclass GzipFile(filename=None,mode=None,compresslevel=9, fileobj=None)Creates and returns a file-like object f that wraps the file or file-like object fileobj. f supplies all methods of built-in file objects exceptseek
andtell
. Thus, f is not seekable: you can only access f sequentially, whether for reading or writing. When fileobj isNone
, filename must be a string that names a file:GzipFile
opens that file with the given mode (by default, 'rb
'), and f wraps the resulting file object. mode should be one of 'ab
', 'rb
', 'wb
', orNone
. If mode isNone
, f uses the mode of fileobj if it is able to find out the mode; otherwise it uses 'rb
'. If filename isNone
, f uses the filename of fileobj if able to find out the name; otherwise it uses ''. compresslevelAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Text Input and Output
- Content preview·Buy reprint rights for this chapterPython presents non-GUI text input and output channels to your programs as file objects, so you can use the methods of file objects (covered in Section 10.3 earlier in this chapter) to manipulate these channels.The
sys
module, covered in Chapter 8, has attributesstdout
andstderr
, file objects to which you can write. Unless you are using some sort of shell redirection, these streams connect to the terminal in which your script is running. Nowadays, actual terminals are rare: the terminal is generally a screen window that supports text input/output (e.g., an MS-DOS Prompt console on Windows or an xterm window on Unix).The distinction betweensys.stdout
andsys.stderr
is a matter of convention.sys.stdout
, known as your script's standard output, is where your program emits results.sys.stderr
, known as your script's standard error, is where error messages go. Separating program results from error messages helps you use shell redirection effectively. Python respects this convention, usingsys.stderr
for error and warning messages.Programs that output results to standard output often need to write tosys.stdout
. Python'sprint
statement can be a convenient alternative tosys.stdout.write
. Theprint
statement has the following syntax:print [>>fileobject,] expressions [,]
The normal destination ofprint
's output is the file or file-like object that is the value of thestdout
attribute of thesys
module. However, when>>
fileobject, is present right after keywordprint
, the statement uses the given fileobject instead ofsys.stdout
. expressions is a list of zero or more expressions separated by commas (,).print
outputs each expression, in order, as a string (using the built-instr
, covered in Chapter 8), with a space to separate strings. After all expressions,Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Richer-Text I/O
- Content preview·Buy reprint rights for this chapterThe tools we have covered so far support the minimal subset of text I/O functionality that all platforms supply. Most platforms also offer richer-text I/O capabilities, such as responding to single keypresses (not just to entire lines of text) and showing text in any spot of the terminal (not just sequentially).Python extensions and core Python modules let you access platform-specific functionality. Unfortunately, various platforms expose this functionality in different ways. To develop cross-platform Python programs with rich-text I/O functionality, you may need to wrap different modules uniformly, importing platform-specific modules conditionally (usually with the
try
/except
idiom covered in Chapter 6).Thereadline
module wraps the GNU Readline Library. Readline lets the user edit text lines during interactive input, and also recall previous lines for further editing and re-entry. GNU Readline is widely installed on Unix-like platforms, and is available athttps://cnswww.cns.cwru.edu/~chet/readline/rltop.html
. A Windows port (https://starship.python.net/crew/kernr/
) is available, but not widely deployed. Chris Gonnerman's module, Alternative Readline for Windows, implements a subset of Python's standardreadline
module (using a small dedicated .pyd file instead of GNU Readline) and can be freely downloaded fromhttps://newcenturycomputers.net/projects/readline.html
.When eitherreadline
module is loaded, Python uses Readline for all line-oriented input, such asraw_input
. The interactive Python interpreter always tries loadingreadline
to enable line editing and recall for interactive sessions. You can call functions supplied by modulereadline
to control advanced functionality, particularly the history functionality for recalling lines entered in previous sessions, and the completion functionality for context-sensitive completion of the word being entered. SeeAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Interactive Command Sessions
- Content preview·Buy reprint rights for this chapterThe
cmd
module offers a simple way to handle interactive sessions of commands. Each command is a line of text. The first word of each command is a verb defining the requested action. The rest of the line is passed as an argument to the method that implements the action that the verb requests.Modulecmd
supplies classCmd
to use as a base class, and you define your own subclass ofcmd.Cmd
. The subclass supplies methods with names starting withdo_
andhelp_
, and may also optionally override some ofCmd
's methods. When the user enters a command line such as verb and the rest, as long as the subclass defines a method nameddo_
verb,Cmd.onecmd
calls:self.do_verb('and the rest')
Similarly, as long as the subclass defines a method namedhelp_
verb,Cmd.do_help
calls it when the command line starts with either 'help
verb' or '?
verb'.Cmd
, by default, also shows suitable error messages if the user tries to use, or asks for help about, a verb for which the subclass does not define a needed method.An instance c of a subclass of classCmd
supplies the following methods (many of these methods are meant to be overridden by the subclass).cmdloopc.cmdloop(intro=None)Performs an entire interactive session of line-oriented commands.cmdloop
starts by calling c.preloop( )
, then outputs string intro (c.intro
, if intro isNone
). Then c.cmdloop
enters a loop. In each iteration of the loop,cmdloop
reads line s with s=raw_input(
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Internationalization
- Content preview·Buy reprint rights for this chapterMost programs present some information to users as text. Such text should be understandable and acceptable to the user. For example, in some countries and cultures, the date "March 7" can be concisely expressed as "3/7". Elsewhere, "3/7" indicates "July 3", and the string that means "March 7" is "7/3". In Python, such cultural conventions are handled with the help of standard module
locale
.Similarly, a greeting can be expressed in one natural language by the string "Benvenuti", while in another language the string to use is "Welcome". In Python, such translations are handled with the help of standard modulegettext
.Both kinds of issues are commonly called internationalization (often abbreviated i18n, as there are 18 letters between i and n in the full spelling). This is actually a misnomer, as the issues also apply to programs used within one nation by users of different languages or cultures.Python's support for cultural conventions is patterned on that of C, slightly simplified. In this architecture, a program operates in an environment of cultural conventions known as a locale. The locale setting permeates the program and is typically set early on in the program's operation. The locale is not thread-specific, and modulelocale
is not thread-safe. In a multithreaded program, set the program's locale before starting secondary threads.If a program does not calllocale.setlocale
, the program operates in a neutral locale known as the C locale. The C locale is named from this architecture's origins in the C language, and is similar, but not identical, to the U.S. English locale. Alternatively, a program can find out and accept the user's default locale. In this case, modulelocale
interacts with the operating system (via the environment, or in other system-dependent ways) to establish the user's preferred locale. Finally, a program can set a specific locale, presumably determining which locale to set on the basis of user interaction, or via persistent configuration settings such as a program initialization file.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 11: Persistence and Databases
- Content preview·Buy reprint rights for this chapterPython supports a variety of ways of making data persistent. One such way, known as serialization, involves viewing the data as a collection of Python objects. These objects can be saved, or serialized, to a byte stream, and later loaded and recreated, or deserialized, back from the byte stream. Object persistence layers on top of serialization and adds such features as object naming. This chapter covers the built-in Python modules that support serialization and object persistence.Another way to make data persistent is to store it in a database. One simple type of database is actually just a file format that uses keyed access to enable selective reading and updating of relevant parts of the data. Python supplies modules that support several variations of this file format, known as DBM, and these modules are covered in this chapter.A relational database management system (RDBMS), such as MySQL or Oracle, provides a more powerful approach to storing, searching, and retrieving persistent data. Relational databases rely on dialects of Structured Query Language (SQL) to create and alter a database's schema, insert and update data in the database, and query the database according to search criteria. This chapter does not provide any reference material on SQL. For that purpose, I recommend SQL in a Nutshell, by Kevin Kline (O'Reilly). Unfortunately, despite the existence of SQL standards, no two RDBMSes implement exactly the same SQL dialect.The Python standard library does not come with an RDBMS interface. However, many free third-party modules let your Python programs access a specific RDBMS. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI. This chapter covers the DBAPI standard and mentions some of the third-party modules that implement it.Python supplies a number of modules that deal with I/O operations that serialize (save) entire Python objects to various kinds of byte streams, and deserialize (load and recreate) Python objects back from such streams. Serialization is also calledAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Serialization
- Content preview·Buy reprint rights for this chapterPython supplies a number of modules that deal with I/O operations that serialize (save) entire Python objects to various kinds of byte streams, and deserialize (load and recreate) Python objects back from such streams. Serialization is also called marshaling.The
marshal
module supports the specific serialization tasks needed to save and reload compiled Python files (.pyc and .pyo).marshal
only handles instances of fundamental built-in data types:None
, numbers (plain and long integers, float, complex), strings (plain and Unicode), code objects, and built-in containers (tuples, lists, dictionaries) whose items are instances of elementary types.marshal
does not handle instances of user-defined types, nor classes and instances of classes.marshal
is faster than other serialization modules. Code objects are supported only bymarshal
, not by other serialization modules. Modulemarshal
supplies the following functions.dump, dumpsdump(value,fileobj) dumps(value)dumps
returns a string representing object value.dump
writes the same string to file object fileobj, which must be opened for writing in binary mode.dump(
v,f)
is just like f.write(dumps(
v))
. fileobj cannot be a file-like object: it must be an instance of typefile
.load, loadsload(fileobj) loads(str)Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - DBM Modules
- Content preview·Buy reprint rights for this chapterA DBM-like file is a file that contains a set of pairs of strings
(
key,data)
, with support for fetching or storing the data given a key, known as keyed access. DBM-like files were originally supported on early Unix systems, with functionality roughly equivalent to that of access methods popular on other mainframe and minicomputers of the time, such as ISAM, the Indexed-Sequential Access Method. Today, several different libraries, available for many platforms, let programs written in many different languages create, update, and read DBM-like files.Keyed access, while not as powerful as the data access functionality of relational databases, may often suffice for a program's needs. And if DBM-like files are sufficient, you may end up with a program that is smaller, faster, and more portable than one that uses an RDBMS.The classic dbm library, whose first version introduced DBM-like files many years ago, has limited functionality, but tends to be available on most Unix platforms. The GNU version, gdbm, is richer and also widespread. The BSD version, dbhash, offers superior functionality. Python supplies modules that interface with each of these libraries if the relevant underlying library is installed on your system. Python also offers a minimal DBM module,dumbdbm
(usable anywhere, as it does not rely on other installed libraries), and generic DBM modules, which are able to automatically identify, select, and wrap the appropriate DBM library to deal with an existing or new DBM file. Depending on your platform, your Python distribution, and what dbm-like libraries you have installed on your computer, the default Python build may install some subset of these modules. In general, at a minimum, you can rely on having moduledbm
on Unix-like platforms, moduledbhash
on Windows, anddumbdbm
on any platform.TheAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Berkeley DB Module
- Content preview·Buy reprint rights for this chapterPython comes with the
bsddb
module, which wraps the Berkeley Database library (also known as BSD DB) if that library is installed on your system and your Python installation is built to support it. With the BSD DB library, you can create hash, binary tree, or record-based files that generally behave like dictionaries. On Windows, Python includes a port of the BSD DB library, thus ensuring that modulebsddb
is always usable. To download BSD DB sources, binaries for other platforms, and detailed documentation on BSD DB, seehttps://www.sleepycat.com
. Modulebsddb
supplies three factory functions,btopen
,hashopen
, andrnopen
.btopen, hashopen, rnopenbtopen(filename,flag='r',*many_other_optional_arguments) hashopen(filename,flag='r',*many_other_optional_arguments) rnopen(filename,flag='r',*many_other_optional_arguments)btopen
opens or creates the binary tree format file named by filename (a string that denotes any path to a file, not just a name), and returns a suitableBTree
object to access and manipulate the file. Argument flag has exactly the same values and meaning as foranydbm.open
. Other arguments indicate low-level options that allow fine-grained control, but are rarely used.hashopen
andrnopen
work the same way, but open or create hash format and record format files, returning objects of typeHash
andRecord
.hashopen
is generally the fastest format and makes sense when you are using keys to look up records. However, if you also need to access records in sorted order, usebtopen
, or if you need to access records in the same order in which you originally wrote them, useAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Python Database API (DBAPI) 2.0
- Content preview·Buy reprint rights for this chapterAs I mentioned earlier, the Python standard library does not come with an RDBMS interface, but there are many free third-party modules that let your Python programs access specific databases. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI.At the time of this writing, Python's DBAPI Special Interest Group (SIG) was busy preparing a new version of the DBAPI (possibly to be known as 3.0 when it is ready). Programs written against DBAPI 2.0 should work with minimal or no changes with the future DBAPI 3.0, although 3.0 will no doubt offer further enhancements that future programs will be able to take advantage of.If your Python program runs only on Windows, you may prefer to access databases by using Microsoft's
ADO
package throughCOM
. For more information on using Python on Windows, see the book Python Programming on Win32, by Mark Hammond and Andy Robinson (O'Reilly). Since ADO and COM are platform-specific, and this book focuses on cross-platform use of Python, I do not cover ADO nor COM further in this book.After importing a DBAPI-compliant module, you call the module'sconnect
function with suitable parameters.connect
returns an instance of classConnection
, which represents a connection to the database. This instance suppliescommit
androllback
methods to let you deal with transactions, aclose
method to call as soon as you're done with the database, and acursor
method that returns an instance of classCursor
. This instance supplies the methods and attributes that you'll use for all database operations. A DBAPI-compliant module also supplies exception classes,descriptive
attributes, factory functions, and type-description attributes.A DBAPI-compliant module supplies exception classesWarning
,Error
, and several subclasses ofError
.Warning
indicates such anomalies as data truncation during insertion.Error
's subclasses indicate various kinds of errors that your program can encounter when dealing with the database and the DBAPI-compliant module that interfaces to it. Generally, your code uses a statement of the form:Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 12: Time Operations
- Content preview·Buy reprint rights for this chapterA Python program can handle time in several ways. Time intervals are represented by floating-point numbers, in units of seconds (a fraction of a second is the fractional part of the interval). Particular instants in time are expressed in seconds since a reference instant, known as the epoch. (Midnight, UTC, of January 1, 1970, is a popular epoch used on both Unix and Windows platforms.) Time instants often also need to be expressed as a mixture of units of measurement (e.g., years, months, days, hours, minutes, and seconds), particularly for I/O purposes.This chapter covers the
time
module, which supplies Python's core time-handling functionality. Thetime
module strongly depends on the system C library. The chapter also presents thesched
andcalendar
modules and the essentials of the popular extension modulemx.DateTime
.mx.DateTime
has more uniform behavior across platforms thantime
, which helps account for its popularity.Python 2.3 will introduce a newdatetime
module to manipulate dates and times in other ways. Athttps://starship.python.net/crew/jbauer/normaldate/
, you can download Jeff Bauer's normalDate.py, which gains simplicity by dealing only with dates, not with times. Neither of these modules is further covered in this book.The underlying C library determines the range of dates that thetime
module can handle. On Unix systems, years 1970 and 2038 are the typical cut-off points, a limitation thatmx.DateTime
lets you avoid. Time instants are normally specified in UTC (Coordinated Universal Time, once known as GMT, or Greenwich Mean Time). Moduletime
also supports local time zones and Daylight Saving Time (DST), but only to the extent that support is supplied by the underlying C system library.As an alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers known as a time-tuple. Items in time-tuples are covered in Table 12-1. All items are integers, and therefore time-tuples cannot keep track of fractions of a second. In Python 2.2 and later, the result of any function in moduleAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The time Module
- Content preview·Buy reprint rights for this chapterThe underlying C library determines the range of dates that the
time
module can handle. On Unix systems, years 1970 and 2038 are the typical cut-off points, a limitation thatmx.DateTime
lets you avoid. Time instants are normally specified in UTC (Coordinated Universal Time, once known as GMT, or Greenwich Mean Time). Moduletime
also supports local time zones and Daylight Saving Time (DST), but only to the extent that support is supplied by the underlying C system library.As an alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers known as a time-tuple. Items in time-tuples are covered in Table 12-1. All items are integers, and therefore time-tuples cannot keep track of fractions of a second. In Python 2.2 and later, the result of any function in moduletime
that used to return a time-tuple is now of typestruct_time
. You can still use the result as a tuple, but you can also access the items as read-only attributes x.tm_year
, x.tm_mon
, and so on, using the attribute names listed in Table 12-1. Wherever a function used to require a time-tuple argument, you can now pass an instance ofstruct_time
or any other sequence whose items are nine integers in the applicable ranges.Table 12-1: Tuple form of time representation ItemMeaningField nameRangeNotesAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The sched Module
- Content preview·Buy reprint rights for this chapterThe
sched
module supplies a class that implements an event scheduler.sched
supplies ascheduler
class.schedulerclass scheduler(timefunc,delayfunc)An instance s ofscheduler
is initialized with two functions, which s then uses for all time-related operations. timefunc must be callable without arguments to get the current time instant (in any unit of measure), meaning that you can passtime.time
. delayfunc must be callable with one argument (a time duration, in the same units timefunc returns), and it should delay for about that amount of time, meaning you can passtime.sleep
.scheduler
also calls delayfunc with argument0
after each event, to give other threads a chance; again, this is compatible with the behavior oftime.sleep
.Ascheduler
instance s supplies the following methods.cancels.cancel(event_token)Removes an event from s's queue of scheduled events. event_token must be the result of a previous call to s.enter
or s.enterabs
, and the event must not yet have happened; otherwisecancel
raisesRuntimeError
.emptys.empty( )ReturnsTrue
if s's queue of scheduled events is empty, otherwiseFalse
.enterabsAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The calendar Module
- Content preview·Buy reprint rights for this chapterThe
calendar
module supplies calendar-related functions, including functions to print a text calendar for any given month or year. By default,calendar
considers Monday the first day of the week and Sunday the last one. You can change this setting by calling functioncalendar.setfirstweekday
.calendar
handles years in the range supported by moduletime
, typically 1970 to 2038. Modulecalendar
supplies the following functions.calendarcalendar(year,w=2,l=1,c=6)Returns a multiline string with a calendar for year year formatted into three columns separated by c spaces. w is the width in characters of each date; each line has length21*
w+18+2*
c. l is the number of lines used for each week.firstweekdayfirstweekday( )Returns the current setting for the weekday that starts each week. By default, whencalendar
is first imported, this is0
, meaning Monday.isleapisleap(year)ReturnsTrue
if year is a leap year, otherwiseFalse
.leapdaysleapdays(y1,y2)Returns the total number of leap days in the years inrange(
y1,y2)
.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The mx.DateTime Module
- Content preview·Buy reprint rights for this chapter
DateTime
is one of the modules in themx
package made available by eGenix GmbH.mx
is open source, and at the time of this writing,mx.DateTime
has liberal license conditions similar to those of Python itself.mx.DateTime
's popularity stems from its functional richness and cross-platform portability. I present only an essential subset ofmx.DateTime
's rich functionality here; the module comes with detailed documentation about its advanced time and date handling features.ModuleDateTime
supplies several date and time types whose instances are immutable (and therefore suitable as dictionary keys). TypeDateTime
represents a time instant and includes an absolute date, which is the number of days since an epoch of January 1, year 1 CE, according to the Gregorian calendar (0001-01-01 is day1
), and an absolute time, which is a floating-point number of seconds since midnight. TypeDateTimeDelta
represents an interval of elapsed time, which is a floating-point number of seconds. ClassRelativeDateTime
lets you specify dates in relative terms, such as "next Monday" or "first day of next month."DateTime
andDateTimeDelta
are covered in detail later in this section, butRelativeDateTime
is not.Date and time types supply customized string conversion, invoked via the built-instr
or automatically during implicit conversion (e.g., in aprint
statement). The resulting strings are in standard ISO 8601 formats, such as:YYYY-MM-DD HH:MM:SS.ss
For finer-grained control of string formatting, use methodstrftime
. FunctionDateTimeFrom
constructsDateTime
instances from strings. Submodules of modulemx.DateTime
supply other formatting and parsing functions, using different standards and conventions.ModuleDateTime
supplies factory functions to build instances of typeAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 13: Controlling Execution
- Content preview·Buy reprint rights for this chapterPython directly exposes many of the mechanisms it uses internally. This helps you understand Python at an advanced level, and means you can hook your own code into such documented Python mechanisms and control those mechanisms to some extent. For example, Chapter 7 covered the
import
statement and the way Python arranges for built-ins to be made implicitly visible. This chapter covers other advanced techniques that Python offers for controlling execution, while Chapter 17 covers execution-control possibilities that apply specifically to the three crucial phases of development: testing, debugging, and profiling.With Python'sexec
statement, it is possible to execute code that you read, generate, or otherwise obtain during the running of a program. Theexec
statement dynamically executes a statement or a suite of statements.exec
is a simple keyword statement with the following syntax:exec code[ in globals[,locals]]
code can be a string, an open file-like object, or a code object. globals and locals are dictionaries. If both are present, they are the global and local namespaces, respectively, in which code executes. If only globals is present,exec
uses globals in the role of both namespaces. If neither globals nor locals is present, code executes in the current scope. Runningexec
in current scope is not good programming practice, since it can bind, rebind, or unbind any name. To keep things under control, you should useexec
only with specific, explicit dictionaries.More generally, useexec
only when it's really indispensable. Most often, it is better avoided in favor of more specific mechanisms. For example, a frequently asked question is, "How do I set a variable whose name I just read or constructed?" Strictly speaking,exec
lets you do this. For example, if the name of the variable you want to set is in variable varname, you might use:Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Dynamic Execution and the exec Statement
- Content preview·Buy reprint rights for this chapterWith Python's
exec
statement, it is possible to execute code that you read, generate, or otherwise obtain during the running of a program. Theexec
statement dynamically executes a statement or a suite of statements.exec
is a simple keyword statement with the following syntax:exec code[ in globals[,locals]]
code can be a string, an open file-like object, or a code object. globals and locals are dictionaries. If both are present, they are the global and local namespaces, respectively, in which code executes. If only globals is present,exec
uses globals in the role of both namespaces. If neither globals nor locals is present, code executes in the current scope. Runningexec
in current scope is not good programming practice, since it can bind, rebind, or unbind any name. To keep things under control, you should useexec
only with specific, explicit dictionaries.More generally, useexec
only when it's really indispensable. Most often, it is better avoided in favor of more specific mechanisms. For example, a frequently asked question is, "How do I set a variable whose name I just read or constructed?" Strictly speaking,exec
lets you do this. For example, if the name of the variable you want to set is in variable varname, you might use:exec varname+'=23'
Don't do this. Anexec
statement like this in current scope causes you to lose control of your namespace, leading to bugs that are extremely hard to track and more generally making your program unfathomably difficult to understand. An improvement is to keep the "variables" you need to set, not as variables, but as entries in a dictionary, say mydict. You can then use the following variation:exec varname+'=23' in mydict
While this is not as terrible as the previous example, it is still a bad idea. The best approach is to keep such "variables" as dictionary entries and not useAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Restricted Execution
- Content preview·Buy reprint rights for this chapterPython code executed dynamically normally suffers no special restrictions. Python's general philosophy is to give the programmer tools and mechanisms that make it easy to write good, safe code, and trust the programmer to use them appropriately. Sometimes, however, trust might not be warranted. When code to execute dynamically comes from an untrusted source, the code itself is untrusted. In such cases it's important to selectively restrict the execution environment so that such code cannot accidentally or maliciously inflict damage. If you never need to execute untrusted code, you can skip this section. However, Python makes it easy to impose appropriate restrictions on untrusted code if you ever do need to execute it.When the
__builtins__
item in the global namespace isn't the standard__builtin__
module (or the latter's dictionary), Python knows the code being run is restricted. Restricted code executes in a sandbox environment, previously prepared by the trusted code, that requests the restricted code's execution. Standard modulesrexec
andBastion
help you prepare an appropriate sandbox. To ensure that restricted code cannot escape the sandbox, a few crucial internals (e.g., the__dict__
attributes of modules, classes, and instances) are not directly available to restricted code.There is no special protection against restricted code raising exceptions. On the contrary, Python diagnoses any attempt by restricted code to violate the sandbox restrictions by raising an exception. Therefore, you should generally run restricted code in thetry
clause of atry
/except
statement, as covered in Chapter 6. Make sure you catch all exceptions and handle them appropriately if your program needs to keep running in such cases.There is no built-in protection against untrusted code attempting to inflict damage by consuming large amounts of memory or time (so-called denial-of-service attacks). If you need to ward against such attacks, you can run untrusted code in a separate process. The separate process uses the mechanisms described in this section to restrict the untrusted code's execution, while the main process monitors the separate one and terminates it if and when resource consumption becomes excessive. Processes are covered in Chapter 14. Resource monitoring is currently supported by the standard Python library only on Unix-like platforms (by platform-specific moduleAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Internal Types
- Content preview·Buy reprint rights for this chapterSome of the internal Python objects that I mention in this section are hard to use. Using such objects correctly requires some study of Python's own C (or Java) sources. Such black magic is rarely needed, except to build general-purpose development frameworks and similar wizardly tasks. Once you do understand things in depth, Python empowers you to exert control, if and when you need to. Since Python exposes internal objects to your Python code, you can exert that control by coding in Python, even when a nodding acquaintance with C (or Java) is needed to understand what is going on.The built-in type named
type
acts as a factory object, returning objects that are types themselves (type
was a built-in function in Python 2.1 and earlier). Type objects don't need to support any special operations except equality comparison and representation as strings. Most type objects are callable, and return new instances of the type when called. In particular, built-in types such asint
,float
,list
,str
,tuple
, anddict
all work this way. The attributes of thetypes
module are the built-in types, each with one or more names. For example,types.DictType
andtypes.DictionaryType
both refer totype({ })
, also known since Python 2.2 as the built-in typedict
. Besides being callable to generate instances, type objects are useful in Python 2.2 and later because you can subclass them, as covered in Chapter 5.As well as by using built-in functioncompile
, you can also get a code object via thefunc_code
attribute of a function or method object. A code object'sco_varnames
attribute is the tuple of names of local variables, including the formal arguments; theco_argcount
attribute is the number of arguments. Code objects are not callable, but you can rebind theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Garbage Collection
- Content preview·Buy reprint rights for this chapterPython's garbage collection normally proceeds transparently and automatically, but you can choose to exert some direct control. The general principle is that Python collects each object x at some time after x becomes unreachable, that is, when no chain of references can reach x by starting from a local variable of a function that is executing, nor from a global variable of a loaded module. Normally, an object x becomes unreachable when there are no references at all to x. However, a group of objects can also be unreachable when they reference each other.Classic Python keeps in each object x a count, known as a reference count, of how many references to x are outstanding. When x's reference count drops to
0
, CPython immediately collects x. Functiongetrefcount
of modulesys
accepts any object and returns its reference count (at least1
, sincegetrefcount
itself has a reference to the object it's examining). Other versions of Python, such as Jython, rely on different garbage collection mechanisms, supplied by the platform they run on (e.g., the JVM). Modulesgc
andweakref
therefore apply only to CPython.When Python garbage-collects x and there are no references at all to x, Python then finalizes x (i.e., calls x.__del__( )
) and makes the memory that x occupied available for other uses. If x held any references to other objects, Python removes the references, which in turn may make other objects collectable by leaving them unreachable.Thegc
module exposes the functionality of Python's garbage collector.gc
deals only with objects that are unreachable in a subtle way, being part of mutual reference loops. In such a loop, each object in the loop refers to others, keeping the reference counts of all objects positive. However, an outside reference no longer exists to the whole set of mutually referencing objects. Therefore, the whole group, also known as cyclic garbage, is unreachable, and therefore garbage collectable. Looking for such cyclic garbage loops takes time, which is why moduleAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Termination Functions
- Content preview·Buy reprint rights for this chapterThe
atexit
module lets you register termination functions (i.e., functions to be called at program termination, last in, first out). Termination functions are similar to clean-up handlers established bytry
/finally
. However, termination functions are globally registered and called at the end of the whole program, while clean-up handlers are established lexically and called at the end of a specifictry
clause. Both termination functions and clean-up handlers are called whether the program terminates normally or abnormally, but not when the termination is caused by callingos._exit
. Moduleatexit
supplies a single function calledregister
.registerregister(func,*args,**kwds)Ensures thatfunc(*args,**kwds)
is called at program termination time.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Site and User Customization
- Content preview·Buy reprint rights for this chapterPython provides a specific hook to let each site customize some aspects of Python's behavior at the start of each run. Customization by each single user is not enabled by default, but Python specifies how programs that want to run user-provided code at startup can explicitly request such customization.Python loads standard module
site
just before the main script. If Python is run with option-S
, Python does not loadsite
.-S
allows faster startup, but saddles the main script with initialization chores.site
's tasks are:-
Putting
sys.path
in standard form (absolute paths, no duplicates). -
Interpreting each .pth file found in the Python home directory, adding entries to
sys.path
, and/or importing modules, as each .pth file indicates. -
Adding built-ins used to display information in interactive sessions (
quit
,exit
,copyright
,credits
, andlicense
). -
Setting the default Unicode encoding to '
ascii
'.site
's source code includes two blocks, each guarded byif
0
:, one to set the default encoding to be locale dependent, and the other to disable default encoding and decoding between Unicode and plain strings. You may optionally edit site.py to select either block. -
Trying to import
sitecustomize
(shouldimport
sitecustomize
raise anImportError
exception,site
catches and ignores it).sitecustomize
is the module that each site's installation can optionally use for further site-specific customization beyondsite
's tasks. It is generally best not to edit site.py, as any Python upgrade or reinstallation might overwrite your customizations.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Chapter 14: Threads and Processes
- Content preview·Buy reprint rights for this chapterA thread is a flow of control that shares global state with other threads; all threads appear to execute simultaneously. Threads are not easy to master, but once you do, they may offer a simpler architecture or better performance (faster response, but typically not better throughput) for some problems. This chapter covers the facilities that Python provides for dealing with threads, including the
thread
,threading
, andQueue
modules.A process is an instance of a running program. Sometimes you get better results with multiple processes than with threads. The operating system protects processes from one another. Processes that want to communicate must explicitly arrange to do so, via local inter-process communication (IPC). Processes may communicate via files (covered in Chapter 10) or via databases (covered in Chapter 11). In both cases, the general way in which processes communicate using such data storage mechanisms is that one process can write data, and another process can later read that data back. This chapter covers the process-related parts of moduleos
, including simple IPC by means of pipes, and a cross-platform IPC mechanism known as memory-mapped files, supplied to Python programs by modulemmap
.Network mechanisms are well suited for IPC, as they work between processes that run on different nodes of a network as well as those that run on the same node. Chapter 19 covers low-level network mechanisms that provide a flexible basis for IPC. Other, higher-level mechanisms, known as distributed computing, such as CORBA, DCOM/COM+, EJB, SOAP, XML-RPC, and .NET, make IPC easier, whether locally or remotely. However, distributed computing is not covered in this book.Python offers multithreading on platforms that support threads, such as Win32, Linux, and most variants of Unix. The Python interpreter does not freely switch threads. Python uses a global interpreter lock (GIL) to ensure that switching between threads happens only between bytecode instructions or when C code deliberately releases the GIL (Python's C code releases the GIL around blocking I/O and sleep operations). An action is said to beAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Threads in Python
- Content preview·Buy reprint rights for this chapterPython offers multithreading on platforms that support threads, such as Win32, Linux, and most variants of Unix. The Python interpreter does not freely switch threads. Python uses a global interpreter lock (GIL) to ensure that switching between threads happens only between bytecode instructions or when C code deliberately releases the GIL (Python's C code releases the GIL around blocking I/O and sleep operations). An action is said to be atomic if it's guaranteed that no thread switching within Python's process occurs between the start and the end of the action. In practice, an operation that looks atomic actually is atomic when executed on an object of a built-in type (augmented assignment on an immutable object, however, is not atomic). However, in general it is not a good idea to rely on atomicity. For example, you never know when you might be dealing with a derived class rather than an object of a built-in type, meaning there might be callbacks to Python code.Python offers multithreading in two different flavors. An older and lower-level module,
thread
, offers a bare minimum of functionality, and is not recommended for direct use by your code. The higher-level modulethreading
, built on top ofthread
, was loosely inspired by Java's threads, and is the recommended tool. The key design issue in multithreading systems is most often how best to coordinate multiple threads.threading
therefore supplies several synchronization objects. ModuleQueue
is very useful for thread synchronization as it supplies a synchronized FIFO queue type, which is extremely handy for communication and coordination between threads.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The thread Module
- Content preview·Buy reprint rights for this chapterThe only part of the
thread
module that your code should use directly is the lock objects that modulethread
supplies. Locks are simple thread-synchronization primitives. Technically,thread
's locks are non-reentrant and unowned: they do not keep track of what thread last locked them, so there is no specific owner thread for a lock. A lock is in one of two states, locked or unlocked.To get a new lock object (in the unlocked state), call the function namedallocate_lock
without arguments. This function is supplied by both modulesthread
andthreading
. A lock object L supplies three methods.acquireL.acquire(wait=True)When wait isTrue
,acquire
locks L. If L is already locked, the calling thread suspends and waits until L is unlocked, then locks L. Even if the calling thread was the one that last locked L, it still suspends and waits until another thread releases L. When wait isFalse
and L is unlocked,acquire
locks L and returnsTrue
. When wait isFalse
and L is locked,acquire
does not affect L, and returnsFalse
.lockedL.locked( )ReturnsTrue
if L is locked, otherwiseFalse
.releaseL.release( )Unlocks L, which must be locked. When L is locked, any thread may call L.release
, not just the thread that last locked L. When more than one thread is waiting onAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Queue Module
- Content preview·Buy reprint rights for this chapterThe
Queue
module supplies first-in, first-out (FIFO) queues that support multithread access, with one main class and two exception classes.Queueclass Queue(maxsize=0)Queue
is the main class for moduleQueue
and is covered in the next section. When maxsize is greater than0
, the newQueue
instance q is deemed full when q has maxsize items. A thread inserting an item with the block option, when q is full, suspends until another thread extracts an item. When maxsize is less than or equal to0
, q is never considered full, and is limited in size only by available memory, like normal Python containers.EmptyEmpty
is the class of the exception that q.get(False)
raises when q is empty.FullFull
is the class of the exception that q.put(
x,False)
raises when q is full.An instance q of classQueue
supplies the following methods.emptyq.empty( )ReturnsTrue
if q is empty, otherwiseFalse
.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The threading Module
- Content preview·Buy reprint rights for this chapterThe
threading
module is built on top of modulethread
and supplies multithreading functionality in a more usable form. The general approach ofthreading
is similar to that of Java, but locks and conditions are modeled as separate objects (in Java, such functionality is part of every object), and threads cannot be directly controlled from the outside (meaning there are no priorities, groups, destruction, or stopping). All methods of objects supplied bythreading
are atomic.threading
provides numerous classes for dealing with threads, includingThread
,Condition
,Event
,RLock
, andSemaphore
. Besides factory functions for the classes detailed in the following sections of this chapter,threading
supplies thecurrentThread
factory function.currentThreadcurrentThread( )Returns aThread
object for the calling thread. If the calling thread was not created by modulethreading
,currentThread
creates and returns a semi-dummyThread
object with limited functionality.AThread
object t models a thread. You can pass t's main function as an argument when you create t, or you can subclassThread
and override therun
method (you may also override_ _init__
, but should not override other methods). t is not ready to run when you create it: to make t ready (active), call t.start( )
. Once t is active, it terminates when its main function ends, either normally or by propagating an exception. AThread
t can be a daemon, meaning that Python can terminate even if t is still active, while a normal (non-daemon) thread keeps Python alive until the thread terminates. ClassAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Threaded Program Architecture
- Content preview·Buy reprint rights for this chapterA threaded program should always arrange for a single thread to deal with any given object or subsystem that is external to the program (such as a file, a database, a GUI, or a network connection). Having multiple threads that deal with the same external object can often cause unpredictable problems.Whenever your threaded program must deal with some external object, devote a thread to such dealings, using a
Queue
object from which the external-interfacing thread gets work requests that other threads post. The external-interfacing thread can return results by putting them on one or more otherQueue
objects. The following example shows how to package this architecture into a general, reusable class, assuming that each unit of work on the external subsystem can be represented by a callable object:import threading, Queue class ExternalInterfacing(Threading.Thread): def __init__(self, externalCallable, **kwds): Threading.Thread.__init__(self, **kwds) self.setDaemon(1) self.externalCallable = externalCallable self.workRequestQueue = Queue.Queue( ) self.resultQueue = Queue.Queue( ) self.start( ) def request(self, *args, **kwds): "called by other threads as externalCallable would be" self.workRequestQueue.put((args,kwds)) return self.resultQueue.get( ) def run(self): while 1: args, kwds = self.workRequestQueue.get( ) self.resultQueue.put(self.externalCallable(*args, **kwds))
Once someExternalInterfacing
object ei is instantiated, all other threads may now call ei.request
just like they would call someExternalCallable without such a mechanism (with or without arguments as appropriate). The advantage of theExternalInterfacing
mechanism is that all calls upon someExternalCallable are now serialized. This means they are performed by just one thread (the thread object bound to ei) in some defined sequential order, without overlap, race conditions (hard-to-debug errors that depend on which thread happens to get there first), or other anomalies that might otherwise result.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Process Environment
- Content preview·Buy reprint rights for this chapterThe operating system supplies each process P with an environment, which is a set of environment variables whose names are identifiers (most often, by convention, uppercase identifiers) and whose contents are strings. For example, in Chapter 3, we covered environment variables that affect Python's operations. Operating system shells offer various ways to examine and modify the environment, by such means as shell commands and others mentioned in Chapter 3.The environment of any process P is determined when P starts. After startup, only P itself can change P's environment. Nothing that P does affects the environment of P's parent process (the process that started P), nor those of child processes previously started from P and now running, nor of processes unrelated to P. Changes to P's environment affect only P itself: the environment is not a means of IPC. Child processes of P normally get a copy of P's environment as their starting environment: in this sense, changes to P's environment do affect child processes that P starts after such changes.Module
os
supplies attributeenviron
, a mapping that represents the current process's environment.os.environ
is initialized from the process environment when Python starts. Changes toos.environ
update the current process's environment if the platform supports such updates. Keys and values inos.environ
must be strings. On Windows, but not on Unix-like platforms, keys intoos.environ
are implicitly uppercased. For example, here's how to try to determine what shell or command processor you're running under:import os shell = os.environ.get('COMSPEC') if shell is None: shell = os.environ.get('SHELL') if shell is None: shell = 'an unknown command processor' print 'Running under', shell
If a Python program changes its own environment (e.g., viaos.environ['X']='Y
'), this does not affect the environment of the shell or command processor that started the program. Like in other cases, changes to a process's environment affect only the process itself, not others.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Running Other Programs
- Content preview·Buy reprint rights for this chapterThe
os
module offers several ways for your program to run other programs. The simplest way to run another program is through functionos.system
, although this offers no way to control the external program. Theos
module also provides a number of functions whose names start withexec
. These functions offer fine-grained control. A program run by one of theexec
functions, however, replaces the current program (i.e., the Python interpreter) in the same process. In practice, therefore, you use theexec
functions mostly on platforms that let a process duplicate itself byfork
(i.e., Unix-like platforms). Finally,os
functions whose names start withspawn
andpopen
offer intermediate simplicity and power: they are cross-platform and not quite as simple assystem
, but simple and usable enough for most purposes.Theexec
andspawn
functions run a specified executable file given the executable file's path, arguments to pass to it, and optionally an environment mapping. Thesystem
andpopen
functions execute a command, a string passed to a new instance of the platform's default shell (typically /bin/sh on Unix, command.com or cmd.exe on Windows). A command is a more general concept than an executable file, as it can include shell functionality (pipes, redirection, built-in shell commands) using the normal shell syntax specific to the current platform.execl, execle, execlp, execv, execve, execvp, execvpeexecl(path,*args) execle(path,*args) execlp(path,*args) execv(path,args) execve(path,args,env) execvp(path,args) execvpe(path,args,env)These functions run the executable file (program) indicated by stringAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The mmap Module
- Content preview·Buy reprint rights for this chapterThe
mmap
module supplies memory-mapped file objects. Anmmap
object behaves similarly to a plain (not Unicode) string, so you can often pass anmmap
object where a plain string is expected. However, there are differences:-
An
mmap
object does not supply the methods of a string object -
An
mmap
object is mutable, while string objects are immutable -
An
mmap
object also corresponds to an open file and behaves polymorphically to a Python file object (as covered in Chapter 10)
Anmmap
object m can be indexed or sliced, yielding plain strings. Since m is mutable, you can also assign to an indexing or slicing of m. However, when you assign to a slice of m, the right-hand side of the assignment statement must be a string of exactly the same length as the slice you're assigning to. Therefore, many of the useful tricks available with list slice assignment (covered in Chapter 4) do not apply tommap
slice assignment.Modulemmap
supplies a factory function that is different on Unix-like systems and Windows.mmapmmap(filedesc,length,tagname='') # Windows mmap(filedesc,length,flags=MAP_SHARED, prot=PROT_READ|PROT_WRITE) # UnixCreates and returns anmmap
object m that maps into memory the first length bytes of the file indicated by file descriptor filedesc. filedesc must normally be a file descriptor opened for both reading and writing (except, on Unix-like platforms, when argument prot requests only reading or only writing). File descriptors are covered in Section 10.2.8. To get anmmap
object m that refers to a Python file objectAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Chapter 15: Numeric Processing
- Content preview·Buy reprint rights for this chapterIn Python, you can perform numeric computations with operators (as covered in Chapter 4) and built-in functions (as covered in Chapter 8). Python also provides the
math
,cmath
,operator
, andrandom
modules, which support additional numeric computation functionality, as documented in this chapter.You can represent arrays in Python with lists and tuples (covered in Chapter 4), as well as with thearray
standard library module, which is covered in this chapter. You can also build advanced array manipulation functions with loops, list comprehensions, iterators, generators, and built-ins such asmap
,reduce
, andfilter
, but such functions can be complicated and slow. Therefore, when you process large arrays of numbers in these ways, your program's performance can be below your machine's full potential.TheNumeric
package addresses these issues, providing high-performance support for multidimensional arrays (matrices) and advanced mathematical operations, such as linear algebra and Fourier transforms.Numeric
does not come with standard Python distributions, but you can freely download it athttps://sourceforge.net/projects/numpy
, either as source code (which is easy to build and install on many platforms) or as a prebuilt self-installing .exe file for Windows. Visit https://www.pfdubois.com/numpy/ for an extensive tutorial and other resources, such as a mailing list aboutNumeric
. Note that theNumeric
package is not just for numeric processing. Much ofNumeric
is about multidimensional arrays and advanced array handling that you can use for any Python sequence.Numeric
is a large, rich package. For full understanding, study the tutorial, work through the examples, and experiment interactively. This chapter presents a reference to an essential subset ofNumeric
on the assumption that you already have some grasp of array manipulation and numeric computing issues. If you are unfamiliar with this subject, theNumeric
tutorial can help.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The math and cmath Modules
- Content preview·Buy reprint rights for this chapterThe
math
module supplies mathematical functions on floating-point numbers, while thecmath
module supplies equivalent functions on complex numbers. For example,math.sqrt(-1)
raises an exception, butcmath.sqrt(-1)
returns1j
.Each module also exposes two attributes of typefloat
bound to the values of fundamental mathematical constants,pi
ande
.acosmath and cmathacos(x)Returns the arccosine of x in radians.acoshcmath onlyacosh(x)Returns the arc hyperbolic cosine of x in radians.asinmath and cmathasin(x)Returns the arcsine of x in radians.asinhcmath onlyasinh(x)Returns the arc hyperbolic sine of x in radians.atanmath and cmathatan(x)Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The operator Module
- Content preview·Buy reprint rights for this chapterThe
operator
module supplies functions that are equivalent to Python's operators. These functions are handy for use withmap
andreduce
, and in other cases where callables must be stored, passed as arguments, or returned as function results. The functions inoperator
have the same names as the corresponding special methods (covered in Chapter 5). Each function is available with two names, with and without the leading and trailing double underscores (e.g., bothoperator.add(
a,b)
andoperator.__add_ _(
a,b)
return a+
b). Table 15-1 lists the functions supplied byoperator
.Table 15-1: Functions supplied by operator MethodSignatureBehaves likeabs
abs(a)
abs(
a)
add
add(a,b)
a+
band_
and_(a,b)
a&
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The random Module
- Content preview·Buy reprint rights for this chapterThe
random
module generates pseudo-random numbers with various distributions. The underlying uniform pseudo-random generator uses the Whichmann-Hill algorithm, with a period of length 6,953,607,871,644. The resulting pseudo-random numbers, while quite good, are not of cryptographic quality. If you want physically generated random numbers rather than algorithmically generated pseudo-random numbers, you may use /dev/random or /dev/urandom on platforms that support such pseudo-devices (such as recent Linux releases). For an alternative, seehttps://www.fourmilab.ch/hotbits
.All functions of modulerandom
are methods of a hidden instance of classrandom.Random
. You can instantiateRandom
explicitly to get multiple generators that do not share state. Explicit instantiation is advisable if you require random numbers in multiple threads (threads are covered in Chapter 14). This section documents the most frequently used functions exposed by modulerandom
.choicechoice(seq)Returns a random item from non-empty sequence seq.getstategetstate( )Returns an object S that represents the current state of the generator. You can later pass S to functionsetstate
in order to restore the generator's state.jumpaheadjumpahead(n)Advances the generator state as if n random numbers had been generated. Computing the new state is faster than generatingAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The array Module
- Content preview·Buy reprint rights for this chapterThe
array
module supplies a type, also calledarray
, whose instances are mutable sequences, like lists. Anarray
a is a one-dimensional sequence whose items can be only characters, or only numbers of one specific numeric type that is fixed when a is created.The extension moduleNumeric
, covered later in this chapter, also supplies a type calledarray
that is far more powerful thanarray.array
. For advanced array operations and multidimensional arrays, I recommendNumeric
even if your array elements are not numbers.array.array
is a simple type, whose main advantage is that, compared to a list, it can save memory to hold objects all of the same (numeric or character) type. Anarray
object a has a one-character read-only attribute a.typecode
, set when a is created, that gives the type of a's items. Table 15-2 shows the possible type codes forarray
.Table 15-2: Type codes for the array module Type codeC typePython typeMinimum size'c'
char
str
(length 1)1 byte'b'
char
int
1 byteAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Numeric Package
- Content preview·Buy reprint rights for this chapterThe main module in the
Numeric
package is theNumeric
module, which provides thearray
object type, a set of functions that manipulate these objects, and universal functions that operate on arrays and other sequences. TheNumeric
package also supports a variety of optional modules for things like linear algebra, random numbers, masked arrays, and Fast Fourier Transforms.Numeric
is one of the rare Python packages often used with the idiomfrom
Numeric
import
*
. You can also useimport
Numeric
and qualify each name by preceding it withNumeric
. However, if you need many of the package's names, importing all the names at once is handy. Another popular alternative is to importNumeric
with a shorter name (e.g.,import
Numeric
as
N
) and qualify each name by preceding it withN
.Although quite solid and stable,Numeric
is under continuous development, with functionality being added and limitations removed. This chapter describes specificallyNumeric
Version 21.3, the latest released version at the time of this writing. A successor toNumeric
, namednumarray
, is being developed by theNumeric
community, and is not quite ready for production use yet.numarray
is not totally compatible withNumeric
, but shares most ofNumeric
's functionality and enriches it further. Information onnumarray
is available athttps://stsdas.stsci.edu/numarray/
.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Array Objects
- Content preview·Buy reprint rights for this chapter
Numeric
provides anarray
type that represents a grid of items. Anarray
object a has a specified number of dimensions, known as its rank, up to some arbitrarily high limit (normally40
, whenNumeric
is built with default options). A scalar (i.e., a single number) has rank0
, a vector has rank1
, a matrix has rank2
, and so forth.The values that occupy cells in the grid of anarray
object, known as the elements of the array, are homogeneous, meaning they are all of the same type, and all element values are stored within one memory area. This contrasts with a list or tuple, where the items may be of different types and each is stored as a separate Python object. This means aNumeric
array occupies far less memory than a Python list or tuple with the same number of items. The type of a's elements is encoded as a's type code, a one-character string, as shown in Table 15-3. Factory functions that buildarray
instances, covered in Section 15.6.6 later in this chapter, take a typecode argument that is one of the values in Table 15-3.Table 15-3: Type codes for Numeric arrays Type codeC typePython typeSynonym'c'
char
str
(length 1)Character
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Universal Functions (ufuncs)
- Content preview·Buy reprint rights for this chapter
Numeric
supplies named functions with the same semantics as Python's arithmetic, comparison, and bitwise operators. Similar semantics (element-wise operation, broadcasting, coercion) are also available with other mathematical functions, both binary and unary, thatNumeric
supplies. For example,Numeric
supplies typical mathematical functions similar to those supplied by built-in modulemath
, such assin
,cos
,log
, andexp
.These functions are objects of typeufunc
(which stands for universal function) and share several traits in addition to those they have in common with array operators. Everyufunc
instance u is callable, is applicable to sequences as well as to arrays, and lets you specify an optional output argument. If u is binary (i.e., if u accepts two operand arguments), u also has four callable attributes, named u.accumulate
, u.outer
, u.reduce
, and u.reduceat
. Theufunc
objects supplied byNumeric
apply only to arrays with numeric type codes (i.e., not to arrays with type code 'O
' or 'c
').Anyufunc
u applies to sequences, not just to arrays. When you start with a list L, it's faster to call u directly on L rather than to convert L to an array. u's return value is an array a; you can perform further computation, if any, on a, and then, if you need a list result, you can convert the resulting array to a list by calling its methodtolist
. For example, say you must compute the logarithm of each item of a list and return another list. On my system, withN
set to2222
and usingpython
-O
, a list comprehension such as:def logsupto(N): return [math.log(x) for x in range(2,N)]
takes about 5.6 milliseconds. Using Python's built-inmap
:def logsupto(N): return map(math.log, range(2,N))
takes around half the time, 2.8 milliseconds. UsingNumeric
'sufunc
namedlog
:def logsupto(N): return Numeric.log(range(2,N)).tolist( )
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Optional Numeric Modules
- Content preview·Buy reprint rights for this chapterMany other modules are built on top of
Numeric
or cooperate with it. You can download some of them from the same URL asNumeric
(https://sourceforge.net/projects/numpy
). Some of these extra modules may already be included in the package you have downloaded. Documentation for the modules is also part of the documentation forNumeric
. A rich library of scientific tools that work well withNumeric
isSciPy
, available athttps://www.scipy.org
. I highly recommend it if you are using Python for scientific or engineering computing.Here are some key optionalNumeric
modules:-
MLab
-
MLab
supplies many Python functions written on top ofNumeric
.MLab
's functions are similar in name and operation to functions supplied by the productMatlab
. -
FFT
-
FFT
supplies Python-callable Fast Fourier Transforms (FFTs) of data held inNumeric
arrays.FFT
can wrap either the well-known FFTPACK Fortran-coded library or the compatible C-coded fftpack library. -
LinearAlgebra
-
LinearAlgebra
supplies Python-callable functions, operating on data held inNumeric
arrays, that wrap either the well-known LAPACK Fortran-coded library or the compatible C-coded lapack_lite library.LinearAlgebra
lets you invert matrices, solve linear systems, compute eigenvalues and eigenvectors, perform singular value decomposition, and least-squares-solve overdetermined linear systems.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Chapter 16: Tkinter GUIs
- Content preview·Buy reprint rights for this chapterMost professional applications interact with users through a graphical user interface (GUI). A GUI is normally programmed through a toolkit, which is a library that implements controls (also known as widgets) that are visible objects such as buttons, labels, text entry fields, and menus. A GUI toolkit lets you compose controls into a coherent whole, display them on-screen, and interact with the user, receiving input via such devices as the keyboard and mouse.Python gives you a choice among many GUI toolkits. Some are platform-specific, but most are cross-platform to different degrees, supporting at least Windows and Unix-like platforms, and often the Macintosh as well. Check
https://phaseit.net/claird/comp.lang.python/python_GUI.html
for a list of dozens of GUI toolkits available for Python. One package,anygui
(https://anygui.org
), lets you program simple GUIs to one common programming interface and deploy them with any of a variety of backends.The most widespread Python GUI toolkit is Tkinter. Tkinter is an object-oriented Python wrapper around the cross-platform toolkit Tk, which is also used with other scripting languages such as Tcl (for which it was originally developed) and Perl. Tkinter, like the underlying Tcl/Tk, runs on Windows, Macintosh, and Unix-like platforms. Tkinter itself comes with standard Python distributions. On Windows, the standard Python distribution also includes the Tcl/Tk components needed to run Tkinter. On other platforms, you must obtain and install Tcl/Tk separately.This chapter covers an essential subset of Tkinter, sufficient to build simple graphical frontends for Python applications. A richer introduction is available athttps://www.pythonware.com/library/tkinter/introduction/
.TheTkinter
module makes it easy to build simple GUI applications. You simply importTkinter
, create, configure, and position the widgets you want, and then enter theTkinter
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Tkinter Fundamentals
- Content preview·Buy reprint rights for this chapterThe
Tkinter
module makes it easy to build simple GUI applications. You simply importTkinter
, create, configure, and position the widgets you want, and then enter theTkinter
main loop. Your application becomes event-driven, which means that the user interacts with the widgets, causing events, and your application responds via the functions you installed as handlers for these events.The following example shows a simple application that exhibits this general structure:import sys, Tkinter Tkinter.Label(text="Welcome!").pack( ) Tkinter.Button(text="Exit", command=sys.exit).pack( ) Tkinter.mainloop( )
The calls toLabel
andButton
create the respective widgets and return them as results. Since we specify no parent windows,Tkinter
puts the widgets directly in the application's main window. The named arguments specify each widget's configuration. In this simple case, we don't need to bind variables to the widgets. We just call thepack
method on each widget, handing control of the widget's geometry to a layout manager object known as the packer. A layout manager is an invisible component whose job is to position widgets within other widgets (known as container or parent widgets), handling geometrical layout issues. The previous example passes no arguments to control the packer's operation, so therefore the packer operates in a default way.When the user clicks on the button, thecommand
callable of theButton
widget executes without arguments. The example passes functionsys.exit
as the argument namedcommand
when it creates theButton
. Therefore, when the user clicks on the button,sys.exit( )
executes and terminates the application (as covered in Chapter 8).After creating and packing the widgets, the example callsTkinter
'smainloop
function, and thus enters theTkinter
main loop and becomes event-driven. Since the only event for which the example installs a handler is a click on the button, nothing happens from the application's viewpoint until the user clicks the button. Meanwhile, however, theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Widget Fundamentals
- Content preview·Buy reprint rights for this chapterThe
Tkinter
module supplies many kinds of widgets, and most of them have several things in common. All widgets are instances of classes that inherit from classWidget
. ClassWidget
itself is abstract; that is, you never instantiateWidget
itself. You only instantiate concrete subclasses corresponding to specific kinds of widgets. ClassWidget
's functionality is common to all the widgets you instantiate.To instantiate any kind of widget, call the widget's class. The first argument is the parent window of the widget, also known as the widget's master. If you omit this positional argument, the widget's master is the application's main window. All other arguments are in named form, option=
value. You can also set or change options on an existing widget w by calling w.config(
option=
value)
. You can get an option of w by calling w.cget(
'option')
, which returns the option's value. Each widget w is a mapping, so you can also get an option as w[
'option']
and set or change it with w[
'option']=
value.Many widgets accept some common options. Some options affect a widget's colors, others affect lengths (normally in pixels), and there are various other kinds. This section details the most commonly used options.Section 16.2.1.1: Color options
Tkinter represents colors with strings. The string can be a color name, such as 'red
' or 'orange
', or it may be of the form '#
RRGGBB', where each of R, G, and B is a hexadecimal digit, to represent a color by the values of red, green, and blue components on a scale of0
to255
. Don't worry; if your screen can't display millions of different colors, as implied by this scheme; Tkinter maps any requested color to the closest color that your screen can display. The common color options are:Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Commonly Used Simple Widgets
- Content preview·Buy reprint rights for this chapterThe
Tkinter
module provides a number of simple widgets that cover most needs of basic GUI applications. This section documents theButton
,Checkbutton
,Entry
,Label
,Listbox
,Radiobutton
,Scale
, andScrollbar
widgets.ClassButton
implements a pushbutton, which the user clicks to execute an action. InstantiateButton
with optiontext=
somestring to let the button show text, orimage=
imageobject to let the button show an image. You normally use optioncommand=
callable to have callable execute without arguments when the user clicks the button. callable can be a function, a bound method of an object, an instance of a class with a__call__
method, or alambda
.Besides methods common to all widgets, an instance b of classButton
supplies two button-specific methods.flashb.flash( )Draws the user's attention to button b by redrawing b a few times, alternatively in normal and active states.invokeb.invoke( )Calls without arguments the callable object that is b'scommand
option, just like b.cget('command')( )
. This can be handy when, within some other action, you want the program to act just as if the button had been clicked.ClassAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Container Widgets
- Content preview·Buy reprint rights for this chapterThe
Tkinter
module supplies widgets whose purpose is to contain other widgets. AFrame
instance does nothing more than act as a container. AToplevel
instance (includingTkinter
's root window, also known as the application's main window) is a top-level window, so your window manager interacts with it (typically by supplying suitable decoration and handling certain requests). To ensure that a widget parent, which must be aFrame
orToplevel
instance, is the parent (also known as master) of another widget child, pass parent as the first parameter when you instantiate child.ClassFrame
represents a rectangular area of the screen contained in other frames or top-level windows.Frame
's only purpose is to contain other widgets. Optionborderwidth
defaults to0
, so an instance ofFrame
normally displays no border. You can configure the option withborderwidth=1
if you want the frame border's outline to be visible.ClassToplevel
represents a rectangular area of the screen that is a top-level window and therefore receives decoration from whatever window manager handles your screen. Each instance ofToplevel
can interact with the window manager and can contain other widgets. Every program usingTkinter
has at least one top-level window, known as the root window. You can instantiateTkinter
's root window explicitly using root=Tkinter.Tk( )
; otherwiseTkinter
instantiates its root window implicitly as and when first needed. If you want to have more than one top-level window, first instantiate the main one with root=Tkinter.Tk( )
. Later in your program, you can instantiate other top-level windows as needed, with calls such as another_toplevel=Tkinter.Toplevel( )
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Menus
- Content preview·Buy reprint rights for this chapterClass
Menu
implements all kinds of menus: menubars of top-level windows, submenus, and pop-up menus. To use aMenu
instance m as the menubar for a top-level window w, set w's configuration optionmenu=
m. To use m as a submenu of aMenu
instance x, call x.add_cascade
with a named argumentmenu=
m. To use m as a pop-up menu, call method m.post
.Besides configuration options covered in Section 16.2.1 earlier in this chapter, aMenu
instance m supports optionpostcommand=
callable.Tkinter
calls callable without arguments each time it is about to display m (whether because of a call to m.post
or because of user actions). You can use this option to update a dynamic menu just in time when necessary.By default, aTkinter
menu shows a tear-off entry (a dashed line before other entries), which lets the user get a copy of the menu in a separateToplevel
window. Since such tear-offs are not part of user interface standards on popular platforms, you may want to disable tear-off functionality by using configuration optiontearoff=0
for the menu.Besides methods common to all widgets, an instance m of classMenu
supplies several menu-specific methods.add, add_cascade, add_checkbutton, add_command, add_radiobutton, add_separatorm.add(entry_kind, **entry_options)Adds after m's existing entries a new entry whose kind is the string entry_kind, which is one of the strings 'cascade
', 'checkbutton
', 'command
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Text Widget
- Content preview·Buy reprint rights for this chapterClass
Text
implements a powerful multiline text editor, able to display images and embedded widgets as well as text in one or more fonts and colors. An instance t ofText
supports many ways to refer to specific points in t's contents. t supplies methods and configuration options allowing fine-grained control of operations, content, and rendering. This section covers a large, frequently used subset of this vast functionality. In some very simple cases, you can get by with just threeText
-specific idioms:t.delete('1.0', END) # clear the widget's contents t.insert(END, astring) # append astring to the widget's contents somestring = t.get('1.0', END) # get the widget's contents as a string
END
is an index on anyText
instance t, indicating the end of t's text. '1.0
' is also an index, indicating the start of t's text (first line, first column). For more about indices, see Section 16.6.5 later in this chapter.An instance t of classText
supplies many methods. Methods dealing with marks and tags are covered in later sections. Many methods accept one or two indices into t's contents. The most frequently used methods are the following.deletet.delete(i[,j])t.delete(
i)
removes t's character at index i. t.delete(
i,j)
removes all characters from index i to index j, included.getAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Canvas Widget
- Content preview·Buy reprint rights for this chapterClass
Canvas
is a powerful, flexible widget used for many purposes, including plotting and, in particular, building custom widgets. Building custom widgets is an advanced topic, and I do not cover it further in this book. This section covers only a subset ofCanvas
functionality used for the simplest kind of plotting.Coordinates within aCanvas
instance c are in pixels, with the origin at the upper left corner of c and positive coordinates growing rightward and downward. There are advanced methods that let you change c's coordinate system, but I do not cover them in this book.What you draw on aCanvas
instance c are canvas items, which can be lines, polygons,Tkinter
images, arcs, ovals, texts, and others. Each item has an item handle by which you can refer to the item. You can also assign symbolic names called tags to sets of canvas items (the sets of items with different tags can overlap).ALL
is a predefined tag that applies to all items;CURRENT
is a predefined tag that applies to the item under the mouse pointer.Tags on aCanvas
instance are different from tags on aText
instance. The canvas tags are nothing more than sets of items with no independent existence. When you perform any operation, passing aCanvas
tag as the item identifier, the operation occurs on those items that are in the tag's current set. It makes no difference if items are later removed from or added to that tag's set.You create a canvas item by calling on c a method with a name of the formcreate_
kindofitem, which returns the new item's handle. Methodsitemcget
anditemconfig
of c let you get and change items' options.ACanvas
instance c supplies methods that you can call on items. The item argument can be an item's handle, as returned for example by c.create_line
, or a tag, meaning all items in that tag's set (or no items at all, if the tag's set is currently empty), unless otherwise indicated in the method's description.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Geometry Management
- Content preview·Buy reprint rights for this chapterIn all the examples so far, we have made each widget visible by calling method
pack
on the widget. This is representative of real-lifeTkinter
usage. However, two other layout managers exist and are sometimes useful. This section covers all three layout managers provided by theTkinter
module.Never mix geometry managers for the same container widget: all children of each given container widget must be handled by the same geometry manager, or very strange effects (includingTkinter
going into infinite loops) may result.Calling methodpack
on a widget delegates widget geometry management to a simple and flexible layout manager component called thePacker
. ThePacker
sizes and positions each widget within a container (parent) widget, according to each widget's space needs (including optionspadx
andpady
). Each widget w supplies the followingPacker
-related methods.packw.pack(**pack_options)Delegates geometry management to the packer. pack_options may include:-
expand
-
When true, w expands to fill any space not otherwise used in w's parent.
-
fill
-
Determines whether w fills any extra space allocated to it by the packer, or keeps its own minimal dimensions:
NONE
(default),X
(fill only horizontally),Y
(fill only vertically), orBOTH
(fill both horizontally and vertically).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Tkinter Events
- Content preview·Buy reprint rights for this chapterSo far, we've seen only the most elementary kind of event handling: the callbacks performed on callables installed with the
command=
option of buttons and menu entries of various kinds.Tkinter
also lets you install callables to call back when needed to handle a variety of events. However,Tkinter
does not let you create your own custom events; you are limited to working with events predefined byTkinter
itself.General event callbacks must accept one argument event that is aTkinter
event object. Such an event object has several attributes describing the event:-
char
-
A single-character string that is the key's code (only for keyboard events)
-
keysym
-
A string that is the key's symbolic name (only for keyboard events)
-
num
-
Button number (only for mouse-button events);
1
and up -
x
,y
-
Mouse position, in pixels, relative to the upper left corner of the widget
-
x_root
,y_root
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Chapter 17: Testing, Debugging, and Optimizing
- Content preview·Buy reprint rights for this chapterYou're not finished with a programming task when you're done writing the code: you're finished when your code is running correctly and with acceptable performance. Testing means verifying that your code is running correctly by exercising the code under known conditions and checking that the results are as expected. Debugging means discovering the causes of incorrect behavior and removing them (the removal is often easy once you have figured out the causes).Optimizing is often used as an umbrella term for activities meant to ensure acceptable performance. Optimizing breaks down into benchmarking (measuring performance for given tasks and checking that it's within acceptable bounds), profiling (instrumenting the program to find out what parts are performance bottlenecks), and optimizing proper (removing bottlenecks to make overall program performance acceptable). Clearly, you can't remove performance bottlenecks until you've found out where they are (using profiling), which in turn requires knowing that there are performance problems (using benchmarking).All of these tasks are large and important, and each could fill a book by itself. This chapter does not explore every related technique and implication; it focuses on Python-specific techniques, approaches, and tools.In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and even more distinctions could be drawn, but my goal is to focus on the issues of most immediate importance to software developers.Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional testing) involves running an entire program with known inputs. Some classic books on testing draw the distinction betweenAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Testing
- Content preview·Buy reprint rights for this chapterIn this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and even more distinctions could be drawn, but my goal is to focus on the issues of most immediate importance to software developers.Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional testing) involves running an entire program with known inputs. Some classic books on testing draw the distinction between white-box testing, done with knowledge of a program's internals, and black-box testing, done from the outside. This classic viewpoint parallels the modern one of unit versus system testing.Unit and system testing serve different goals. Unit testing proceeds apace with development; you can and should test each unit as you're developing it. Indeed, one modern approach is known as test-first coding: for each feature that your program must have, you first write unit tests, and only then do you proceed to write code that implements the feature. Test-first coding seems a strange approach, but it has several advantages. For example, it ensures that you won't omit unit tests for some feature. Further, test-first coding is helpful because it urges you to focus first on what tasks a certain function, class, or method should accomplish, and to deal only afterwards with implementing that function, class, or method. In order to test a unit, which may depend on other units not yet fully developed, you often have to write stubs, which are fake implementations of various units' interfaces that give known and correct responses in cases needed to test other units.System testing comes afterwards, since it requires the system to exist with some subset of system functionality believed to be in working condition. System testing provides a sanity check: given that each module in the program works properly (passes unit tests), does the whole program work? If each unit is okay but the system as a whole is not, there is a problem with integration between units. For this reason, system testing is also known as integration testing.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Debugging
- Content preview·Buy reprint rights for this chapterSince Python's development cycle is so fast, the most effective way to debug is often to edit your code to make it output relevant information at key points. Python has many ways to let your code explore its own state in order to extract information that may be relevant for debugging. The
inspect
andtraceback
modules specifically support such exploration, which is also known as reflection or introspection.Once you have obtained debugging-relevant information, statementprint
is often the simplest way to display it. You can also log debugging information to files. Logging is particularly useful for programs that run unattended for a long time, as is typically the case for server programs. Displaying debugging information is like displaying other kinds of information, as covered in Chapter 10 and Chapter 16, and similarly for logging it, as covered in Chapter 10 and Chapter 11. Python 2.3 will also include a module specifically dedicated to logging. As covered in Chapter 8, rebinding attributeexcepthook
of modulesys
lets your program log detailed error information just before your program is terminated by a propagating exception.Python also offers hooks enabling interactive debugging. Modulepdb
supplies a simple text-mode interactive debugger. Other interactive debuggers for Python are part of integrated development environments (IDEs), such as IDLE and various commercial offerings. However, I do not cover IDEs in this book.Theinspect
module supplies functions to extract information from all kinds of objects, including the Python call stack (which records all function calls currently executing) and source files. At the time of this writing, moduleinspect
is not yet available for Jython. The most frequently used functions of moduleinspect
are as follows.getargspec, formatargspecAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The warnings Module
- Content preview·Buy reprint rights for this chapterWarnings are messages about errors or anomalies that may not be serious enough to be worth disrupting the program's control flow (as would happen by raising a normal exception). The
warnings
module offers you fine-grained control over which warnings are output and what happens to them. Your code can conditionally output a warning by calling functionwarn
in modulewarnings
. Other functions in the module let you control how warnings are formatted, set their destinations, and conditionally suppress some warnings (or transform some warnings into exceptions).Modulewarnings
supplies several exception classes representing warnings. ClassWarning
subclassesException
and is the base class for all warnings. You may define your own warning classes; they must subclassWarning
, either directly or via one of its other existing subclasses, which are:-
DeprecationWarning
-
Using deprecated features only supplied for backward compatibility
-
RuntimeWarning
-
Using features whose semantics are error-prone
-
SyntaxWarning
-
Using features whose syntax is error-prone
-
UserWarning
-
Other user-defined warnings that don't fit any of the above cases
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Optimization
- Content preview·Buy reprint rights for this chapter"First make it work. Then make it right. Then make it fast." This quotation, often with slight variations, is widely known as the golden rule of programming. As far as I've been able to ascertain, the quotation is attributed to Kent Beck, who credits his father with it. Being widely known makes the principle no less important, particularly because it's more honored in the breach than in the observance. A negative form, slightly exaggerated for emphasis, is in a quotation by Don Knuth: "Premature optimization is the root of all evil in programming."Optimization is premature if your code is not working yet. First make it work. Optimization is also premature if your code is working but you are not satisfied with the overall architecture and design. Remedy structural flaws before worrying about optimization: first make it work, then make it right. These first two steps are not optional—working, well-architected code is always a must.In contrast, you don't always need to make it fast. Benchmarks may show that your code's performance is already acceptable after the first two steps. When performance is not acceptable, profiling often shows that all performance issues are in a small subset, perhaps 10% to 20% of the code where your program spends 80% or 90% of the time. Such performance-crucial regions of your code are also known as its bottlenecks, or hot spots. It's a waste of effort to optimize large portions of code that account for, say, 10% of your program's running time. Even if you made that part run 10 times as fast (a rare feat), your program's overall runtime would only decrease by 9%, a speedup no user will even notice. If optimization is needed, focus your efforts where they'll matter, on bottlenecks. You can optimize bottlenecks while keeping your code 100% pure Python. In some cases, you can resort to recoding some computational bottlenecks as Python extensions, potentially gaining even better performance.Start by designing, coding, and testing your application in Python, often using some already available extension modules. This takes much less time than it would take with a classic compiled language. Then benchmark the application to find out if the resulting code is fast enough. Often it is, and you're done—congratulations!Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 18: Client-Side Network Protocol Modules
- Content preview·Buy reprint rights for this chapterA program can work on the Internet as a client (a program that accesses resources) or as a server (a program that makes services available). Both kinds of program deal with protocol issues, such as how to access and communicate data, and with data formatting issues. For order and clarity, the Python library deals with these issues in several different modules. This book will cover the topics in separate chapters. This chapter deals with the modules in the Python library that support protocol issues of client programs.Nowadays, data access can often be achieved most simply through Uniform Resource Locators (URLs). Python supports URLs with modules
urlparse
,urllib
, andurllib2
. For rarer cases, when you need fine-grained control of data access protocols normally accessed via URLs, Python supplies moduleshttplib
andftplib
. Protocols for which URLs are often insufficient include mail (modulespoplib
andsmtplib
), Network News (modulenntplib
), and Telnet (moduletelnetlib
). Python also supports the XML-RPC protocol for distributed computing with modulexmlrpclib
.A URL identifies a resource on the Internet. A URL is a string composed of several optional parts, called components, known as scheme, location, path, query, and fragment. A URL with all its parts looks something like:scheme://lo.ca.ti.on/pa/th?query#fragment
For example, inhttps://www.python.org:80/faq.cgi?src=fie
, the scheme is http, the location is www.python.org:80, the path is /faq.cgi, the query is src=fie, and there is no fragment. Some of the punctuation characters form a part of one of the components they separate, while others are just separators and are part of no component. Omitting punctuation implies missing components. For example, inAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - URL Access
- Content preview·Buy reprint rights for this chapterA URL identifies a resource on the Internet. A URL is a string composed of several optional parts, called components, known as scheme, location, path, query, and fragment. A URL with all its parts looks something like:
scheme://lo.ca.ti.on/pa/th?query#fragment
For example, inhttps://www.python.org:80/faq.cgi?src=fie
, the scheme is http, the location is www.python.org:80, the path is /faq.cgi, the query is src=fie, and there is no fragment. Some of the punctuation characters form a part of one of the components they separate, while others are just separators and are part of no component. Omitting punctuation implies missing components. For example, in mailto:me@you.com, the scheme is mailto, the path is me@you.com, and there is no location, query, or fragment. The missing // means the URL has no location part, the missing ? means it has no query part, and the missing # means it has no fragment part.Theurlparse
module supplies functions to analyze and synthesize URL strings. In Python 2.2, the most frequently used functions of moduleurlparse
areurljoin
,urlsplit
, andurlunsplit
.urljoinurljoin(base_url_string,relative_url_string)Returns a URL string u, obtained by joining relative_url_string, which may be relative, with base_url_string. The joining procedure thaturljoin
performs to obtain its result u may be summarized as follows:-
When either of the argument strings is empty, u is the other argument.
-
When relative_url_string explicitly specifies a scheme different from that of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Email Protocols
- Content preview·Buy reprint rights for this chapterMost email today is sent via servers that implement the Simple Mail Transport Protocol (SMTP) and received via servers that implement the Post Office Protocol Version 3 (POP3). These protocols are supported by the Python standard library modules
smtplib
andpoplib
, respectively. Some servers, instead of or in addition to POP3, implement the richer and more advanced Internet Message Access Protocol Version 4 (IMAP4), supported by the Python standard library moduleimaplib
, which I do not cover in this book.Thepoplib
module supplies a classPOP3
to access a POP mailbox.POP3class POP3(host,port=110)Returns an instance p of classPOP3
connected to the given host and port.Instance p supplies many methods, of which the most frequently used are the following.delep.dele(msgnum)Marks message msgnum for deletion. The server performs deletions when this connection terminates by a call to method quit. Returns the response string.listp.list(msgnum=None)Returns a pair(
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The HTTP and FTP Protocols
- Content preview·Buy reprint rights for this chapterModules
urllib
andurllib2
are most often the handiest ways to access servers for http, https, and ftp protocols. The Python standard library also supplies specific modules to use for these data access protocols.Modulehttplib
supplies a classHTTPConnection
to connect to an HTTP server.HTTPConnectionclass HTTPConnection(host,port=80)Returns an instance h of classHTTPConnection
, ready for connection (but not yet connected) to the given host and port.Instance h supplies several methods, of which the most frequently used are the following.closeh.close( )Closes the connection to the HTTP server.getresponseh.getresponse( )Returns an instance r of classHTTPResponse
, which represents the response received from the HTTP server. Call after methodrequest
has returned. Instance r supplies the following attributes and methods:-
r
.getheader(
name
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
r
- Network News
- Content preview·Buy reprint rights for this chapterNetwork News, also known as Usenet News, is mostly transmitted with the Network News Transport Protocol (NNTP). The Python standard library supports this protocol in its module
nntplib
. Thenntplib
module supplies a classNNTP
to connect to an NNTP server.NNTPclass NNTP( host,port=119,user=None,passwd=None,readermode=False)Returns an instance n of classNNTP
connected to the given host and port, and optionally authenticated with the given user and passwd if user is notNone
. When readermode isTrue
, also sends a 'mode
reader
' command; you may need this, depending on what NNTP server you connect to and on what NNTP commands you send to that server.An instance n ofNNTP
supplies many methods. Each of n's methods returns a tuple whose first item is a string (referred to as response in the following section) that is the response from the NNTP server to the NNTP command corresponding to the method (methodpost
just returns the response string, not a tuple). Each method returns the response string just as the NNTP server supplies it. The string starts with an integer in decimal form (the integer is known as the return code), followed by a space, followed by more text.For some commands, the extra text after the return code is just a comment or explanation supplied by the NNTP server. For other commands, the NNTP standard specifies the format of the text that follows the return code on the response line. In those cases, the relevant method also parses the text in question, yielding other items in the method's resulting tuple, so your code need not perform such parsing itself; rather, you can just access further items in the method's result tuple, as specified in the following sections.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Telnet
- Content preview·Buy reprint rights for this chapterTelnet is an old protocol, specified by RFC 854 (see
https://www.faqs.org/rfcs/rfc854.html
), and normally used for interactive user sessions. The Python standard library supports this protocol in its moduletelnetlib
. Moduletelnetlib
supplies a classTelnet
to connect to a Telnet server.Telnetclass Telnet(host=None,port=23)Returns an instance t of classTelnet
. When host (and optionally port) is given, implicitly calls t.open(
host,port)
.Instance t supplies many methods, of which the most frequently used are as follows.closet.close( )Closes the connection.expectt.expect(res,timeout=None)Reads data from the connection until it matches any of the regular expressions that are the items of list res, or until timeout seconds elapse when timeout is notNone
. Regular expressions and match objects are covered in Chapter 9. Returns a tuple of three items(
i,mo,txt)
, where i is the index in res of the regular expression that matched, mo is the match object, and txt is all the text read until the match, included. RaisesEOFError
when the connection is closed and no data is available; otherwise, when it gets no match, returns(-1,None
,txt)
, whereAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Distributed Computing
- Content preview·Buy reprint rights for this chapterThere are many standards for distributed computing, from simple Remote Procedure Call (RPC) ones to rich object-oriented ones such as CORBA. You can find several third-party Python modules supporting these standards on the Internet.The Python standard library comes with support for both server and client use of a simple yet powerful standard known as XML-RPC. For in-depth coverage of XML-RPC, I recommend the book Programming Web Services with XML-RPC, by Simon St. Laurent and Joe Johnson (O'Reilly). XML-RPC uses HTTP as the underlying transport and encodes requests and replies in XML. For server-side support, see Section 19.2.2.4 in Chapter 19. Client-side support is supplied by module
xmlrpclib
.Thexmlrcplib
module supports a classServerProxy
, which you instantiate to connect to an XML-RPC server. An instance s ofServerProxy
is a proxy for the server it connects to. In other words, you call arbitrary methods on s, and s packages up the method name and argument values as an XML-RPC request, sends the request to the XML-RPC server, receives the server's response, and unpackages the response as the method's result. The arguments to such method calls can be of any type supported by XML-RPC:- Boolean
-
Constant attributes
True
andFalse
of modulexmlrpclib
(since modulexlmrpclib
predates the introduction ofbool
into Python, it does not use Python's built-inTrue
andFalse
values for this purpose) - Integers, floating-point numbers, strings, arrays
-
Passed and returned as Python
int
,float
,Unicode
, andlist
values - Structures
-
Passed and returned as Python
dict
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 19: Sockets and Server-Side Network Protocol Modules
- Content preview·Buy reprint rights for this chapterTo communicate with the Internet, programs use devices known as sockets. The Python library supports sockets through module
socket
, as well as wrapping them into higher-level modules covered in Chapter 18. To help you write server programs, the Python library also supplies higher-level modules to use as frameworks for socket servers. Standard and third-party Python modules and extensions also support timed and asynchronous socket operations. This chapter coverssocket
, the server-side framework modules, and the essentials of other, more advanced modules.The modules covered in this chapter offer many conveniences compared to C-level socket programming. However, in the end, the modules rely on native socket functionality supplied by the underlying operating system. While it is often possible to write effective network clients by using just the modules covered in Chapter 18, without needing to understand sockets, writing effective network servers most often does require some understanding of sockets. Thus, the lower-level modulesocket
is covered in this chapter and not in Chapter 18, even though both clients and servers use sockets.However, I only cover the ways in which modulesocket
lets your program access sockets; I do not try to impart the detailed understanding of sockets, and of other aspects of network behavior independent of Python, that you may need to make use ofsocket
's functionality. To understand socket behavior in detail on any kind of platform, I recommend W. Richard Stevens' Unix Network Programming, Volume 1 (Prentice-Hall). Higher-level modules are simpler and more powerful, but a detailed understanding of the underlying technology is always useful, and sometimes it can prove indispensable.Thesocket
module supplies a factory function, also namedsocket
, that you call to generate a socket object s. You perform network operations by calling methods on s. In a client program, you connect to a server by calling sAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The socket Module
- Content preview·Buy reprint rights for this chapterThe
socket
module supplies a factory function, also namedsocket
, that you call to generate a socket object s. You perform network operations by calling methods on s. In a client program, you connect to a server by calling s.connect
. In a server program, you wait for clients to connect by calling s.bind
and s.listen
. When a client requests a connection, you accept the request by calling s.accept
, which returns another socket object s1 connected to the client. Once you have a connected socket object, you transmit data by calling its methodsend
, and receive data by calling its methodrecv
.Python supports both current Internet Protocol (IP) standards. IPv4 is more widespread, while IPv6 is newer. In IPv4, a network address is a pair(
host,port)
, where host is a Domain Name System (DNS) hostname such as 'www.python.org
' or a dotted-quad IP address string such as '194.109.137.226
'. port is an integer indicating a socket's port number. In IPv6, a network address is a tuple(
host, port, flowinfo, scopeid)
. Since IPv6 infrastructure is not yet widely deployed, I do not cover IPv6 further in this book. When host is a DNS hostname, Python implicitly looks up the name, using your platform's DNS infrastructure, and uses the dotted-quad IP address corresponding to that name.Modulesocket
supplies an exception classerror
. Functions and methods of the module raiseerror
instances to diagnose socket-specific errors. Modulesocket
also supplies many functions. Several of these functions translate data, such as integers, between your host's native format and network standard format. The higher-level protocol that your program and its counterpart are using on a socket determines what kind of conversions you must perform.The most frequently used functions of moduleAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The SocketServer Module
- Content preview·Buy reprint rights for this chapterThe Python library supplies a framework module,
SocketServer
, to help you implement Internet servers.SocketServer
supplies server classesTCPServer
, for connection-oriented servers using TCP, andUDPServer
, for datagram-oriented servers using UDP, with the same interface.An instance s of eitherTCPServer
orUDPServer
supplies many attributes and methods, and you can subclass either class and override some methods to architect your own specialized server framework. However, I do not cover such advanced and rarely used possibilities in this book.ClassesTCPServer
andUDPServer
implement synchronous servers, able to serve one request at a time. ClassesThreadingTCPServer
andThreadingUDPServer
implement threaded servers, spawning a new thread per request. You are responsible for synchronizing the resulting threads as needed. Threading is covered in Chapter 14.For normal use ofSocketServer
, subclass theBaseRequestHandler
class provided bySocketServer
and override thehandle
method. Then, instantiate a server class, passing the address pair on which to serve and your subclass ofBaseRequestHandler
. Finally, call methodserve_forever
on the server class instance.An instance h ofBaseRequestHandler
supplies the following methods and attributes.client_addressThe h.client_address
attribute is the pair(
host,port)
of the client, set by the base class at connection.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Event-Driven Socket Programs
- Content preview·Buy reprint rights for this chapterSocket programs, particularly servers, must often be ready to perform many tasks at once. Example 19-1 accepts a connection request, then serves a single client until that client has finished—other connection requests must wait. This is not acceptable for servers in production use. Clients cannot wait too long: the server must be able to service multiple clients at once.One approach that lets your program perform several tasks at once is threading, covered in Chapter 14. Module
SocketServer
optionally supports threading, as covered earlier in this chapter. An alternative to threading that can offer better performance and scalability is event-driven (also known as asynchronous) programming.An event-driven program sits in an event loop, where it waits for events. In networking, typical events are "a client requests connection," "data arrived on a socket," and "a socket is available for writing." The program responds to each event by executing a small slice of work to service that event, then goes back to the event loop to wait for the next event. The Python library supports event-driven network programming with low-levelselect
module and higher-levelasyncore
andasynchat
modules. Even more complete support for event-driven programming is in the Twisted package (available athttps://www.twistedmatrix.com
), particularly in subpackagetwisted.internet
.Theselect
module exposes a cross-platform low-level function that lets you implement high-performance asynchronous network servers and clients. Moduleselect
offers additional platform-dependent functionality on Unix-like platforms, but I cover only cross-platform functionality in this book.selectselect(inputs,outputs,exceptsAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 20: CGI Scripting and Alternatives
- Content preview·Buy reprint rights for this chapterWhen a web browser (or other web client) requests a page from a web server, the server may return either static or dynamic content. Serving dynamic content involves server-side web programs that generate and deliver content on the fly, often based on information that is stored in a database. The one longstanding Web-wide standard for server-side programming is known as CGI, which stands for Common Gateway Interface. In server-side programming, a client sends a structured request to a web server. The server runs another program, passing the content of the request. The server captures the output of the other program, and sends that output to the client as the response to the original request. In other words, the server's role is that of a gateway between the client and the other program. The other program is called a CGI program or CGI script.CGI enjoys the typical advantages of standards. When you program to the CGI standard, your program can be deployed on different web servers, and work despite the differences. This chapter focuses on CGI scripting in Python. It also mentions the downsides of CGI (basically, issues of scalability under high load) and some of the alternative, nonstandard server-side architectures that you can use instead of CGI.This chapter assumes that you are familiar with both HTML and HTTP. For reference material on both of these standards, see Webmaster in a Nutshell, by Stephen Spainhour and Robert Eckstein (O'Reilly). For detailed coverage of HTML, I recommend HTML & XHTML: The Definitive Guide, by Chuck Musciano and Bill Kennedy (O'Reilly). And for additional coverage of HTTP, see the HTTP Pocket Reference, by Clinton Wong (O'Reilly).CGI's standardization lets you use any language to code CGI scripts. Python is a very-high-level, high-productivity language, and thus quite suitable for CGI coding. The Python standard library supplies modules to handle typical CGI-related tasks.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - CGI in Python
- Content preview·Buy reprint rights for this chapterCGI's standardization lets you use any language to code CGI scripts. Python is a very-high-level, high-productivity language, and thus quite suitable for CGI coding. The Python standard library supplies modules to handle typical CGI-related tasks.CGI scripts are often used to handle HTML form submissions. In this case, the
action
attribute of theform
tag specifies a URL for a CGI script to handle the form, and themethod
attribute is eitherGET
orPOST
, indicating how the form data is sent to the script. According to the CGI standard, the GET method should be used for forms without side effects, such as asking the server to query a database and display the results, while the POST method is meant for forms with side effects, such as asking the server to update a database. In practice, however, GET is also often used to create side effects. The distinction between GET and POST in practical use is that GET encodes the form's contents as a query string joined to theaction
URL to form a longer URL, while POST transmits the form's contents as an encoded stream of data, which a CGI script sees as the script's standard input.The GET method is slightly faster. You can use a fixed GET-form URL wherever you can use a hyperlink. However, GET cannot send large amounts of data to the server, since many clients and servers limit URL lengths (you're safe up to about 200 bytes). The POST method has no size limits. You must use POST when the form containsinput
tags withtype=file
—theform
tag must then haveenctype=multipart/form-data
.The CGI standard does not specify whether a single script can access both the query string (used for GET) and the script's standard input (used for POST). Many clients and servers let you get away with it, but relying on this nonstandard practice may negate the portability advantages that you would otherwise get from the fact that CGI is a standard. Python's standard modulecgi
, covered in the next section, recovers form data from the query string only, when any query string is present; otherwise, when no query string is present,Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Cookies
- Content preview·Buy reprint rights for this chapterHTTP is a stateless protocol, meaning that it retains no session state between transactions. Cookies, as specified by the HTTP 1.1 standard, let web clients and servers cooperate to build a stateful session from a sequence of HTTP transactions.Each time a server sends a response to a client's request, the server may initiate or continue a session by sending one or more Set-Cookie headers, whose contents are small data items called cookies. When a client sends another request to the server, the client may continue a session by sending Cookie headers with cookies previously received from that server or other servers in the same domain. Each cookie is a pair of strings, the name and value of the cookie, plus optional attributes. Attribute
max-age
is the maximum number of seconds the cookie should be kept. The client should discard saved cookies after their maximum age. Ifmax-age
is missing, then the client should discard the cookie when the user's interactive session ends.Cookies have no intrinsic privacy nor authentication. Cookies travel in the clear on the Internet, and therefore are vulnerable to sniffing. A malicious client might return cookies different from cookies previously received. To use cookies for authentication or identification or to hold sensitive information, the server must encrypt and encode cookies sent to clients, and decode, decrypt, and verify cookies received back from clients.Encryption, encoding, decoding, decryption, and verification may all be slow when applied to large amounts of data. Decryption and verification require the server to keep some amount of server-side state. Sending substantial amounts of data back and forth on the network is also slow. The server should therefore persist most state data locally, in files or databases. In most cases, a server should use cookies only as small, encrypted, verifiable keys confirming the identity of a user or session, using DBM files or a relational database (covered in Chapter 11) for session state. HTTP sets a limit of 2 KB on cookie size, but I suggest you normally use substantially smaller cookies.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Other Server-Side Approaches
- Content preview·Buy reprint rights for this chapterA CGI script runs as a new process each time a client requests it. Process startup time, interpreter initialization, connection to databases, and script initialization all add up to measurable overhead. On fast, modern server platforms, the overhead is bearable for light to moderate loads. On a busy server, CGI may not scale up well. Web servers support server-specific ways to reduce overhead, running scripts in processes that can serve for several hits rather than starting up a new CGI process per hit.Microsoft's ASP (Active Server Pages) is a server extension leveraging a lower-level library, ISAPI, and Microsoft's COM technology. Most ASP pages are coded in the VBScript language, but ASP is language-independent. As the reptilian connection suggests, Python and ASP go very well together, as long as Python is installed with the platform-specific
win32all
extensions, specificallyActiveScripting
. Many other server extensions are cross-platform, not tied to specific operating systems.The popular content server framework Zope (https://www.zope.org
) is a Python application. If you need advanced content management features, Zope should definitely be among the solutions you consider. However, Zope is a large, rich, powerful system, needing a full book of its own to do it justice. Therefore, I do not cover Zope further in this book.FastCGI lets you write scripts similar to CGI scripts, yet use each process to handle multiple hits, either sequentially or simultaneously in separate threads. FastCGI is available for Apache and other free web servers, but at the time of this writing not for Microsoft IIS. Seehttps://www.fastcgi.com
for FastCGI overviews and details. Go tohttps://alldunn.com/python/fcgi.py
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 21: MIME and Network Encodings
- Content preview·Buy reprint rights for this chapterWhat travels on a network are streams of bytes or text. However, what you want to send over the network often has more structure. The Multipurpose Internet Mail Extensions (MIME) and other encoding standards bridge the gap by specifying how to represent structured data as bytes or text. Python supports such encodings through many library modules, such as
base64
,quopri
,uu
, and the modules of theemail
package. This chapter covers these modules.Several kinds of media (e.g., email messages) contain only text. When you want to transmit binary data via such media, you need to encode the data as text strings. The Python standard library supplies modules that support the standard encodings known as Base 64, Quoted Printable, and UU.Thebase64
module supports the encoding specified in RFC 1521 as Base 64. The Base 64 encoding is a compact way to represent arbitrary binary data as text, without any attempt to produce human-readable results. Modulebase64
supplies four functions.decodedecode(infile,outfile)Reads text-file-like object infile, by calling infile.readline
until end of file (i.e, until a call to infile.readline
returns an empty string), decodes the Base 64-encoded text thus read, and writes the decoded data to binary-file-like object outfile.decodestringdecodestring(s)Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Encoding Binary Data as Text
- Content preview·Buy reprint rights for this chapterSeveral kinds of media (e.g., email messages) contain only text. When you want to transmit binary data via such media, you need to encode the data as text strings. The Python standard library supplies modules that support the standard encodings known as Base 64, Quoted Printable, and UU.The
base64
module supports the encoding specified in RFC 1521 as Base 64. The Base 64 encoding is a compact way to represent arbitrary binary data as text, without any attempt to produce human-readable results. Modulebase64
supplies four functions.decodedecode(infile,outfile)Reads text-file-like object infile, by calling infile.readline
until end of file (i.e, until a call to infile.readline
returns an empty string), decodes the Base 64-encoded text thus read, and writes the decoded data to binary-file-like object outfile.decodestringdecodestring(s)Decodes text string s, which contains one or more complete lines of Base 64-encoded text, and returns the byte string with the corresponding decoded data.encodeencode(infile,outfile)Reads binary-file-like object infileAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - MIME and Email Format Handling
- Content preview·Buy reprint rights for this chapterPython supplies the
email
package to handle parsing, generation, and manipulation of MIME files such as email messages, network news posts, and so on. The Python standard library also contains other modules that handle some parts of these jobs. However, the newemail
package offers a more complete and systematic approach to these important tasks. I therefore suggest you use packageemail
, not the older modules that partially overlap with parts ofemail
's functionality. Packageemail
has nothing to do with receiving or sending email; for such tasks, see modulespoplib
andsmtplib
, covered in Chapter 18. Instead, packageemail
deals with how you handle messages after you receive them or before you send them.Packageemail
supplies two factory functions returning an instance m of classemail.Message.Message
. These functions rely on classemail.Parser.Parser
, but the factory functions are handier and simpler. Therefore, I do not cover moduleParser
further in this book.message_from_stringmessage_from_string(s)Builds m by parsing string s.message_from_filemessage_from_file(f)Builds m by parsing the contents of file-like object f, which must be open for reading.Theemail.Message
module supplies classMessage
. All parts of packageemail
produce, modify, or use instances of classAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 22: Structured Text: HTML
- Content preview·Buy reprint rights for this chapterMost documents on the Web use HTML, the HyperText Markup Language. Markup is the insertion of special tokens, known as tags, in a text document to give structure to the text. HTML is an application of the large, general standard known as SGML, the Standard General Markup Language. In practice, many of the Web's documents use HTML in sloppy or incorrect ways. Browsers have evolved many practical heuristics over the years to try and compensate for this, but even so, it still often happens that a browser displays an incorrect web page in some weird way.Moreover, HTML was never suitable for much more than presenting documents on a screen. Complete and precise extraction of the information in the document, working backward from the document's presentation, is often unfeasible. To tighten things up again, HTML has evolved into a more rigorous standard called XHTML. XHTML is very similar to traditional HTML, but it is defined in terms of XML and more precisely than HTML. You can handle XHTML with the tools covered in Chapter 23.Despite the difficulties, it's often possible to extract at least some useful information from HTML documents. Python supplies the
sgmllib
,htmllib
, andHTMLParser
modules for the task of parsing HTML documents, whether this parsing is for the purpose of presenting the documents, or, more typically, as part of an attempt to extract information from them. Generating HTML and embedding Python in HTML are also frequent tasks. No standard Python library module supports HTML generation or embedding directly, but you can use normal Python string manipulation, and third-party modules can also help.The name of thesgmllib
module is misleading:sgmllib
parses only a tiny subset of SGML, but it is still a good way to get information from HTML files.sgmllib
supplies one class,SGMLParser
, which you subclass to override and add methods. The most frequently used methods of an instanceAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The sgmllib Module
- Content preview·Buy reprint rights for this chapterThe name of the
sgmllib
module is misleading:sgmllib
parses only a tiny subset of SGML, but it is still a good way to get information from HTML files.sgmllib
supplies one class,SGMLParser
, which you subclass to override and add methods. The most frequently used methods of an instance s of your subclass X ofSGMLParser
are as follows.closes.close( )Tells the parser that there is no more input data. When X overridesclose
, x.close
must callSGMLParser.close
to ensure that buffered data get processed.do_tags.do_tag(attributes)X supplies a method with such a name for each tag, with no corresponding end tag, that X wants to process. tag must be in lowercase in the method name, but can be in any mix of cases in the parsed text.SGMLParser
'shandle_tag
method callsdo_
tag as appropriate. attributes is a list of pairs(
name,value)
, where name is each attribute's name, lowercased, and value is the value, processed to resolve entity references and character references and to remove surrounding quotes.end_tags.end_tag( )X supplies a method with such a name for each tag whose end tag X wants to process.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The htmllib Module
- Content preview·Buy reprint rights for this chapterThe
htmllib
module supplies a class namedHTMLParser
that subclassesSGMLParser
and definesstart_
tag,do_
tag, andend_
tag methods for tags defined in HTML 2.0.HTMLParser
implements and overrides methods in terms of calls to methods of a formatter object, covered later in this chapter. You can subclassHTMLParser
to add or override methods. In addition to thestart_
tag,do_
tag, andend_
tag methods, an instance h ofHTMLParser
supplies the following attributes and methods.anchor_bgnh.anchor_bgn(href,name,type)Called for each<a>
tag. href, name, and type are the string values of the tag's attributes with the same names.HTMLParser
's implementation ofanchor_bgn
maintains a list of outgoing hyperlinks (i.e., href arguments of method s.anchor_bgn
) in an instance attribute named s.anchorlist
.anchor_endh.anchor_end( )Called for each</a>
end tag.HTMLParser
's implementation ofanchor_end
emits to the formatter a footnote reference that is an index within s.anchorlist
. In other words, by default,HTMLParser
asks the formatter to format an<a>
/</a>
tag pair as the text inside the tag, followed by a footnote reference number that points to the URL in theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The HTMLParser Module
- Content preview·Buy reprint rights for this chapterModule
HTMLParser
supplies one class,HTMLParser
, that you subclass to override and add methods.HTMLParser.HTMLParser
is similar tosgmllib.SGMLParser
, but is simpler and able to parse XHTML as well. The main differences betweenHTMLParser
andSGMLParser
are the following:-
HMTLParser
does not call back to methods nameddo_
tag,start_
tag, andend_
tag. To process tags and end tags, your subclass X ofHTMLParser
must override methodshandle_starttag
and/orhandle_endtag
and check explicitly for the tags it wants to process. -
HMTLParser
does not keep track of, nor check, tag nesting in any way. -
HMTLParser
does nothing, by default, to resolve character and entity references. Your subclass X ofHTMLParser
must override methodshandle_charref
and/orhandle_entityref
if it needs to perform processing of such references.
The most frequently used methods of an instance h of a subclass X ofHTMLParser
are as follows.closeh.close( )Tells the parser that there is no more input data. When X overridesclose
, h.close
must also callHTMLParser.close
to ensure that buffered data gets processed.feedh.feed(data)Passes to the parser a part of the text being parsed. The parser processes some prefix of the text and holds the rest in a buffer until the next call toAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! -
- Generating HTML
- Content preview·Buy reprint rights for this chapterPython does not come with tools to generate HTML. If you want an advanced framework for structured HTML generation, I recommend Robin Friedrich's HTMLGen 2.2 (available at
https://starship.python.net/crew/friedrich/HTMLgen/html/main.html
), but I do not cover the package in this book. To generate XHTML, you can also use the approaches covered in Section 23.4 in Chapter 23.If your favorite approach is to embed Python code within HTML in the manner made popular by JSP, ASP, and PHP, one possibility is to use Python Server Pages (PSP) as supported by Webware, mentioned in Chapter 20. Another package, focused more specifically on the embedding approach, is Spyce (available athttps://spyce.sf.net/
). For all but the simplest problems, development and maintenance are eased by separating logic and presentation issues through templating, covered in the next section. Both Webware and Spyce optionally support templating in lieu of embedding.To generate HTML, the best approach is often templating. With templating, you start with a template, which is a text string (often read from a file, database, etc.) that is valid HTML, but includes markers, also known as placeholders, where dynamically generated text must be inserted. Your program generates the needed text and substitutes it into the template. In the simplest case, you can use markers of the form '%(
name)s
'. Bind the dynamically generated text as the value for key 'name' in some dictionary d. The Python string formatting operator%
, covered in Chapter 9, now does all you need. If t is your template, t%d is a copy of the template with all values properly substituted.For advanced templating tasks, I recommend Cheetah (available atAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 23: Structured Text: XML
- Content preview·Buy reprint rights for this chapterXML, the eXtensible Markup Language, has taken the programming world by storm over the last few years. Like SGML, XML is a metalanguage, a language to describe markup languages. On top of the XML 1.0 specification, the XML community (in good part inside the World Wide Web Consortium, W3C) has standardized other technologies, such as various schema languages, Namespaces, XPath, XLink, XPointer, and XSLT.Industry consortia in many fields have defined industry-specific markup languages on top of XML, to facilitate data exchange among applications in the various fields. Such industry standards let applications exchange data even if the applications are coded in different languages and deployed on different platforms by different firms. XML, related technologies, and XML-based markup languages are the basis of interapplication, cross-language, cross-platform data interchange in modern applications.Python has excellent support for XML. The standard Python library supplies the
xml
package, which lets you use fundamental XML technology quite simply. The third-party package PyXML (available athttps://pyxml.sf.net
) extends the standard library'sxml
with validating parsers, richer DOM implementations, and advanced technologies such as XPath and XSLT. Downloading and installing PyXML upgrades Python's ownxml
packages, so it can be a good idea to do so even if you don't use PyXML-specific features.On top of PyXML, you can choose to install yet another freely available third-party package, 4Suite (available athttps://4suite.org
). 4Suite provides yet more XML parsers for special niches, advanced technologies such as XLink and XPointer, and code supporting standards built on top of XML, such as the Resource Description Framework (RDF).As an alternative to Python's built-in XML support, PyXML, and 4Suite, you can try ReportLab's new pyRXP, a fast validating XML parser based on Tobin's RXP. pyRXP is DOM-like in that it constructs an in-memory representation of the whole XML document you're parsing. However, pyRXP does not construct a DOM-compliant tree, but rather a lightweight tree of Python tuples to save memory and enhance speed. For more information on pyRXP, seeAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - An Overview of XML Parsing
- Content preview·Buy reprint rights for this chapterWhen your application must parse XML documents, your first, fundamental choice is what kind of parsing to use. You can use event-driven parsing, where the parser reads the document sequentially and calls back to your application each time it parses a significant aspect of the document (such as an element). Or you can use object-based parsing, where the parser reads the whole document and builds in-memory data structures, representing the document, that you can then navigate. SAX is the main, normal way to perform event-driven parsing, and DOM is the main, normal way to perform object-based parsing. In each case there are alternatives, such as direct use of expat for event-driven parsing and pyRXP for object-based parsing, but I do not cover these alternatives in this book. Another interesting possibility is offered by
pulldom
, which is covered later in this chapter.Event-driven parsing requires fewer resources, which makes it particularly suitable when you need to parse very large documents. However, event-driven parsing requires you to structure your application accordingly, performing your processing (and typically building auxiliary data structures) in your methods that are called by the parser. Object-based parsing gives you more flexibility about the ways in which you can structure your application. It may be more suitable when you need to perform very complicated processing, as long as you can afford the extra resources needed for object-based parsing (typically, this means that you are not dealing with very large documents). Object-based approaches also support programs that need to modify or create XML documents, as covered later in this chapter.As a general guideline, when you are still undecided after studying the various trade-offs, I suggest you try event-driven parsing when you can see a reasonably direct way to perform your program's tasks through this approach. Event-driven parsing is more scalable; therefore, if your program can perform its task via event-driven parsing, it will be applicable to larger documents than it would be able to handle otherwise. If event-driven parsing is too confining, tryAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Parsing XML with SAX
- Content preview·Buy reprint rights for this chapterIn most cases, the best way to extract information from an XML document is to parse the document with a parser compliant with SAX, the Simple API for XML. SAX defines a standard API that can be implemented on top of many different underlying parsers. The SAX approach to parsing has similarities to the HTML parsers covered in Chapter 22. As the parser encounters XML elements, text contents, and other significant events in the input stream, the parser calls back to methods of your classes. Such event-driven parsing, based on callbacks to your methods as relevant events occur, also has similarities to the event-driven approach that is almost universal in GUIs and in some networking frameworks. Event-driven approaches in various programming fields may not appear natural to beginners, but enable high performance and particularly high scalability, making them very suitable for high-workload cases.To use SAX, you define a content handler class, subclassing a library class and overriding some methods. Then, you build a parser object p, install an instance of your class as p's handler, and feed p the input stream to parse. p calls methods on your handler to reflect the document's structure and contents. Your handler's methods perform application-specific processing. The
xml.sax
package supplies a factory function to build p, as well as convenience functions for simpler operation in typical cases.xml.sax
also supplies exception classes, used to diagnose invalid input and other errors.Optionally, you can also register with parser p other kinds of handlers besides the content handler. You can supply a custom error handler to use an error diagnosis strategy different from normal exception raising, and try to diagnose several errors during a parse. You can supply a custom DTD handler to receive information about notation and unparsed entities from the XML document's Document Type Definition (DTD). You can supply a custom entity resolver to handle external entity references in advanced, customized ways. These additional possibilities are advanced and rarely used, so I do not cover them in this book.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Parsing XML with DOM
- Content preview·Buy reprint rights for this chapterSAX parsing does not build any structure in memory to represent the XML document. This makes SAX fast and highly scalable, as your application builds exactly as little or as much in-memory structure as needed for its specific tasks. However, for particularly complicated processing tasks involving reasonably small XML documents, you may prefer to let the library build in-memory structures that represent the whole XML document, and then traverse those structures. The XML standards describe the DOM (Document Object Model) for XML. A DOM object represents an XML document as a tree whose root is the document object, while other nodes correspond to elements, text contents, element attributes, and so on.The Python standard library supplies a minimal implementation of the XML DOM standard,
xml.dom.minidom
.minidom
builds everything up in memory, with the typical pros and cons of the DOM approach to parsing. The Python standard library also supplies a different DOM-like approach in modulexml.dom.pulldom
.pulldom
occupies an interesting middle ground between SAX and DOM, presenting the stream of parsing events as a Python iterator object so that you do not code callbacks, but rather loop over the events and examine each event to see if it's of interest. When you do find an event of interest to your application, you can askpulldom
to build the DOM subtree rooted in that event's node by calling methodexpandNode
, and then work with that subtree as you would inminidom
. Paul Prescod,pulldom
's author and XML and Python expert, describes the net result as "80% of the performance of SAX, 80% of the convenience of DOM." Other DOM parsers are part of the PyXML and 4Suite extension packages, mentioned at the start of this chapter.Thexml.dom
package supplies exception classDOMException
and subclasses of it to support fine-grained exception handling.xml.dom
also supplies a classAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Changing and Generating XML
- Content preview·Buy reprint rights for this chapterJust like for HTML and other kinds of structured text, the simplest way to output an XML document is often to prepare and write it using Python's normal string and file operations, covered in Chapter 9 and Chapter 10. Templating, covered in Chapter 22, is also often the best approach. Subclassing class
XMLGenerator
, covered earlier in this chapter, is a good way to generate an XML document that is like an input XML document, except for a few changes.Thexml.dom.minidom
module offers yet another possibility, because its classes support methods to generate, insert, remove, and alter nodes in a DOM tree representing the document. You can create a DOM tree by parsing and then alter it, or you can create an empty DOM tree and populate it, and then output the resulting XML document with methodstoxml
,toprettyxml
, orwritexml
of theDocument
instance. You can also output a subtree of the DOM tree by calling these methods on theNode
that is the subtree's root.TheDocument
class supplies factory methods to create new instances of subclasses ofNode
. The most frequently used factory methods of aDocument
instance d are as follows.createCommentd.createComment(data)Builds and returns an instance c of classComment
for a comment with text data.createElementd.createElement(tagname)Builds and returns an instance e of classElement
for an element with the given tag.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 24: Extending and Embedding Classic Python
- Content preview·Buy reprint rights for this chapterClassic Python runs on a portable C-coded virtual machine. Python's built-in objects, such as numbers, sequences, dictionaries, and files, are coded in C, as are several modules in Python's standard library. Modern platforms support dynamic-load libraries, with file extensions such as .dll on Windows and .so on Linux, and building Python produces such binary files. You can code your own extension modules for Python in C, using the Python C API covered in this chapter, to produce and deploy dynamic libraries that Python scripts and interactive sessions can later use with the
import
statement, covered in Chapter 7.Extending Python means building modules that Python code canimport
to access the features the modules supply. Embedding Python means executing Python code from your application. For such execution to be useful, Python code must in turn be able to access some of your application's functionality. In practice, therefore, embedding implies some extending, as well as a few embedding-specific operations.Embedding and extending are covered extensively in Python's online documentation; you can find an in-depth tutorial athttps://www.python.org/doc/ext/ext.html
and a reference manual athttps://www.python.org/doc/api/api.html
. Many details are best studied in Python's extensively documented sources. Download Python's source distribution and study the sources of Python's core, C-coded extension modules and the example extensions supplied for study purposes.This chapter covers the basics of extending and embedding Python with C. It also mentions, but does not cover, other possibilities for extending Python.A Python extension module named x resides in a dynamic library with the same filename (x.pyd on Windows, x.so on most Unix-like platforms) in an appropriate directory (normally the site-packages subdirectory of the Python library directory). You generally build theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Extending Python with Python's C API
- Content preview·Buy reprint rights for this chapterA Python extension module named x resides in a dynamic library with the same filename (x.pyd on Windows, x.so on most Unix-like platforms) in an appropriate directory (normally the site-packages subdirectory of the Python library directory). You generally build the x extension module from a C source file x.c with the overall structure:
#include <Python.h> /* omitted: the body of the x module */ void initx(void) { /* omitted: the code that initializes the module named x */ }
When you have built and installed the extension module, a Python statementimport
x loads the dynamic library, then locates and calls the function namedinit
x, which must do all that is needed to initialize the module object named x.To build and install a C-coded Python extension module, it's simplest and most productive to use the distribution utilities,distutils
, covered in Chapter 26. In the same directory as x.c, place a file named setup.py that contains at least the following statements:from distutils.core import setup, Extension setup(name='x', ext_modules=[ Extension('x',sources=['x.c']) ])
From a shell prompt in this directory, you can now run:C:\> python setup.py install
to build the module and install it so that it becomes usable in your Python installation. Thedistutils
perform all needed compilation and linking steps, with the right compiler and linker commands and flags, and copy the resulting dynamic library in an appropriate directory, dependent on your Python installation. Your Python code can then access the resulting module with the statementimport
x.Your C functioninit
x generally has the following overall structure:Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Extending Python Without Python's C API
- Content preview·Buy reprint rights for this chapterYou can code Python extensions in other classic compiled languages besides C. For Fortran, the choice is between Paul Dubois's Pyfort (available at
https://pyfortran.sf.net
) and Pearu Peterson's F2PY (available athttps://cens.ioc.ee/projects/f2py2e/
). Both packages support and require theNumeric
package covered in Chapter 15, since numeric processing is Fortran's typical application area.For C++, the choice is between Gordon McMillan's simple, lightweight SCXX (available athttps://www.mcmillan-inc.com/scxx.html
), which uses no templates and is thus suitable for older C++ compilers, Paul Dubois's CXX (available athttps://cxx.sf.net
), and David Abrahams's Boost Python Library (available athttps://www.boost.org/libs/python/doc
). Boost is a package of C++ libraries of uniformly high quality for compilers that support templates well, and includes the Boost Python component. Paul Dubois, CXX's author, recommends considering Boost. You may also choose to use Python's C API from your C++ code, using C++ in this respect as if it was C, and foregoing the extra convenience that C++ affords. However, if you're already using C++ rather than C anyway, then using SCXX, CXX, or Boost can substantially improve your programming productivity when compared to using Python's C API.If your Python extension is basically a wrapper over an existing C or C++ library (as many are), consider SWIG, the Simplified Wrapper and Interface Generator (available athttps://www.swig.org
). SWIG generates the C source code for your extension based on the library's header files, generally with some help in terms of further annotations in an interface description file.Greg Ewing is developing a language, Pyrex, specifically for coding Python extensions. Pyrex (found athttps://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Embedding Python
- Content preview·Buy reprint rights for this chapterIf you have an application already written in C or C++ (or any other classic compiled language), you may want to embed Python as your application's scripting language. To embed Python in languages other than C, the other language must be able to call C functions. In the following, I cover only the C view of things, since other languages vary widely regarding what you have to do in order to call C functions from them.In order for Python scripts to communicate with your application, your application must supply extension modules with Python-accessible functions and classes that expose your application's functionality. If these modules are linked with your application rather than residing in dynamic libraries that Python can load when necessary, register your modules with Python as additional built-in modules by calling the
PyImport_AppendInittab
C API function.PyImport_AppendInittabint PyImport_AppendInittab(char* name,void (*initfunc)(void))name is the module name, which Python scripts use inimport
statements to access the module. initfunc is the module initialization function, taking no argument and returning no result, as covered earlier in this chapter (i.e., initfunc is the module's function that would be namedinit
name for a normal extension module residing in a dynamic library).PyImport_AppendInittab
must be called before callingPy_Initialize
.You may want to set the program name and arguments, which Python scripts can access assys.argv
, by calling either or both of the following C API functions.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 25: Extending and Embedding Jython
- Content preview·Buy reprint rights for this chapterJython implements Python on a Java Virtual Machine (JVM). Jython's built-in objects, such as numbers, sequences, dictionaries, and files, are coded in Java. To extend Classic Python with C, you code C modules using the Python C API (as covered in Chapter 24). To extend Jython with Java, you do not have to code Java modules in special ways: every Java package on the Java
CLASSPATH
(or on Jython'ssys.path
) is automatically available to your Jython scripts and Jython interactive sessions for use with theimport
statement covered in Chapter 7. This applies to Java's standard libraries, third-party Java libraries you have installed, and Java classes you have coded yourself. You can also extend Java with C using the Java Native Interface (JNI), and such extensions will also be available to Jython code, just as if they had been coded in pure Java rather than in JNI-compliant C.For details on advanced issues related to interoperation between Java and Jython, I recommend Jython Essentials, by Samuele Pedroni and Noel Rappin (O'Reilly). In this chapter, I offer a brief overview of the simplest interoperation scenarios, which suffices for a large number of practical needs. Importing, using, extending, and implementing Java classes and interfaces in Jython just works in most practical cases of interest. In some cases, however, you need to be aware of issues related to accessibility, type conversions, and overloading, as covered in this chapter. Embedding the Jython interpreter in Java-coded applications is similar to embedding the Python interpreter in C-coded applications (as covered in Chapter 24), but the Jython task is easier. Jython offers yet another possibility for interoperation with Java, using the jythonc compiler to turn your Python sources into classic, static JVM bytecode .class and .jar files. You can then use these bytecode files in Java applications and frameworks, exactly as if their source code had been in Java rather than in Python.Unlike Java, Jython does not implicitly and automatically importAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Importing Java Packages in Jython
- Content preview·Buy reprint rights for this chapterUnlike Java, Jython does not implicitly and automatically import
java.lang
. Your Jython code can explicitlyimport
java.lang
, or even justimport
java
, and then use classes such asjava.lang.System
andjava.lang.String
as if they were Python classes. Specifically, your Jython code can use imported Java classes as if they were Python classes with a__slots__
class attribute (i.e., you cannot create arbitrary new instance attributes). You can subclass a Java class with your own Python class, and instances of your class let you create new attributes just by binding them, as usual.You may choose to import a top-level Java package (such asjava
) rather than specific subpackages (such asjava.lang
). Your Python code acquires the ability to access all subpackages when you import the top-level package. For example, afterimport
java
, your code can use classesjava.lang.String
,java.util.Vector
, and so on.The Jython runtime wraps every Java class you import in a transparent proxy, which manages communication between Python and Java code behind the scenes. This gives an extra reason to avoid the dubious idiomfrom
somewhereimport
*
, in addition to the reasons mentioned in Chapter 7. When you perform such a bulk import, the Jython runtime must build proxy wrappers for all the Java classes in package somewhere, spending substantial amounts of memory and time wrapping classes your code will probably not use. Avoidfrom
..
.import
*
except for occasional convenience in interactive exploratory sessions, and stick with theimport
statement. Alternatively, it's okay to use specific, explicitfrom
statements for classes you know your Python code wants to use (e.g.,from
java.lang
import
System
).Jython relies on a registry of Java properties as a cross-platform equivalent of the kind of settings that would normally use the Windows registry, or environment variables on Unix-like systems. Jython's registry file is a standard Java properties file namedAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Embedding Jython in Java
- Content preview·Buy reprint rights for this chapterYour Java-coded application can embed the Jython interpreter in order to use Jython for scripting. jython.jar must be in your Java
CLASSPATH
. Your Java code must importorg.python.core.*
andorg.python.util.*
in order to access Jython's classes. To initialize Jython's state and instantiate an interpreter, use the Java statements:PySystemState.initialize( ); PythonInterpreter interp = new PythonInterpreter( );
Jython also supplies several advanced overloads of this method and constructor in order to let you determine in detail howPySystemState
is set up, and to control the system state and global scope for each interpreter instance. However, in typical, simple cases, the previous Java code is all your application needs.Once you have an instance interp of classPythonInterpreter
, you can call method interp.eval
to have the interpreter evaluate a Python expression held in a Java string. You can also call any of several overloads of interp.exec
and interp.execfile
to have the interpreter execute Python statements held in a Java string, a precompiled Jython code object, a file, or a JavaInputStream
.The Python code you execute canimport
your Java classes in order to access your application's functionality. Your Java code can set attributes in the interpreter namespace by calling overloads of interp.set
, and get attributes from the interpreter namespace by calling overloads of interp.get
. The methods' overloads give you a choice. You can work with native Java data and let Jython perform type conversions, or you can work directly withPyObject
, the base class of all Python objects, covered later in this chapter. The most frequently used methods and overloads of aPythonInterpreter
instance interp are the following.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Compiling Python into Java
- Content preview·Buy reprint rights for this chapterJython comes with the jythonc compiler. You can feed jythonc your .py source files, and jythonc compiles them into normal JVM bytecode and packages them into .class and .jar files. Since jythonc generates static, classic bytecode, it cannot quite cope with the whole range of dynamic possibilities that Python allows. For example, jythonc cannot successfully compile Python classes that determine their base classes dynamically at runtime, as the normal Python interpreters allow. However, except for such extreme examples of dynamically changeable class structures, jythonc does support compilation of essentially the whole Python language into Java bytecode.jythonc resides in the Tools/jythonc directory of your Jython installation. You invoke it from a shell (console) command line with the syntax:
jythonc options modules
options are zero or more option flags starting with--
. modules are zero or more names of Python source files to compile, either as Python-style names of modules residing on Python'ssys.path
, or as relative or absolute paths to Python source files. Include the .py extension in each path to a source file, but not in a module name.More often than not, you will specify the jythonc option--jar
jarfile, to build a .jar file of compiled bytecode rather than separate .class files. Most other options deal with what to put in the .jar file. You can choose to make the file self-sufficient (for browsers and other Java runtime environments that do not support using multiple .jar files) at the expense of making the file larger. Option--all
ensures all Jython core classes are copied into the .jar file, while--core
tries to be more conservative, copying as few core classes as feasible. Option--addpackages
packages lets you list (in packages, a comma-separated list) those external Java packages whose classes are copied into theAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Chapter 26: Distributing Extensions and Programs
- Content preview·Buy reprint rights for this chapterPython's
distutils
allow you to package Python programs and extensions in several ways, and to install programs and extensions to work with your Python installation. As I mentioned in Chapter 24, thedistutils
also afford the most effective way to build C-coded extensions you write yourself, even when you are not interested in distributing such extensions. This chapter covers thedistutils
, as well as third-party tools that complement thedistutils
and let you package Python programs for distribution as standalone applications, installable on machines with specific hardware and operating systems without a separate installation of Python.Thedistutils
are a rich and flexible set of tools to package Python programs and extensions for distribution to third parties. I cover typical, simple use of thedistutils
for the most common packaging needs. For in-depth, highly detailed discussion ofdistutils
, I recommend two manuals that are part of Python's online documentation: Distributing Python Modules (available athttps://www.python.org/doc/current/dist/
), and Installing Python Modules (available athttps://www.python.org/doc/current/inst/
), both by Greg Ward, the principal author of thedistutils
.A distribution is the set of files to package into a single file for distribution purposes. A di stribution may include zero, one, or more Python packages and other Python modules (as covered in Chapter 7), as well as, optionally, Python scripts, C-coded (and other) extensions, supporting data files, and auxiliary files containing metadata about the distribution itself. A distribution is said to be pure if all code it includes is Python, and non-pure if it also includes non-Python code (most often, C-coded extensions).You should normally place all the files of a distribution in a directory, known as the distribution root directoryAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - Python's distutils
- Content preview·Buy reprint rights for this chapterThe
distutils
are a rich and flexible set of tools to package Python programs and extensions for distribution to third parties. I cover typical, simple use of thedistutils
for the most common packaging needs. For in-depth, highly detailed discussion ofdistutils
, I recommend two manuals that are part of Python's online documentation: Distributing Python Modules (available athttps://www.python.org/doc/current/dist/
), and Installing Python Modules (available athttps://www.python.org/doc/current/inst/
), both by Greg Ward, the principal author of thedistutils
.A distribution is the set of files to package into a single file for distribution purposes. A di stribution may include zero, one, or more Python packages and other Python modules (as covered in Chapter 7), as well as, optionally, Python scripts, C-coded (and other) extensions, supporting data files, and auxiliary files containing metadata about the distribution itself. A distribution is said to be pure if all code it includes is Python, and non-pure if it also includes non-Python code (most often, C-coded extensions).You should normally place all the files of a distribution in a directory, known as the distribution root directory, and in subdirectories of the distribution root. Mostly, you can arrange the subtree of files and directories rooted at the distribution root to suit your own organizational needs. However, remember from Chapter 7 that a Python package must reside in its own directory, and a package's directory must contain a file named __init__.py (or subdirectories with __init__.py files, for subpackages) as well as other modules belonging to that package.The distribution root directory must contain a Python script that by convention is namedAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The py2exe Tool
- Content preview·Buy reprint rights for this chapterThe
distutils
help you package up your Python extensions and applications. However, an end user can install the resulting packaged form only after installing Python. This is particularly a problem on Windows, where end users want to run a single installer to get an application working on their machine. Installing Python first and then running your application's installer may prove too much of a hassle for such end users.Thomas Heller has developed a simple solution, adistutils
add-on namedpy2exe
, freely available for download fromhttps://starship.python.net/crew/theller/py2exe/
. This URL also contains detailed documentation ofpy2exe
, and I recommend that you study that documentation if you intend to usepy2exe
in advanced ways. However, the simplest kinds of use, which I cover in the rest of this section, cover most practical needs.After downloading and installingpy2exe
(on a Windows machine where Microsoft Visual C++ 6 is also installed), you just need to add the line:import py2exe
at the start of your otherwise normaldistutils
script setup.py. Now, in addition to otherdistutils
commands, you have one more option. Running:python setup.py py2exe
builds and collects in a subdirectory of your distribution root directory an .exe file and one or more .dll files. If your distribution'sname
metadata is, for example,myapp
, then the directory into which the .exe and .dll files are collected is named dist\myapp \. Any files specified by optiondata_files
in your setup.py script are placed in subdirectories of dist\myapp \. The .exe file corresponds to your application's first or single entry in thescripts
keyword argument value, and also contains the bytecode-compiled form of all Python modules and packages that your setup.py specifies or implies. Among the .dll files is, at minimum, the Python dynamic load library, for example python22.dll if you use Python 2.2, plus any other .pyd or .dll files that your application needs, excludingAdditional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing! - The Installer Tool
- Content preview·Buy reprint rights for this chapterGordon McMillan has developed a richer and more general solution to the same problem that
py2exe
solves—preparing compact ways to package up Python applications for installation on end user machines that may not have Python installed. TheInstaller
tool, freely downloadable fromhttps://www.mcmillan-inc.com/install1.html
, is more general thanpy2exe
, which supports only Windows platforms.Installer
natively supports Linux as well as Windows. Also,Installer
's portable, cross-platform architecture may allow you to extend it to support other Unix-like platforms with a reasonable amount of effort.Installer
does not rely ondistutils
. To useInstaller
, you must learn its own specification files' syntax and semantics.Installer
can do much more thanpy2exe
, so it's not surprising that there is more for you to learn before making full use of it. However, I recommend studying and trying outInstaller
if you have the specific need of building standalone Python applications for Linux or other Unix-like architectures, or if you have triedpy2exe
and found it did not quite meet your needs.Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Return to Python in a Nutshell
About O'Reilly | Contact | Jobs | Press Room | How to Advertise | Privacy Policy
© 2008, O'Reilly Media, Inc.
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.