CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Fri, 16 Jan 2026 08:57:46 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20110309045420 location: https://web.archive.org/web/20110309045420/https://planet.python.org/ server-timing: captures_list;dur=1.564579, exclusion.robots;dur=0.032288, exclusion.robots.policy;dur=0.022602, esindex;dur=0.010104, cdx.remote;dur=20.438651, LoadShardBlock;dur=238.164364, PetaboxLoader3.datanode;dur=125.808538, PetaboxLoader3.resolve;dur=44.235031 x-app-server: wwwb-app227-dc8 x-ts: 302 x-tr: 285 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 set-cookie: wb-p-SERVER=wwwb-app227; path=/ x-location: All x-as: 14061 x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Fri, 16 Jan 2026 08:57:47 GMT content-type: text/html x-archive-orig-date: Wed, 09 Mar 2011 04:54:21 GMT x-archive-orig-server: Apache/2.2.16 (Debian) x-archive-orig-last-modified: Wed, 09 Mar 2011 03:45:29 GMT x-archive-orig-etag: "17d8205-25b58-49e048f5ae840" x-archive-orig-accept-ranges: bytes x-archive-orig-content-length: 154456 x-archive-orig-connection: close x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Wed, 09 Mar 2011 04:54:20 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: WIDE-20110309005125-crawl338/WIDE-20110309035222-00007.warc.gz server-timing: captures_list;dur=0.474363, exclusion.robots;dur=0.019240, exclusion.robots.policy;dur=0.009052, esindex;dur=0.010870, cdx.remote;dur=5.636223, LoadShardBlock;dur=219.952828, PetaboxLoader3.datanode;dur=130.376959, PetaboxLoader3.resolve;dur=75.192766, load_resource;dur=108.657945 x-app-server: wwwb-app227-dc8 x-ts: 200 x-tr: 515 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=1 x-location: All x-as: 14061 x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip Planet Python

Planet Python

Last update: March 09, 2011 03:45 AM

March 08, 2011

Will Kahn-Greene

Python Software Foundation Grant for Python Miro Community

A couple of weeks ago at Carl's urging, I applied for a grant from the Python Software Foundation. This would cover Miro Community service costs for the next year as well as work on a series of improvements to the site. Things like:

Universal Subtitles support
using transcriptions in the search corpus for videos
implementing an API in Miro Community allowing for automated data validation

I talked about all this at length in my call for funding.

I'm very pleased to announce that the PSF has awarded me a grant. I know how selective they are in their grant approval and I really appreciate this. It helps me a ton and I will work hard to make it money well spent.

I'll be at PyCon 2011.. I hope to spend some time with Carl, Asheesh and others working on Miro Community. I'm also hoping to talk with people who've used the site and what kinds of things we can make better going forward. If you see me, feel free to say, "Hi!"

March 08, 2011 10:03 PM

Roberto Alsina

OK, so THAT is how much browser I can put in 128 lines of code.

I have already posted a couple of times (1, 2) about De Vicenzo , an attempt to implement the rest of the browser, starting with PyQt's WebKit... limiting myself to 128 lines of code.

Of course I could do more, but I have my standards!

No using ;
No if whatever: f()

Other than that, I did a lot of dirty tricks, but right now, it's a fairly complete browser, and it has 127 lines of code (according to sloccount) so that's enough playing and it's time to go back to real work.

But first, let's consider how some features were implemented (I'll wrap the lines so they page stays reasonably narrow), and also look at the "normal" versions of the same (the "normal" code is not tested, please tell me if it's broken ;-).

This is not something you should learn how to do. In fact, this is almost a treatise on how not to do things. This is some of the least pythonic, less clear code you will see this week.

It is short, and it is expressive. But it is ugly.

I'll discuss this version.

Proxy Support

A browser is not much of a browser if you can't use it without a proxy, but luckily Qt's network stack has good proxy support. The trick was configuring it.

De Vicenzo supports HTTP and SOCKS proxies by parsing a http_proxy environment variable and setting Qt's application-wide proxy:

 proxy_url = QtCore.QUrl(os.environ.get('http_proxy', ''))
 QtNetwork.QNetworkProxy.setApplicationProxy(QtNetwork.QNetworkProxy(\
 QtNetwork.QNetworkProxy.HttpProxy if unicode(proxy_url.scheme()).startswith('http')\
 else QtNetwork.QNetworkProxy.Socks5Proxy, proxy_url.host(),\
 proxy_url.port(), proxy_url.userName(), proxy_url.password())) if\
'http_proxy' in os.environ else None

How would that look in normal code?

if 'http_proxy' in os.environ:
    proxy_url = QtCore.QUrl(os.environ['http_proxy'])
    if unicode(proxy_url.scheme()).starstswith('http'):
        protocol = QtNetwork.QNetworkProxy.HttpProxy
    else:
        protocol = QtNetwork.QNetworkProxy.Socks5Proxy
    QtNetwork.QNetworkProxy.setApplicationProxy(
        QtNetwork.QNetworkProxy(
            protocol,
            proxy_url.host(),
            proxy_url.port(),
            proxy_url.userName(),
            proxy_url.password()))

As you can see, the main abuses against python here are the use of the ternary operator as a one-line if (and nesting it), and line length.

Persistent Cookies

You really need this, since you want to stay logged into your sites between sessions. For this, first I needed to write some persistence mechanism, and then save/restore the cookies there.

Here's how the persistence is done (settings is a global QSettings instance):

def put(self, key, value):
    "Persist an object somewhere under a given key"
    settings.setValue(key, json.dumps(value))
    settings.sync()
def get(self, key, default=None):
    "Get the object stored under 'key' in persistent storage, or the default value"
    v = settings.value(key)
    return json.loads(unicode(v.toString())) if v.isValid() else default

It's not terribly weird code, except for the use of the ternary operator in the last line. The use of json ensures that as long as reasonable things are persisted, you will get them with the same type as you put them without needing to convert them or call special methods.

So, how do you save/restore the cookies? First, you need to access the cookie jar. I couldn't find whether there is a global one, or a per-webview one, so I created a QNetworkCookieJar in line 24 and assign it to each web page in line 107.

# Save the cookies, in the window's closeEvent
self.put("cookiejar", [str(c.toRawForm()) for c in self.cookies.allCookies()])
# Restore the cookies, in the window's __init__
self.cookies.setAllCookies([QtNetwork.QNetworkCookie.parseCookies(c)[0]\
for c in self.get("cookiejar", [])])

Here I confess I am guilty of using list comprehensions when a for loop would have been the correct thing.

I use the same trick when restoring the open tabs, with the added misfeature of using a list comprehension and throwing away the result:

# get("tabs") is a list of URLs
[self.addTab(QtCore.QUrl(u)) for u in self.get("tabs", [])]

Using Properties and Signals in Object Creation

This is a feature of recent PyQt versions: if you pass property names as keyword arguments when you create an object, they are assigned the value. If you pass a signal as a keyword argument, they are connected to the given value.

This is a really great feature that helps you create clear, local code, and it's a great thing to have. But if you are writing evil code... well, you can go to hell on a handbasket using it.

This is all over the place in De Vicenzo, and here's one example (yes, this is one line):

QtWebKit.QWebView.__init__(self, loadProgress=lambda v:\
(self.pbar.show(), self.pbar.setValue(v)) if self.amCurrent() else\
None, loadFinished=self.pbar.hide, loadStarted=lambda:\
self.pbar.show() if self.amCurrent() else None, titleChanged=lambda\
t: container.tabs.setTabText(container.tabs.indexOf(self), t) or\
(container.setWindowTitle(t) if self.amCurrent() else None))

Oh, boy, where do I start with this one.

There are lambda expressions used to define the callbacks in-place instead of just connecting to a real function or method.

There are lambdas that contain the ternary operator:

loadStarted=lambda:\
    self.pbar.show() if self.amCurrent() else None

There are lambdas that use or or a tuple to trick python into doing two things in a single lambda!

loadProgress=lambda v:\
(self.pbar.show(), self.pbar.setValue(v)) if self.amCurrent() else\
None

I won't even try to untangle this for educational purposes, but let's just say that line contains what should be replaced by 3 methods, and should be spread over 6 lines or more.

Download Manager

Ok, calling it a manager is overreaching, since you can't stop them once they start, but hey, it lets you download things and keep on browsing, and reports the progress!

First, on line 16 I created a bars dictionary for general bookkeeping of the downloads.

Then, I needed to delegate the unsupported content to the right method, and that's done in lines 108 and 109

What that does is basically that whenever you click on something WebKit can't handle, the method fetch will be called and passed the network request.

def fetch(self, reply):
    destination = QtGui.QFileDialog.getSaveFileName(self, \
        "Save File", os.path.expanduser(os.path.join('~',\
            unicode(reply.url().path()).split('/')[-1])))
    if destination:
        bar = QtGui.QProgressBar(format='%p% - ' +
            os.path.basename(unicode(destination)))
        self.statusBar().addPermanentWidget(bar)
        reply.downloadProgress.connect(self.progress)
        reply.finished.connect(self.finished)
        self.bars[unicode(reply.url().toString())] = [bar, reply,\
            unicode(destination)]

No real code golfing here, except for long lines, but once you break them reasonably, this is pretty much the obvious way to do it:

Ask for a filename
Create a progressbar, put it in the statusbar, and connect it to the download's progress signals.

Then, of course, we need ths progress slot, that updates the progressbar:

progress = lambda self, received, total:\
    self.bars[unicode(self.sender().url().toString())][0]\
    .setValue(100. * received / total)

Yes, I defined a method as a lambda to save 1 line. [facepalm]

And the finished slot for when the download is done:

def finished(self):
    reply = self.sender()
    url = unicode(reply.url().toString())
    bar, _, fname = self.bars[url]
    redirURL = unicode(reply.attribute(QtNetwork.QNetworkRequest.\
        RedirectionTargetAttribute).toString())
    del self.bars[url]
    bar.deleteLater()
    if redirURL and redirURL != url:
        return self.fetch(redirURL, fname)
    with open(fname, 'wb') as f:
        f.write(str(reply.readAll()))

Notice that it even handles redirections sanely! Beyond that, it just hides the progress bar, saves the data, end of story. The longest line is not even my fault!

There is a big inefficiency in that the whole file is kept in memory until the end. If you download a DVD image, that's gonna sting.

Also, using with saves a line and doesn't leak a file handle, compared to the alternatives.

Printing

Again Qt saved me, because doing this manually would have been a pain. However, it turns out that printing is just ... there? Qt, specially when used via PyQt is such an awesomely rich environment.

self.previewer = QtGui.QPrintPreviewDialog(\
    paintRequested=self.print_)
self.do_print = QtGui.QShortcut("Ctrl+p",\
    self, activated=self.previewer.exec_)

There's not even any need to golf here, that's exactly as much code as you need to hook Ctrl+p to make a QWebView print.

Other Tricks

There are no other tricks. All that's left is creating widgets, connecting things to one another, and enjoying the awesome experience of programming PyQt, where you can write a whole web browser (except the engine) in 127 lines of code.

March 08, 2011 09:45 PM

Imaginary Landscape

Security for Mobile Applications

Imaginary Landscape has been putting significant effort into developing usable and secure mobile websites. Understanding the context of a mobile user is the first step in developing security protocols to protect mobile access and information.

A new blog posting on the Imaginary Landscape main website describes how we approach mobile ...

March 08, 2011 06:38 PM

Matt Harrison

PyCon, a new job and throwing out 70% of your servers

PyCon is coming up. I'm looking forward to some warmer climes after shoveling 8 inches of heavy snow earlier today. I'll be teaching 2 tutorials, Beginner and Intermediate Hands-on Python, and will post materials for those soon.

Earlier this year, I began working for a purveyor of fast storage devices, Fusion-IO. We make enterprise storage that makes up for Moore's Law not really working for spinning disk. Enough with the marketing talk, here's a post by Jeremy Zawodny, on how Craigslist used Fusion-IO devices to go from 14 overloaded servers to 4 underutilized servers. Needless to say, these things are selling like hotcakes and we use a lot of Python. BTW, we are hiring and there will be a few other guys from Fusion-IO at PyCon. Feel free to inquire.

In other news, I've joined the 21st century and am slowly ramping up on using my twitter account, "dunder mharrison".

March 08, 2011 05:34 PM

Grig Gheorghiu

Monitoring is for ops what testing is for dev

Devops. It's the new buzzword. Go to any tech conference these days and you're sure to find an expert panel on the 'what' and 'why' of devops. These panels tend to be light on the 'how', because that's where the rubber meets the road. I tried to give a step-by-step description of how you can become a Ninja Rockstar Internet Samurai devops in my blog post on 'How to whip your infrastructure into shape'.

Here I just want to say that I am struck by the parallels that exist between the activities of developer testing and operations monitoring. It's not a new idea by any means, but it's been growing on me recently.

Test-infected vs. monitoring-infected

Good developers are test-infected. It doesn't matter too much whether they write tests before or after writing their code -- what matters is that they do write those tests as soon as possible, and that they don't consider their code 'done' until it has a comprehensive suite of tests. And of course test-infected developers are addicted to watching those dots in the output of their favorite test runner.

Good ops engineers are monitoring-infected. They don't consider their infrastructure build-out 'done' until it has a comprehensive suite of monitoring checks, notifications and alerting rules, and also one or more dashboard-type systems that help them visualize the status of the resources in the infrastructure.

Adding tests vs. adding monitoring checks

Whenever a bug is found, a good developer will add a unit test for it. It serves as a proof that the bug is now fixed, and also as a regression test for that bug.

Whenever something unexpectedly breaks within the systems infrastructure, a good ops engineer will add a monitoring check for it, and if possible a graph showing metrics related to the resource that broke. This ensures that alerts will go out in a timely manner next time things break, and that correlations can be made by looking at the metrics graphs for the various resources involved.

Ignoring broken tests vs. ignoring monitoring alerts

When a test starts failing, you can either fix it so that the bar goes green, or you can ignore it. Similarly, if a monitoring alert goes off, you can either fix the underlying issue, or you can ignore it by telling yourself it's not really critical.

The problem with ignoring broken tests and monitoring alerts is that this attitude leads slowly but surely to the Broken Window Syndrome. You train yourself to ignore issues that sooner or later will become critical (it's a matter of when, not if).

A good developer will make sure there are no broken tests in their Continuous Integration system, and a good ops engineer will make sure all alerts are accounted for and the underlying issues fixed.

Improving test coverage vs. improving monitoring coverage

Although 100% test coverage is not sufficient for your code to be bug-free, still, having something around 80-90% code coverage is a good measure that you as a developer are disciplined in writing those tests. This makes you sleep better at night and gives you pride in producing quality code.

For ops engineers, sleeping better at night is definitely directly proportional to the quantity and quality of the monitors that are in place for their infrastructure. The more monitors, the better the chances that issues are caught early and fixed before they escalate into the dreaded 2 AM pager alert.

Measure and graph everything

The more dashboards you have as a devops, the better insight you have into how your infrastructure behaves, from both a code and an operational point of view. I am inspired in this area by the work that's done at Etsy, where they are graphing every interesting metric they can think of (see their 'Measure Anything, Measure Everything' blog post).

As a developer, you want to see your code coverage graphs showing decent values, close to that mythical 100%. As an ops engineer, you want to see uptime graphs that are close to the mythical 5 9's.

But maybe even more importantly, you want insight into metrics that tie directly into your business. At Evite, processing messages and sending email reliably is our bread and butter, so we track those processes closely and we have dashboards for metrics related to them. Spikes, either up or down, are investigated quickly.

Here are some examples of the dashboards we have. For now these use homegrown data collection tools and the Google Visualization API, but we're looking into using Graphite soon.

Outgoing email messages in the last hour (spiking at close to 100 messages/second):

Size of various queues we use to process messages (using a homegrown queuing mechanism):

Percentage of errors across some of our servers:

Associated with these metrics we have Nagios alerts that fire when certain thresholds are being met. This combination allows our devops team to sleep better at night.

March 08, 2011 04:34 PM

Python User Groups

pyCologne Python User Group Cologne - Meeting, March 9, 2011, 6.30pm

The next meeting of pyCologne will take place:

Wednesday, March, 9th starting about 6.30 pm - 6.45 pm
at Room 0.14, Benutzerrechenzentrum (RRZK-B)
University of Cologne, Berrenrather Str. 136, 50937 Köln, Germany

Any presentations, news, book presentations etc. are welcome on each of our meetings!

At about 8.30 pm we will as usual enjoy the rest of the evening in a nearby restaurant.

Further information including directions how to get to the location can be found at:

https://www.pycologne.de

(Sorry, the web-links are in German only.)

March 08, 2011 02:36 PM

Mikko Ohtamaa

Installing and using Scrapy web crawler to search text on multiple sites

Here is a little script to use Scrapy, a web crawling framework for Python, to search sites for references for certain texts including link content and PDFs. This is handy for cases where you need to find links violating the user policy, trademarks which are not allowed or just to see where your template output is being used. Our Scrapy example differs from a normal search engine as it does HTML source code level checking: you can also search for CSS classes, link targets and other elements which may be invisible for normal search engines.

Scrapy comes with a command-line tool and project skeleton generator. You need to generate your own Scrapy project to where you can then add your own spider classes.

Install Scrapy using Distribute (or setuptools):

easy_install Scrapy

Create project code skeleton:

scrapy startproject myscraper

Add your spider class skeleton by creating a file myscraper/spiders/spiders.py:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
class MySpider(CrawlSpider):
 "carview.php?tsp="" Crawl through web sites you specify "carview.php?tsp=""
 name = "mycrawler"
 # Stay within these domains when crawling
 allowed_domains = ["www.mysite.com"]
 start_urls = [
 "https://www.mysite.com/",
 ]
 # Add our callback which will be called for every found link
 rules = [
   Rule(SgmlLinkExtractor(), follow=True)
 ]

Start Scrapy to test it’s crawling properly. Run the following the top level directoty:

scrapy crawl mycrawler

You should see output like:

2011-03-08 15:25:52+0200 [scrapy] INFO: Scrapy 0.12.0.2538 started (bot: myscraper)
2011-03-08 15:25:52+0200 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-03-08 15:25:52+0200 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware

You can hit CTRL+C to interrupt scrapy.

Then let’s enhance the spider a bit to search for a blacklisted tags, with optional whitelisting in myscraper/spiders/spiders.py. We use also pyPdf library to crawl inside PDF files:

"carview.php?tsp=""
        A sample crawler for seeking a text on sites.
"carview.php?tsp=""
import StringIO
from functools import partial
from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.item import Item
def find_all_substrings(string, sub):
    "carview.php?tsp=""
https://code.activestate.com/recipes/499314-find-all-indices-of-a-substring-in-a-given-string/
    "carview.php?tsp=""
    import re
    starts = [match.start() for match in re.finditer(re.escape(sub), string)]
    return starts
class MySpider(CrawlSpider):
    "carview.php?tsp="" Crawl through web sites you specify "carview.php?tsp=""
    name = "mycrawler"
    # Stay within these domains when crawling
    allowed_domains = ["www.mysite.com", "www.mysite2.com", "intranet.mysite.com"]
    start_urls = [
        "https://www.mysite.com/",
        "https://www.mysite2.com/",
        "https://intranet.mysite.com/"
    ]
    # Add our callback which will be called for every found link
    rules = [
        Rule(SgmlLinkExtractor(), follow=True, callback="check_violations")
    ]
    # How many pages crawled? XXX: Was not sure if CrawlSpider is a singleton class
    crawl_count = 0
    # How many text matches we have found
    violations = 0
    def get_pdf_text(self, response):
        "carview.php?tsp="" Peek inside PDF to check possible violations.
        @return: PDF content as searcable plain-text string
        "carview.php?tsp=""
        try:
                from pyPdf import PdfFileReader
        except ImportError:
                print "Needed: easy_install pyPdf"
                raise 
        stream = StringIO.StringIO(response.body)
        reader = PdfFileReader(stream)
        text = u"carview.php?tsp="
        if reader.getDocumentInfo().title:
                # Title is optional, may be None
                text += reader.getDocumentInfo().title
        for page in reader.pages:
                # XXX: Does handle unicode properly?
                text += page.extractText()
        return text                                      
    def check_violations(self, response):
        "carview.php?tsp="" Check a server response page (file) for possible violations "carview.php?tsp=""
        # Do some user visible status reporting
        self.__class__.crawl_count += 1
        crawl_count = self.__class__.crawl_count
        if crawl_count % 100 == 0:
                # Print some progress output
                print "Crawled %d pages" % crawl_count
        # Entries which are not allowed to appear in content.
        # These are case-sensitive
        blacklist = ["meat", "ham" ]
        # Enteries which are allowed to appear. They are usually
        # non-human visible data, like CSS classes, and may not be interesting business wise
        exceptions_after = [ "meatball",
                             "hamming",
                             "hamburg"
                     ]
        # These are predencing string where our match is allowed
        exceptions_before = [
                "bushmeat",
                "honeybaked ham"
        ]
        url = response.url
        # Check response content type to identify what kind of payload this link target is
        ct = response.headers.get("content-type", "carview.php?tsp=").lower()
        if "pdf" in ct:
                # Assume a PDF file
                data = self.get_pdf_text(response)
        else:
                # Assume it's HTML
                data = response.body
        # Go through our search goals to identify any "bad" text on the page
        for tag in blacklist:
                substrings = find_all_substrings(data, tag)
                # Check entries against the exception list for "allowed" special cases
                for pos in substrings:
                        ok = False
                        for exception in exceptions_after:
                                sample = data[pos:pos+len(exception)]
                                if sample == exception:
                                        #print "Was whitelisted special case:" + sample
                                        ok = True
                                        break
                        for exception in exceptions_before:
                                sample = data[pos - len(exception) + len(tag): pos+len(tag) ]
                                #print "For %s got sample %s" % (exception, sample)
                                if sample == exception:
                                        #print "Was whitelisted special case:" + sample
                                        ok = True
                                        break
                        if not ok:
                                self.__class__.violations += 1
                                print "Violation number %d" % self.__class__.violations
                                print "URL %s" % url
                                print "Violating text:" + tag
                                print "Position:" + str(pos)
                                piece = data[pos-40:pos+40].encode("utf-8")
                                print "Sample text around position:" + piece.replace("\n", " ")
                                print "------"
        # We are not actually storing any data, return dummy item
        return Item()
    def _requests_to_follow(self, response):
        if getattr(response, "encoding", None) != None:
                # Server does not set encoding for binary files
                # Do not try to follow links in
                # binary data, as this will break Scrapy
                return CrawlSpider._requests_to_follow(self, response)
        else:
                return []

Let’s tune down logging output level, so we get only relevant data in the output. In myscaper/settings.py add:

LOG_LEVEL="INFO"

Now you can run the crawler and pipe the output to a text file:

scrapy crawl mycrawler > violations.txt

More information

Scrapy manual

Read our blog Subscribe mFabrik blog in a reader Follow me on Twitter

March 08, 2011 01:37 PM

Wingware

Wingware at PyCon 2011

Wingware will be at PyCon 2011 Friday through Monday this coming weekend (March 11th-14th). For those attending the conference: Please stop by to see us and pick up some Wingware swag at booth 321 in the Expo Hall on Friday or Saturday. We will also be participating in the Python IDE Panel on Saturday at 11:45AM in Centennial I and are planning two open spaces where we can provide demos, answer questions, or show the new features in Wing IDE 4.0. Hope to see you there!

March 08, 2011 09:29 AM

Eli Bendersky

Non-constant global initialization in C and C++

Consider this code:

int init_func()
{
    return 42;
}
int global_var = init_func();
int main()
{
    return global_var;
}

Is it valid C? Is it valid C++?

Curiously, the answer to the first question is no, and to the second question is yes. This can be easily checked with a compiler:

$ gcc -Wall -pedantic global_init.c
global_init.c:7: error: initializer element is not constant
$ g++ -Wall -pedantic global_init.c
$ a.out; echo $?
42

The C standard prohibits initialization of global objects with non-constant values. Section 6.7.8 of the C99 standard states:

All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.

What is an object with static storage duration? This is defined in section 6.2.4:

An object whose identifier is declared with external or internal linkage, or with the storage-class specifier static has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.

C++ is a different story, however. In C++ much more is being determined at runtime before the user’s main function runs. This is in order to allow proper construction of global and static objects (C++ objects may have user-defined constructors, which isn’t true for C).

Peeking at the disassembled code produced by the g++ for our code snippet, we see some interesting symbols, among them __do_global_ctors_aux and _Z41__static_initialization_and_destruction_0ii, both executed before our main.

In particular, _Z41__static_initialization_and_destruction_0ii does the actual initialization of global_var. Here are the relevant lines:

40055d:  callq  400528 <_Z9init_funcv>
400562:  mov    %eax,2098308(%rip) # 6009ec <global_var>

init_func is called (its name is distorted due to C++ name mangling), and then its return value (which is in eax) is assigned to global_var.

Initialization of structures and arrays in C++ Suppose you have a large array or structure containing important...
Variable initialization in C++ There are many ways to initialize a variable in C++....
Array initialization with enum indices in C but not C++ Suppose you have the following scenario: some function you’re writing...

March 08, 2011 05:00 AM

March 07, 2011

PyCon US

PyCon 2011: Program Guide on iOS and Android Devices

We are proud to announce that the PyCon Program Guide is available on your Apple iOS and Android devices, via the Conventionist app from Proxima Labs (Conventionist). This app is free of charge, commercial free, and once the program is downloaded, will not require your data plan or wireless.

To install, follow the following link: Conventionist - Get It! or search the App Store or Android Market for 'conventionist' from Proxima Labs. Once the application is installed, run it and select 'Download Guides'. Look for and select the "PyCon US '11" guide.

The entire schedule, including tutorials, with detailed information is available, as well as information on all our sponsors and exhibitors. Maps of the conference area, exhibitors room, and poster session are included. You can create a personal schedules with reminders naively; this is not connected to the personal schedule feature on our website.

Very special thanks to Jeff Lewis, Peter Lada, and the entire Proxima Labs team for providing such a fantastic service!

March 07, 2011 10:44 PM

Mike C. Fletcher

PyPy hits 3x speed (or 1/12th, or 2.5x depending on the sign-post)

Another day playing with PyPy. First up was a pleasant surprise in that the 2x slowdown reduced with the currently nightly build[1], bringing performance up from 40,000cps to ~60,000cps.

After that, applied a micro-optimization that got me about a 6% speedup; I eliminated the use of a state "struct object" (object that just used regular attribute access and no methods) in favour of passing all of the state as explicit arguments and returning the state modification(s) to the caller. Not a huge win, but what it did do is make it possible for cfbolz to point out that, with the modifications, there was an unneeded re-raising of an exception in one of the most heavily used methods.

Eliminating just that trivial operation caused performance to triple (from ~60,000cps to a somewhat respectable 180,000cps). Baseline for the naive rewrite in cPython was 82,000cps, so we're suddenly seeing a real performance improvement.

Second fix cfbolz proposed was a bit more... evil... basically replace a for-loop with a recursive call, which caused performance to jump to 265,000cps on pypy, but caused it to drop noticeably on cpython. A little bit of code to test for pypy before using the pypy hack eliminates that issue.

End of the day, the code-base is markedly faster on pypy, but has also been optimized (slightly) for cpython as well. cpython parses at 110,000cps, pypy at 265,000cps on the test file. That puts us at ~1/12th the speed of the optimized C for pypy and ~1/30th for cpython.

There are still lots of things I want to explore, but I don't have the time to work on them today, so I suppose that will be next week...

[1] at the time I'd thought it had been entirely reduced, but that was because I ran the cPython test in the wrong window (duh!)

March 07, 2011 08:50 PM

Greg Turnquist

Runnable code fragments are important

As I work through the rewrites for chapter 3, I am really thankful that I focused on making every block of code runnable when I first wrote these chapters. Maybe that sounds strange, but it isn't that uncommon to be writing a recipe and want to go back and add a step that was missed. You write some extra code in the draft and then move on.

Sometimes it can be very tempting to just put that extra bit in the draft, especially when working towards a deadline. But I paid extra special attention to capturing changes in code, even starting a new file if it was an alteration to code already run. So now I'm just having to include the steps to creating the files which I forgot to include the first time around. This has given me extra confidence in the quality of the code, and I haven't received any comments about bugs yet. Yeah!

This makes me feel good that my readers just have to copy down the code and they will be able to run things just like me. In a software book, being able to run the code is vital!

Permalink | Leave a comment »

March 07, 2011 01:29 PM

Calvin Spealman

How To Attend Pycon 2011

Short announcement:

I will be at Pycon this year. Also, that Saturday, March 12, is my birthday.

See you there!

March 07, 2011 11:48 AM

PyCon US

PyCon 2011: Live on Startup Row

We had a torrent of interest when we announced Startup Row for PyCon 2011. At that time, we only had six or seven companies to start. Well, due to the immense interest, we are happy to announce the final slate of entrants for Startup Row at PyCon 2011 - fifteen different startups that are making it happen with Python.

It is worth quoting just a little from the original post introducing Startup Row:

"carview.php?tsp=""Since the beginning, Python has always been strongly associated with startups and entrepreneurs.... For Startup Row, we wanted to look toward the future - companies that are just starting today, but may become household names in the future."carview.php?tsp=""

The founders of these companies will be at PyCon for the mail conference days, and for one day they will be participating in the Expo Hall. The other days they will be participating at PyCon with everyone else, so look around - the person next to you may have just started a company.

So without further ado, here are the fifteen Startup Row Finalists:

Friday

CollectorDASH: One of the things that make us unique is that we collect things - stamps, dishes, pennies, blue glass vases, art, everything. This is a vibrant part of our culture, but the tools to empower collectors are stuck back in the last century. CollectorDASH builds applications and marketplaces for collectors, applying modern technologies to make collecting more efficient, fun and affordable.

CrowdBooster: Everything is social these days - and Twitter is the fastest-moving social platform of all. Crowdbooster gives you twitter analytics that let you understand who is talking about you in real time.

Stormpulse: Are you tired of watching the news only to find out where a thunderstorm was 20 minutes ago? It is especially important to have up-to-date information if your business depends on the weather. Stormpulse delivers high-fidelity weather intelligence to a broad range of industries including energy, manufacturing, transportation, defense, healthcare, and retail. Businesses and government agencies subscribe to this intelligence to improve their long-range planning and daily decision-making surrounding the arrival and impact of significant weather events in the continental United States and Caribbean.

AppNest: AppNest is a distribution platform for mobile applications. Specifically, they are making the installation of private applications really simple. Ad-hoc distribution is the current technique used by iOS developers - but the publication and distribution process for ad-hoc applications is byzantine and difficult. AppNest makes this process a lot simpler. With AppNest, users can install the application using nothing more than a web browser and a native iOS application. No iTunes required, everything can be done on the mobile device.

Saaspire: Saaspire is building a suite of products based around "behavioral data": the data that you create when you say or do things on the Internet. Saaspire is launching their first product, FocusLab, right now. It's a behavioral data analysis tool. They are also rolling out a SaaS platform that will allow developers to easily add personalization and other behavioral data services to their apps.

Olark: One of the most frustrating things for any business is having people come into your store, put items in their carts, and then... leave. Olark helps businesses engage with customers before they leave the site, increasing the numbers of visitors that become customers.

Ep.io: Ep.io is a smart Python hosting platform. Not only do they take care of your load balancing, database configuration, redundancy, backups and code pushes for you, they support any WSGI-compatable framework or application, and any Python library we can install from PyPi. By running things in their secure, shared environment, they can keep costs down. Within their free resource limits, if you have a small site you just want people to see, just throw it up - a few minutes, and it'll be up and running at no cost. Above that, they bill you only by the amount of resources you actually use.

Saturday

DotCloud: DotCloud takes Platform-as-a-Service to the next level by allowing you to choose your /own/ platform that will be supported by DotCloud. DotCloud allows developers to be developers and not system administrators. DotCloud has recently received 800K in Angel Funding and is growing like mad.

Mailgun: Email is an essential part of doing business today - but building proper email interfaces can be tricky and error-prone. Mailgun provides an email interface for your app, allowing you to integrate email into your existing processes in a simple and easy way.

Glancely: Just one look at Glancely and you get it - instant product search. Not thousands of clicks through various confusing catalogs - just see what you want, click it, its yours.

NodeRabbit: NodeRabbit is all about it easier to build and deploy web applications. Our first product is DjangoZoom, a Heroku-like platform-as-a-service that radically speeds up the deployment of Django apps. Early adopters have described DjangoZoom as "magical" and a service that they can't live without.

ACL Systems: ACL Systems is a new startup focusing on the next wave in education - online education. For some aspects of education, there is nothing like being in class with a teacher. For everything else, the cost and availability advantages of online education are disrupting this established business. ACL System's online education platform makes online education effective and affordable.

MBA Sciences: MBA Sciences is a self-funded startup focused on allowing programmers to exploit parallelism and rapidly create scalable, fault-tolerant distributed applications that are rock solid from day one. MBA Sciences was highlighted by the Supercomputing 2010 Conference as a Disruptive Technology due to the ability of SPM.Python to perform the non-differentiating heavy lifting by removing the pain typically associated with authoring, testing and maintaining parallel capabilities.

Eldarion: Eldarion builds and runs great websites like Quisition and Typewar and helps you do the same. They get you from idea to launch sooner by developing custom and open source components for Django/Pinax and providing hosting on the Gondor platform.
Beautylish: Beauty products is not where you would necessarily expect to find Python - but it is there. Beautylish is a social network for beauty, cosmetics, and brands. They focus on video tutorials, candid product reviews, and conversations amongst users. The service actually launched in beta form this past January, and has already accumulated a small but loyal user base. They have already raised $1 million in seed funding from a group of prominent angel investors, including Ron Conway's SV Angel.

Edit: Hacker News Discussion Link.

March 07, 2011 11:38 AM

Python Software Foundation

Call for submissions for promotional brochure

A new PSF project aims to create professional quality promotional material about Python. The first goal is to create a brochure to showcase the many ways Python is used. It will include use cases to highlight the ways the language allows users to accomplish their tasks both in educational and in professional settings.

Project team members Marc-André Lemburg, Jan Ulrich Hasecke, and Armin Stross-Radschinski created this Plone marketing brochure for the German Zope User Group. It is the inspiration for this new project.

Community feedback and awareness is vitally important for the success of this initiative, mainly to gather information to be used in the brochure. We are especially looking for interesting projects that can be discussed as use-cases.

If you have any suggestions for information to include in the brochure, please contact Marc-André Lemburg or send an email to brochure AT getpython DOT info.

March 07, 2011 11:07 AM

Eli Bendersky

From C to AST and back to C with pycparser

Ever since I first released pycparser, people were asking me if it’s possible to generate C code back from the ASTs it creates. My answer was always – "sure, it was done by other users and doesn’t sound very difficult".

But recently I thought, why not add an example to pycparser’s distribution showing how one could go about it. So this is exactly what I did, and such an example (examples/c-to-c.py) is part of pycparser version 2.03 which was released today.

Dumping C back from pycparser ASTs turned out to be not too difficult, but not as trivial as I initially imagined. Some particular points of interest I ran into:

I couldn’t use the generic node visitor distributed with pycparser, because I needed to accumulate generated strings from a node’s children.
C types were, as usual, a problem. This led to an interesting application of non-trivial recursive AST visiting. To properly print out types, I had to accumulate pointer, array and function modifiers (see the _generate_type method for more details) while traversing down the tree, using this information in the innermost nodes.
C statements are also problematic, because some expressions can be both parts of other expressions and statements on their own right. This makes it a bit tricky to decide when to add semicolons after expressions.
ASTs encode operator precedence implicitly (i.e. there’s no need for it). But how do I print it back into C? Just parenthesizing both sides of each operator quickly gets ugly. So the code uses some heuristics to not parenthesize some nodes that surely have precedence higher than all binary operators. a = b + (c * k) definitely looks better than a = (b) + ((c) * (k)), though both would parse back into the same AST. This applies not only to operators but also to things like structure references. *foo->bar and (*foo)->bar mean different things to a C compiler, and c-to-c.py knows to parenthesize the left-side only when necessary.

Here’s a sample function before being parsed into an AST:

const Entry* HashFind(const Hash* hash, const char* key)
{
    unsigned int index = hash_func(key, hash->table_size);
    Node* temp = hash->heads[index];
    while (temp != NULL)
    {
        if (!strcmp(key, temp->entry->key))
            return temp->entry;
        temp = temp->next;
    }
    return NULL;
}

And here it is when dumped back from a parsed AST by c-to-c.py:

const Entry *HashFind(const Hash *hash, const char *key)
{
  int unsigned index = hash_func(key, hash->table_size);
  Node *temp = hash->heads[index];
  while (temp != NULL)
  {
    if (!strcmp(key, temp->entry->key))
      return temp->entry;
    temp = temp->next;
  }
  return NULL;
}

Indentation and whitespace aside, it looks almost exactly the same. Note the curiosity on the declaration of index. In C you can specify several type names before a variable (such as unsigned int or long long int), but c-to-c.py has no idea in what order to print them back. The order itself doesn’t really matter to a C compiler – unsigned int and int unsigned are exactly the same in its eyes. unsigned int is just a convention used by most programmers.

A final word: since this is just an example, I didn’t invest too much into the validation of c-to-c.py – it’s considered "alpha" quality at best. If you find any bugs, please open an issue and I’ll have it fixed.

Implementing cdecl with pycparser cdecl is a tool for decoding C type declarations. It...
pycparser now supports C99 Today I released pycparser version 2.00, with support for C99...
SICP section 5.3 I liked the way the authors used vectors to simply...

March 07, 2011 06:02 AM

Python 4 Kids

Time for Some Introspection

“…The illusion is complete; it is reality, the reality is illusion and the ambiguity is the only truth. But is the truth, as Hitchcock observes, in the box? No there isn’t room, the ambiguity has put on weight. The point is taken, the elk is dead, the beast stops at Swindon, Chabrol stops at nothing, I’m having treatment and La Fontaine can get knotted.”

Did you notice some errors in the previous tutorial? One was fatal. The fact that no one commented on them indicates to me that no one is actually typing in the code – naughty naughty! Type it in. It’s important.

The errors have been corrected now, but they were:

pickle.dump(fileObject, triviaQuestions)

(the order of the arguments is wrong, the object to dump goes first, and the file object to dump it into goes next); and
there was a stray full stop at the end of one line.

If you typed in the previous tutorial you should have received the following error:

>>> pickle.dump(fileObject,triviaQuestions)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/pickle.py", line 1362, in dump
Pickler(file, protocol).dump(obj)
File "/usr/lib64/python2.6/pickle.py", line 203, in __init__
self.write = file.write
AttributeError: 'list' object has no attribute 'write'

Or something like it – the exact error may be different depending on what version of python you are running.

If you receive an error like this you can always use the interpreter’s built in help function to assist:

>>> help(pickle.dump)
Help on function dump in module pickle:
dump(obj, file, protocol=None)

This is not entirely enlightening, but it does tell you that the order of the arguments – the object first, followed by the file second, followed by a third, optional, argument (protocol). We know it is optional because it is assigned a default value.

The object itself is also able to tell you about itself. This is called “introspection”. In English introspection means looking inward. People who are introspective spend time thinking about themselves. In Python, introspection is the ability of the program to examine, or give information about, itself. For example, try this:

>>> print pickle.__doc__
Create portable serialized representations of Python objects.

See module cPickle for a (much) faster implementation.
See module copy_reg for a mechanism for registering custom picklers.
See module pickletools source for extensive comments.

Classes:

Pickler
Unpickler

Functions:

dump(object, file)
dumps(object) -> string
load(file) -> object
loads(string) -> object

Misc variables:

__version__
format_version
compatible_formats

This shows the “docstring” for the pickle module. Docstring is a string which holds documentation about the object. We have learnt from the docstring that pickle has methods for dumping object to strings as well as files. Any object can have a docstring, for example, our triviaQuestions list had one [if you redo the previous tute to reconstruct it, since we haven't instantiated it this time]:

>>> triviaQuestions.__doc__
"list() -> new empty list\nlist(iterable) -> new list initialized from iterable's items"

In this case, the docstring is the same for all lists (try [].__doc__). However, some objects, particularly classes (which we haven’t met yet) and functions, are able to have their own docstrings which are particular to that object. A docstring can be created for an object by adding a comment in triple single quotes (”’) at the start of the object’s definition (other comment forms like single quotes work, but triple single quotes are the convention so that you can include apostrophes etc in the docstring):

 >>> def square(x):
...     '''Given a number x return the square of x (ie x times x)'''
...     return x*x
...
>>> square(2)
4
>>> square.__doc__
'Given a number x return the square of x (ie x times x)'

When you write code you should also write docstrings which explain what the code does. While you may think you’ll remember what it does in the future, the reality is that you won’t!

How did I know that pickle had it’s own docstring? Well, I read it somewhere, like you read it here. However, if you ever find yourself needing to work out what forms part of an object Python has a function to do it – it’s called dir(). You can use it on any object. Let’s have a look at it on the square() function we just made up:

>>> dir(square)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']

I bet you didn’t realise that the function we just defined now had so many attributes/methods!! You can see that __doc__ is one of them. Where an attribute starts with two underscores ‘__’ it’s got a special meaning in Python. You can pronounce the two underscores in a number of different ways including: “underscore underscore”, “under under”, “double underscore”, “double under” and, my favourite, “dunder”.

To tell whether these are methods (think functions) rather than attributes (think values) you can use the callable function:

>>> callable(square.__repr__)
True
>>> callable(square.__doc__)
False

If it is callable, then you can add parentheses to it and treat it like a function (sometimes you will need to know what arguments the callable takes):

>>> square.__repr__()
'<function square at 0x7f0b977fab90>'

The __repr__ method of an object gives a printable version of the object.

When something goes wrong with your program you can use Python’s introspection capabilities to get more information about what might have gone wrong and why. Also, don’t forget to check the Python docs!

Homework:

go over previous tutes and identify 3 objects
for each of these objects:
- re-do the relevant tute to instantiate (ie create) each of these objects;
- look at the docstring for the object (print objectName.__doc__); and
- look at the directory listing for the object (print dir(objectName)).
Extra marks:
- find some callable methods in one listing and call them.

March 07, 2011 04:27 AM

Invent with Python

New Game Source Code: Squirrel Eat Squirrel

Made a new game with Pygame. It’s called “Squirrel Eat Squirrel”, where you move your squirrel around the screen eating the smaller squirrels and avoiding the larger ones. The more squirrels you eat, the larger you grow. This is a Python 3 game, but I think it’s compatible with Python 2. You need Pygame installed as well.

Use the arrow keys to move around. You can be hit three times before you die.

Download the source.

Try modifying the constant variables at the top of the file to change around the game. (Squirrel speeds, number of squirrels, amount of health, etc.) This isn’t part of my Code Comments tutorials, since I haven’t had time to go through and add detailed comments to the code (but it’s still commented.)

March 07, 2011 03:46 AM

Python 4 Kids

A Big Jar of Pickles

Pither (voice over) As I lay down to the sound of the Russian gentlemen practising their shooting, I realised I was in a bit of a pickle. My heart sank as I realised I should never see the Okehampton by-pass again…

In the last tutorial we learned how to pickle our objects. Pickling is a way of storing the object (on the computer’s file system) so that it can be used later. This means that if we want to re use an object we can simply save it and load it when we need it, rather than re-creating it each time we want to use it. This is very useful when our object is a list of questions for our trivia game. We really only want to type the questions in once and then reload them later.

Now we need to settle on a way to structure our data. We saw in our earlier tutorial that each question was a list, and that the list itself had a certain structure. We also need to think about how a number of questions will be stored. We will use a list to do that as well! In this case we will have a list of questions. Each of the elements in the list will itself be a list. Let’s build one. First we make an empty list to store all the questions:

triviaQuestions=[]

It is empty:

len(triviaQuestions)

Next, let’s make a sample question to add to that list. Feel free to use your own question/ answers if you want to use your own topic:

sampleQuestion = []

Now, we populate the sample question:

sampleQuestion.append("Who expects the Spanish Inquisition?")
# first entry must be the question
sampleQuestion.append("Nobody")
# second entry must be the correct answer
sampleQuestion.append("Eric the Hallibut")
sampleQuestion.append("An unladen swallow")
sampleQuestion.append("Brian")
sampleQuestion.append("Me!")
# any number of incorrect answers can follow
# but they must all be incorrect

There are 6 elements in the sampleQuestion list:

len(sampleQuestion)

Now, we add the sample question (as the first entry) to the list of trivia questions:

triviaQuestions.append(sampleQuestion)

It now has one question in it:

len(triviaQuestions)

To add more questions we “rinse and repeat”:

sampleQuestion = []
# this clears the earlier entries
# if we append without doing this
# we'll have multiple questions in the wrong list
sampleQuestion.append("What is the air-speed velocity of an unladen swallow?")
sampleQuestion.append("What do you mean?  African or European swallow?")
sampleQuestion.append("10 m/s")
sampleQuestion.append("14.4 m/s")
sampleQuestion.append("23.6 m/s")
triviaQuestions.append(sampleQuestion)

Now, the sampleQuestion has five entries and there are two questions in total:

len(sampleQuestion)
len(triviaQuestions)

Now we need to save the question list so we can use it again later. We will save it to a file called “p4kTriviaQuestions.txt”. Ideally we would test to see whether this file already exists before first creating it (so that we don’t inadvertently wipe some valuable file). Today however, we’re just crossing our fingers and hoping that you don’t already have a file of this name in your directory:

import pickle
fileName = "p4kTriviaQuestions.txt"
fileObject = open(fileName,'w')
pickle.dump(triviaQuestions,fileObject)
# oops! earlier draft had these in the wrong order!
fileObject.close()

So far we have spent a lot of time on how to store the data used by the game. However, in order to hang the various parts of the trivia game together we need to learn about storing a different part of the game – the program itself. We will be looking at that in the coming tutorials.

March 07, 2011 02:57 AM

Richard Jones

PyWeek 12 (April 2011) is registration is open!

The 12th Python Game Programming Challenge (PyWeek) is almost upon us. It'll run from the 3rd to the 10th of April. Registration for teams and individuals is now open on the website.

The PyWeek challenge:

Invites entrants to write a game in one week from scratch either as an individual or in a team,
Is intended to be challenging and fun,
Will hopefully increase the public body of game tools, code and expertise,
Will let a lot of people actually finish a game, and
May inspire new projects (with ready made teams!)

If you've never written a game before and would like to try things out then perhaps you could try either:

The tutorial I presented at LCA 2010, Introduction to Game Programming, or
The book Invent Your Own Computer Games With Python

March 07, 2011 02:04 AM

March 06, 2011

Invent with Python

New Extra Game: Connect Four clone

I have a text version of a Connect Four clone done. The AI for it looks ahead two moves, which makes it fairly impossible to beat unless you concentrate. I was planning to use this game for a chapter on recursion in my next book, but decided to publish the code for the text-version now.

Download fourinarow.py (This is for Python 3, not Python 2)

The code has few comments, but looking at its source code might be a good exercise for someone learning to program. It’s available on the book’s website in the Extra section.

March 06, 2011 10:23 PM

Nick Coghlan

What is a Python script?

This is an adaptation of a lightning talk I gave at PyconAU 2010, after realising a lot of the people there had no idea about the way CPython's concept of what could be executed had expanded over the years since version 2.4 was released. As of Python 2.7, there are actually 4 things that the reference interpreter will accept as a main module.

Ordinary scripts: the classic main module identified by filesystem path, available for as long as Python has been around. Can be executed without naming the interpreter through the use of file associations (Windows) or shebang lines (pretty much everywhere else).

Module name: By using the -m switch, a user can tell the interpreter to locate the main module based on its position in the module hierarchy rather than by its location on the filesystem. This has been supported for top level modules since Python 2.4, and for all modules since Python 2.5 (via PEP 338). Correctly handles explicit relative imports since Python 2.6 (via PEP 366 and the __package__ attribute). The classic example of this usage is the practice of invoking "python -m timeit 'snippet'" when discussing the relative performance of various Python expressions and statements.

Valid sys.path entry: If a valid sys.path entry (e.g. the name of a directory or a zipfile) is passed as the script argument, CPython will automatically insert that location at the beginning of sys.path, then use the module name execution mechanism to look for a __main__ module with the updated sys.path. Supported since Python 2.6, this system allows quick and easy bundling of a script with its dependencies for internal distribution within a company or organisation (external distribution should still use proper packaging and installer development practices). When using zipfiles, you can even add a shebang line to the zip header or use a file association for a custom extension like .pyz and the interpreter will still process the file correctly.

Package name: If a package name is passed as the value for the -m switch, the Python interpreter will reinterpret the command as referring to a __main__ submodule within that package. This version of the feature was added in Python 2.7, after some users objected to the removal in Python 2.6 of the original (broken) code that incorrectly allowed a package's __init__.py to be executed as the main module. Starting in Python 3.2, CPython's own test suite supports this feature, allowing it to be executed as "python -m test".

The above functionality is exposed via the runpy module, as runpy.run_module() and runpy.run_path().

If anyone ever sees me (metaphorically) jumping up and down about making sure things get mentioned in the What's New document for a new Python version, this is why. Python 2.6 was released in October 2008, but we didn't get the note about the zipfile and directory execution trick into the What's New until February 2010. It is described in the documentation, but really, who reads the command line documentation, or is likely to be casually browsing the runpy docs? This post turning up on Planet Python will probably do more to get the word out about the functionality than anything we've done before now :)

March 06, 2011 03:39 PM

March 05, 2011

Roberto Alsina

De Vicenzo: A much cooler mini web browser.

It seems it was only a few days ago that I started this project. Oh, wait, yes, it was just a few days ago!

If you don't want to read that again, the idea is to see just how much code is needed to turn Qt's WebKit engine into a fully-fledged browser.

To do that, I set myself a completely arbitrary limit: 128 lines of code.

So, as of now, I declare it feature-complete.

The new features are:

Tabbed browsing (you can add/remove tabs)
Bookmarks (you can add/remove them, and choose them from a drop-down menu)

This is what already worked:

Zoom in (Ctrl++)
Zoom out (Ctrl+-)
Reset Zoom (Ctrl+=)
Find (Ctrl+F)
Hide find (Esc)
Buttons for back/forward and reload
URL entry that matches the page + autocomplete from history + smart entry (adds https://, that kind of thing)
Plugins support (including flash)
The window title shows the page title (without browser advertising ;-)
Progress bar for page loading
Statusbar that shows hovered links URL
Takes a URL on the command line, or opens https://python.org
Multiplatform (works in any place QtWebKit works)

So... how much code was needed for this? 87 LINES OF CODE

Or if you want the PEP8-compliant version, 115 LINES OF CODE.

Before anyone says it: yes, I know the rendering engine and the toolkit are huge. What I wrote is just the chrome around them, just like Arora, Rekonq, Galeon, Epiphany and a bunch of others do.

It's simple, minimalistic chrome, but it works pretty good, IMVHO.

Here it is in (buggy) action:

It's more or less feature-complete for what I expected to be achievable, but it still needs some fixes.

You can see the code at it's own home page: https://devicenzo.googlecode.com

March 05, 2011 11:46 PM

PyCharm

PyCharm 1.2 Release Candidate; Execute selection in console

As Django 1.3 is almost ready for release, so is PyCharm 1.2. The Release Candidate build, in addition to a bunch of improvements for Django support and general bugfixes, includes a new feature: an action in the context menu to execute the selected code fragment in a Python console. It uses a running console if one exists, or starts a new one if one is not running.

As usual, the PyCharm 1.2 Release Candidate download and the Release Notes are available on the PyCharm EAP page.

March 05, 2011 03:28 PM

Nick Coghlan

Justifying Python language changes

A few years back, I chipped in on python-dev with a review of syntax change proposals that had made it into the language over the years. With Python 3.3 development starting and the language moratorium being lifted, I thought it would be a good time to tidy that up and republish it as a blog post.

Generally speaking, syntactic sugar (or new builtins) need to take a construct in idiomatic Python that is fairly obvious to an experienced Python user and make it obvious to even new users, or else take an idiom that is easy to get wrong when writing (or miss when reading) and make it trivial to use correctly.

Providing significant performance improvements (usually in the form of reduced memory usage or increased speed) also counts heavily in favour of new constructs.

I strongly suggest browsing through past PEPs (both accepted and rejected ones) before proposing syntax changes, but here are some examples of syntactic sugar proposals that were accepted.

List/set/dict comprehensions
(and the reduction builtins any(), all(), min(), max(), sum())

target = [op(x) for x in source]

instead of:

target = []
for x in source:
    target.append(op(x))

The transformation (`op(x)`) is far more prominent in the comprehension version, as is the fact that all the loop does is produce a new list. I include the various reduction builtins here, since they serve exactly the same purpose of taking an idiomatic looping construct and turning it into a single expression.

Generator expressions

total = sum(x*x for x in source)

instead of:

def _g(source):
    for x in source:
        yield x*x
total = sum(_g(x))

or:

total = sum([x*x for x in source])

Here, the GE version has obvious readability gains over the generator function version (as with comprehensions, it brings the operation being applied to each element front and centre instead of burying it in the middle of the code, as well as allowing reduction operations like sum() to retain their prominence), but doesn't actually improve readability significantly over the second LC-based version. The gain over the latter, of course, is that the GE based version needs a lot less memory than the LC version, and, as it consumes the source data
incrementally, can work on source iterators of arbitrary (even infinite) length, and can also cope with source iterators with large time gaps between items (e.g. reading from a socket) as each item will be returned as it becomes available (obviously, the latter two features aren't useful when used in conjunction with reduction operations like sum, but they can be helpful in other contexts).

With statements

with lock:
    # perform synchronised operations

instead of:

lock.acquire()
try:
    # perform synchronised operations
finally:
    lock.release()

This change was a gain for both readability and writability - there were plenty of ways to get this kind of code wrong (e.g. leave out the try-finally altogether, acquire the resource inside the try block instead of before it, call the wrong method or spell the variable name wrong when attempting to release the resource in the finally block), and it wasn't easy to audit because the resource acquisition and release could be separated by an arbitrary number of lines of code. By combining all of that into a single line of code at the beginning of the block, the with statement eliminated a lot of those issues, making the code much easier to write correctly in the first place, and also easier to audit for correctness later (just make sure the code is using the correct context manager for the task at hand).

Function decorators

@classmethod
def f(cls):
    # Method body

instead of:

def f(cls):
    # Method body
f = classmethod(f)

Easier to write (function name only written once instead of three times), and easier to read (decorator names up top with the function signature instead of buried after the function body). Some folks still dislike the use of the @ symbol, but compared to the drawbacks of the old approach, the dedicated function decorator syntax is a huge improvement.

Conditional expressions

x = A if C else B

instead of:

x = C and A or B

The addition of conditional expressions arguably wasn't a particularly big win for readability, but it was a big win for correctness. The and/or based workaround for the lack of a true conditional expression was not only hard to read if you weren't already familiar with the construct, but using it was also a potential source of bugs if A could ever be False while C was True (in such cases, B would be returned from the expression instead of A).

Except clause

except Exception as ex:

instead of:

except Exception, ex:

Another example of changing the syntax to reduce the potential for non-obvious bugs (in this case, except clauses like `except TypeError, AttributeError:`, that would actually never catch AttributeError, and would locally do AttributeError=TypeError if a TypeError was caught).

March 05, 2011 09:40 AM

Original Source | Taken Source