CARVIEW |
Select Language
HTTP/2 302
server: nginx
date: Fri, 01 Aug 2025 12:55:55 GMT
content-type: text/plain; charset=utf-8
content-length: 0
x-archive-redirect-reason: found capture at 20090327112531
location: https://web.archive.org/web/20090327112531/https://code.activestate.com/recipes/259173/
server-timing: captures_list;dur=9.493420, exclusion.robots;dur=0.025757, exclusion.robots.policy;dur=0.011304, esindex;dur=0.014396, cdx.remote;dur=15.260994, LoadShardBlock;dur=628.589818, PetaboxLoader3.datanode;dur=410.465607, PetaboxLoader3.resolve;dur=156.192875
x-app-server: wwwb-app216
x-ts: 302
x-tr: 708
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
set-cookie: SERVER=wwwb-app216; path=/
x-location: All
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
HTTP/2 200
server: nginx
date: Fri, 01 Aug 2025 12:55:55 GMT
content-type: text/html; charset=utf-8
x-archive-orig-date: Fri, 27 Mar 2009 11:25:30 GMT
x-archive-orig-server: Apache/2.2.3 (CentOS)
x-archive-orig-expires: Fri, 27 Mar 2009 11:35:31 GMT
x-archive-orig-vary: Cookie
x-archive-orig-etag: ea77d2adeb4300807eac811dc2bf535b
x-archive-orig-cache-control: max-age=600
x-archive-orig-via: 1.0 box19.activestate.com:3129 (squid/2.6.STABLE6)
x-archive-orig-connection: close
x-archive-guessed-content-type: text/html
x-archive-guessed-charset: utf-8
memento-datetime: Fri, 27 Mar 2009 11:25:31 GMT
link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Tue, 19 Aug 2008 19:29:26 GMT", ; rel="prev memento"; datetime="Sun, 08 Feb 2009 08:25:48 GMT", ; rel="memento"; datetime="Fri, 27 Mar 2009 11:25:31 GMT", ; rel="next memento"; datetime="Mon, 22 Jun 2009 13:56:03 GMT", ; rel="last memento"; datetime="Tue, 28 Mar 2023 14:53:32 GMT"
content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org
x-archive-src: 51_8_20090327080839_crawl102-c/51_8_20090327112429_crawl103.arc.gz
server-timing: captures_list;dur=0.523241, exclusion.robots;dur=0.019990, exclusion.robots.policy;dur=0.009833, esindex;dur=0.009730, cdx.remote;dur=6.616350, LoadShardBlock;dur=131.939640, PetaboxLoader3.datanode;dur=125.447383, PetaboxLoader3.resolve;dur=124.629864, load_resource;dur=166.780030
x-app-server: wwwb-app216
x-ts: 200
x-tr: 367
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
x-location: All
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
content-encoding: gzip
Groupby « ActiveState Code
ActiveState Code
Recipe 259173: Groupby
Guido inspired SQL-like GROUPBY class that also encapsulates the logic in a Unix-like "sort | uniq".
Python |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | class groupby(dict):
def __init__(self, seq, key=lambda x:x):
for value in seq:
k = key(value)
self.setdefault(k, []).append(value)
__iter__ = dict.iteritems
# -------------------------- Examples -----------------------------------
>>> letters = 'abracadabra'
>>> [g for k, g in groupby(letters)] # grouped
[['a', 'a', 'a', 'a', 'a'], ['r', 'r'], ['b', 'b'], ['c'], ['d']]
>>> [k for k, g in groupby(letters)] # uniq
['a', 'r', 'b', 'c', 'd']
>>> [(k, len(g)) for k, g in groupby(letters)] # uniq -c
[('a', 5), ('r', 2), ('b', 2), ('c', 1), ('d', 1)]
>>> [k for k, g in groupby(letters) if len(g) > 1] # uniq -d
['a', 'r', 'b']
>>> data = [('becky', 'girl', 5), ('john', 'boy', 10), ('sue', 'girl', 10)]
>>> for k, g in groupby(data, key=lambda r: r[1]):
... print k
... for record in g:
... print " ", record
...
boy
('john', 'boy', 10)
girl
('becky', 'girl', 5)
('sue', 'girl', 10)
|
Discussion
Used for: 1. Grouping records in reports 2. Listing the unique keys in a database 3. Counting the number of keys in each group 4. Finding records with duplicate keys
Since the underlying implementation is a dictionary of lists: 1. The build time is O(n) 2. The input can be in any order 3. The keys must be hashable 4. The order of key output is arbitrary 5. The order of values for each group is stable (matches original record order)
To get sorted output, change the code for __iter__ to: <pre> def __iter__(self): keys = self.keys() keys.sort() for k in keys: yield k, self[k]
</pre>
languages: | Python |
---|---|
posted: | Fri, 9 Jan 2004 |
by: | Raymond Hettinger |
rev: | 5 (5 years, 1 month ago) |
rating: |
11 points
(11 votes)
Sign in to rate this recipe |
Tags
- Accounts
- Information
- Using
- Feedback
- ActiveState
© 2008 ActiveState Software. All rights reserved.
Comments
Zope Friendly Version. I use a Zope installation that uses Python 2.1 and it doesn't support iterators. Also, it doesn't allow variable names that start with '_'. So I made a modification to use in Zope for use in creating web reports. Just create a new script or a function within a script with the parameters seq and key (just like __init__ in the recipe). Use this code inside the function or script:
Since the Zope version that I have uses Python 2.1, it can't use iteritems, so it has to return an actual list. This means that a copy of the list of key, value pairs is created. This could drop performance if you have a large sequence.
redundant list creation. Note that in this line
an empty list is instantiated on each iteration.
I prefer to use for similar tasks a dict subclass with KeyError catched inside, like follows:
Then the groupby could be defined as a function:
This class also provides a simple way to count list entries:
Can someone put the idict in its own recipe? Recently I have used the idict class a number of times. It seems useful enough to warrant its own recipe.
Thanks, Jonathan.
idict() is pennywise and pound foolish. The cost of setdefault() instantiating an empty list is miniscule in comparison with the overhead of a __setitem__ call to idict().
An application of this nice recipe:
groupbyhead: Group a list of items according to the starting character(s) of items.
https://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259173
Getting rid of setdefault ; using defaultdict instead. def group(data, key=None):
____d=defaultdict(list)
____for v in data:
________k=key(v) if key else v
________d[k].append(v)
____return d.items()
defaultdict is New in version 2.5. Alas. I itch for it weekly.
Sign in to comment