CARVIEW |
samuel / squawk
- Source
- Commits
- Network (2)
- Issues (0)
- Downloads (0)
- Wiki (2)
- Graphs
-
Branch:
master
click here to add a description
click here to add a homepage
-
Branches (1)
- master ✓
- Tags (0)
Pledgie Donations
Once activated, we'll place the following badge in your repository's detail box:
SQL Query Tool for Static Files — Read more
name | age | message | |
---|---|---|---|
![]() |
.gitignore | Sat Feb 06 19:22:57 -0800 2010 | Adde dist directory to .gitignore [samuel] |
![]() |
LICENSE | Sat Feb 06 19:56:30 -0800 2010 | Copy fail.. fixed the LICENSE [samuel] |
![]() |
MANIFEST.in | Sat Feb 06 19:52:25 -0800 2010 | Added license to setup.py and LICENSE file (BSD) [samuel] |
![]() |
README.md | Mon Feb 08 12:00:09 -0800 2010 | Updated examples in README to match new api [samuel] |
![]() |
TODO | Sat Feb 06 21:38:32 -0800 2010 | Added installable command "squawk" [samuel] |
![]() |
bin/ | Sat Feb 06 21:38:32 -0800 2010 | Added installable command "squawk" [samuel] |
![]() |
setup.py | Mon Feb 08 13:59:27 -0800 2010 | Added "pyparsing" to requirements in setup.py [samuel] |
![]() |
squawk/ | Mon Feb 08 14:18:57 -0800 2010 | AccessLogParser now sets bytes to 0 if it's not... [samuel] |
Description
Squawk is a library and command line tool for running SQL queries against structured/semi-structured static files. (e.g. Apache logs, csv files, tcpdump output).
License
BSD
See LICENSE
Goal
The purpose is Squawk is to make querying for data in log files or other structured files easier. Everything that Squawk does can be done by combining various unix tools, but Squawk makes it ever easier to express more complex relationships. It is in no way a database or meant to be used as such. It's merely a reporting tool.
Squawk can be used from the command line for ad-hoc queries, and it can also be used as a library as a part of a more in-depth reporting tool.
Status
Still in major development. API is guaranteed to change.
Requirements
- Python2.5 or later 2.x
- pyparsing
Supported SQL Features
- Aggregates: count, min, max, avg, sum
- GROUP BY
- ORDER BY (single column)
- LIMIT
- OFFSET
- WHERE
- Column aliases
- Subqueries in FROM
Departures from Standard SQL
- Table list in FROM uses a space rather than a comma as a separator. This makes it easier on the command line to specify files. (e.g. FROM access.log* )
Parsers
- Common access file formats (nginx, apache)
- CSV
Output Formats
- Basic tabular for console (like most database command line tools)
- JSON
- CSV
Examples
SQL query on the command line::
$ squawk "SELECT COUNT(1) AS n, status FROM access.log GROUP BY status ORDER BY n DESC"
n | status
----------------------------------------
381353 | 200
180668 | 302
17976 | 404
12952 | 301
10836 | 304
735 | 403
420 | 206
376 | 416
123 | 400
46 | 500
5 | 502
3 | 408
3 | 405
1 | 504
SQL based query through API::
query = Query(
"SELECT COUNT(1) AS n, remote_addr"
" FROM file"
" WHERE status = 200"
" AND remote_addr != '-'"
" GROUP BY remote_addr"
" ORDER BY n DESC"
" LIMIT 10")
source = AccessLogParser("access.log")
output_console(query(source))
# or
query = Query(
"SELECT COUNT(1) AS n, remote_addr"
" FROM file"
" WHERE status = 200"
" AND remote_addr != '-'"
" GROUP BY remote_addr"
" ORDER BY n DESC"
" LIMIT 10")
source = AccessLogParser("access.log")
for row in query(source):
print row
Code generated query::
source = AccessLogParser("access.log")
filtered = Filter(source, lambda row:row['status'] == 200)
group_by = GroupBy(filtered, group_by="remote_addr", select=[
lambda:Column('remote_addr'),
lambda:CountAggregate(None, 'count(1)')])
order_by = OrderBy(group_by, 'count(1)', True)
limit = Limit(order_by, 10)
for row in limit:
print row