CARVIEW |
Every repository with this icon (

Every repository with this icon (

Run the following if you haven't already:
gem sources -a https://gems.github.com
Install the gem(s):
sudo gem install brianmario-yajl-ruby
Description: | Ruby C bindings to Yajl. An extremely efficient streaming JSON parsing library |
Homepage: | https://brianmario.github.com/yajl-ruby |
Clone URL: |
git://github.com/brianmario/yajl-ruby.git
Give this clone URL to anyone.
git clone git://github.com/brianmario/yajl-ruby.git
|
name | age | message | |
---|---|---|---|
![]() |
.gitignore | Thu May 07 21:40:23 -0700 2009 | adding my TODO file to gitignore [brianmario] |
![]() |
CHANGELOG.rdoc | Loading commit data... ![]() |
|
![]() |
MIT-LICENSE | Sun Apr 19 17:11:50 -0700 2009 | adding license [brianmario] |
![]() |
README.rdoc | Wed May 20 07:49:02 -0700 2009 | added streaming http example to readme, added s... [brianmario] |
![]() |
Rakefile | ||
![]() |
VERSION.yml | ||
![]() |
benchmark/ | Sun May 24 21:03:56 -0700 2009 | finish converting chunked parsing over to newer... [brianmario] |
![]() |
examples/ | Mon May 25 12:14:28 -0700 2009 | updated changelog for next release, removed req... [brianmario] |
![]() |
ext/ | ||
![]() |
lib/ | ||
![]() |
spec/ | Sun May 24 19:24:02 -0700 2009 | updated examples to have 1.9 file encoding magi... [brianmario] |
![]() |
yajl-ruby.gemspec |
YAJL C Bindings for Ruby
This gem (although not in gem form just yet) is a C binding to the excellent YAJL JSON parsing and generation library.
You can read more info at the projects website lloydforge.org/projects/yajl or check out it’s codes at github.com/lloyd/yajl.
How to install
Install it like any other gem hosted at the Githubs like so:
(more instructions here: gems.github.com)
sudo gem install brianmario-yajl-ruby
Example of use
First, you’re probably gonna want to require it:
require 'yajl'
Parsing
Then maybe parse some JSON from:
a File IO
json = File.new('test.json', 'r') hash = Yajl::Stream.parse(json)
or maybe a StringIO
json = StringIO.new hash = Yajl::Stream.parse(json)
or maybe STDIN
cat someJsonFile.json | ruby -ryajl -e "puts Yajl::Stream.parse(STDIN).inspect"
Or lets say you didn’t have access to the IO object that contained JSON data, but instead only had access to chunks of it at a time. No problem!
(Assume we’re in an EventMachine::Connection instance)
def object_parsed(obj) puts "Sometimes one pays most for the things one gets for nothing. - Albert Einstein" puts obj.inspect end def connection_completed # once a full JSON object has been parsed from the stream # object_parsed will be called, and passed the constructed object Yajl::Chunked.on_parse_complete = method(:object_parsed) end def receive_data(data) # continue passing chunks Yajl::Chunked.parse_some(data) # Or as an alias, you could have done: # Yajl::Chunked << data end
Or how about a JSON API HTTP request? This actually makes a request using a raw TCPSocket, then parses the JSON body right off the socket. While it’s being received over the wire!
require 'uri' require 'yajl/http_stream' url = URI.parse("https://search.twitter.com/search.json?q=engineyard") results = Yajl::HttpStream.get(url)
Or do the same request, with Gzip and Deflate output compression support (also supports Bzip2, if loaded): (this does the same raw socket Request, but transparently parses the compressed response body)
require 'uri' require 'yajl/gzip' require 'yajl/deflate' require 'yajl/http_stream' url = URI.parse("https://search.twitter.com/search.json?q=engineyard") results = Yajl::HttpStream.get(url)
Since yajl-ruby parses JSON as a stream, supporting API’s like Twitter’s Streaming API are a piece-of-cake. You can simply supply a block to Yajl::HttpStream.get, which is used as the callback for when a JSON object has been unserialized off the stream. For the case of this Twitter Streaming API call, the callback gets fired a few times a second (depending on your connection speed). The code below is all that’s needed to make the request and stream unserialized Ruby hashes off the response, continuously.
require 'uri' require 'yajl/http_stream' uri = URI.parse("https://#{username}:#{password}@stream.twitter.com/spritzer.json") Yajl::HttpStream.get(uri) do |hash| puts hash.inspect end
Or how about parsing directly from a compressed file?
require 'yajl/bzip2' file = File.new('some.json.bz2', 'r') result = Yajl::Bzip2::StreamReader.parse(file)
Encoding
Since yajl-ruby does everything using streams, you simply need to pass the object to encode, and the IO to write the stream to (this happens in chunks).
This allows you to encode JSON as a stream, writing directly to a socket
socket = TCPSocket.new(192.168.1.101, 9000) hash = {:foo => 12425125, :bar => "some string", ... } Yajl::Stream.encode(hash, socket)
Or what if you wanted to compress the stream over the wire?
require 'yajl/gzip' socket = TCPSocket.new(192.168.1.101, 9000) hash = {:foo => 12425125, :bar => "some string", ... } Yajl::Gzip::StreamWriter.encode(hash, socket)
You can also use Yajl::Bzip2::StreamWriter and Yajl::Deflate::StreamWriter. So you can pick whichever fits your CPU/bandwidth sweet-spot.
There are a lot more possibilities, some of which I’m going to write other gems/plugins for.
Some ideas are:
- parsing logs in JSON format
- a Rails plugin (github.com/technoweenie/yajl-rails)
- builtin Rails 3 support?
- Rack middleware (ideally the JSON body could be handed to the parser while it’s still being received)
- use with ohai
- JSON API clients
- Patch Marshal#load and Marshal#dump to use JSON? ;)
- etc…
Benchmarks
After I finished implementation - this library performs close to the same as the current JSON.parse (C gem) does on small/medium files.
But on larger files, and higher amounts of iteration, this library was around 2x faster than JSON.parse.
The main benefit of this library is in it’s memory usage. Since it’s able to parse the stream in chunks, it’s memory requirements are very, very low.
Here’s what parsing a 2.43MB JSON file off the filesystem 20 times looks like:
Memory Usage
Average
- Yajl::Stream.parse: 32MB
- JSON.parse: 54MB
- ActiveSupport::JSON.decode: 63MB
Peak
- Yajl::Stream.parse: 32MB
- JSON.parse: 57MB
- ActiveSupport::JSON.decode: 67MB
Parse Time
- Yajl::Stream.parse: 4.54s
- JSON.parse: 5.47s
- ActiveSupport::JSON.decode: 64.42s
Encode Time
- Yajl::Stream.encode: 3.59s
- JSON#to_json: 6.2s
- ActiveSupport::JSON.encode: 45.58s
Compared to YAML
NOTE: I converted the 2.4MB JSON file to YAML for this test.
Parse Time (from their respective formats)
- Yajl::Stream.parse: 4.33s
- JSON.parse: 5.37s
- YAML.load_stream: 19.47s
Encode Time (to their respective formats)
- Yajl::Stream.encode: 3.47s
- JSON#to_json: 6.6s
- YAML.dump(obj, io): 1309.93s
Third Party Sources Bundled
This project includes code from the BSD licensed yajl project, copyright 2007-2009 Lloyd Hilaiel
Special Thanks
I’ve had a lot of inspiration, and a lot of help. Thanks to everyone who’s been a part of this and those to come!
- Lloyd Hilaiel - github.com/lloyd - for writing Yajl!!
- Josh Ferguson - github.com/besquared - for peer-pressuring me into getting back into C; it worked ;) Also tons of support over IM
- Jonathan Novak - github.com/cypriss - pointer-hacking help
- Tom Smith - github.com/rtomsmith - pointer-hacking help
- Rick github.com/technoweenie - for making an ActiveSupport patch with support for this library and teasing me that it might go into Rails 3. You sure lit a fire under my ass and I got a ton of work done because of it! :)
- The entire Github Crew - github.com/ - my inspiration, time spent writing this, finding Yajl, So many-MANY other things wouldn’t have been possible without this awesome service. I owe you guys some whiskey at Kilowatt.
- benburkert - github.com/benburkert