A drop-in replacement for Ruby's CSV stdlib that uses the zsv C library for 5-6x performance improvements via SIMD optimizations.
🤖 Built with Claude Code
- Quick Start Guide - Get started in 5 minutes
- API Reference - Complete API documentation
- Verification Report - Test results and metrics
- Blazing Fast: 5-6x faster than Ruby's CSV stdlib thanks to SIMD optimizations
- Memory Efficient: Streaming parser that doesn't load entire files into memory
- API Compatible: Familiar interface matching Ruby's CSV class
- Native Extension: Direct C integration for minimal overhead
- Ruby 3.3+: Modern Ruby support with proper encoding handling
Add to your Gemfile:
gem 'zsv'Or install directly:
gem install zsvThe gem will automatically download and compile zsv 1.3.0 during installation.
require 'zsv'
# Parse entire file
rows = ZSV.read("data.csv")
# => [["a", "b", "c"], ["1", "2", "3"]]
# Stream rows (memory efficient)
ZSV.foreach("large_file.csv") do |row|
puts row.inspect
end
# Parse string
rows = ZSV.parse("a,b,c\n1,2,3\n")# Use first row as headers
ZSV.foreach("data.csv", headers: true) do |row|
puts row["name"] # Hash access
end
# Provide custom headers
ZSV.foreach("data.csv", headers: ["id", "name", "email"]) do |row|
puts row["name"]
end# Create parser
parser = ZSV.open("data.csv", headers: true)
# Read rows one at a time
row = parser.shift
row = parser.shift
# Iterate all rows
parser.each do |row|
puts row
end
# Rewind to beginning
parser.rewind
# Clean up
parser.close
# Or use block form (auto-closes)
ZSV.open("data.csv") do |parser|
parser.each { |row| puts row }
endThe parser includes Enumerable, so you can use map, select, find, etc.:
# Transform rows
names = ZSV.open("users.csv", headers: true) do |parser|
parser.map { |row| row["name"].upcase }
end
# Filter rows
adults = ZSV.open("users.csv", headers: true) do |parser|
parser.select { |row| row["age"].to_i >= 18 }
end
# Find first match
admin = ZSV.open("users.csv", headers: true) do |parser|
parser.find { |row| row["role"] == "admin" }
endAll parsing methods accept these options:
| Option | Type | Default | Description |
|---|---|---|---|
headers |
Boolean/Array | false |
Use first row as headers or provide custom headers |
col_sep |
String | "," |
Column delimiter (single character) |
quote_char |
String | "\"" |
Quote character (single character) |
skip_lines |
Integer | 0 |
Number of lines to skip at start |
encoding |
Encoding | UTF-8 |
Source encoding |
liberal_parsing |
Boolean | false |
Handle malformed CSV gracefully |
buffer_size |
Integer | 262144 |
Buffer size in bytes (256KB default) |
# Tab-separated values
ZSV.foreach("data.tsv", col_sep: "\t") { |row| puts row }
# Pipe-separated values
ZSV.parse("a|b|c\n1|2|3", col_sep: "|")
# Skip header comment lines
ZSV.foreach("data.csv", skip_lines: 2) { |row| puts row }Benchmarks comparing ZSV vs Ruby CSV stdlib (Ruby 3.4.7):
=== Small file (1K rows, 5 cols) ===
CSV (stdlib): 163.4 i/s
ZSV: 1,013.7 i/s - 6.20x faster
=== Medium file (10K rows, 10 cols) ===
CSV (stdlib): 10.3 i/s
ZSV: 54.5 i/s - 5.27x faster
=== Large file (100K rows, 10 cols) ===
CSV (stdlib): 1.1 i/s
ZSV: 5.3 i/s - 5.00x faster
=== With headers (10K rows) ===
CSV (stdlib): 7.8 i/s
ZSV: 33.8 i/s - 4.33x faster
ZSV uses significantly less memory than Ruby's CSV stdlib:
=== Memory Usage (100K rows) ===
CSV stdlib: 56.8 MB
ZSV: 9.9 MB - 82.6% less memory
=== String Allocations (10K rows) ===
CSV stdlib: 116,144 strings
ZSV: 50,005 strings - 56.9% fewer allocations
ZSV achieves ~6x lower memory usage through frozen strings and efficient C-level memory management.
Run benchmarks yourself:
bundle exec rake bench
bundle exec ruby benchmark/memory_bench.rbStream rows from a CSV file. Returns an Enumerator if no block given.
Parse CSV string and return all rows as an array.
Read entire CSV file into an array.
Open a CSV file and return a Parser instance. If a block is given, the parser is automatically closed after the block completes.
Create a Parser from any IO-like object.
Read and return the next row. Returns nil at EOF.
Iterate over all rows. Returns Enumerator without block.
Reset parser to the beginning (file-based parsers only).
Close parser and release resources.
Return headers if header mode is enabled.
Check if parser is closed.
Read all remaining rows into an array.
ZSV::Error- Base exception classZSV::MalformedCSVError- Raised on CSV parsing errorsZSV::InvalidEncodingError- Raised on encoding issues
The gem follows SOLID principles with clear separation of concerns:
ext/zsv/
├── zsv_ext.c # Main extension entry point, Ruby API
├── parser.c/h # Parser state management and zsv wrapper
├── row.c/h # Row building and conversion (arrays/hashes)
├── options.c/h # Option parsing and validation
└── common.h # Shared types and macros
- Single Responsibility: Each C module handles one concern
- Streaming First: Never load entire files into memory
- Zero-Copy Where Possible: Minimize data copying
- Proper Resource Management: RAII-style cleanup with Ruby GC
# Clone and setup
git clone https://github.com/sebyx07/zsv-ruby.git
cd zsv-ruby
bundle install
# Compile extension
bundle exec rake compile
# Run tests
bundle exec rake spec
# Run benchmarks
bundle exec rake bench
# Clean build artifacts
bundle exec rake cleanbundle exec rspecThe test suite includes:
- Basic parsing tests
- Header mode tests
- Custom delimiter tests
- Error handling tests
- Memory leak detection
- API compatibility tests
- Ruby: 3.3+ required
- Platforms: Linux, macOS (ARM and x86)
- ZSV: Compiles against zsv 1.3.0
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure tests pass (
bundle exec rake spec) - Commit your changes (
git commit -am 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details.
- Built on zsv by liquidaty
- Inspired by Ruby's CSV stdlib
- SIMD optimizations courtesy of zsv's excellent engineering
- Developed with Claude Code
- Basic parsing (foreach, parse, read)
- Header mode
- Custom delimiters
- File and string input
- Type converters (
:numeric,:date,:date_time) - Header converters (
:downcase,:symbol) -
unconverted_fieldsoption
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Upstream zsv: zsv repository