| CARVIEW |
csv-enumerator: A flexible, fast, enumerator-based CSV parser library for Haskell.
CSV files are the de-facto standard in many situations involving data transfer, particularly when dealing with enterprise application or disparate database systems.
While there are a number of CSV libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:
Full flexibility in quote characters, separators, input/output
Constant space operation
Robust parsing, correctness and error resiliency
Convenient interface that supports a variety of use cases
Fast operation
This library is an attempt to close these gaps.
For more documentation and examples, check out the README at:
https://github.com/ozataman/csv-enumerator
The API is fairly well documented and I would encourage you to keep your haddocks handy. If you run into problems, just email me or holler over at #haskell.
[Skip to Readme]
Downloads
- csv-enumerator-0.10.2.0.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
For package maintainers and hackage trustees
Candidates
- No Candidates
| Versions [RSS] | 0.8, 0.8.2, 0.9.0, 0.9.2, 0.9.2.1, 0.9.3, 0.9.5, 0.10.1.0, 0.10.1.1, 0.10.2.0 |
|---|---|
| Dependencies | attoparsec (>=0.10), attoparsec-enumerator (>=0.2), base (>=4 && <5), bytestring, containers (>=0.3), directory, enumerator (>=0.4.5), safe, transformers (>=0.2), unix-compat (>=0.2.1.1) [details] |
| License | BSD-3-Clause |
| Author | Ozgun Ataman |
| Maintainer | Ozgun Ataman <ozataman@gmail.com> |
| Uploaded | by OzgunAtaman at 2013-01-21T13:26:06Z |
| Category | Data |
| Home page | https://github.com/ozataman/csv-enumerator |
| Distributions | |
| Reverse Dependencies | 2 direct, 0 indirect [details] |
| Downloads | 8034 total (32 in the last 30 days) |
| Rating | (no votes yet) [estimated by Bayesian average] |
| Your Rating |
|
| Status | Docs uploaded by user Build status unknown [no reports yet] |
Readme for csv-enumerator-0.10.2.0
[back to package description]README
CSV Files and Haskell
CSV files are the de-facto standard in many cases of data transfer, particularly when dealing with enterprise application or disparate database systems.
While there are a number of csv libraries in Haskell, at the time of this project's start in 2010, there wasn't one that provided all of the following:
- Full flexibility in quote characters, separators, input/output
- Constant space operation
- Robust parsing and error resiliency
- Fast operation
- Convenient interface that supports a variety of use cases
This library is an attempt to close these gaps.
This package
csv-enumerator is an enumerator-based CSV parsing library that is easy to use, flexible and fast. Furthermore, it provides ways to use constant-space during operation, which is absolutely critical in many real world use cases.
Introduction
- ByteStrings are used for everything
- There are 2 basic row types and they implement exactly the same operations,
so you can chose the right one for the job at hand:
- type MapRow = Map ByteString ByteString
- type Row = [ByteString]
- Folding over a CSV file can be thought of as the most basic operation.
- Higher level convenience functions are provided to "map" over CSV files, modifying and transforming them along the way.
- Helpers are provided for simple input/output of CSV files for simple use cases.
- For extreme / advanced use cases, the user can drop down to the Enumerator/Iteratee level and do interleaved IO among other things.
API Docs
The API is quite well documented and I would encourage you to keep it handy.
Speed
While fast operation is of concern, I have so far cared more about correct operation and a flexible API. Please let me know if you notice any performance regressions or optimization opportunities.
Usage Examples
Example 1: Basic Operation
{-# LANGUAGE OverloadedStrings #-}
import Data.CSV.Enumerator
import Data.Char (isSpace)
import qualified Data.Map as M
import Data.Map ((!))
-- Naive whitespace stripper
strip = reverse . B.dropWhile isSpace . reverse . B.dropWhile isSpace
-- A function that takes a row and "emits" zero or more rows as output.
processRow :: MapRow -> [MapRow]
processRow row = [M.insert "Column1" fixedCol row]
where fixedCol = strip (row ! "Column1")
main = mapCSVFile "InputFile.csv" defCSVSettings procesRow "OutputFile.csv"
and we are done.
Further examples to be provided at a later time.
TODO - Next Steps
- Refactor all operations to use iterCSV as the basic building block -- in progress.
- The CSVeable typeclass can be refactored to have a more minimal definition.
- Get mapCSVFiles out of the typeclass if possible.
- Need to think about specializing an Exception type for the library and properly notifying the user when parsing-related problems occur.
- Some operations can be further broken down to their atoms, increasing the flexibility of the library.
- Operating on Text in addition to ByteString would be phenomenal.
- A test-suite needs to be added.
- Some benchmarking would be nice.
Any and all kinds of help is much appreciated!