CARVIEW |
Every repository with this icon (

Every repository with this icon (

Description: | Enables full-text searching of CouchDB documents using Lucene |
Homepage: | https://rnewson.github.com/couchdb-lucene/ |
Clone URL: |
git://github.com/rnewson/couchdb-lucene.git
Give this clone URL to anyone.
git clone git://github.com/rnewson/couchdb-lucene.git
|
name | age | message | |
---|---|---|---|
![]() |
LICENSE | Mon Feb 09 11:57:38 -0800 2009 | add license (apache 2). [rnewson] |
![]() |
README.md | Wed Feb 18 14:30:57 -0800 2009 | updated README.md [rnewson] |
![]() |
TODO | Wed Feb 18 14:41:07 -0800 2009 | update TODO [rnewson] |
![]() |
pom.xml | Wed Feb 18 14:06:20 -0800 2009 | use Apache Tika to extract content of Word/PDF/... [rnewson] |
![]() |
src/ | Thu Feb 19 05:22:38 -0800 2009 | handle document ID's with spaces in them (ensur... [rnewson] |
Build couchdb-lucene
- Install Maven 2.
- checkout repository
- type 'mvn'
- configure couchdb (see below)
Configure CouchDB
[external] fti= /usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar [httpd_db_handlers] _fti = {couch_httpd_external, handle_external_req, <<"fti">>}
Indexing Strategy
Document Indexing
Currently all fields of all documents are indexed, javascript control coming soon.
Attachment Indexing
CouchDB uses Apache Tika to index attachments of the following types, assuming the correct content_type is set in couchdb;
Supported Formats
- Excel spreadsheets (application/vnd.ms-excel)
- Word documents (application/msword)
- Powerpoint presentations (application/vnd.ms-powerpoint)
- Visio (application/vnd.visio)
- Outlook (application/vnd.ms-outlook)
- XML (application/xml)
- HTML (text/html)
- Images (image/*)
- Java class files
- Java jar archives
- MP3 (audio/mp3)
- OpenDocument (application/vnd.oasis.opendocument.*)
- Plain text (text/plain)
- PDF (application/pdf)
- RTF (application/rtf)
Searching with couchdb-lucene
You can perform all types of queries using Lucene's default query syntax. The following parameters can be passed for more sophisticated searches;
All parameters except 'q' are optional.
Special Fields
Examples
https://localhost:5984/dbname/_fti?q=field_name:value https://localhost:5984/dbname/_fti?q=field_name:value&sort;=other_field https://localhost:5984/dbname/_fti?debug=true&sort;=billing_size&q;=body:document AND customer:[A TO C] https://localhost:5984/dbname/_fti?debug=true&sort;=billing_size&q;=body:document AND customer:[100 TO 400]
Search Results Format
return values is a JSON array of id, rev and sort_field values (the latter only when sort= is supplied)
{ "total_rows":49999, "rows": [ {"_id":"9","_rev":"2779848574","score":1.712123155593872}, {"_id":"8","_rev":"670155834","score":1.712123155593872} ] }
{ "total_rows":49999, "sort_order": [ {"field":"customer","reverse":false,"type":"string"}, {"reverse":false,"type":"doc"} ], "rows": [ {"_id":"75000","_rev":"372496647","score":1.712123155593872,"sort_order":["00000000000000",50802]}, {"_id":"170036","_rev":"3628205594","score":1.712123155593872,"sort_order":["00000000000000",51716]} ] }
Working With The Source
To develop "live", type "mvn dependency:unpack-dependencies" and change the external line to something like this;
fti=/usr/bin/java -cp /path/to/couchdb-lucene/target/classes:\ /path/to/couchdb-lucene/target/dependency org.apache.couchdb.lucene.Main
You will need to restart CouchDB if you change couchdb-lucene source code but this is very fast.
Configuration
couchdb-lucene respects several system properties;
You can override these properties like this;
fti=/usr/bin/java -D couchdb.lucene.dir=/tmp \ -cp /home/rnewson/Source/couchdb-lucene/target/classes:\ /home/rnewson/Source/couchdb-lucene/target/dependency\ org.apache.couchdb.lucene.Main