CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Mon, 22 Dec 2025 16:23:34 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20100128234900 location: https://web.archive.org/web/20100128234900/https://github.com/clofresh/couch-crawler server-timing: captures_list;dur=1.272617, exclusion.robots;dur=0.097569, exclusion.robots.policy;dur=0.077608, esindex;dur=0.017633, cdx.remote;dur=10.125165, LoadShardBlock;dur=340.508608, PetaboxLoader3.resolve;dur=156.122281, PetaboxLoader3.datanode;dur=82.794368 x-app-server: wwwb-app210-dc8 x-ts: 302 x-tr: 396 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 set-cookie: wb-p-SERVER=wwwb-app210; path=/ x-location: All x-as: 14061 x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Mon, 22 Dec 2025 16:23:34 GMT content-type: text/html; charset=utf-8 x-archive-orig-server: nginx/0.7.61 x-archive-orig-date: Thu, 28 Jan 2010 23:48:59 GMT x-archive-orig-connection: close x-archive-orig-status: 200 OK x-archive-orig-etag: "c299390351a962425a19d49830621f3d" x-archive-orig-x-runtime: 268ms x-archive-orig-content-length: 21879 x-archive-orig-cache-control: private, max-age=0, must-revalidate x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Thu, 28 Jan 2010 23:49:00 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: 52_14_20100128202020_crawl103-c/52_14_20100128234810_crawl101.arc.gz server-timing: captures_list;dur=0.528591, exclusion.robots;dur=0.018778, exclusion.robots.policy;dur=0.008352, esindex;dur=0.011126, cdx.remote;dur=6.914709, LoadShardBlock;dur=141.657767, PetaboxLoader3.datanode;dur=100.977430, PetaboxLoader3.resolve;dur=356.460603, load_resource;dur=370.900730 x-app-server: wwwb-app210-dc8 x-ts: 200 x-tr: 585 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 x-location: All x-as: 14061 x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip clofresh's couch-crawler at master - GitHub

clofresh / couch-crawler

Branches (2)
- erlang-crawler
- master ✓
Tags (0)

A search engine built on top of couchdb-lucene — Read more

https://syntacticbayleaves.com/2010/01/17/announcing-couch-crawler-a-couchdb-search-enginecrawler/

This URL has Read+Write access

respect "nofollow"

clofresh (author)

Mon Jan 18 05:57:27 -0800 2010

commit 615d5638a0a329b1ca5c607a40105e7ac61c92aa
tree a1e3e4109e24b206b5ca7c32fe5bffd949e02c8f
parent 5d5cf2d313c95df023f39bf0839ae2d7c0f96c57

couch-crawler /

name	age	history message
LICENSE	Sun Jan 17 12:52:31 -0800 2010	adding bsd license [clofresh]
README.md	Sun Jan 17 13:09:51 -0800 2010	adding README.md documentation [clofresh]
couchapp/	Sun Jan 17 17:47:23 -0800 2010	make paging links static urls (fixes issue #1) [clofresh]
python/	Mon Jan 18 05:57:27 -0800 2010	respect "nofollow" [clofresh]

README.md

Couch Crawler

A search engine built on top of couchdb-lucene.

Dependencies

CouchDB

Python

Installation

Assuming couchdb-lucene was installed to the "_fti" endpoint, you can push Couch Crawler to your CouchDB instance with the command:

cd couchapp
couchapp push

This will create a new CouchDB database called "crawler" on the localhost:5984 CouchDB instance. To change the db, modify couchapp/.couchapprc and do another couchapp push.

To start indexing pages, run the crawler script:

cd python
python crawler.py https://url_to_crawl1 https://url_to_crawl2 ...

The crawler will start indexing given urls and follow and index any links it finds. You can set the max depth to follow urls with the -d option.

While it's indexing, you can visit the search engine at the following url:

https://localhost:5984/crawler/_design/crawler/index.html

Original Source | Taken Source

clofresh / couch-crawler

Pledgie Donations

Couch Crawler

Dependencies

Installation