CARVIEW |
Select Language
HTTP/2 302
server: nginx
date: Wed, 06 Aug 2025 10:07:52 GMT
content-type: text/plain; charset=utf-8
content-length: 0
x-archive-redirect-reason: found capture at 20091004140112
location: https://web.archive.org/web/20091004140112/https://github.com/emi/bixo
server-timing: captures_list;dur=0.969715, exclusion.robots;dur=0.036279, exclusion.robots.policy;dur=0.015128, esindex;dur=0.015002, cdx.remote;dur=9.650569, LoadShardBlock;dur=249.011200, PetaboxLoader3.datanode;dur=55.085547, PetaboxLoader3.resolve;dur=109.077292
x-app-server: wwwb-app224
x-ts: 302
x-tr: 309
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
set-cookie: wb-p-SERVER=wwwb-app224; path=/
x-location: All
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
HTTP/2 200
server: nginx
date: Wed, 06 Aug 2025 10:07:52 GMT
content-type: text/html; charset=utf-8
x-archive-orig-server: nginx/0.7.61
x-archive-orig-date: Sun, 04 Oct 2009 14:01:12 GMT
x-archive-orig-connection: close
x-archive-orig-status: 200 OK
x-archive-orig-etag: "cce162e93c69c5da7325556dcbc0fbc3"
x-archive-orig-x-runtime: 129ms
x-archive-orig-content-length: 25425
x-archive-orig-cache-control: private, max-age=0, must-revalidate
x-archive-guessed-content-type: text/html
x-archive-guessed-charset: utf-8
memento-datetime: Sun, 04 Oct 2009 14:01:12 GMT
link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Sat, 03 Oct 2009 02:14:24 GMT", ; rel="prev memento"; datetime="Sat, 03 Oct 2009 02:14:24 GMT", ; rel="memento"; datetime="Sun, 04 Oct 2009 14:01:12 GMT", ; rel="next memento"; datetime="Wed, 09 Feb 2011 13:43:37 GMT", ; rel="last memento"; datetime="Tue, 11 Feb 2025 08:21:45 GMT"
content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org
x-archive-src: 52_12_20091004130742_crawl102_IndexOnly-c/52_12_20091004140005_crawl101.arc.gz
server-timing: captures_list;dur=0.741946, exclusion.robots;dur=0.027910, exclusion.robots.policy;dur=0.011832, esindex;dur=0.014983, cdx.remote;dur=212.072404, LoadShardBlock;dur=260.669595, PetaboxLoader3.datanode;dur=189.986937, PetaboxLoader3.resolve;dur=162.488308, load_resource;dur=110.685868
x-app-server: wwwb-app224
x-ts: 200
x-tr: 645
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
x-location: All
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
content-encoding: gzip
emi's bixo at master - GitHub
This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (

This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (

Description: | A creepy crawler edit |
Homepage: | edit |
Public Clone URL: |
git://github.com/emi/bixo.git
Give this clone URL to anyone.
git clone git://github.com/emi/bixo.git
|
Your Clone URL: |
Use this clone URL yourself.
git clone git@github.com:emi/bixo.git
|

Ken Krugler (author)
Fri Oct 02 16:19:31 -0700 2009
bixo /
name | age | message | |
---|---|---|---|
![]() |
.gitignore | Tue Sep 22 06:30:29 -0700 2009 | Fixed problem w/"fresh" builds: .gitingore ign... [Ken Krugler] |
![]() |
README | Tue Sep 22 09:55:57 -0700 2009 | Updated project description. [Ken Krugler] |
![]() |
bin/ | Fri Aug 28 13:52:05 -0700 2009 | Added ref link to on-line documentation. [Ken Krugler] |
![]() |
build.xml | Wed Sep 30 11:52:30 -0700 2009 | Cleaned up vestigial use of "testcase". Added ... [Ken Krugler] |
![]() |
contrib/ | Tue Sep 22 15:19:29 -0700 2009 | Added contrib with first component - the "helpf... [Ken Krugler] |
![]() |
doc/ | Fri Oct 02 16:19:31 -0700 2009 | Use clean when doing dist [Ken Krugler] |
![]() |
lib/ | Mon Sep 21 12:40:47 -0700 2009 | Complete the switch to using Maven for jar depe... [Ken Krugler] |
![]() |
pom.xml | Wed Sep 30 10:32:44 -0700 2009 | Clean up pom.xml [Ken Krugler] |
![]() |
src/ | Fri Oct 02 16:17:21 -0700 2009 | Added big gnarly test case for all incoming Url... [Ken Krugler] |
README
=============================== Introduction =============================== Bixo is an open source Java web mining tooklit that runs as a series of Cascading pipes. It is designed to be used as a tool for creating customized web mining apps. By building a customized Cascading pipe assembly, you can quickly create a workflow using Bixo that fetches web content, parses, analyzes, and publishes the results. Bixo borrows heavily from the Apache Nutch project, as well as many other open source rojects at Apache and elsewhere. Bixo is released under the MIT license. =============================== Building =============================== See https://bixo.101tec.com/documentation/building-bixo/ for full details. You need Apache Ant 1.7 or higher. To get a list of valid targets: % cd <project directory> % ant -p To clean, run the tests and build a jar: % ant clean test jar To create Eclipse project files: % ant eclipse Than choose "Import existing project" in Eclipse, and select the Bixo project directory.
This feature is coming soon. Sit tight!