CARVIEW |
Select Language
HTTP/2 200
date: Mon, 21 Jul 2025 22:44:17 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"139b5167c45beef2a59e3af3fc350f1a"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=rCK191UHdxbaKEQfHVQ%2FiCA3TDxXh%2BgjPEkokJ%2Fg02ClW5v%2F%2BPpvKlP9BijwqE2GfeQmpM6b3YIWO3HjO%2FeD8gKqiCCCnQRBLFlwZo1V0miszylHAaCxRUjuCBEyTp1HfnQiQGCoZXg09UJRzDg2KT42mC%2F429D5aabA9NRHG5yCScHk3Z%2BZfmDsUjbgii13iVHTpFxxKORX0hNCriQ2r0fk1Hh%2BYa3VNmfKK8%2BkOPRhwbN8epXjAAjO1E7U%2BGmZqD049bbK%2FzhM8xMQaOfxfw%3D%3D--DIfsGtA5v%2BI6o8Pm--H9OcLce5jDYBCklOTpu8XA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1261470355.1753137856; Path=/; Domain=github.com; Expires=Tue, 21 Jul 2026 22:44:16 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Tue, 21 Jul 2026 22:44:16 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: 9F96:DCF94:AD986:100AA3:687EC2C0
ArchiveBox Architecture Diagrams · ArchiveBox/ArchiveBox Wiki · GitHub
Loading
Loading
Loading
Loading
Skip to content
Navigation Menu
{{ message }}
-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
ArchiveBox Architecture Diagrams
Nick Sweeting edited this page Nov 12, 2024
·
3 revisions
stateDiagram-v2
archivebox.cli.main(sys.argv)
state Supervisord {
Scheduler
state Orchestrator {
[*] --> TICK
TICK --> SPAWN_ACTORS: queued > 0
SPAWN_ACTORS --> TICK
TICK --> IDLE: queued == 0
IDLE --> TICK: 1s
}
}
note left of archivebox.cli.main(sys.argv)
archivebox entrypoint
end note
state "archivebox.cli.SUBCOMMAND" as MAIN_THREAD
archivebox.cli.main(sys.argv) --> run_subcommand(sys.argv)
run_subcommand(sys.argv) --> setup_django()
setup_django() --> Supervisord: spawns in background
setup_django() --> MAIN_THREAD: runs in foreground
MAIN_THREAD --> archivebox.main.SUBCOMMAND
archivebox.main.SUBCOMMAND --> Storage: add_to_queue()
state Actors {
CrawlActor --> Crawl: tick()
SnapshotActor --> Snapshot: tick()
ArchiveResultActors --> ArchiveResult: tick()
}
state "State Machines" as JOBS {
state Crawl {
state "QUEUED" as CRAWL_QUEUED
state "STARTED" as CRAWL_STARTED
state "SEALED" as CRAWL_SEALED
CRAWL_QUEUED --> CRAWL_STARTED: create_root_snapshot()
CRAWL_STARTED --> CRAWL_SEALED: is_finished
}
state Snapshot {
state "QUEUED" as SNAP_QUEUED
state "STARTED" as SNAP_STARTED
state "SEALED" as SNAP_SEALED
SNAP_QUEUED --> SNAP_STARTED: create_pending_archiveresults()
SNAP_STARTED --> SNAP_SEALED: is_finished
}
state ArchiveResult {
QUEUED --> STARTED: run_extractor()
STARTED --> BACKOFF: is_temp_error
BACKOFF --> STARTED: is_retry_past
STARTED --> FAILED: is_fatal_error
STARTED --> SUCCEEDED: is_succeded
}
note right of ArchiveResult
exec_crome()
end note
note right of ArchiveResult
exec_wget()
end note
note right of ArchiveResult
exec_curl()
end note
note right of ArchiveResult
... other extractor subprocesses ...
end note
}
state Storage {
state "DB" as SQLITE_DB
sources/
archive/
state "index.json" as INDEX_JSONS
}
Storage: Storage
Orchestrator --> Actors: spawns subprocesses
Crawl --> Snapshot: create_root_snapshot()
Snapshot --> ArchiveResult: create_pending_archiveresults()
Crawl --> Storage: .save()
Snapshot --> Storage: .save()
ArchiveResult --> Storage: .save()
Storage --> Actors: get_queue()
-
crawls/models.py
:Crawl
-
crawls/statemachines.py
:CrawlMachine
stateDiagram-v2
STARTED --> SEALED: tick [is_finished]
STARTED --> STARTED: tick [!is_finished]
QUEUED --> STARTED: tick [can_start]
QUEUED --> QUEUED: tick [!can_start]
note left of QUEUED
Crawl created
end note
note right of STARTED
create_root_snapshot()
crawl.retry_at = now + 5s
end note
-
core/models.py
:Snapshot
-
core/statemachines.py
:SnapshotMachine
stateDiagram-v2
STARTED --> SEALED: tick [is_finished]
STARTED --> STARTED: tick [!is_finished]
QUEUED --> STARTED: tick [can_start]
QUEUED --> QUEUED: tick [!can_start]
note left of QUEUED
Snapshot created
end note
note right of STARTED
create_pending_archiveresults(extractors)
snapshot.retry_at = now + 60s
end note
-
core/models.py
:ArchiveResult
-
core/statemachines.py
:ArchiveResultMachine

stateDiagram-v2
QUEUED --> QUEUED: tick [!can_start]
QUEUED --> STARTED: tick [can_start]
STARTED --> STARTED: tick [!is_finished]
STARTED --> BACKOFF: tick [is_backoff]
STARTED --> FAILED: tick [is_failed]
STARTED --> SUCCEEDED: tick [is_succeeded]
BACKOFF --> BACKOFF: tick [!can_start]
BACKOFF --> STARTED: tick [can_start]
note left of QUEUED
ArchiveResult created
end note
note left of STARTED
start_ts = now
retry_at = now + 60s
create_output_dir()
run_extractor()
end note
note right of BACKOFF
retry_at = now + 60s
end note
note right of SUCCEEDED
end_ts = now
retry_at = None
end note
note right of FAILED
end_ts = now
retry_at = None
end note
- 🔢 Quickstart
- 🖥️ Install
- 🐳 Docker
- ➡️ Supported Sources
- ⬅️ Supported Outputs
- ﹩Command Line
- 🌐 Web UI
- 🧩 Browser Extension
- 👾 REST API / Webhooks
- 📜 Python API / REPL / SQL API
- Upgrading
- Setting up Storage (NFS/SMB/S3/etc)
- Setting up Authentication (SSO/LDAP/etc)
- Setting up Search (rg/sonic/etc)
- Scheduled Archiving
- Publishing Your Archive
- Chromium Install
- Cookies & Sessions Setup
- Merging Collections
- Troubleshooting
- ⭐️ Web Archiving Community
- Background & Motivation
- Comparison to Other Tools
- Architecture Diagram
- Changelog & Roadmap
Clone this wiki locally
You can’t perform that action at this time.