CARVIEW |
Select Language
HTTP/2 200
date: Thu, 24 Jul 2025 15:59:50 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
etag: W/"77d64abe8a15951c76e7424966916969"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=92GvY3SlfkkQRUhdP3Kn0DZEqtpV7K9waDD2NJNK7wGYaeZkudeLXe0trI05yJ3apZyy7PtJHRuocG5fMyIQD7KGxVvLDJy0R%2BA8U9xGM%2FFRhtjkcAOBhsxTJu8FEIoW1TqWSNp2AKkZxieXe6P0i4XBdOXwxVF0STk76j6H%2FlvAmEtVdolZq8q6ssgLz74z5VlioSLr1IoKOSW%2BF7dWfMcj9SKklny1edXMKueYxUgjjbhDempOwsLhV5xLc0qC7ec64HBrQlmi%2FEsyWZd%2F5A%3D%3D--W5MbWzUIFJ2%2F38JL--TK3ZzQUHuuKD34QUGplg8A%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1870147273.1753372789; Path=/; Domain=github.com; Expires=Fri, 24 Jul 2026 15:59:49 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Fri, 24 Jul 2026 15:59:49 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: B8F4:1B4B27:F2E80:11B7B4:68825875
Releases Β· iipc/jwarc Β· GitHub
16 Jul 10:24
Loading
20 Nov 04:11
Loading
14 Nov 01:59
Loading
28 Jun 07:36
Loading
14 Feb 04:43
Loading
09 Feb 07:15
Loading
13 Dec 05:34
Loading
13 Dec 05:33
Loading
28 Sep 00:09
Loading
15 Sep 07:18
Loading
Skip to content
Navigation Menu
{{ message }}
-
Notifications
You must be signed in to change notification settings - Fork 10
Releases: iipc/jwarc
Releases Β· iipc/jwarc
v0.32.0
Compare
New features
- HeaderValidator with WARC/1.1 standard ruleset
- ExtractTool: can now extract sequential concurrent records (
--concurrent
option) - DedupeTool
- In-memory cache for cross-URL digest-based deduplication (
--cache-size
option) - Now prints deduplication statistics (
--dry-run
and--quiet
options) - Multi-threaded deduplication (
--threads
option)
- In-memory cache for cross-URL digest-based deduplication (
- ValidateTool
- Multi-threaded validation (
--threads
option)
- Multi-threaded validation (
- ParsingException message is now annotated with the source filename and record offset when available
Bugs fixed
- RFC5952 canonical form is now used for IPv6 addresses in WARC-IP-Address
- HttpParser in lenient mode now:
- accepts responses missing version number
- ignores header lines missing :
- ignores folded status lines
- WarcParser: treats
alexa/dat
ARC records as not HTTP type
Assets 2
v0.31.1: Release 0.31.1
Compare
Bugs fixed
- Fixed URIs.parseLeniently() returning a different value to new URI() if the path was empty or the input contained percent encoded characters #90 #91
- Replaced some internal usages of record.targetURI() with record.target() to reduce the chance of runtime exceptions and preserve the exact original value
Assets 3
v0.31.0: Release 0.31.0
Compare
New features
- Added optional support for brotli content encoding #88
- Added HttpMessage.bodyDecoded() #88
- WarcTool: Added
dedupe
subcommand - DedupeTool: Added --verbose option and silenced default logging
Bug fixes
- GunzipChannel: Fixed incorrect record length calculation when gzip footer aligns with the end of the buffer
- ValidateTool: Fixed digest validation #87
- DedupeTool: Used matchType=exact to properly handle CDX queries for URLs ending with
*
- DedupeTool: Fixed record copying when transferTo copies fewer bytes than requested
- DedupeTool: Prevented appending of an empty gzip member when no records were deduplicated
- DedupeTool: Fixed exception when input files are in the current working directory
Assets 3
v0.30.0: Release 0.30.0
Compare
New features
- WarcReader and WarcParser gained a lenient parsing mode which:
- permits ASCII control characters in header field names and values
- allows lines to end with LF instead of CRLF
- permits multi-digit WARC minor versions like "0.18"
Assets 3
v0.29.0: Release 0.29.0
Compare
New features
- Added MediaType.parseLeniently() and .isValid()
Changes
- Message.contentType() and other methods that internally call it now use the lenient MediaType parser instead of throwing IllegalArgumentException #83
Assets 3
v0.28.6: Release 0.28.6
Compare
Bugs fixed
- Improved compatibility with ARC variants (version-block length off by one, v2 version-block, spurious linefeeds) #82
- WarcParser: Context in parse error messages was incorrectly using the parser (file) position instead of buffer position
Assets 3
v0.28.5: Release 0.28.5
Compare
Bugs fixed
- Fixed ClosedChannelException when reading a WarcRevisit body after closing a previous one due to reuse of empty MessageBody. #80
Assets 3
v0.28.4: Release 0.28.4
Compare
Bugs fixed
- CDX formatting now percent encodes spaces, newlines and null characters in all string fields. This is non-standard but at least prevents us outputting invalid CDX lines.
- CdxRequestEncoder now handles requests with an invalid content-type header
Assets 2
v0.28.3: Release 0.28.3
Compare
Assets 2
1 person reacted
v0.28.2: Release 0.28.2
Compare
Changes:
- HttpRequest+HttpResponse in lenient mode now recovers when parsing the Content-Length header throws NumberFormatException
- WarcParser now tries to leniently parse ARC records containing corrupt dates
Assets 2
Previous Next
You canβt perform that action at this time.