CARVIEW |
Select Language
HTTP/2 302
server: nginx
date: Sat, 16 Aug 2025 22:26:30 GMT
content-type: text/plain; charset=utf-8
content-length: 0
x-archive-redirect-reason: found capture at 20100922055555
location: https://web.archive.org/web/20100922055555/https://github.com/apache/pdfbox
server-timing: captures_list;dur=0.907622, exclusion.robots;dur=0.041593, exclusion.robots.policy;dur=0.026280, esindex;dur=0.013093, cdx.remote;dur=35.932189, LoadShardBlock;dur=309.916858, PetaboxLoader3.resolve;dur=214.626139, PetaboxLoader3.datanode;dur=48.038220
x-app-server: wwwb-app28
x-ts: 302
x-tr: 393
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
set-cookie: wb-p-SERVER=wwwb-app28; path=/
x-rl: 0
x-na: 0
x-page-cache: BYPASS
server-timing: BYPASS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
x-location: _pdf
HTTP/2 200
server: nginx
date: Sat, 16 Aug 2025 22:26:30 GMT
content-type: text/html; charset=utf-8
x-archive-orig-server: nginx/0.7.67
x-archive-orig-date: Wed, 22 Sep 2010 05:55:55 GMT
x-archive-orig-connection: close
x-archive-orig-status: 200 OK
x-archive-orig-etag: "b38c6e3ccd702e320a5b507048366ebb"
x-archive-orig-x-runtime: 67ms
x-archive-orig-content-length: 38794
x-archive-orig-cache-control: private, max-age=0, must-revalidate
x-archive-guessed-content-type: text/html
x-archive-guessed-charset: utf-8
memento-datetime: Wed, 22 Sep 2010 05:55:55 GMT
link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Wed, 22 Sep 2010 05:55:55 GMT", ; rel="memento"; datetime="Wed, 22 Sep 2010 05:55:55 GMT", ; rel="next memento"; datetime="Sun, 26 Sep 2010 10:37:50 GMT", ; rel="last memento"; datetime="Fri, 11 Jul 2025 19:23:55 GMT"
content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org
x-archive-src: 51_17_20100921234623_crawl102-c/51_17_20100922055445_crawl101.arc.gz
server-timing: captures_list;dur=0.899270, exclusion.robots;dur=0.042593, exclusion.robots.policy;dur=0.025012, esindex;dur=0.014998, cdx.remote;dur=17.167915, LoadShardBlock;dur=309.620347, PetaboxLoader3.resolve;dur=280.997222, PetaboxLoader3.datanode;dur=132.948656, load_resource;dur=177.497296
x-app-server: wwwb-app28
x-ts: 200
x-tr: 610
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
x-rl: 0
x-na: 0
x-page-cache: BYPASS
server-timing: BYPASS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
x-location: _pdf
content-encoding: gzip
apache's pdfbox at 1.1.x - GitHub
apache / pdfbox mirrored from git://git.apache.org/pdfbox.git
- Source
- Commits
- Network (2)
- Issues (0)
- Downloads (5)
- Graphs
-
Branch:
1.1.x
click here to add a description
click here to add a homepage

Jukka Zitting (author)
Thu Mar 25 09:28:23 -0700 2010
pdfbox /
name | age | message | |
---|---|---|---|
![]() |
KEYS | Mon Oct 05 09:53:23 -0700 2009 | added signed key for lehmi@apache.org [Andreas Lehmkuhler] |
![]() |
LICENSE.txt | Wed Aug 26 02:16:56 -0700 2009 | PDFBOX-366: License review [Jukka Zitting] |
![]() |
NOTICE.txt | Thu Feb 11 04:57:09 -0800 2010 | pdfbox: Prepare for the 1.0.0 release [Jukka Zitting] |
![]() |
README.txt | Sat Feb 13 04:27:37 -0800 2010 | Fixed the webpage link in the header. [Andreas Lehmkuhler] |
![]() |
RELEASE-NOTES.txt | Thu Mar 25 08:38:24 -0700 2010 | Minor update in release notes. [Jukka Zitting] |
![]() |
assembly.xml | Thu Mar 25 08:41:11 -0700 2010 | PDFBOX-644: Move FontBox and JempBox under the ... [Jukka Zitting] |
![]() |
fontbox/ | Thu Mar 25 09:28:23 -0700 2010 | Update 1.1.x version to 1.1.1-SNAPSHOT [Jukka Zitting] |
![]() |
jempbox/ | Thu Mar 25 09:28:23 -0700 2010 | Update 1.1.x version to 1.1.1-SNAPSHOT [Jukka Zitting] |
![]() |
parent/ | Thu Mar 25 09:28:23 -0700 2010 | Update 1.1.x version to 1.1.1-SNAPSHOT [Jukka Zitting] |
![]() |
pdfbox-checkstyle.xml | Sat Oct 24 05:28:59 -0700 2009 | Author tags are discouraged at the ASF so at le... [Jeremias Maerki] |
![]() |
pdfbox.war/ | Tue Dec 01 11:29:22 -0800 2009 | PDFBOX-555: replacing xml comment-tags with jsp... [Andreas Lehmkuhler] |
![]() |
pdfbox/ | Thu Mar 25 09:28:23 -0700 2010 | Update 1.1.x version to 1.1.1-SNAPSHOT [Jukka Zitting] |
![]() |
pom.xml | Thu Mar 25 09:28:23 -0700 2010 | Update 1.1.x version to 1.1.1-SNAPSHOT [Jukka Zitting] |
README.txt
=================================================== Apache PDFBox <https://pdfbox.apache.org/> =================================================== PDFBox is an open source Java library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities. PDFBox is published under the Apache License, Version 2.0. You need Java 5 (or higher) and Maven 2 <https://maven.apache.org/> to build PDFBox. The recommended build command is: mvn clean install The default build will compile the Java sources and package the binary classes into a jar package. See the Maven documentation for all the other available build options. There is also an Ant build that you can use to build the same binaries. The Ant build can also produce .NET DLLs if you have IKVM.NET <https://www.ikvm.net/> installed. See the build.xml file for details. PDFBox is a project of the Apache Software Foundation <https://www.apache.org/>. Known Limitations and Problems ============================== 1. You get text like "G38G43G36G51G5" instead of what you expect when you are extracting text. This is because the characters are a meaningless internal encoding that point to glyphs that are embedded in the PDF document. The only way to access the text is to use OCR. This may be a future enhancement. 2. You get an error message like "java.io.IOException: Can't handle font width" this MIGHT be due to the fact that you don't have the Resources directory in your classpath. The easiest solution is to simply include the apache-pdfbox-x.x.x.jar in your classpath. 3. You get text that has the correct characters, but in the wrong order. This mght be because you have not enabled sorting. The text in PDF files is stored in chunks and the chunks do not need to be stored in the order that they are displayed on a page. By default, PDFBox does not sort the text. Also, if you have text in a language that reads right to left (such as Arabic or Hebrew), make sure you have the ICU4J jar file in your classpath. This library is needed to properly hande right to left text. See the issue tracker at https://issues.apache.org/jira/browse/PDFBOX for the full list of known issues and requested features. License (see also LICENSE.txt) ============================== Collective work: Copyright 2010 The Apache Software Foundation. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Unmodifiable files ================== Apache PDFBox contains Adobe CMap and Glyph files that may be redistributed only in *unmodified* form. See the LICENSE file for the exact licensing conditions. Export control ============== This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See <https://www.wassenaar.org/> for more information. The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Apache PDFBox uses the Java Cryptography Architecture (JCA) and the Bouncy Castle libraries for handling encryption in PDF documents.
- © 2010 GitHub Inc. All rights reserved.
- Terms of Service
- Privacy
- Security
Keyboard Shortcuts
Site wide shortcuts
- s
- Focus site search
- ?
- Bring up this help dialog
Pull request list shortcuts
- j
- Move selected down
- k
- Move selected up
- o
- Open issue
- enter
- Open issue
Issues shortcuts
- c
- Create issue
- l
- Create label
- i
- Back to inbox
- u
- Back to issues
- I
- Mark selected as read
- U
- Mark selected as unread
- e
- Close selected
- y
- Remove selected from view
- j
- Move selected down
- k
- Move selected up
- o
- Open issue
- x
- Toggle select target
- /
- Focus issues search
- enter
- Open issue