CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 273
Releases: UglyToad/PdfPig
Forest Mountain (0.1.11)
Compare
Welcome to version 0.1.11. The changes in this version have mainly focused on stability. There is a breaking API change.
We have also started to run tests against a larger corpus of documents from Common Crawl allowing us to find bugs and malformed files proactively. This release is screened against 6000 additional files.
- Improvements to content and font parsing detected by fuzzing inputs.
- Improvements and resiliency for finding the
startxref
location when parsing a file.. - Adds build and tests for Mac OS as well as retrieving system fonts on iPad (Mac Catalyst).
- Support clipping when rendering XObjects.
- Prevent malformed files leading to an out-of-memory when decompressing streams.
- Make
IGraphicsStateOperationFactory
andReflectionGraphicsStateOperationFactory
public. - Softmask support for images.
- Performance improvements using
Span
andReadOnlyMemory
where available. - Handle corrupt files where the stream contains comment tokens.
- Improvements to copying from existing files when using
PdfDocumentBuilder
, fixes some bugs with copying fonts and dictionary tokens referenced indirectly. - Handle corrupt files with double
endstream
definitions. - More tolerant parsing for a number of invalid PDFs, including invalid USC2 input, CMAP formats, CFF fonts, missing font subtypes, invalid
xref
table positions, missing/FirstChar
entry for font dictionaries and corrupt ASCII 85 encoded data. - Fix an issue where adding content to an existing PDF using
PdfDocumentBuilder
could result in upside-down or wrongly positioned text due to global transforms in the source PDF. - New option to completely skip annotations when building a document.
- Prevent infinite loops in certain documents #1096.
- Improved performance when tokenizing numbers, this should provide a minor speed improvement.
- When adding a page from an existing PDF to a
PdfDocumentBuilder
any external link annotations should be preserved.
Breaking changes
The method on PdfDocumentBuilder
:
public PdfPageBuilder AddPage(PdfDocument document, int pageNumber, Func<PdfAction, PdfAction?>? copyLink)
Has been changed to wrap the copyLink
parameter in an options object to support the KeepAnnotations
option:
public PdfPageBuilder AddPage(PdfDocument document, int pageNumber, AddPageOptions options)
You can just set the CopyLinkFunc
property in the options object if you need to access this functionality.
Auto generated change log
- Bump version to 0.1.11-alpha001 by @BobLd in #1009
- Improve Jpeg2000Helper to support J2K codec and add test by @BobLd in #1010
- Add SetStrokeDetails() and SetFillDetails() to PdfPath and tidy up ContentStreamProcessor by @BobLd in #1014
- Implement clipping in ProcessFormXObject() by @BobLd in #1015
- Fix #1017 by @lofcz in #1018
- Fix PatternColor Equals() method and fix #1016 by @BobLd in #1019
- Feature/image mask by @BobLd in #1012
- Update README.md by @BobLd in #1020
- Fix bug where FormXObject bbox needs to be normalised by @BobLd in #1021
- Add MacOS test pipeline and fix failing tests by @BobLd in #1025
- Update README.md by @BobLd in #1026
- Seal PdfSubpath class and IPathCommand implementations, fix Close.GetHashCode() and fix #1027 by @BobLd in #1029
- Fix issue #1013 by @BobLd in #1031
- Add support for MacCatalyst in SystemFontFinder by @BobLd in #1033
- Make sure the value of the ImageMask / Im token is check in ColorSpaceDetailsParser by @BobLd in #1038
- Add early support for Stencil masking, rename SoftMaskImage property into MaskImage and make sure IsInlineImage is true for InlineImage by @BobLd in #1039
- Bugfix and optimize GetStartXrefPosition by @ricflams in #1036
- Fix bug introduced in #1039 by @BobLd in #1041
- Try to repair xref offset by looking for all startxref and fix #1040 by @BobLd in #1044
- Add test to ensure #822 is fixed by @BobLd in #1045
- Handle TrueType case in CidFontFactory where the font is CFF by @BobLd in #1046
- Issues/1048 by @BobLd in #1049
- Check for infinite recursion in ObjectLocationProvider.TryGetOffset() and fix #1050 by @BobLd in #1051
- Improve IFilter memory allocation by @BobLd in #1052
- Modernise PngPredictor and refactor LzwFilter and FlateFilter to reduce memory allocation by @BobLd in #1053
- Do not throw if the Mask dictionary contains a ColorSpace key by @BobLd in #1055
- Make the Diacritics class public for use in external StreamProcessors by @BobLd in #1056
- Add extension method to get Memory from MemoryStream, attempting to do it without allocation and update CMapParser by @BobLd in #1057
- Miscellaneous minor changes by @BobLd in #1058
- Optimize internal representation of IndirectReference by @BobLd in #1059
- Skip creating IndirectReference in CrossReferenceTablePartBuilder when generationNumber is more than 65,535 by @BobLd in #1060
- Check ColorSpace token as dictionary and fix issue #1061 by @BobLd in #1063
- Make classes related to page content parsing public by @BobLd in #1065
- Prevent RunLengthFilter malicious OOM by @BobLd in #1068
- Use ReadOnlyMemory in ShowText operators and implement MoveToNextLineShowTextWithSpacing parsing by @BobLd in #1066
- Fix bug in PngFromPdfImageFactory where softmask is wrongly referenced. by @orrest in #1069
- Fix issue 926 by @EliotJones in #1072
- writer util did not follow reference links #1032 by @EliotJones in #1073
- fix #670 by ignoring duplicate endstream definitions by @EliotJones in #1075
- skip single letter final blocks by @EliotJones in #1076
- fix copying of sub-dictionary when keys collide by @EliotJones in #1077
- use correct bounding boxes for standard 14 glyphs #850 by @EliotJones in #1080
- back-calculate first char if last char and widths present by @EliotJones in #1081
- fix off-by-one and optimize brute force xref search #1078 by @EliotJones in #1079
- fall back to times-roman as standard 14 font when lenient by @EliotJones in #1085
- allow reading to continue if encountering an invalid surrogate pair by @EliotJones in #1084
- fix colorspace error when form xobject contains a transparency group by @EliotJones in #1088
- support bfrange having incorrect length in a cmap by @EliotJones in #1089
- add new action to run integration against common crawl corpus by @EliotJones in #1090
- Update run_common_crawl_tests.yml by @BobLd in #1091
- Remove decode parameter application from Stencil color space for consistency by @BobLd in #1092
- Update hack for 1bpc + DeviceGray by @BobLd in #1093
- when writing content to an existing page inverse any global transform #614 by @EliotJones in #1094
- add option to strip annotation by @EnraH in #492
- check for cycles during indirect reference resolution by @jan-sutter in #1097
- i merged a pr which broke the build, this updates the build to work by @EliotJones in #1099
- remove debug asserts causing test failures by @EliotJones in #1098
- Track IndirectReference instead of only ObjectNumber when checking for cycles during indirect reference resolution and add test by @BobLd in #1101
- move last uncovered operators to switch statement by @EliotJones in #1100
- rework numeric tokenizer hot path by @EliotJones in #1104
- make link copying more tolerant when adding page by @EliotJones in #1103
- handle additional broken pdf files in the common crawl set by @EliotJones in #1108
- update readme to avoid people using
page.Text
or asking about editing docs by @EliotJones in #110...
Assets 3
v0.1.10
Compare
What's Changed
- Fix GetTextOrientation by cleanly checking if rotation is divisible by 90 and fix #913 by @BobLd in #914
- Add early version of BrowserSystemFontLister by @BobLd in #920
- Remove list from FileTrailerParser.GetStartXrefPosition() by @BobLd in #922
- Reorganise Filters and make them public by @BobLd in #925
- Support decrypting V4/R4 files with AESV2 and no Length property by @Greybird in #924
- Use pdfScanner in ReadVerticalDisplacements and fix #693 and return 0β¦ by @BobLd in #928
- Default page number to 0 in ExplicitDestination when the Dest has no page number and fix #736 by @BobLd in #930
- Move Paths, GetAnnotations() and GetOptionalContents() outside of ExperimentalAccess and mark Experimental class and reference as obsolete by @BobLd in #931
- Upgrade tests project NuGet packages by @BobLd in #932
- Optimize cross reference object offset validation by avoiding nested loop by @madelson in #935
- Revive trimming/AOT analysis by @madelson in #939
- Stop treating Warnings as Errors by @BobLd in #941
- Handle alternate Unicode name representation cXXX and fix #943 by @BobLd in #944
- Handle odd ligatures names and fix #945 by @BobLd in #946
- Update additional glyph list to latest from PDFBox by @BobLd in #948
- New GetText() option: NegativeGapAsWhitespace by @Kizaemon in #952
- Fix for IndexOutOfRangeException exception by @GrabzIt in #955
- Fix "Nightly Release" pipeline following csproj changes by @BobLd in #957
- Do not throw exception when lenient parsing in ON in CrossReferenceParser and fix #959 by @BobLd in #961
- Improve UnwrapIndexedColorSpaceBytes by @BobLd in #962
- Fix out of range exception in AnnotationProvider by @BobLd in #963
- Return a copy of the ArrayPoolBufferWriter buffer in Ascii85, AsciiHex and RunLength filters and fix #964 by @BobLd in #965
- Make ColorSpaceDetails.BaseNumberOfColorComponents public to allow for external image factories by @BobLd in #966
- Improve GlyphList by @BobLd in #967
- Properly handle ZapfDingbats font for TrueTypeSimpleFont and add tests by @BobLd in #969
- Execute RemoveStridePadding in place when possible by @BobLd in #968
- Add HexToken case in OptionalContent parsing by @simonedd in #971
- Update UglyToad.PdfPig.ConsoleRunner target framework to net8 by @BobLd in #972
- Do not throw error on Pop when stack size is 1 in lenient mode and fix #973 by @BobLd in #974
- Fix warnings about "type 'K' cannot be used as type parameter 'TKey' in the generic type or method 'Dictionary<TKey, TValue>'" by @BobLd in #976
- Refactor XObjectFactory by @BobLd in #977
- Update UnpackComponents() to account for 1bpc + DeviceGray (hack for Jbig2) by @BobLd in #978
- CcittFaxDecodeFilter: do not check for input length, invert bitmap with ref byte and fix #982 by @BobLd in #983
- Add JPX bits per component decoding by @BobLd in #986
- Issues/987 by @BobLd in #990
- Make DecodeParameterResolver class public by @BobLd in #993
- Update Microsoft and SkiaSharp NuGet packages by @BobLd in #994
- Update Microsoft NuGet packages for UglyToad.PdfPig.Package by @BobLd in #996
- Resolve image data (implementation from @kasperdaff) by @BobLd in #998
- Pass IFilterProvider to IFilter.Decode() and handle null in PdfExtensions.Resolve() by @BobLd in #999
- Improve GetExtendedGraphicsStateDictionary() and StackDictionary.TryGetValue() by @BobLd in #1004
- Better handle integer overflow in DocstrumBoundingBoxes by @BobLd in #1005
- version 0.1.10 by @BobLd in #1006
- Update run_integration_tests.yml by @BobLd in #1007
New Contributors
- @madelson made their first contribution in #935
- @Kizaemon made their first contribution in #952
- @GrabzIt made their first contribution in #955
- @simonedd made their first contribution in #971
Full Changelog: v0.1.9...v0.1.10
Assets 3
Red Wattle Hog
Compare
This will be the last release solely by the current maintainer, future releases can come from new co-maintainer(s) and you should audit your dependency upgrades on this basis.
This is the first major release in well over a year so it is not feasible to provide exhaustive release notes.
This release contains many performance improvements and bug-fixes. We also drop support for the following full framework versions:
- .NET 4.5.1
- .NET 4.5.2
- .NET 4.6
- .NET 4.6.1
If you are using full framework the newest version has additional dependencies:
- Microsoft.Bcl.HashCode (>= 1.1.1)
- System.Memory (>= 4.5.5)
The other major change is to use double
instead of decimal
package-wide. This should provide performance benefits and more closely matches the behavior in the official PDF specification. Where you were using decimal
before you will need to switch to double
.
Thanks to all the contributors!
Assets 3
Tamworth
Compare
This is a release with various bug-fixes and quality of life improvements but no new major features. It adds many of the supporting classes necessary for PDF rendering.
Breaking Changes
IColor
can now be of typePatternColor
. This implementation will throw an error when callingToRGBValues()
. You might have to check forIColor.ColorSpace != ColorSpace.Pattern
before calling this function- Remove
Details
suffix fromColorSpaceDetails
property names AlternateColorSpaceDetails
renamed toAlternateColorSpace
BaseColorSpaceDetails
renamed toBaseColorSpace
- Seal
IColor
implementations - Use
double
instead ofdecimal
in color spaces and colors - Move
IColorSpaceContext
fromIOperationContext
toCurrentGraphicsState
- Removed
ColorSpace
property fromIPdfImage
. UseColorSpaceDetails.Type
to get the enum value IColorSpaceContext
'sCurrentStrokingColorSpace
andCurrentNonStrokingColorSpace
are now of typeColorSpaceDetails
(not aColorSpace
enum
anymore). UseCurrentStrokingColorSpace.Type
orCurrentNonStrokingColorSpace.Type
to get theenum
value- Logic change to
DefaultWordExtractor
, a logic bug in the existing implementation was fixed, meaning the output of the defaultpage.GetWords()
may change in this version
NET 4.5
Note that this version removes support for .NET 4.5. Consumers should upgrade to .NET 4.5.1 or 4.5.2
Release notes
- Fix support for using the ZapfDingbats Standard 14 font when creating files
- Address issue with extracting CJK text from PDFs
- Fix issue with writing ShowText operations to output files when the text contained parentheses
- Error handling for Type 2 charstring parsing
- New letter properties,
TextRenderingMode
,StrokeColor
andFillColor
- Fix for copying inline images to output files
- Enums for PDF/A-3 compliance
- Fix for library embedding PNGs with invalid information on output
- Resolve
PageSize
enum for landscape orientation documents - Fix to rotation handling. The coordinates used for letters etc. are different now for rotated and/or cropped pages
- Fix to calculated positions of annotations
- Fix to adding JPG files to output documents
- Add height to Type 3 font bounding boxes and default width/height for zero values
CreationDate
andModifiedDate
are now available inDocumentInformationBuilder
- Images can be added to document builder without specifying placement rectangle, this will place the image at 0,0 with full width and height
PdfAction
exposed byAnnotation
class.InReplyTo
property also addedGetFields
extensions method forAcroForm
type- Fix for internal links when using existing documents with annotations with
PdfDocumentBuilder
- Handle name conflicts when using
PdfDocumentBuilder
with one or more existing documents - Swaps internal uses of
Rijndael
andRijndaelManaged
toAes
since these were marked as obsolete
Assets 3
Gloucestershire Old Spots
Compare
Changes since 0.1.6:
- Add
page.SetRotation
forPdfPageBuilder
- Add
SkipMissingFonts
to parsing options to ignore content where the font is not present or corrupt. Can result in content being missed during extraction but will enable partial extraction of retrievable content on page for corrupted files. - Multiple bug fixes thanks to @fnatzke
- Fix to page number order bug on extraction thanks to @grinay
- Various shape drawing utilities on
PdfPageBuilder
thanks to @Jonowa - Fix to issue in
GrahamScan
thanks to @BobLd - Remove stray
Debugger.Break
from the encryption handler - Various other bug fixes
Assets 3
Australian Yorkshire
Compare
Mainly bug fixes. There are some compatibility changes in the document layout analysis API. See here: https://github.com/UglyToad/PdfPig/wiki/Migration-to-0.1.6
- Fix transparency being applied for PDF/A-1
- Fixes to string handling
- .NET 6.0 support
- Handle null rather than missing encryption data
- Fixes bug with size of JPG files in documents created by PdfPig
- Better handling for unusual Type1 fonts
- Support for invisible/hidden text in document builder
- Fixes stack overflow when parsing page tree for some documents
- Fixes bug in some glyph bounding boxes for Type2 fonts
- Handle non-contiguous xref ranges when building a document
- Better location of version headers for non-compliant documents
Assets 3
Finnish Landrace
Compare
Changes since v0.1.4: v0.1.4...v0.1.5
Assets 3
0.1.5 Second Alpha
Compare
Some more bug-fixes:
- Fix for object streams in files which require brute force searching.
- Handle
NullToken
presence when creating documents. - Support for PDFs where the filters are defined as indirect references (against specification).
- Support for CMYK when generating PNG images from
IPdfImage
. - Support for indexed ColorSpaces where palette is stored in a string.
- Handle UTF16 strings in encrypted document dictionaries.
- Handle documents with a XMP metadata stream instead of an information dictionary.
- CCITTFaxDecode filter support.
- Tweaks to
DefaultWordExtractor
to try and detect word gap size based on preceding text instead of a global gap threshold.
Note that changes to DefaultWordExtractor
may change the output of calls to Page.GetWords()
in this version.
Assets 3
0.1.5 First Alpha
Compare
First alpha version of 0.1.5
- Fix glyph bounding boxes and paths for Type1 fonts using flexpoints.
- Fix stack overflow when merging some documents.
- Support loading existing documents into PdfDocumentBuilder.
- Performance improvements for multithreaded scenarios.
- Fix checked value for AcroForm checkboxes where the checked state is appearance only.
- New
page.GetOptionalContents()
partial optional content retrieval support. - Partial support for colorspace details on
IPdfImage
s. - Multiple bug-fixes for various font related issues.
Breaking changes:
PdfDocumentBuilder
now implementsIDisposable
. This disposes the underlying stream by default but this is aMemoryStream
normally so not any serious consequences if left undisposed.PdfPageBuilder
had theAdvancedEditing
property removed. The API is now available in theContentStream
methods / properties (this was from #250).
Assets 3
British Lop
Compare
- Adds support for filling rectangles when using
PdfDocumentBuilder
. TheDrawRectangle
method now takes an optional boolean parameter,fill
. - Fix bug recognising Standard 14 fonts with
Arial MT
naming. - Handle unusual object streams containing
endobj
tokens. - Support broken
Differences
arrays for encodings. - Support very long xref streams by making infinite loop detection more relaxed.
- Fix issue with parsing Type0 fonts that are using indirect references.
- Internal structure changes to support pdf to image work.