| CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 299
Releases: UglyToad/PdfPig
Red Sphere (0.1.13)
What's Changed
- Increment version to 0.1.13 by @BobLd in #1207
- Simply order by offset also when not doing brute force to fix #1208 by @BobLd in #1210
- Ensure no key end up missing in ResolveInternal and fix #1209 by @BobLd in #1211
- update release logic to check out master before commit by @EliotJones in #1212
- Return empty glyph in ReadCompositeGlyph when glyphIndex is out of range and fix #1213 by @BobLd in #1215
- Handling of optional content group names without proper name by @carlokok in #1216
- Minor Type1FontParser optimisations by @BobLd in #1221
- Use file header offset when doing brute force find and fix #1223 by @BobLd in #1224
- Do not return glyph bbox and path in Type1Font if character name is '.notdef' by @BobLd in #1229
New Contributors
Full Changelog: v0.1.12...unreleased
Assets 2
Green Cube (0.1.12)
What's Changed
- add nullability to core project by @EliotJones in #1111
- Fix usage of List.Contains by @theolivenbaum in #1112
- allow missing catalog type definition for catalog dictionary by @EliotJones in #1113
- Performance improvements and .Net 9 support by @chuckbeasley in #1116
- Update run_integration_tests.yml by @BobLd in #1117
- Add global.json in tools by @BobLd in #1118
- Update run_integration_tests.yml by @BobLd in #1119
- Update run_integration_tests.yml by @BobLd in #1120
- Update run_common_crawl_tests.yml by @BobLd in #1121
- Update nightly_release.yml by @BobLd in #1123
- Increase FlateFilter multiplier when preventing malicious OOM and fix #1125 by @BobLd in #1126
- Update build_and_test_macos.yml by @BobLd in #1129
- Update build_and_test_macos.yml by @BobLd in #1130
- Prevent StackOverflow in ParseTrailer and fix #1122 by @BobLd in #1127
- Lower max search depth in preventing StackOverflow in ParseTrailer by @BobLd in #1131
- add container node support for BookmarksProvider.cs by @migeyusu in #1133
- move file parsing to single-pass static methods by @EliotJones in #1102
- Add early version of IOSSystemFontLister by @BobLd in #1143
- File buffering read stream investigation by @EliotJones in #1140
- Draft release on master build by @EliotJones in #1145
- First create the StreamInputBytes in PdfDocument.Open() to check the stream CanRead and CanSeek by @BobLd in #1147
- Fix font matrix issues by @BobLd in #1150
- Properly fix #1148 by always parsing optional tables in TrueTypeFontParser and remove Type 0 font hack by @BobLd in #1151
- copy other parser behavior by treating end of stream as valid end inline image by @EliotJones in #1152
- add test jobs for common crawl 0000 to 0007 by @EliotJones in #1153
- handle case where xobjects use same key as fonts by @EliotJones in #1154
- read last line of ignore file by @EliotJones in #1155
- Use correct font matrix when transforming the width in Type 0 font and fix #1156 by @BobLd in #1157
- Add initial support to process CFF fonts contained inside a TrueType font by @BobLd in #1159
- Handle non seekable stream by copying it into a memory stream and fix #1146 by @BobLd in #1158
- handle case where offsets are out of range by @EliotJones in #1160
- Use record struct in FileHeaderOffset by @BobLd in #1161
- Expose letter's font via GetFont(), make Font property as obsolete and use FontDetails instead by @BobLd in #1166
- Add GetDescent() and GetAscent() to IFont and loose bounding box to letter by @BobLd in #1167
- Use pageFactoryCache.Clear() in Pages dispose and fix #1170 by @BobLd in #1174
- Bugfix: xref-streams were not added by @ricflams in #1173
- Guard against circular references in XRef tables/streams by @ricflams in #1175
- Add more tests to NearestNeighbourWordExtractorTests by @BobLd in #1180
- Feature/improve group indexes by @BobLd in #1181
- Trim excess in long lived font collections by @BobLd in #1184
- Set Type 3 font ascent to Top instead of Height, see #1164 by @BobLd in #1185
- Only apply RemoveStridePadding() when bytes per pixel is one and fix #1183 by @BobLd in #1187
- Use zlib decode to properly use window size and checksum in flate filter by @rhuijben in #1186
- Avoid doing a true file seek for simple peeking in the token parser by @rhuijben in #1188
- Fix regression introduced in 3592fc8 where slicing the stream to the length breaks decoding by @BobLd in #1192
- Update NameToUnicodeConvertAglSpecification to test what was intended by @rhuijben in #1191
- Add CMap caching at document level and add MurmurHash3 hashing function by @BobLd in #1193
- Avoid reading ahead and then seeking back by @rhuijben in #1189
- Do not slice the stream to the length breaks decoding in FlateDecode by @BobLd in #1194
- Update test run command to use Release configuration by @BobLd in #1195
- Make geometric transforms consistent with PDF specification by @PsykerUdot in #1198
- Check for index out of range in GlyphDataTable.ReadFlags() and fix #1199 by @BobLd in #1201
- Check for array size before slice in ColorSpaceDetailsByteConverter.Convert() by @BobLd in #1202
- Bugs/revert e11dc6b by @BobLd in #1203
New Contributors
- @chuckbeasley made their first contribution in #1116
- @migeyusu made their first contribution in #1133
- @rhuijben made their first contribution in #1186
- @PsykerUdot made their first contribution in #1198
Full Changelog: v0.1.11...v0.1.12
Assets 2
Forest Mountain (0.1.11)
Welcome to version 0.1.11. The changes in this version have mainly focused on stability. There is a breaking API change.
We have also started to run tests against a larger corpus of documents from Common Crawl allowing us to find bugs and malformed files proactively. This release is screened against 6000 additional files.
- Improvements to content and font parsing detected by fuzzing inputs.
- Improvements and resiliency for finding the
startxreflocation when parsing a file.. - Adds build and tests for Mac OS as well as retrieving system fonts on iPad (Mac Catalyst).
- Support clipping when rendering XObjects.
- Prevent malformed files leading to an out-of-memory when decompressing streams.
- Make
IGraphicsStateOperationFactoryandReflectionGraphicsStateOperationFactorypublic. - Softmask support for images.
- Performance improvements using
SpanandReadOnlyMemorywhere available. - Handle corrupt files where the stream contains comment tokens.
- Improvements to copying from existing files when using
PdfDocumentBuilder, fixes some bugs with copying fonts and dictionary tokens referenced indirectly. - Handle corrupt files with double
endstreamdefinitions. - More tolerant parsing for a number of invalid PDFs, including invalid USC2 input, CMAP formats, CFF fonts, missing font subtypes, invalid
xreftable positions, missing/FirstCharentry for font dictionaries and corrupt ASCII 85 encoded data. - Fix an issue where adding content to an existing PDF using
PdfDocumentBuildercould result in upside-down or wrongly positioned text due to global transforms in the source PDF. - New option to completely skip annotations when building a document.
- Prevent infinite loops in certain documents #1096.
- Improved performance when tokenizing numbers, this should provide a minor speed improvement.
- When adding a page from an existing PDF to a
PdfDocumentBuilderany external link annotations should be preserved.
Breaking changes
The method on PdfDocumentBuilder:
public PdfPageBuilder AddPage(PdfDocument document, int pageNumber, Func<PdfAction, PdfAction?>? copyLink)
Has been changed to wrap the copyLink parameter in an options object to support the KeepAnnotations option:
public PdfPageBuilder AddPage(PdfDocument document, int pageNumber, AddPageOptions options)
You can just set the CopyLinkFunc property in the options object if you need to access this functionality.
Auto generated change log
- Bump version to 0.1.11-alpha001 by @BobLd in #1009
- Improve Jpeg2000Helper to support J2K codec and add test by @BobLd in #1010
- Add SetStrokeDetails() and SetFillDetails() to PdfPath and tidy up ContentStreamProcessor by @BobLd in #1014
- Implement clipping in ProcessFormXObject() by @BobLd in #1015
- Fix #1017 by @lofcz in #1018
- Fix PatternColor Equals() method and fix #1016 by @BobLd in #1019
- Feature/image mask by @BobLd in #1012
- Update README.md by @BobLd in #1020
- Fix bug where FormXObject bbox needs to be normalised by @BobLd in #1021
- Add MacOS test pipeline and fix failing tests by @BobLd in #1025
- Update README.md by @BobLd in #1026
- Seal PdfSubpath class and IPathCommand implementations, fix Close.GetHashCode() and fix #1027 by @BobLd in #1029
- Fix issue #1013 by @BobLd in #1031
- Add support for MacCatalyst in SystemFontFinder by @BobLd in #1033
- Make sure the value of the ImageMask / Im token is check in ColorSpaceDetailsParser by @BobLd in #1038
- Add early support for Stencil masking, rename SoftMaskImage property into MaskImage and make sure IsInlineImage is true for InlineImage by @BobLd in #1039
- Bugfix and optimize GetStartXrefPosition by @ricflams in #1036
- Fix bug introduced in #1039 by @BobLd in #1041
- Try to repair xref offset by looking for all startxref and fix #1040 by @BobLd in #1044
- Add test to ensure #822 is fixed by @BobLd in #1045
- Handle TrueType case in CidFontFactory where the font is CFF by @BobLd in #1046
- Issues/1048 by @BobLd in #1049
- Check for infinite recursion in ObjectLocationProvider.TryGetOffset() and fix #1050 by @BobLd in #1051
- Improve IFilter memory allocation by @BobLd in #1052
- Modernise PngPredictor and refactor LzwFilter and FlateFilter to reduce memory allocation by @BobLd in #1053
- Do not throw if the Mask dictionary contains a ColorSpace key by @BobLd in #1055
- Make the Diacritics class public for use in external StreamProcessors by @BobLd in #1056
- Add extension method to get Memory from MemoryStream, attempting to do it without allocation and update CMapParser by @BobLd in #1057
- Miscellaneous minor changes by @BobLd in #1058
- Optimize internal representation of IndirectReference by @BobLd in #1059
- Skip creating IndirectReference in CrossReferenceTablePartBuilder when generationNumber is more than 65,535 by @BobLd in #1060
- Check ColorSpace token as dictionary and fix issue #1061 by @BobLd in #1063
- Make classes related to page content parsing public by @BobLd in #1065
- Prevent RunLengthFilter malicious OOM by @BobLd in #1068
- Use ReadOnlyMemory in ShowText operators and implement MoveToNextLineShowTextWithSpacing parsing by @BobLd in #1066
- Fix bug in PngFromPdfImageFactory where softmask is wrongly referenced. by @orrest in #1069
- Fix issue 926 by @EliotJones in #1072
- writer util did not follow reference links #1032 by @EliotJones in #1073
- fix #670 by ignoring duplicate endstream definitions by @EliotJones in #1075
- skip single letter final blocks by @EliotJones in #1076
- fix copying of sub-dictionary when keys collide by @EliotJones in #1077
- use correct bounding boxes for standard 14 glyphs #850 by @EliotJones in #1080
- back-calculate first char if last char and widths present by @EliotJones in #1081
- fix off-by-one and optimize brute force xref search #1078 by @EliotJones in #1079
- fall back to times-roman as standard 14 font when lenient by @EliotJones in #1085
- allow reading to continue if encountering an invalid surrogate pair by @EliotJones in #1084
- fix colorspace error when form xobject contains a transparency group by @EliotJones in #1088
- support bfrange having incorrect length in a cmap by @EliotJones in #1089
- add new action to run integration against common crawl corpus by @EliotJones in #1090
- Update run_common_crawl_tests.yml by @BobLd in #1091
- Remove decode parameter application from Stencil color space for consistency by @BobLd in #1092
- Update hack for 1bpc + DeviceGray by @BobLd in #1093
- when writing content to an existing page inverse any global transform #614 by @EliotJones in #1094
- add option to strip annotation by @EnraH in #492
- check for cycles during indirect reference resolution by @jan-sutter in #1097
- i merged a pr which broke the build, this updates the build to work by @EliotJones in #1099
- remove debug asserts causing test failures by @EliotJones in #1098
- Track IndirectReference instead of only ObjectNumber when checking for cycles during indirect reference resolution and add test by @BobLd in #1101
- move last uncovered operators to switch statement by @EliotJones in #1100
- rework numeric tokenizer hot path by @EliotJones in #1104
- make link copying more tolerant when adding page by @EliotJones in #1103
- handle additional broken pdf files in the common crawl set by @EliotJones in #1108
- update readme to avoid people using
page.Textor asking about editing docs by @EliotJones in #110...
Assets 3
v0.1.10
What's Changed
- Fix GetTextOrientation by cleanly checking if rotation is divisible by 90 and fix #913 by @BobLd in #914
- Add early version of BrowserSystemFontLister by @BobLd in #920
- Remove list from FileTrailerParser.GetStartXrefPosition() by @BobLd in #922
- Reorganise Filters and make them public by @BobLd in #925
- Support decrypting V4/R4 files with AESV2 and no Length property by @Greybird in #924
- Use pdfScanner in ReadVerticalDisplacements and fix #693 and return 0β¦ by @BobLd in #928
- Default page number to 0 in ExplicitDestination when the Dest has no page number and fix #736 by @BobLd in #930
- Move Paths, GetAnnotations() and GetOptionalContents() outside of ExperimentalAccess and mark Experimental class and reference as obsolete by @BobLd in #931
- Upgrade tests project NuGet packages by @BobLd in #932
- Optimize cross reference object offset validation by avoiding nested loop by @madelson in #935
- Revive trimming/AOT analysis by @madelson in #939
- Stop treating Warnings as Errors by @BobLd in #941
- Handle alternate Unicode name representation cXXX and fix #943 by @BobLd in #944
- Handle odd ligatures names and fix #945 by @BobLd in #946
- Update additional glyph list to latest from PDFBox by @BobLd in #948
- New GetText() option: NegativeGapAsWhitespace by @Kizaemon in #952
- Fix for IndexOutOfRangeException exception by @GrabzIt in #955
- Fix "Nightly Release" pipeline following csproj changes by @BobLd in #957
- Do not throw exception when lenient parsing in ON in CrossReferenceParser and fix #959 by @BobLd in #961
- Improve UnwrapIndexedColorSpaceBytes by @BobLd in #962
- Fix out of range exception in AnnotationProvider by @BobLd in #963
- Return a copy of the ArrayPoolBufferWriter buffer in Ascii85, AsciiHex and RunLength filters and fix #964 by @BobLd in #965
- Make ColorSpaceDetails.BaseNumberOfColorComponents public to allow for external image factories by @BobLd in #966
- Improve GlyphList by @BobLd in #967
- Properly handle ZapfDingbats font for TrueTypeSimpleFont and add tests by @BobLd in #969
- Execute RemoveStridePadding in place when possible by @BobLd in #968
- Add HexToken case in OptionalContent parsing by @simonedd in #971
- Update UglyToad.PdfPig.ConsoleRunner target framework to net8 by @BobLd in #972
- Do not throw error on Pop when stack size is 1 in lenient mode and fix #973 by @BobLd in #974
- Fix warnings about "type 'K' cannot be used as type parameter 'TKey' in the generic type or method 'Dictionary<TKey, TValue>'" by @BobLd in #976
- Refactor XObjectFactory by @BobLd in #977
- Update UnpackComponents() to account for 1bpc + DeviceGray (hack for Jbig2) by @BobLd in #978
- CcittFaxDecodeFilter: do not check for input length, invert bitmap with ref byte and fix #982 by @BobLd in #983
- Add JPX bits per component decoding by @BobLd in #986
- Issues/987 by @BobLd in #990
- Make DecodeParameterResolver class public by @BobLd in #993
- Update Microsoft and SkiaSharp NuGet packages by @BobLd in #994
- Update Microsoft NuGet packages for UglyToad.PdfPig.Package by @BobLd in #996
- Resolve image data (implementation from @kasperdaff) by @BobLd in #998
- Pass IFilterProvider to IFilter.Decode() and handle null in PdfExtensions.Resolve() by @BobLd in #999
- Improve GetExtendedGraphicsStateDictionary() and StackDictionary.TryGetValue() by @BobLd in #1004
- Better handle integer overflow in DocstrumBoundingBoxes by @BobLd in #1005
- version 0.1.10 by @BobLd in #1006
- Update run_integration_tests.yml by @BobLd in #1007
New Contributors
- @madelson made their first contribution in #935
- @Kizaemon made their first contribution in #952
- @GrabzIt made their first contribution in #955
- @simonedd made their first contribution in #971
Full Changelog: v0.1.9...v0.1.10
Assets 3
Red Wattle Hog
This will be the last release solely by the current maintainer, future releases can come from new co-maintainer(s) and you should audit your dependency upgrades on this basis.
This is the first major release in well over a year so it is not feasible to provide exhaustive release notes.
This release contains many performance improvements and bug-fixes. We also drop support for the following full framework versions:
- .NET 4.5.1
- .NET 4.5.2
- .NET 4.6
- .NET 4.6.1
If you are using full framework the newest version has additional dependencies:
- Microsoft.Bcl.HashCode (>= 1.1.1)
- System.Memory (>= 4.5.5)
The other major change is to use double instead of decimal package-wide. This should provide performance benefits and more closely matches the behavior in the official PDF specification. Where you were using decimal before you will need to switch to double.
Thanks to all the contributors!
Assets 3
Tamworth
This is a release with various bug-fixes and quality of life improvements but no new major features. It adds many of the supporting classes necessary for PDF rendering.
Breaking Changes
IColorcan now be of typePatternColor. This implementation will throw an error when callingToRGBValues(). You might have to check forIColor.ColorSpace != ColorSpace.Patternbefore calling this function- Remove
Detailssuffix fromColorSpaceDetailsproperty names AlternateColorSpaceDetailsrenamed toAlternateColorSpaceBaseColorSpaceDetailsrenamed toBaseColorSpace- Seal
IColorimplementations - Use
doubleinstead ofdecimalin color spaces and colors - Move
IColorSpaceContextfromIOperationContexttoCurrentGraphicsState - Removed
ColorSpaceproperty fromIPdfImage. UseColorSpaceDetails.Typeto get the enum value IColorSpaceContext'sCurrentStrokingColorSpaceandCurrentNonStrokingColorSpaceare now of typeColorSpaceDetails(not aColorSpaceenumanymore). UseCurrentStrokingColorSpace.TypeorCurrentNonStrokingColorSpace.Typeto get theenumvalue- Logic change to
DefaultWordExtractor, a logic bug in the existing implementation was fixed, meaning the output of the defaultpage.GetWords()may change in this version
NET 4.5
Note that this version removes support for .NET 4.5. Consumers should upgrade to .NET 4.5.1 or 4.5.2
Release notes
- Fix support for using the ZapfDingbats Standard 14 font when creating files
- Address issue with extracting CJK text from PDFs
- Fix issue with writing ShowText operations to output files when the text contained parentheses
- Error handling for Type 2 charstring parsing
- New letter properties,
TextRenderingMode,StrokeColorandFillColor - Fix for copying inline images to output files
- Enums for PDF/A-3 compliance
- Fix for library embedding PNGs with invalid information on output
- Resolve
PageSizeenum for landscape orientation documents - Fix to rotation handling. The coordinates used for letters etc. are different now for rotated and/or cropped pages
- Fix to calculated positions of annotations
- Fix to adding JPG files to output documents
- Add height to Type 3 font bounding boxes and default width/height for zero values
CreationDateandModifiedDateare now available inDocumentInformationBuilder- Images can be added to document builder without specifying placement rectangle, this will place the image at 0,0 with full width and height
PdfActionexposed byAnnotationclass.InReplyToproperty also addedGetFieldsextensions method forAcroFormtype- Fix for internal links when using existing documents with annotations with
PdfDocumentBuilder - Handle name conflicts when using
PdfDocumentBuilderwith one or more existing documents - Swaps internal uses of
RijndaelandRijndaelManagedtoAessince these were marked as obsolete
Assets 3
Gloucestershire Old Spots
Changes since 0.1.6:
- Add
page.SetRotationforPdfPageBuilder - Add
SkipMissingFontsto parsing options to ignore content where the font is not present or corrupt. Can result in content being missed during extraction but will enable partial extraction of retrievable content on page for corrupted files. - Multiple bug fixes thanks to @fnatzke
- Fix to page number order bug on extraction thanks to @grinay
- Various shape drawing utilities on
PdfPageBuilderthanks to @Jonowa - Fix to issue in
GrahamScanthanks to @BobLd - Remove stray
Debugger.Breakfrom the encryption handler - Various other bug fixes
Assets 3
Australian Yorkshire
Mainly bug fixes. There are some compatibility changes in the document layout analysis API. See here: https://github.com/UglyToad/PdfPig/wiki/Migration-to-0.1.6
- Fix transparency being applied for PDF/A-1
- Fixes to string handling
- .NET 6.0 support
- Handle null rather than missing encryption data
- Fixes bug with size of JPG files in documents created by PdfPig
- Better handling for unusual Type1 fonts
- Support for invisible/hidden text in document builder
- Fixes stack overflow when parsing page tree for some documents
- Fixes bug in some glyph bounding boxes for Type2 fonts
- Handle non-contiguous xref ranges when building a document
- Better location of version headers for non-compliant documents
Assets 3
Finnish Landrace
Changes since v0.1.4: v0.1.4...v0.1.5
Assets 3
0.1.5 Second Alpha
Some more bug-fixes:
- Fix for object streams in files which require brute force searching.
- Handle
NullTokenpresence when creating documents. - Support for PDFs where the filters are defined as indirect references (against specification).
- Support for CMYK when generating PNG images from
IPdfImage. - Support for indexed ColorSpaces where palette is stored in a string.
- Handle UTF16 strings in encrypted document dictionaries.
- Handle documents with a XMP metadata stream instead of an information dictionary.
- CCITTFaxDecode filter support.
- Tweaks to
DefaultWordExtractorto try and detect word gap size based on preceding text instead of a global gap threshold.
Note that changes to DefaultWordExtractor may change the output of calls to Page.GetWords() in this version.