| CARVIEW |
unicode-transforms: Unicode normalization
Flags
Manual Flags
| Name | Description | Default |
|---|---|---|
| dev | Developer build | Disabled |
| bench-show | Use bench-show to compare benchmarks | Disabled |
| has-icu | Use text-icu for benchmark and test comparisons | Disabled |
| has-llvm | Use llvm backend (faster) for compilation | Disabled |
| use-gauge | Use gauge instead of tasty-bench for benchmarking | Disabled |
Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info
Downloads
- unicode-transforms-0.4.0.1.tar.gz [browse] (Cabal source package)
- Package description (revised from the package)
Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.
Maintainer's Corner
For package maintainers and hackage trustees
Candidates
- No Candidates
| Versions [RSS] | 0.1.0.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.7.1, 0.3.8, 0.4.0, 0.4.0.1 (info) |
|---|---|
| Change log | Changelog.md |
| Dependencies | base (>=4.8 && <4.22), bytestring (>=0.9 && <0.13), ghc-prim (>=0.2 && <0.14), text (>=1.1.1 && <=1.2.5.0 || >=2.0 && <2.2), unicode-data (>=0.2 && <0.9) [details] |
| Tested with | ghc ==8.0.2, ghc ==8.2.2, ghc ==8.4.4, ghc ==8.6.5, ghc ==8.8.4, ghc ==8.10.7, ghc ==9.0.2, ghc ==9.2.8, ghc ==9.4.7, ghc ==9.6.3, ghc ==9.8.1 |
| License | BSD-3-Clause |
| Copyright | 2016-2017 Harendra Kumar, 2014–2015 Antonio Nikishaev |
| Author | Harendra Kumar |
| Maintainer | harendra.kumar@gmail.com |
| Revised | Revision 8 made by adithyaov at 2025-10-20T18:38:28Z |
| Category | Data, Text, Unicode |
| Home page | https://github.com/composewell/unicode-transforms |
| Bug tracker | https://github.com/composewell/unicode-transforms/issues |
| Source repo | head: git clone https://github.com/composewell/unicode-transforms |
| Uploaded | by adithyaov at 2022-03-21T11:41:13Z |
| Distributions | Arch:0.4.0.1, Debian:0.3.6, Fedora:0.4.0.1, LTSHaskell:0.4.0.1, NixOS:0.4.0.1, Stackage:0.4.0.1, openSUSE:0.4.0.1 |
| Reverse Dependencies | 11 direct, 186 indirect [details] |
| Executables | chart |
| Downloads | 40054 total (34 in the last 30 days) |
| Rating | (no votes yet) [estimated by Bayesian average] |
| Your Rating |
|
| Status | Docs available [build log] Last success reported on 2022-03-21 [all 1 reports] |
Readme for unicode-transforms-0.4.0.1
[back to package description]Unicode Transforms
Fast Unicode 14.0.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).
What is normalization?
Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.
A regular byte comparison may tell that two strings are different even though
they might be equivalent. We need to convert both the strings in a
normalized form using the Unicode
Character Database before we can
compare them for equivalence. For example:
>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True
Performance
Normalization performance comparison of this package (v0.3.7) with
the text-icu package
using the ICU C++ library
version ICU4C 65.1 on macOS. The benchmarks compare the time taken in
milliseconds to normalize files in different languages and normalization
forms using both the packages. In most cases unicode-transforms
outperforms ICU.
Benchmark unicode-transforms(ms) ICU(ms) % Diff
--------------- ---------------------- ------- --------
NFKD/Korean 7.78 37.10 +376.87
NFD/Korean 7.86 37.06 +371.50
NFKD/Vietnamese 6.85 12.48 +82.20
NFKD/Deutsch 2.17 3.55 +63.30
NFKD/English 1.71 2.78 +62.30
NFKC/Korean 4.77 7.65 +60.28
NFD/Deutsch 2.24 3.53 +57.41
NFD/English 1.76 2.77 +57.32
NFC/Vietnamese 10.66 16.63 +56.00
NFKC/Vietnamese 10.95 16.58 +51.43
NFD/Devanagari 6.48 8.68 +34.10
NFC/Devanagari 6.77 8.49 +25.48
NFD/AllChars 6.18 7.41 +19.91
NFD/Japanese 7.80 9.20 +17.99
NFKC/Devanagari 7.33 8.48 +15.74
NFKD/Japanese 8.71 10.05 +15.39
NFD/Vietnamese 5.94 6.83 +14.99
NFKD/Devanagari 7.59 8.68 +14.27
NFKD/AllChars 9.80 10.66 +8.82
NFKC/Deutsch 3.21 3.18 -0.72
NFC/Korean 4.62 4.38 -5.35
NFKC/English 2.21 2.06 -6.88
NFC/English 2.19 2.04 -7.21
NFKC/AllChars 14.67 9.75 -50.51
NFC/Deutsch 3.02 1.95 -54.39
NFKC/Japanese 12.46 5.42 -129.93
NFC/AllChars 9.72 3.58 -171.63
NFC/Japanese 11.90 3.04 -292.04
Talks
Contributing
Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.