You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
utf8proc is a small, clean C
library that provides Unicode normalization, case-folding, and other
operations for data in the UTF-8
encoding. It was initially
developed by Jan
Behrens and the rest of the Public Software
Group, who deserve nearly all
of the credit for this package. With the blessing of the Public
Software Group, the Julia developers have
taken over development of utf8proc, since the original developers have
moved to other projects.
utf8proc is used for basic Unicode
support in the Julia language, and the Julia
developers became involved because they wanted to add Unicode 7 support and other features; it is now regularly updated to keep up with recent Unicode releases.
There are also utf8proc wrappers for Ruby and Rust (github). (The original utf8proc package also included PostgreSQL bindings.)
The utf8proc package is licensed under the
free/open-source MIT "expat"
license (plus certain Unicode
data governed by the similarly permissive Unicode data
license); please see
the included LICENSE.md file for more detailed information.
Quick Start
Typical users should download a utf8proc release rather than cloning directly from github.
For compilation of the C library, run make. You can also install the library and header file with make install (by default into /usr/local/lib and /usr/local/bin, but this can be changed by make prefix=/some/dir). make check runs some tests, and make clean deletes all of the generated files.
Alternatively, you can compile with cmake, e.g. by
The included Makefile supports GNU/Linux flavors and MacOS with gcc-like compilers; Windows users will typically use cmake.
For other Unix-like systems and other compilers, you may need to pass modified settings to make in order to use the correct compilation flags for building shared libraries on your system.
For HP-UX with HP's aCC compiler and GNU Make (installed as gmake), you can compile with
To run gmake install you will need GNU coreutils for the install command, and you may want to pass prefix=/opt libdir=/opt/lib/hpux32 or similar to change the installation location.
Using with CMake
A CMake Config-file package is provided. To use utf8proc in a CMake project:
The C library is found in this directory after successful compilation
and is named libutf8proc.a (for the static library) and
libutf8proc.so (for the dynamic library).
The Unicode version supported is 17.0.0.
For Unicode normalizations, the following options are used:
Normalization Form C: STABLE, COMPOSE
Normalization Form D: STABLE, DECOMPOSE
Normalization Form KC: STABLE, COMPOSE, COMPAT
Normalization Form KD: STABLE, DECOMPOSE, COMPAT
C Library
The documentation for the C library is found in the utf8proc.h header file.
utf8proc_map is function you will most likely be using for mapping UTF-8
strings, unless you want to allocate memory yourself.