cutlet

Cutlet is a tool to convert Japanese to romaji. Check out the interactive demo! Also see the docs and the original blog post.

issueを英語で書く必要はありません。

Features:

support for Modified Hepburn, Kunreisiki, Nihonsiki systems
custom overrides for individual mappings
custom overrides for specific words
built in exceptions list (Tokyo, Osaka, etc.)
uses foreign spelling when available in UniDic
proper nouns are capitalized
slug mode for url generation

Things not supported:

traditional Hepburn n-to-m: Shimbashi
macrons or circumflexes: Tōkyō, Tôkyô
passport Hepburn: Satoh (but you can use an exception)
hyphenating words
Traditional Hepburn in general is not supported

Internally, cutlet uses fugashi, so you can use the same dictionary you use for normal tokenization.

Installation

Cutlet can be installed through pip as usual.

pip install cutlet

Note that if you don't have a MeCab dictionary installed you'll also have to install one. If you're just getting started unidic-lite is a good choice.

pip install unidic-lite

Usage

A command-line script is included for quick testing. Just use cutlet and each line of stdin will be treated as a sentence. You can specify the system to use (hepburn, kunrei, nippon, or nihon) as the first argument.

$ cutlet
ローマ字変換プログラム作ってみた。
Roma ji henkan program tsukutte mita.

In code:

import cutlet
katsu = cutlet.Cutlet()
katsu.romaji("カツカレーは美味しい")
# => 'Cutlet curry wa oishii'
# you can print a slug suitable for urls
katsu.slug("カツカレーは美味しい")
# => 'cutlet-curry-wa-oishii'
# You can disable using foreign spelling too
katsu.use_foreign_spelling = False
katsu.romaji("カツカレーは美味しい")
# => 'Katsu karee wa oishii'
# kunreisiki, nihonsiki work too
katu = cutlet.Cutlet('kunrei')
katu.romaji("富士山")
# => 'Huzi yama'
# comparison
nkatu = cutlet.Cutlet('nihon')
sent = "彼女は王への手紙を読み上げた。"
katsu.romaji(sent)
# => 'Kanojo wa ou e no tegami wo yomiageta.'
katu.romaji(sent)
# => 'Kanozyo wa ou e no tegami o yomiageta.'
nkatu.romaji(sent)
# => 'Kanozyo ha ou he no tegami wo yomiageta.'

Alternatives

kakasi: Historically important, but not updated since 2014.
pykakasi: self contained, it does segmentation on its own and uses its own dictionary.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github		.github
cutlet		cutlet
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
cutlet.png		cutlet.png
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

cutlet

Installation

Usage

Alternatives

About

Uh oh!

Releases 5

Sponsor this project

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages

Uh oh!

License

polm/cutlet

Folders and files

Latest commit

History

Repository files navigation

cutlet

Installation

Usage

Alternatives

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages