You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
passport Hepburn: Satoh (but you can use an exception)
hyphenating words
Traditional Hepburn in general is not supported
Internally, cutlet uses fugashi, so you can
use the same dictionary you use for normal tokenization.
Installation
Cutlet can be installed through pip as usual.
pip install cutlet
Note that if you don't have a MeCab dictionary installed you'll also have to
install one. If you're just getting started
unidic-lite is a good choice.
pip install unidic-lite
Usage
A command-line script is included for quick testing. Just use cutlet and each
line of stdin will be treated as a sentence. You can specify the system to use
(hepburn, kunrei, nippon, or nihon) as the first argument.
$ cutlet
ローマ字変換プログラム作ってみた。
Roma ji henkan program tsukutte mita.
In code:
importcutletkatsu=cutlet.Cutlet()
katsu.romaji("カツカレーは美味しい")
# => 'Cutlet curry wa oishii'# you can print a slug suitable for urlskatsu.slug("カツカレーは美味しい")
# => 'cutlet-curry-wa-oishii'# You can disable using foreign spelling tookatsu.use_foreign_spelling=Falsekatsu.romaji("カツカレーは美味しい")
# => 'Katsu karee wa oishii'# kunreisiki, nihonsiki work tookatu=cutlet.Cutlet('kunrei')
katu.romaji("富士山")
# => 'Huzi yama'# comparisonnkatu=cutlet.Cutlet('nihon')
sent="彼女は王への手紙を読み上げた。"katsu.romaji(sent)
# => 'Kanojo wa ou e no tegami wo yomiageta.'katu.romaji(sent)
# => 'Kanozyo wa ou e no tegami o yomiageta.'nkatu.romaji(sent)
# => 'Kanozyo ha ou he no tegami wo yomiageta.'
Alternatives
kakasi: Historically important, but not updated since 2014.
pykakasi: self contained, it does segmentation on its own and uses its own dictionary.