豪鬼メモ

一瞬千撃

Open-source Japanese-English Dictionary for Kindle

I made a Japanese-English dictionary for Kindle from WordNet, Wiktionary, and EDict2. You can download the data for free. This post describes how I made it. Because the main users are supposed to be English speakers, I write this article in English.
f:id:fridaynight:20211231014910j:plain


Please download this dictionary file in MOBI format and install it on your Kindle device. Put the MOBI file in "documents/dictionaries" directory. Then, go to the menu -> "Language & Dictionaries" -> "Dictionaries" -> "Japanese" and select "Union Japanese-English Dictionary". After that, you can look up any Japanese word by selecting it in the sentence.

If you want to use this dictionary on Kindle for Mac/PC/Android/iOS, you have to replace a default dictionary with it. First, install the progressive Japanese-English dictionary in a normal way. Download this MOBI file, whose ASIN has been modified to "B00DQB1G3K" of the default dictionary. Then, find the data directory containing of Kindle. On Mac, it is "~/Library/Application Support/Kindle/My Kindle Content". On Android, it is "Android/data/com.amazon.kindle/files". In the data directory, the directory "B00DQB1G3K_EBOK" exists and it contains "B00DQB1G3K_EBOK.prc" or ""B00DQB1G3K_EBOK.azw". Replace the file with the downloaded custom MOBI file. Finally, restart the Kindle app. The content of the default dictionary is replaced by the custom dictionary. If you want to undo it to the default content, you can simply remove the "B00DQB1G3K_EBOK" directory. After restart of the Kindle app, the default dictionary is downloaded automatically.

I had already made an English-Japanese dictionary from WordNet, Wiktionary, and EDict2 for myself to learn English by reading English books on Kindle. There's also an online search system for both English-Japanese and Japanese-English directions. I also made an English-Japanese dictionary data for Kindle.

The structure of the English-Japanese dictionary is simple. Each entry has a title word in English and its possible translations or definitions in Japanese. This is a simplified example.

bank : 銀行, バンク, 土手, 塚
mound : 土手, 塚

The word "bank" has two meanings in general: monetary institutions and uplifted terrains. Although WordNet helps us distinguish meanings, we don't care it here. Anyway, by inverting this structure, we can make a Japanese-English dictionary, where title words are in Japanese and descriptions are translations.

銀行 : bank
バンク : bank
土手 : bank, mound
塚 : bank, mound

Kindle bundles a commercial Japanese-English dictionary, where English descriptions of each word are in detail and organized well. I made yet another Japanese-English dictionary just for fun. However, the new dictionary has some advantages too. First, the coverage is high thanks to combination of several data sources. Second, the description of words is concise (mere translated phrases) so that you can grasp apporximate meanings quickly. Finally, as the dictionary and the data sources are open-sourced, anyone can customize and enhance them.

To make the dictionary useful in actual usage on Kindle, I had to do more than inverting the structure. The most important thing is to have a decent coverage of Japanese words on condition that the file size doesn't exceed the limit of a Kindle book. Whereas the original union dictionary has up to one million Japanese words, I had to select less than 25 thousand words. I scored all words with various factors such as cooccurrence probability in parallel corpora and occurrence probability in monolingual corpora. As a result, basic words like "走る" (run, sprint) are certainly covered and many practical idioms like "水を差す" (hamper, rain on) are also covered.

A Japanese-English dictionary should provide word definitions or explanations of Japanese words in English, as well as English translations. English translations can be easily extracted just by inverting the structure of the English-Japanese dictionary. Meanwhile, extracting definitions requires more complex procedures. I used the structure of WordNet and Japanese WordNet. Let's say, I extract "water" as a translation of "水". I scan all WordNet synsets where "water" belongs to and where "水" is attached as a translation. If the conditions are satisfied, I pick up the glossary text of the synset. I take synonyms of the synset too. Translations which don't meet the conditions are simply listed as-is. As a result, the entry for "水" has the following content.

水  (みず)
water, H2O
- binary compound that occurs at room temperature as a clear colorless odorless
  tasteless liquid; freezes into ice below 0 degrees centigrade and boils above
  100 degrees centigrade; widely used as a solvent
water, water supply, water system
- a facility that provides a source of water
water
- once thought to be one of four elements composing the universe (Empedocles)
eau
waterness
fluid
liquid

Moreover, translations are sorted in the descending order of cooccurrence rate in bilingual corpora so that the most likely English translation comes first. This feature is very important for people who learn Japanese by reading many Japanese books on Kindle. When we read books written in a foreign language you are learning, the pop-up dictionary should provide concise translations in the first view, rather than detail usages and grammatical details.

Most title words in Japanese have its pronunciation in hiragana. I use Mecab morphological analyzer to convert kanji characters into katakana and then convert katakana into hiragana with a map. Although the accuracy of Mecab is not perfect, I guess more than 95% of title words have proper pronunciations.

Kindle dictionary supports lookups with inflected words. The feature is useful to support any kind of variant representation of words. I use it to increase coverage of words by registering the following variants as inflections of each word. I use Mecab morphological analyzer to obtain part-of-speech.

  • hiragana pronunciation: 食べる -> たべる
  • stemming a preceding particle: を投げる -> 投げる
  • stemming a succeeding particle: 海外に -> 海外
  • stemming a succeeding auxiliary verb of "sahen" verbs: 運動する -> 運動
  • stemming a suffix of adjective verbs: 華麗な -> 華麗

Kindle dictionary supports lookups with conjugated words in Japanese, like "走っ(た)" -> "走る". So, there's no need to register all conjugated forms of verbs, adjectives etc.


In conclusion, I hope this dictionary is helpful to English speakers who are learning Japanese by reading books on Kindle. I'm learning English in the opposite direction. Through my first-hand experience, I believe that the combination of WordNet, Wiktionary, and Edict2 gives better coverage than many commercial dictionaries while the quality is good enough.

BTW, I guess that many people learn Japanese because they are attracted to some Japanese pop culture and sub culture. I myself enjoy reading English editions of Japanese "light novel" titles. If you are like me, try their original Japanese editions. "青春ブタ野郎", "俺の青春ラブコメはまちがっている", and "幼女戦記" series are my favorites. Anyway, reading excellent titles is the best way to learn Japanese. If you are not very good at reading Japanese, try their English editions first and then work on Japanese editions.