豪鬼メモ

一瞬千撃

Verb/Adjective Conjugation in Kindle Japanese-English Dictionary

I introduced a custom Japanese-English dictionary for Kindle recently. This post explains its enhancement for conjugation of verb and adjective words in Japanese. With the feature, you can look up "走る" by selecting "走っ" in "走った". Please read the previous post to install the dictionary data.
f:id:fridaynight:20220114004738j:plain


Every language learner hates irregularity of languages. Every English learner including me hates irregular verbs in English. Likewise, every Japanese learner hates kanji (Chinese character) representation of words. Why "さす" is represented as "指す", "差す", "挿す", "刺す", "注す", etc? Japanese writing system was developed by applying Japanese colloquial language called 和語 into Chinese text called 漢文. Although "指" (to point), "差" (to shift), "挿" (to insert), "刺" (to stab), "注" (to pour) are different Chinese words, our ancestors decided to read them in common as "さす" because their common meaning is covered by the Japanese colloquial word which is pronounced so. This dualism of Japanese/Chinese aspects complexes Japanese writing system. Not only you but also every Japanese pupil suffers from the complexity.

Applying the colloquial language introduced another hardship: conjugation aka. 活用 (katsuyou) of verbs, adjectives, and auxiliary verbs. For example, "さす" (指す etc too) becomes "ささ" if "ない" and "ぬ" etc follows. It becomes "さし" if "た" and "て" etc follows. It becomes "させ" if "ば" follows. Each word has six conjugated forms depending on the following word. However, we can see some patterns in conjugation. Verbs are categorized mainly into three conjugation patterns. Adjective has two conjugation patterns. Each auxiliary verb has its own pattern. Although irregular patterns (as in "する" and "来る") are limited, you have to memorize which class each word belongs to, one by one.

Conjugation gives us serious inconvenience when we use Japanese (or Japanese-English) dictionaries. Each title word in the dictionary is represented in the base form called 終止形. If you want to look up a conjugated form of a word in a sentence, you have to normalize it to the base form. For example, in "素早く走った", you have to recognize each word as "素早い", "走る", and "た" to look them up. If you use paper dictionaries, you as a human can do the conjugation normalization yourself. However, if you use pop-up dictionaries, the software have to do it in place of you. Although hardware Kindle products like plain Kindle, Kindle Paparwhite, Kindle Oasis, etc support the conjugation normalization in a basic manner, the ability is limited. Moreover, Kindle applications on Mac/PC/Android/iOS don't support conjugation normalization.

Fortunately, the Kindle dictionary format supports inflection. That is, each title word like "go" can have its inflected forms like "goes", "going", "went", and "gone". We can utilize this feature for Japanese conjugation. For example, "走る" can have "走ら", "走り", "走っ", "走れ", "走ろ" as inflected forms. In other words, the dictionary editor is responsible to generate such inflected forms. We can utilize Mecab morphological analyzer and its dictionary for that purpose.

I applied Mecab to each title word in Japanese and attach conjugated forms if the part-of-speech is verb or adjective. To obtain conjugated words, I read the dictionary data directly.

走る,772,772,6731,動詞,自立,*,*,五段・ラ行,基本形,走る,ハシル,ハシル
走ら,780,780,6512,動詞,自立,*,*,五段・ラ行,未然形,走る,ハシラ,ハシラ
走ん,782,782,6732,動詞,自立,*,*,五段・ラ行,未然特殊,走る,ハシン,ハシン
走ろ,778,778,6732,動詞,自立,*,*,五段・ラ行,未然ウ接続,走る,ハシロ,ハシロ
走り,788,788,6532,動詞,自立,*,*,五段・ラ行,連用形,走る,ハシリ,ハシリ
走っ,786,786,6733,動詞,自立,*,*,五段・ラ行,連用タ接続,走る,ハシッ,ハシッ
走れ,768,768,6382,動詞,自立,*,*,五段・ラ行,仮定形,走る,ハシレ,ハシレ
走れ,784,784,6382,動詞,自立,*,*,五段・ラ行,命令e,走る,ハシレ,ハシレ

Mecab's dictionary also contains compound words like "血走る", "先走る", and "突っ走る". However, as compound words are virtually unlimited, all words are not covered. For example, whereas "見落とす" is covered, "聞き落とす" is not covered. In such cases, I pick up the last part like "落とす" to get its conjugated forms like "落とさ" and "落とし" and synthesize the whole conjugated forms like "聞き落とさ" and "聞き落とし".

As a result, I could register conjugated forms of almost all verbs and adjectives. So, you can look up "走る" by selecting "走ら" in "走らない" and "走り" in "走りたい" etc. This is a conspicuous advantage over the default dictionary.