Sunday, December 04, 2005

The Unbearable Lightness of Being...

...made to use a Japanese keyboard at work.

Clicky clicky for the pertainent technical information; or maybe you would prefer to open the fullsize version of the image in another tab (you are using a decent web browser to read this blog, right?) for reference while you read this blentry. For more information than anyone could possibly need on keyboard layouts, it is Wikipedia to the rescue!

The differences between a standard US keyboard--by which I mean the 105-key model with the Windows keys that even we non-Windows users have thanks to the Software Company That Shall Remain Nameless (and by this I mean Microsoft)--and a 106-key Japanese keyboard seem slight at first glance. They are both based on the almost-ubiquitous QWERTY layout, and most of the keys seem to be in the right place. In fact, the most obvious difference is that each alphanumeric or symbolic key has an extra set of overlays (meaning, the letter, number, or symbol printed on the key itself), so alphabetic keys have two letters on them: the normal Latin one (e.g. 'A') and a hiragana character (e.g. on the aforementioned 'A' key, ら--ra ち--chi; thanks to Tim Meggs for bringing this mistake to my attention!); while numeric and symbolic keys (e.g. '1', '2', '[', ';', etc.) have four characters printed on them: the normal numeral or symbol, the symbol that is produced when the key is pressed while holding Shift, and the same deal for the Japanese keymap.

This is where things start to get complicated. You see, the Japanese keyboard can operate in four different modes: Latin, Hiragana, Katakana, and henkan (変換, literally "conversion"). The first three are straight-forward: you hit a key, and the symbol that is printed on the physical key (which I am calling its "overlay") appears on the screen (well, almost; in Katakana mode, the katakana corresponding to the hiragana character on the overlay appears on the screen, since there is a one-to-one mapping between the two syllabaries). In Henkan mode, WYTIWYG (What You Type Is What You Get--just as good as WYSIWYG, right?), but when you hit the space bar, the Input Method Editor (hereafter: IME) sends the string of hiragana to the henkan dictionary server, and the dictionary server replies with a list of conversions, which are typically iterated through by pressing Space again and again, and finally pressing enter to accept one of the conversions.

Perhaps an example is in order. I want to type "私の名前はグラバー・ジョシュです。" ("watashi no namae wa Gurbaa Joshu desu.", "My name is Josh Glover."), so here is how I go about it (assuming I am already in henkan mode--more on this in a second):

  1. I type "watashi" (i.e. I press the 'w' key, followed by the 'a' key, followed by the 't' key, and so on), and わたし appears on my screen (this is how watashi is spelled in hiragana--see the Wikipedia article on hiragana for an explanation). Now, I hit Shift, and わたし is replaced by 私 (the same word, only it is now represented as kanji--Chinese characters used in written Japanese) on my screen. As this is just what I want, I press Enter to accept the conversion.

  2. I type "no", producing の, which is correct, so I just hit Enter (no is the possessive particle--I want to say my name).
  3. Now for another bit of kanji: I type "namae" (name) and see なまえ. I hit Shift, and immediately am rewarded by the conversion that I want: 名前, so I hit Enter to move on.

  4. Now I type "ha" (this is the subject marker in Japanese, written with the hiragana は (ha) but pronounced as "wa"), producing は, then immediately hit Enter, as I want the character to remain in hiragana (the whys and wherefores are beyond the scope of this entry, but again, Wikipedia has a pretty good explanation).

  5. Now, I type "guraba-", hit Space, and get 愚ラバー (note to Unix geeks: I am using the SCIM IME with the Anthy conversion engine), which is actually not what I want. Holding down the Shift key, I press the right arrow key three times so that the selection expands to encompass all four characters (thus instructing the IME to only consider conversions for the entire string). This causes the incorrect 愚ラバー to change to the almost correct ぐらばー. Now, I press Space one final time, and get the conversion I wanted: グラバー (the katakana version of "guraba-"; I must use katakana because this is my last name, Glover, as it is rendered in the Japanese sound system, and that is one of the major uses of katakana).

  6. I type "ten" (which is how this little dot, ・, is read--it is used to separate last name from first, just as we Westerners would use a comma: Glover, Josh), producing てん on my screen. Hitting Space gives me what I want: ・. I hit enter to accept the conversion.

  7. I type "joshu", giving me じょしゅ. I hit Space and get ジョシュ, which is what I want (again, the katakana version of my name--my first name, this time), so I hit Enter.

  8. Finally, I type "desu." (the so-called "state-of-being verb" of Japanese, which has no literal translation, but is required for simple polite form--also known as desu / masu form--which I am using in my example), getting です。, which is spot-on, so I hit Enter.

Now I'm done! Note that while this took awhile to explain, it took me about 10 seconds to type, and I am not even a particularly fast henkan typist. So it is a reasonably good system for entering Japanese on a computer. In fact, most Japanese people prefer using Henkan mode (i.e. typing in Latin, transliterated from their native writing system) to Hiragana mode!

OK, now that you know something about Japanese input, you know why the keyboard layout is so similar to a US layout. But where does it differ?

Here's the low-down, starting at the top left of the keyboard (you may want to open the fullsize version of the Japanese keyboard layout image in another tab for comparison to your--presumably US105--keyboard whilst you peruse this list):

  • The backtick / tilde key is replaced by a 半角 / 全角 (hankaku / zenkaku, half-width / full-width) key, which is used by most Input Methods (hereafter, IMEs) to return to Latin mode from Henkan mode.

  • Shift-2 changes from at-symbol to double-quote.

  • Shift-6 changes from caret to ampersand.

  • Shift-7 changes from ampersand to single-quote.

  • Shift-8 changes from star to left paren (left bracket).

  • Shift-9 changes from left paren (left bracket) to right paren (right bracket).

  • Shift-0 changes from right paren (right bracket) to tilde.

  • Shift-minus ('-') changes from underscore to equals.

  • Equals / plus changes completely; it becomes caret / tilde (or underscore--does this actually vary from keyboard to keyboard?).

  • The Backspace key is basically chopped in half, with the left half becoming a new key: yen sign / vertical bar ('|')--"pipe" to us Unix folks.

  • The "QWERTYUIOP" keys remain untouched, but the left square bracket / left curly brace key becomes at-sign ('@') / back-tick ('`').

  • Right square bracket / right curly brace becomes left square bracket / left curly brace.

  • Backslash / vertical bar (pipe) is assimilated into the Enter key.

  • Next row: "ASDFGHJKL" remain the same, then Shift-semicolon becomes plus.

  • Single quote / double quote becomes colon / asterick.

  • A chunk of the Enter key is snagged to become a new right square bracket / right curly brace key.

  • Next row: "ZXCVBNM", comma / less than, period (full stop) / greater than, and forward slash ('/') / question mark as on a US layout; then a new key, with a bit of right Shift stolen to accomodate it: backslash / underscore.

  • On the bottom row, the space bar is ravaged, with three keys' worth of space stolen from it, one on the left side and two on the right side. These keys are 無変換 (mu-henkan, "no conversion") to the left and 前候補 / 変換 / (次候補) (zen-kouho, "first candidate"; henkan, conversion; ji-kouho, "next candidate") and カタカナ / ひらがな (katakana / hiragana) to the right. These keys will be explained in a moment.

Of these differences, the ones that tend to matter most to people used to US (and most Western) keyboard layouts are: the shortened space bar, the reconfigured Enter key area, and the migration of the single and double quote keys.

Now, what about these strange keys with Japanese names? In Hiragana and Katakana modes, these keys do just what their names imply:

  • 半角 / 全角 (hankaku / zenkaku) toggles between half- and full-width characters (second paragraph of the "Character encodings" section of the link), which probably matters only to professional typesetters.

  • 無変換 (mu-henkan) deactivates the conversion engine.

  • 前候補 / 変換 / (次候補) (zen-kouho / henkan / ji-kouho) activates conversion mode, requests a conversion candidate, and iterates through possible conversions.

  • カタカナ / ひらがな (katakana / hiragana) toggle between Katakana and Hiragana modes.

In Latin mode (which is, remember, the mode actually used by the majority of people doing Japanese input on a computer--Japanese and foreign alike), only the 半角 / 全角 (hankaku / zenkaku) and 前候補 / 変換 / (次候補) (zen-kouho / henkan / ji-kouho) keys matter. In Windows, using Microsoft's (excellent) Japanese IMEs, the 前候補 / 変換 / (次候補) starts conversion mode (i.e. your input stops showing up as literal Latin characters and becomes hiragana, ready to be converted using the Space key, as in my above example). Most newer Unix IMEs (or the distributions that package these IMEs) tend to bind this key to do the same thing, though the popular kinput2 IME uses Shift-Space. The 半角 / 全角 (hankaku / zenkaku) key then, oddly enough, is used to disable conversion mode (無変換--mu-henkan--probably does the same thing. Again, Unix users can probably use this key to accomplish the same thing, unless you are using kinput2, in which case you simply hit Shift-Space again.

And that, ladies and gents, is an introduction to Japanese input, cleverly disguised as a simple explanation of how US and Japanese keyboards differ. You have Jim Prior to thank, as he was the poor soul who foolishly asked me for an explanation. :)

For further reading on these topics, see:

No comments: