BioJapan.de Armin E-Dictionary Links

Front Technical Download FAQ Examples Reviews

Zaurus Otaku Kurabu

Z O K

- Zaurus PDA Dictionary Extension: The World in your Pocket -

This page is no more updated.
The Zaurus dictionary project moved to

http://zaurus.biojapan.de

Please update your bookmarks and inform the webmaster of the site you came from.


Synopsis

We are putting together the one and only truely useful Japanese-German electronic dictionary. Other languages can be handled in the same way.
A useful electronic dictionary tool for foreigners in Japan should have the following functions:

  1. Chinese character entry through handwriting recognition
  2. A search function to find kanji script, pronunciation and the translation of a given word
  3. True pocket size
  4. Sufficient vocabulary to read any Japanese newspaper
  5. Flexibility to add additional languages and specialized vocabulary
The SHARP Zaurus is the only device that fulfills points 1-3. To deal with points 4 and 5, we recently succeeded to install additional dictionary files.


Notes


Zock-Zock!

ZOK, short for Zaurus Otaku Kurabu, is a small group of (actually not so otaku) expatriates in Tokyo who did this work: Kurt Fischer is the one who actually cut most of the knots, Uli Plate had plenty of good ideas, while I struggled to keep up with the news somehow, wrote some Perl scipts and this page. ZOK also constitutes a mailing list.

SHARP Zaurus

Sharp has been building its "Zaurus" PDA since 1995 or so, for the Japanese market only. Initially, the Zaurus became popular as an electronic organizer, featuring scheduler, address book, note pad and a small but useful set of dictionaries: Japanese-Japanese, Kanji-Japanese, Japanese-English and English-Japanese. The key feature to the Zaurus' popularity was presumably its direct kanji entry utility with an automatic handwriting recognition that actually works.

It must have been with the Igeti-series around 1997 that Sharp adopted the compact flash memory card (CF) standard as the Zaurus' extension port. With the growing popularity of digital cameras, CF cards have become a cheap mass product and make it possible to store and handle an amazing amount of data on the Zaurus - such as a "bookshelf full" of dictionaries.

While the electronic organizer functions havn't changed much, the new development goes towards multimedia gadgets. Recent models can not only handle email and web browsing but also play and record music (MP3) and video (MPEG4) straight from your TV or hifi analog output. Java is the Zaurus' new programming platform, which should make software portation and communication with PCs and other machines increasingly easy. (Note, however, that the software we use still comes from the pre-Java era and runs only on the Zaurus OS.)

From newspaper reports, I gather that Sharp will start selling the Zaurus in the US and Europe starting in March 2002. The international model will run under the operation system LINUX, while the Japanese version will retain Sharp's proprietary OS. The international version, so I imagine, would hardly include the handwriting recognition nor run with our software. Think of it as a totally different machine. Unfortunately, there is NOTHING in terms of English information about the Japanese models of the Zaurus: No English handbooks, only two or three minor application programs in English.

More information on Zaurus models

Compact Flash

The larger a flash card you have, the better: The software will handle up to 12add-on dictionaries. We have never had a problem that the Zaurus could not address a memory card properly, eihter for its size or for any other reasons. We don't know what is the upper size limit. The MI-110 works with 128MB CF cards, the MI-P10, E1, L1 and E21 manage 256MB for sure. If you have larger memory cards, please stick them in and let us know of your success - my guess is that they will work. The word is that IBM's Microdrive devices can NOT be used. Should you try it out, let us know!. We have never tried to use the new and expensive high-speed CF cards.

We first formatted the CF by plugging it into the Zaurus. Then, we plugged the CF into the PCMCIA port of a PC under Japanese Windows98, using the appropriate adapter. There should be a "__zaurus" subdirectory with lots of small files. Deposit any files in this directory and then plug it back into the Zaurus.
Note: The CF card also works with Macintosh, Linux and digital cameras. It's a really neat tool exchange files between any of those machines. Because I don't have my own digital camera, on parties, I often borrow a camera of a friend. I plug in the dictionary card from the Zaurus, take a few pictures and can readily see them on the Zaurus (in b/w on my model, the software takes some time to open large pictures). At home, then, I plug the CF into my computer to upload the pictues to the web or to my hard disk. (By the way, no digital camera or other device has ever had a problem in working with any noname 256MB CF card I plugged in.)

Japanese-incompatible PCs

We have not tried, but technically you should have no problem. Click here for a handy tool by Silas Brown to display Japanese code as graphics on your browser - this will enable you to follow the Japanese links from this page.

Zaurus Tuning

Vector has quite a lot of good free software for the Zaurus to download. ZPDVIEW, the dictionary tool we use, is writen by Ogasawara Hiroyuki (小笠原博之), and can be downloaded (scroll down about 3/4 of the page to ZPDVIEW). Deposit the downloaded file WOBP144.ZAC in the _Zaurus directory on the CF.

Slide the CF into the Zaurus. Go to "MORE Soft", click "Card" to find a file PDIC141.ZAC. With the "Tenkai" botton (top right) you can unpack and install the dictionary search engine. This makes the program "ZPDVIEW v1.41" appear in MORE Soft. Click it and jikko (top right) to run it. What you can't see in MORE Soft is that the card's _Zaurus directory also contains instructions, which you can read back on your PC, for example. Find a longer explanation in my FAQ list.

Compared to the Zaurus native dictionary, ZPDVIEW lacks the easy-to-read large display, but it does have a history function. You will probably end up using both dictionaries in combination. With good dictionary files that show you the kanji, hiragana and translation at one sight, though, I have found myself sticking to ZPDVIEW more and more.

Now that you got the tool, what remains to be done is to put the actual vocabulary files on to the CF. The files must be in PDIC format and renamed to WOBP0000.DAT to WOBP0011.DAT for the max. 12 dictionaries you can install. In a file WOBPMENU.TXT, list the names of the dictionaries you used, one per line. As an example, the contents of my WOBPMENU.TXT file is:

EDICT
和独JT
漢字DIC
広辞苑
名前
KanjiABC
giongo
EDICT-2
	[tab character]

Have a look at the Screen shots, too.

Software Overview

Before I get into the details, be sure to understand the basic concept of where we are going.

Dictionary data needs to be converted in a number of steps. The data formats are:
dictionary -->(1)--> plain text data -->(2)--> properly formatted text data -->(3)--> PDIC (Zaurus usable) format data

The conversion software used is:
(1) dictionary-specific, for example Filemaker (WadokuJT) or DDWIN (Epwing CD-ROM).
(2) This step needs hands-on programming or tweaking. So I wrote a PERL script, which you are welcome to download and use. It is also possible trick popular software packages such as MS Word and Excel into doing many of the conversions, but the result is a compromise.
(3) PDIC

I can't make the dictionary files ready for use downloadable here for two reasons: Firstly, it would conflict with the copyright of some of the data, and even in cases where not, it is always cleaner to get data from its author and owner directly. Secondly, I do not own enough disk space on this web server. What I can do instead is let you download one dictionary, Jim Breen's KanjiDIC, in all states of processing for demonstration: Here is the original file, here my compiled files. Unpack them and copy the two files starting with "w" to the CF _zaurus directory. Access the kanji dictionary form ZPDVIEW on the Zaurus. The remaining file, kanjidic.txt, ist the intermediate oneline text file.

Dictionary Files

The task is now to convert your dictionary files to PDIC format. Unfortunately, PDIC is not a simple text format but rather a special dictionary format that includes a search-index. I don't know how to generate PDIC from scratch, so you have no choice but using the proper software. Just download and install it on your PC.

Once you got PDIC running, go to Tools - Jisho no Henkan to import your dictionary files and reformat them properly. PDIC will also let you combine several dictionary files into one.

So, we need to convert dictionary files into a format that can be imported to PDIC. The two input file formats for PDIC that we used are "*.csv", for "comma separated ???" and "one-line text" (*.txt). One-line text allows line breaks and thus better nicer display, while CSV, for some reason, turns out to be smoother to scroll through by ZPDVIEW. I first used CSV, as described below. Later on, I programmed the above-mentioned PERL script which can generate one-line format from a variety of dicitonaries. This page uses CSV format as an example.

CSV format contains one entry word per line. Its definition in the PDIC case is:

	"field1","field2","field3",4,5,6,"field7"
	"English","Japanese","Example of use",x,y,z,"Pronunciation"
where x is a number indicating the level of difficulty of the word, while y and z set the "dark" and "practice" flags, respectively, if not set to zero (=default). Fortunately, we do not have to worry about any but the first three entries because the Zaurus will not display them anyway - they are only meaningful for the PC version of PDIC. Important is that the first field contains the keyword which the ZPDIC search engine will search for. Transcript and translation go into field 2 and 3. The Zaurus display won't care much whether you put everythinginto field 2 or use field 3 as well - as long as you are consistent and do it equally for all entries. However, how you use fields 2 and 3 does make a difference when merging several dictionaries.
Here is a portion of the .csv file my machine exported from Wadoku JT (see below)
	"きちょう [3]","帰朝","Heimkehr nach Japan.",""
	"きちょう [4]","記帳","Eintragung; Registrierung.",""
	"きちょう [5]","貴重","Kostbarkeit; Hochwertigkeit; Unschaetzbarkeit.",""
	"ぎちょう","議長","Vorsitzender; Praesident; Sprecher (einer Versammlung).",""
	"きちょうえんぜつ","基調演説","programmatische Erklaerung; Grundsatzrede; Keynote.",""
	"きちょうする","帰朝する","nach Japan zurueckkehren.",""
	"きちょうする","記帳する","eintragen; buchen.",""
Oneline text is the format of the kanjidic.txt file you downloaded above. For illustration, here a few line of the combined ("the lot") EDICT file, inverted to English-Japanese direction with my script:
	cleek (golf) /// クリーク
	clef(musical)  /// 音部記号 - おんぶきごう
	cleft grafting /// 枝接ぎ - えだつぎ
	clemency /// 助命 - じょめい
	clergyman /// 牧師 - ぼくし
	clerical desk /// 事務机 - じむづくえ
	clerk /// 店員 - クラーク, てんいん \ 書記 - しょき \ 事務員 - じむいん \ 局員 - きょくいん \ 官吏 - かんり
	clerk at the information desk /// 案内係 - あんないがかり
Once you convert this file and look at it on the Zaurus, you will see that space-///-space divides the keyword form the explanation, while space-\-space will generate line breaks within the explanation. This short example also shows that my script has bugs: The first kanji word for clerk has only one pronunciation, tenin, while katakana "kuraaku" should really appear in a separate line as an independent translation. You will find many such mistakes, sorry about that - but I can live with them, and I hope you can too.

How do you generate a proper .csv file? Here are a few examples. The rest is up to your resourcefulness. If you know where to find good dictionary files in all languages, please let us know! We would like to collect them here.

Wa-Doku
Ulrich Apel in Osaka has done a heroic job of creating an ever-growing Japanese-German dictionary. The system is implemented on a data base engine called Filemaker, and can be downloaded complete with database engine, vocabulary and instruction. (alternaive download site)
The dictionary is growing because it is updated all the time by the users (i.e. you and me) who key in new words which they could not find in the Wadoku and had to look up elsewhere. Please do cooperate and help improve the dictionary, once you got the system running. You can input new terms directly on the web site, but point a second browser window here for examples of the proper entry format. WadokuJT has, by the way, a lot of redundancy in the over 1 mio. entries so far, which is a drawback if your CF is small. Distribution of data derived from WadokuJT is subject to Ulrich Apel's licence.

With the Wa-Doku running on your PC, get the entire dictionary into the search window (leave the search entry blank and press "suche"), then chose File -> Export. You must specify which data fields you want exported in which order. For German, choose the last field which has umlauts converted to ae, oe, ue which the Zaurus can display. As file type, chose .csv, of course. I exported the dictionary 3 times: kanji-hiragana-German, hiragana-kanji-German, and German-kanji-hiragana. (That's necessary because the Zaurus searches only for the first entry in each line.) The three files were processed as above to PDIC format, renamed WOBP0005.DAT, WOBP0006.DAT and WOBP0007.DAT, each about 7-11MByte in size, were saved on a CF card and are now well in use on my Zaurus. Search speed is faster than I blink my eyes. It often happens that the Zaurus cannot find a German word, though. That is because the dictionary is designed for use in the Japanese-German direction: Many of the German explanations start with such words as "sich ..." or "einen ..." - which will render the actual keyword unfindable for the Zaurus.

My above-mentioned PERL script reduces this conversion problem using a few tricks. Far from perfect, the result is quite usable in my opinion. The script also optimizes the Wadoku dictionary for use on the Zaurus. If you use the script, you only need to export WadokuJT from Filemaker once, namely in the kanji-hiragana-German direction, in tab-separated list format. You will notice that the German-Japanese dictionary generated has all nouns in small letters. This is meant to ease the use on ZPDVIEW as it elimiates the need to enter capital letters.

Test WaDokuJT!

Edict
Jim Breen's Japanese-English dictionaries are the best-known Japanese dictionaries on the Web for good reasons: comprehensive, yet concise, the basic EDICT file alone has a far superior vocabulary to the Zaurus native dictionary. In addition, there are excellent specialized dictionaries, for life sciences, computer science, legal terms, for example, in EDICT format.
	サリ、キシィ、ケ [、オ、キ、キ、皃ケ] /to indicate/to show/to point to/
	サリ、ケ [、オ、ケ] /to point/to put up umbrella/to play/
	サリ、ホタ・[、讀モ、ホ、ユ、キ] /knuckle/
	サリーオ [、キ、「、ト] /finger pressure massage or therapy/
You probably notice that the Japanese character display is messed up here: EDICT files are EUC encoded, which is the UNIX way of coding Japanese characters. The Zaurus understands only Shift-JIS encoded characters. At some point (preferrable in the end, once you generated your proper .csv file, before feeding it into PDIC), you need to change the character encoding. There aught to be plenty of conversion utilites around, for example MS Word can do the job.

EDICT lends itself well to manipulations such as dictionary conversion because of its concise and logically consistent format. I was surprised how useful well it works in the English-Japanese direction after conversion with my script. The script has special conversion routines ENAMDICT and KanjiDIC, Jim Breen's Kanji dictionary file, which is actually in quite a different format from the rest of the EDICT family.

Test EDICT!

Eijiro (英二郎)
a 1 million entry English-Japanese dictionary. If you have a NIFTY ID, you can download it for free for personal use. Alternatively, you can order it on CD by mail, but they do not accept international money transfers.
Eijiro data format can be directly imported into PDIC, so it's straight forward to put on the Zaurus, provided you have about 100MB free memory. Also have a look at the other dictionaries on the same site: Abbreviations, omatopoietic expressions, and Japanese-English.
Whether you prefer Eijiro or EIDICT is a matter of taste: Friends have told "Eijiro is unbeatable - it just has anything, including slang". For my part, I prefer EDICT for being compact, fast, and giving me pretty much anything I need. Am I biased because I wressled with the EDICT conversion script for so long? Well, just try out both of them!

Webster
The 1913 Webster Unabridged Dictionary is downloadable for free from Project Gutenberg. The 1913 edition was used because the compyright is expired. For details on copyright, if you want to find out more or contribute to Project Gutenberg's free electronic library, please follow the above link (http://promo.net/pg).
The exact quote of the copy I downloaded is:
         Webster's Revised Unabridged Dictionary
                 Version published 1913
               by the  C. & G. Merriam Co.
                   Springfield, Mass.
                 Under the direction of
                Noah Porter, D.D., LL.D.
I recently compiled a Webster for the Zaurus by writing a respective PERL program, but I havn't tested the dictionary enough to tell whether or not it contains major bugs.

CD-Rom Dictionaries
Please be aware that CD-ROM dictionaries are subject to copyright and reproduction may be prohibited. So, I can't just give you a copy of those files, you have to buy the CD in a store. CD-ROM dictionaries use their own data format. Publishers may encrypt data to prevent illegal copying. For Japanese dictionaries, there seems to be a quasi-standard data format called EPWING. Fortunately, plain text can be extracted easily from EPWING files - at least in the version we had in hands:

DDWIN is an EPWING reader software. First put your dictionary CD into your drive, then start DDWIN. It will detect the dictionaries on the CD automatically. Click on the proper dictionary, then click on "zenbun" (全文) to search without entry.
Next, click on the Edit menue (編集), then call the editor (エデイタ起動). In the popup window, click the middle radio button (該当項目すべて) and execute to export the dictionary in raw text format.
Looking at the file, you will realize that there are a lot of special character codes and commands enclosed in parentheses. You will not get around using a program to fix this: First, replace the important codes by appropriate characters - German umlauts by ae, oe, ue, ss, for example. Then throw out the rest of the garbage and put in the commas and quotation marks to construct a nice .csv filefor PDIC.

Kojien (広辞苑)
Iwanami's Kojien is THE authority among Japanese dictionaries. As a plain text file, it takes up 36MB and is processable in one piece by the P10 but not the MI-110. We used the 4th edition. The latest edition is no. 5, and I can't guarantee it works equally well.

Shimotsuki san has done the hack before and explains about it on his web page. He also has produced a script to do the job, but I made my own version included in the above-mentioned script of mine. Shimotsuki uses jperl, while I use perl with the Jcode module. Use whichever you like.

Nihonshi (日本史辞典)
Iwanami's dictionary of Japanese history will let you look up, for example, what was notable about the Nakasone government, or just about anything or anyone in ancient or recent Japanese history. Its format is equivalent to Kojien, and so is the portation to the Zaurus. Our commpilation does not contain appendices and time tables. Therefore, you can NOT look up what happened in 1968 by typing in "1968". Sorry, that may be a project for the future. Our nihonshi compilation uses only 9.3MB.

Improvements

I appreciate your help in improving this page: Please report errors and ambiguities. If you are smart with graphics (I am not!), please capture some important screens and mail them to me as jpeg attachments so I can illustrate the explanations. Good ideas and suggestions: Please!

Questions

Plese be considerate with my time and think first, then check the FAQs, then ask your neighbor, and only when you really have a problem, and the problem is not off the subject, ask me.

If you can't manage at all, there is still my ready installation service.

Additional Links

Zaurus Software
Add-on tools I am using on my P10:
(for question on these programs, please contact the respective authors directly)
Tools I am not using but you may want to try: Note: Search the higher directories of the above pages for more programs, such as games etc.
Also, here may be a good start point to find software.
SHARP
SZAB - Sharp Zaurus Application Builder is a C/C++ compiler and programming platform for the Zaurus. Look around here for information and for a Zaurus Version of the Metrowerks Code Warrior C compiler.
Sharp's treasure box often requires you to regiser with a valid credit card number before you can download. (I never bothered to read the small print in Japanese...)

Disclaimer

Neither the author nor any members of the ZOK mailing list shall be held liable for damage causally related to infromation on this page or ensuing email correspondence, nor due to program code downloaded from this site. In particular, we shall not be responsible for the integrity of any of the software recommended, nor for suitability for any particular purpose. Use this information at your own risk.

Contact

For comments questions about this page, please contact the author, xarmin@biojapan.de, for suggestions of more general interest, mail to the whole group, zok@egroups.com.

We wish you good luck and a great time in Japan!

Front Technical Download FAQ Examples Reviews

Updated Dec. 27, 2001 by Armin Rump