Next Previous Contents

12. Staying compatible

Being standard is not the only issue. To be really nice, one has to provide the backward compatibility. In our case, this means that the configuration should be tolerant to the data created using non-standard character sets - that is the Alt (cp866) and cp1251 ones. Also, we should be able to run Cyrillic programs for MS-DOS.

In most cases (except for HTTP), it is enough to provide a timely conversion of data to KOI8-R. When we talk about raw unstructured data, it is quite trivial - see section <@@ref>user-toolsConversion Utilities.

Another issue is the structured data. This case is more tricky. I'll try to outline the basic roadmap of fixing it.

12.1 MIME-based data compatibility

MIME is a standard for architecture-independent data representation. Originally developed for mail messages, it has now many more applications. MIME defines format, which is open to extensions and allows architecture-specific handling of data. For example, if I receive a mail message, containing a MIME object of the video/mpeg type (an encoded MPEG file), my mail reader will automatically decode it and start an MPEG player.

Most UNIX programs, offering MIME capabilities, are based on the metamail package, which contains a set of utilities and data files to work with MIME objects. Several configuration files (/etc/mailcap for global usage and ~/.mailcap for personal setup) define rules for handling MIME object of various types.

Thus, if you receive a proper MIME data stream, containing text in one of the obsolete character sets, you may define a MIME rule to convert such text to KOI8.

Below a number of MIME rules are shown, which are supposed to handle plain text and richtext objects, using both of the obsolete codesets, discussed above. You may incorporate these rules into one of the MIME configuration files.

Note, that these rules use the translit package to perform the actual conversion. For more information on that program and the conversion in general see section <@@ref>user-toolsConversion Utilities.

text/plain; translit -t cp1251-koi8.rus < %s; test=test \
    "`echo %{charset} | tr '[A-Z]' '[a-z]'`"  = cp1251; copiousoutput

text/richtext; translit -t cp1251-koi8.rus < %s; test=test \
    "`echo %{charset} | tr '[A-Z]' '[a-z]'`"  = cp1251; copiousoutput

text/plain; translit -t alt-koi8.rus < %s; test=test \
    "`echo %{charset} | tr '[A-Z]' '[a-z]'`"  = cp866; copiousoutput

text/richtext; translit -t alt-koi8.rus < %s; test=test \
    "`echo %{charset} | tr '[A-Z]' '[a-z]'`"  = cp866; copiousoutput

text/plain; translit -t alt-koi8.rus < %s; test=test \
    "`echo %{charset} | tr '[A-Z]' '[a-z]'`"  = alt; copiousoutput

text/richtext; translit -t alt-koi8.rus < %s; test=test \
    "`echo %{charset} | tr '[A-Z]' '[a-z]'`"  = alt; copiousoutput

Obviously enough, this will work for plain text data only. Binary files are supposed to handle the codeset issues themselves (at least their "parent" applications are). Therefore, if you receive a Microsoft Word document in the cp1251 character set, the duty of providing appropriate conversion capabilities lays upon an application you use to read that document (for example Microsoft Word, or Applix Words).

Unfortunately, the real situation is not that ideal. Many application have their own idea on how to use MIME. Until recently Microsoft Mail software had a broken MIME engine. Also, the Netscape Navigator/Communicator mail client is notorious because of it's sending of mail messages, encoded in cp1251 with the charset=koi8-r field in the message header and vice versa.

12.2 Explicit character set conversion

There are a lot of conversion routines for Cyrillic on the Internet. Each of them has it's own quirks and it's own degree of Cyrillic support.

In my opinion tools must be standard. In this particular case the "standard" conversion tool is GNU recode. Unfortunately, the version, found on the official GNU site (3.4) doesn't support Cyrillic yet (only ISO-8859-5). I developed a set of conversion tables for KOI8-R, Alt, and cp1251 for recode and submitted them to the recode maintainer. He promised to provide Cyrillic support in the upcoming release. Once it happens, I'll rewrite this section to recommend GNU recode as the standard conversion engine for Cyrillic.

Meanwhile, I would recommend a translit package. It supports many popular codesets and is even able to produce a *TeX files (see section tex ) from text in Russian. Also, RedHat users will enjoy an RPM package for translit.

For other conversion routines, Look at SovInformBureau or ftp.funet.fi. You can even use the special mode for emacs (see section Emacs).

12.3 Cyrillic in the DOS emulator

This seems to be the only application, which may require Alt Cyrillic character set. The reason is that Alt is native to DOS and most of DOS programs dealing with Cyrillic are Alt-oriented.

For the console version (dos) you just have to load a keyboard and screen driver. Most of DOS drivers will work fine. I personally use the rk driver by A. Strakhov, which works for both console and X versions of dosemu. Another choice is the r driver by V. Kurland (sorry for possible misspelling). It is perfectly customizable and supports many codesets, Alt and KOI8 among them. However it won't work for the X window (at least version 1.14 I'm using).

Both drivers can be found on most Russian Internet sites, for example Kurchatov Institute FTP server.

For the X version of dosemu you have to provide an appropriate X font as well. Alex Bogdanov sent me such font by e-mail. It is an original vga font from the dosemu distribution, modified for the Alt codeset. Unfortunately I don't know who is the creator of this font and where the official site is.

To setup the font for dosemu you should

Finally, you have to load a keyboard driver. Note, the you don't need a screen driver for the X window. Therefore, not all drivers will work. At least two will: rk by A. Strakhov, and cyrkeyb by Pete Kvitek.


Next Previous Contents