The Transliteration Principles of MLS by Oliver Corff Nov. 21, 1994 NB: This paper is a summary of several articles [1] published in English in Mongolia as well as an abstract of some thoughts developed in the "MLS report" [2]. Last but not least the UNIX-style MLS is here introduced in a more formal way than in the original announcement [3]. The MLS (Mongolian Language Support) codepage was originally developed with the constraints of popular IBM compatible computers in mind. It was designed as an extension of the operating system and should peacefully cooperate with any software working in text modes. In order to accomodate Mon- golian in Cyrillic writing, in Classical writing and in transliterations for both within one 8-bit (256 byte) codespace the following principles and compromises were adopted: 1) Text (hereafter called 'linguistic') information and display information are two different data types. In or- der to work with Mongolian language data it is suffi- cient to stick to the underlying linguistic information expressed through transliteration. A Cyrillic or Classi- cal output can always be generated out of the underlying linguistic information, i. e. the transliterated Mongo- lian data. If the interchange of information is neces- sary it is hence sufficient to transport the trans- literated Mongolian. 2) The transliteration systems for both types of Mongolian writing (cum grano salis two different 'languages' if the diachronic aspect is taken into account) should reflect phonological properties if possible. The trans- literations of both sets should unify whenever justified by the transformation of a certain phoneme/letter com- bination. 3) A unified (or "merged" as it used to be called in some of the earlier articles) transliteration results in a common sorting order for both Classical and Cyrillic words which are ideally transliterated by the same sym- bol sequence. 4) The transliteration subsets are the so-called New Clas- sical Transliteration (NCT - see below Table I) for Classical Mongolian and the New Romanization (NR - see below table II) for Cyrillic Mongolian. Their union and the resulting super set are called Comprehensive Mongo- lian Transliteration (CMT - see below Table III). Data files containing CMT data as well as Classical and Cyrillic display data are called MLS files. 5) The whole transliteration system is strictly surjective (i.e. expressing a genuine function): Classical_Mongolian_Display:=f(NCT-encoded text), Cyrillic_Mongolian_Display:=f(NR-encoded Text), which implies that a conversion engine can operate without human intervention or context analysis in order to generate Classical or Cyrillic output. 6) In order to express all features of Classical writing properly, the NCT subset makes use of both lower case and upper case letters: a) for gali symbols: 'C' vs. 'c': '' vs. '' 'K' vs. 'k': '' vs. '' 'Z' vs. 'z': '' vs. '' b) to reflect properties of the writing system: 'D' vs. 'd': "economy" -> 'D' -> 'N' vs. 'n': "fond" -> 'foND' -> 'O' vs. 'o': "cinema" -> 'KinO' -> 'T' vs. 't': "counter, shop" -> 'KoNTor' -> '' vs. '': 'shadursrng' -> 'Y' vs. 'y': 'maYi' -> vs. 'bayixu' -> vs. 'naiman' -> 7) Despite the usage of empty positions in the Latin al- phabet, a strict coherence with rules and only a minimum of two-letter combinations the transliteration system is flexible and friendly enough to adopt various styles: a +strict+ style optimizing data economy: "e" is "e" "" is "" "x" is "x" "" is "q" (etc.) a +relaxed+ style optimizing visual and phonological clarity" "e" is "y" or "y" "" is "yo" "x" is "kh" "" is "ch" (etc.) 8) The Cyrillic alphabet shares some letter shapes (e.g. 'A', 'B', 'C' etc.) with the Latin alphabet. The sorting order can be reset on the operating system level so this measure should not be considered to be a problem. 9) Within MLS, the Classical writing is displayed via graphemes or glyphs, not via canonical letters. Canoni- cal letters may be composed out of several glyphs. Some- times but not necessarily, a glyph and a letter may be identical. 10) The idea of UNIX-style encoding is the limitation of CMT to the printable 7-bit ASCII character set for use in cross-platform applications. The umlaut symbols are avoided. UNIX-style encoding uses some of the natural constraints of the Mongolian language (e.g. vowel harmony) in order to mark female (front) vowels. Since all vowels in a word are usually either front or back it is sufficient to mark the whole word as front: (ql) is _qoloo, not *qo:lo:o: or something similar; hence the notation: _[]o for such a letter which reads as: the word starts with an underbar followed by zero or more characters, and the vowel concerned takes the shape of an "o". If the word happens to contain one or more ""s then the markup symbol '_' is omitted because "" () is given as "e". In case single letters violate vowel harmony they are marked by a caret: Shagdars^uren (Read: the following vowel "stands out"). 11) If an image of Mongolian has to transported that it can be expressed with the symbols of the MLS codepage. If such a document must be transported over a 7-bit en- vironment then it is advisable to use the keyboard equivalents of the Mongolian glyphs. Instead of writing ('mongol') the file would contain the se- quence "-moaInnoL". The filter MLS_KMAP (contained in the MLS package) generates keyboard equivalents out of Mongolian texts and vice versa. Table I The New Classical Transliteration - NCT The Classical canonical letters available in MLS are trans- literated as follows. The alphabetical arrangement follows that of Coima's Mongolian primer [4]. In this and the fol- lowing tables, an equal sign means that the symbol is the same as in the column left to it. ͻ Classical Strict Relaxed UNIX-style ͹ a = = = e i = = o/u = = / = _[]o/_[]u n = = l = = m = = x q = k = = = G g = = b = = ng = = t = = () d = = r = = z = = y = = s = = sh = = c = = v = = f = = p = = K = = C = = Z = = j = = h = = lh = = e = E ͼ Table II The New Romanization - NR ͻ Cyrillic Strict Relaxed UNIX-style ͹ A A/a = = B/b = = B W/w = = G/g = = D/d = = E E/e Y/y Ye/ye Y/y _[]Yo/_[]yo Yo/ Yo/yo = J/j = = Z/z = = I/i = = I/ I/i = K K/k = = L/l = = M M/m = = H N/n = = O O/o = = / _[]O or ^O, _[]o or ^o P/p = = P R/r = = C S/s = = T T/t = = U/u = = Y / = _[]U or ^U, _[]u or ^u F/f = = X X/x Kh/kh = C/c = = Q/q Ch/ch = Sh/sh = = Qh/qh = = ` = = Y/y = = ' = = / E/e Yu/yu = = Y/y = _[]Yu/_[]yu Ya/ya = = ͼ Table III Comprehensive Mongolian Transliteration - CMT (8-bit form) The following table lists all available CMT symbols and gives their correspondances in both Mongolian writing sy- stems. The table produced here is a revised version of the original Table No. 3 in the MLS report and replaces that document. The distinction between "direct" and "dependant" is such that the "direct" symbols can be immediately trans- lated whereas the "dependant" symbols depend on the follow- ing symbol for proper interpretation. If a character listed in the "dependant" columns appears then the combination of both is evaluated instead of the single letter. Letters in brackets are only used in UNIX-style encoded documents. ͻ CMT Cyrillic Classical ĺĺ dependant direct dir.ĺij dep. strict rel.strict rel. ͹ a a b c +h: C --- d D --- e e (E)--- --- f g --- (G)--- h --- i --- j k +h:x K --- l m n N --- o o O --- p q +h: --- r p s c +h: +h: t T --- u y --- v --- w --- x x y +a: +:e +o: +:e +u: +: Y z Z --- ' --- ` --- ͼ References: [1] Corff, Oliver: A new transcription system for Mongolian. In: Mongol Messenger, No. 25, Dec. 17-23, 1991, Ulaan- baatar. -- -- : In Defense of the New Romanization. In: Mongol Messenger, No. 11 (37), March 17, 1992, Ulaan- baatar. -- -- : The Comprehensive Mongolian Translitera- tion and its Application for Mongolian Dictionaries. IAMS Bulletin No. 2 (8), 1991, Ulaanbaatar. -- -- : A New Transcription System for Mongo- lian. Shinjlx Uxaany Akademin Md, Vol. 2, 1992, Ulaanbaatar [2] Corff, Oliver: Mongolian Language Support. A Mongolian Language Environment for IBM-compatible PCs. Institute of Language and Literature Mongolian Academy of Sciences, Ulaanbaatar. Published by United Nations Univ- ersity, International Institute for Software Technology, Macao 1992/1993. [3] Corff, Oliver: MLS for UNIX. Infosystem Mongolei, Novem- ber 1993. [ 4] Coyim-a, S.: mongol bicig. "kdlmri" azar, 1989