*Corresponding author: aleksandr.andreev-at-gmail-dot-com. The present document is a draft and is not intended for citation.
HTML version and revisions by Nikita Simmons.
Draft as of September 19, 2012. (Revisions as of December 12, 2012.)
Abstract: This paper presents a unified approach to encoding Church Slavonic characters of all recensions using the Unicode Standard and discusses proper font design and typography to represent a wide variety of Church Slavonic texts, both historical and modern. Issues of collation and input method are also addressed.
Unicode is a computing standard for the consistent encoding and representation of text as expressed in a wide variety of writing systems. Among the scripts included within the Unicode Standard are several blocks of Cyrillic characters; as of version 6.1, the repertoire of Cyrillic characters (some still in the Pipeline) allows for the correct typesetting of texts in the Church Slavonic language from a wide variety of epochs. Certain further characters or Variation Sequences are proposed in conjunction with the present Roadmap to make the repertoire complete.
While the Unicode Standard documentation is sufficient for correct encoding, certain technical issues or difficulties may arise when working with actual texts. A standard methodology for addressing such issues or difficulties is necessary so that Church Slavonic typography can be portable and flexible; hence, the need for the present document. We stress that the present document is not a supplement in addition to Unicode but rather represents an industry-wide consensus on how Church Slavonic typography can be implemented within Unicode. The next three subsections explain this in detail. Following these sections, the remainder of the document discusses the necessary repertoire of characters and variation sequences, principles of font design and correct typography, as well as the correct handling of collation, numerals, and input method in Church Slavonic texts.
The earliest recension is the Uncial script, also known as Ustav (“rule”, or “official”). The origin of this script is clearly Greek Uncial writing from about the 9th Century A.D., which was used at the time in Greek liturgical books of the Byzantine Empire. The earliest known Ustav manuscript dates to A.D. 933 (Karsky, 1979, 160). Ustav writing is characterized by the straight shape of the script. The letters are formed by straight lines, half-circles and right angles. The size of the characters is usually quite large and the spacing between characters is equal throughout. In general, one can tell that the manuscripts were written with great diligence and care, reflecting their intention for liturgical use (Karsky, 1979, 169), (Schepkin, 1999, 117). Though Ustav manuscripts may be found up to the 17th century, Ustav writing flourishes mainly from the beginning of Church Slavonic writing and up to the 14th Century. Ustav typography is thus of interest for two purposes: building a Slavonic literary corpus and academic work in linguistics, palæography, literature and liturgics.
Figure 1 presents a passage of Ustav writing from the Ostromir Gospel, one of the oldest Slavonic Cyrillic manuscripts, which originated in Novgorod around 1056 AD. The ability to correctly typeset Ustav Slavonic script would be necessary for the proper digital storage of the text of the Ostromir Gospel and similar manuscripts.
Figure 1: Example of Church Slavonic Ustav writing.
Source: Ostromir Gospel, Novgorod, c. 1056. (Another image can be seen here.)
The Ustav script gaves way to the semi-uncial or Poluustav script, which begins to flourish around the middle of the 14th Century. Poluustav writing no longer reveals the kind of diligence and care used by earlier scribes – letters show some degree of slopiness, straight lines become curved and angles are now oblique. Poluustav writing is also characterized by a greater number of abbreviations and by the emergence of accent marks, breathings and other diacritics, some of which have no linguistic meaning but were simply adopted to imitate Greek texts.
Russian Poluustav script can be broadly classified into an early and a late form, which differ in the repertoire of their characters and the style of the script. Thus, for example, early Poluustav continues to use the letter ѥ (Iotified E); often uses ѹ [оу] or у but rarely ꙋ; or, uses the ꙑ form of Yery. Later Poluustav is characterised by the use of ꙋ, ы, ѕ and з, as well as by a preponderance of diacritical marks. Our goal here, however, is not to analyze the reasons for these distinctions but rather to identify the necessary repertoire (Karsky, 1979, 172).
Poluustav handwriting continued to flourish until the late 17th century, at which time it gave way to a handwritten form which was more “blockish” or squared-off, which may be found in Old Ritualist manuscripts (AA: Should we put some pictures here?).
The advent of the printing press brought about both standardization and diversity to the field of Slavonic typography. The fonts designed for the printing press in Muscovite Russia and in the Duchy of Lithuania in the 16th and 17th Centuries were modelled on Poluustav script. This is the script that, for example, appears in the famous Apostol of Ivan Fedorov (A.D. 1563), thought to be the first book printed in Russia, and presented in Figure 2 . While the Poluustav typestyle used by Ivan Fedorov in his publications became the standard letterforms for most printed books, other variant typestyles emerged as well, which were appropriate for different types of books, such as altar Gospels, standard service books for parish use, private prayer books, collections of homilies, and the Scriptures (the Bible). Examples of other fundamental documents which are essential in establishing the repertoire of characters include the Ostrog Bible (1581, SAMPLE) and the Trebnik (1646, Figure 3) of Metropolitan Peter. (We should note here, however, that one must be careful in establishing the repertoire used in any given work, for distinction needs to be made between different character entities versus visual differences in font type, size or weight. This will be discussed below in greater detail.)
Figure 2: Example of Church Slavonic Poluustav type.
Source: Apostol of Ivan Fedorov, Moscow, 1563.
Nonetheless, the form of Poluustav type used by Ivan Fedorov in the Apostol became more or less the standard for typed texts in 17th Century Moscow. Thus, we can speak of a print Poluustav tradition used by the Moscow printing press for the production of liturgical books in the 17th Century. This same typeface and typographical and orthographical conventions continue to be largely imitated by the Russian Old Ritualist community even up until today. Thus, a standardized approach to Poluustav typography is necessary for all three of the purposes of Slavonic typography that we have identified above.
While Ivan Fedorov successfully typeset Church Slavonic texts in Moscow and later in Lithuania, the first attempts at printing Church Slavonic had taken place much earlier in West Slavic and South Slavic lands. The first Church Slavonic book to be printed was the Octoechos, typeset by Schweipolt Fiol in Cracow in 1491 (see Figure 5). Around the same time, printing began in the Balkans, with the publication of the Octoechos in Tone 1 in 1493 by the printing press of the Montenegrin prince Đurađ Crnojević. Fiol also printed an Horologion and both a Lenten and a Flowery Triodion while the Crnojević press produced five major works. These printed incunabula* initiated a somewhat short-lived printing tradition in the South and West Slavic lands. Well known examples from this tradition include the books printed by Božidar Vuković, whose 1517Služabnik opened the work of a Serbian press in Venice, and those by Francysk Skaryna, who between 1517 and 1519 printed the Bible in 22 volumes in Prague (see Figure 4; note particularly the distinct shapes of е, и, в, ы and some other letters). While Skaryna’s work is not without the heavy influence of the vernacular language of his native land (present-day Belarus), it should nonetheless be considered part of the Church Slavonic literary tradition (Nemirovsky, 2113).
While in the West and South Slavic Incunabula there is a small amount of influence on the shapes of the characters from the contemporary Poluustav manuscript tradition (described above), the unique shapes were primarily based on the Ustav script. The letterforms were quite crude and frequently disproportionate, although this is understandable, since many of the typesetting conventions that we take for granted had not yet been introduced or standardized at this infant stage of book printing. A systematic analysis of the repertoire of symbols found in these publications remains to be undertaken. We can, however, present some generalizations in the tables below. Following the introduction of Poluustav letterforms in North and West Slavic publications, the South Slavic printing presses abandoned the crude Incunabula typefaces and adopted the more refined Poluustav style of typography.
*An incunabulum (or incunable) is a book, pamphlet, or broadside printed before the year 1500 in Europe.
Figure 4: Example of West Slavic Church Slavonic type.
Figure 5: Example of West Slavic Church Slavonic type.
Source: Octoechos of Schweipolt Fiol, Cracow, 1491.
The liturgical and ritual reforms of Patriarch Nikon (c. 1655), as well as the influence of Western European principles and methods of typography, affected the tradition of Slavonic typography and led to the emergence of a highly codified and standardized tradition of Church Slavonic – the Synodal recension. This recension is so named after the period in the history of the Russian Church during which it was governed by a body called the “Holy Governing Synod”, which operated also the Synodal printing houses in Moscow and St Petersburg. An example may be seen in Figure 7. In addition, books were also printed in Kiev by the Lavra of the Kiev Caves (see Figure 6); these Kievan editions present a somewhat distinct typeface and repertoire from the Synodal editions. In particular, the Kievan editions can be immediately identified by the use of the variant forms ꙁ (a form of з), д (a form of д), and с (a form of с). In addition, the significant grammatical and orthographical revisions proposed by Meletii Smotrytsky in his Gramatiki slavenskiia pravilnoe syntagma (The Correct Syntax of Slavonic Grammar, Vilna, 1619, and reprinted in Moscow, 1649) were incorporated into the post-Nikonian texts. The Synodal Church Slavonic recension is distinguished by the highly standardized nature of its orthography and repertoire of diacritical marks and combining letters, as well as by the distinctive typefaces.
In terms of grammatical, orthographic and typographic rules, Synodal Church Slavonic, with minor variation, remains the main liturgical language of the Russian Orthodox Church. Indeed, the liturgical books of the Russian Church have remained largely unchanged since the Synodal period; an attempted reform of the books on the eve of the 1917 Revolution remained mostly unnoticed. In terms of typeface, the Synodal type has largely become the standard; the Kievan typeface has fallen out of use and is mostly of historical value, though some originals and photocopied reprints of Kievan editions may still be found in choir lofts, especially in the Russian diaspora. Thus, the correct encoding of Synodal Church Slavonic is primarily of interest for the typesetting of modern liturgical texts, but also for computer-aided analysis of the Church Slavonic corpus.
Figure 6: Example of Synodal Church Slavonic from a Kievan edition.
Figure 7: The same text as in Figure 6, but from a Moscow edition.
Source: Festal Menaion, Moscow, Synodal Printing House, 1901.
Finally, some mention ought to be made of Skoropis (literally, “swift writing”), a form of Slavonic semi-cursive script that emerges around the same time as Poluustav and spreads primarily to secular documents, where it can be found up through the 18th Century. As the name implies, the purpose of Skoropis was to allow scribes to write quickly, and thus Skoropis can be considered a precursor to modern Cyrillic cursive writing. As can be seen from Figure 8, Skoropis is characterized by wide strokes of the feather, the rounded shape of its letters, and the presence of many combining superscripts. Strictly speaking, Skoropis primarily reflects the vernacular (Russian, Ukrainian, etc.) literary tradition, as it was used extensively in secular documents. However, we may also find it used in certain ecclesiastical works, primarily in collections of patristic homilies, didactic material, and scriptural commentaries, though rarely in liturgical texts. The forms of Skoropis are diverse, and depend on time period and location; for the purposes of constructing the chart below, we selected a font that mimicks the Skoropis tradition of Moscow circa the reign of Peter I.
Figure 4: Example of Church Slavonic Skoropis writing.
Source: Gramata issued by Ivan IV to the Solovetski Monastery, 1539.
Broadly, then, we have identified five distinct forms of Church Slavonic script that should be discussed in this document: the earliest form of Ustav writing; subsequent Poluustav writing and type; South and West Slavic Incunabula; the Synodal Poluustav type; and Skoropis. Below, we will discuss the repertoire associated with each script. First, however, a few words are warranted about why a common standard is necessary and how such a standard may be achieved.
As long as one is working in a completely closed system – that is, as long as the correct representation of a document is not of interest beyond the original user and the life expectancy of the software and hardware used to produce it – Unicode is not necessary. Indeed, in a closed environment, one can use any ad hoc codepage. However, as soon as documents become available for exchange over multiple systems and software implementations, the existence of a unified standard for the encoding of Church Slavonic text from all epochs becomes eseential. The typesetting of any text, either in academic work or in modern liturgical texts, is conditional on one additional factor: the presence of adequate Church Slavonic fonts to represent the typeface or script of the given epoch. However, the design and development of such fonts again necessarily requires a common standard. Fonts that conform to the standard of encoding proposed in this Roadmap we shall call “conformant fonts.”
In the past, attempts to create a common standard for Church Slavonic text have been put forth; broadly, these can be classed into two categories. The first are a set of standards that attempt to use an 8-bit encoding and an ad hoc codepage to represent a subset of Church Slavonic text; the most popular of these is the Universal Church Slavonic (UCS) used for Synodal type. UCS may be extended to represent texts of other Slavonic recensions by using a 16-bit encoding and extending the codepage. The second are a set of standards that attempt to encode Slavonic texts of all recensions using an 8-bit codepage and an ad hoc markup language. The most popular of the second type is the Hyperinvariant Presentation (HIP) format, which uses various markup codes to represent Church Slavonic characters within the Windows 1251 codepage. The term “hyperinvariant” is a misnomer – while the text is certainly invariant for storage purposes, it cannot be used for typesetting without a converter into some other standard, usually UCS. In fact, both HIP and UCS are attempts to use the CP 1251 to do something which it was not intended to do, namely, to encode Church Slavonic.
Clearly, any kind of standard for the encoding of Church Slavonic needs to, at a minimum, satisfy
three properties. It should be portable, allowing the user to work with Slavonic text on any platform
and in any software, without special tools, add-ons, converters, and the like (but not without fonts and
standard support for modern font features). It should be sufficient, representing the full repertoire of
Church Slavonic text from different epochs and sources. And it needs to be stable, meaning that, for
example, those characters used in Ustav manuscripts that are common with Synodal analogs should be
encoded identically. The last property also allows for the user to typeset multilingual documents, which
include Church Slavonic and other languages, as well as to perform computer based analysis of Slavonic
texts, such as querying, parsing, and the like.
Unicode has quickly become the industry-wide standard for the representation and encoding of text because it allows for simple multilingual computer processing. Unicode assigns a unique codepoint – a number – to any given character while leaving the graphical representation of this character to higher level protocols (fonts). This abstraction allows for the proper unification of characters across recensions of the textual tradition, while preserving the ability to make graphical distinctions at the font level. The best example of such a balance between unification and graphical flexibility is Han Unification – the unification of the sinographs used in Chinese (both in Simplified and Traditional orthography), Japanese, Korean, and Vietnamese – into one coherent encoding system while allowing for language-based graphical distinctions to be made at the font level. In a nutshell, this Roadmap presents a similar approach to unifying Church Slavonic encoding across recensions.
Moreover, since Unicode is supported on a wide variety of platforms, the representation will be invariant without the use of any special tools beyond conformant software and fonts. We should, however, note what Unicode does not do – namely, it does not address issues of typography other than encoding. The Unicode standard is concerned only with assigning unique codepoints to characters, not with their correct representation in a body of text. The present document is an attempt to demonstrate how such representation can be achieved with a combination of Unicode and modern font technologies.
We should also point out a past attempt to standardize the encoding of Church Slavonic in Unicode. Kostić et al. (2009) present a proposal to encode an entirely new block in the Unicode standard – an Old Slavonic Cyrillic script. The proposed standard would include all required characters for Church Slavonic of the Ustav and South Slavic Incunabula recensions (including those with analogs in modern Cyrillic languages); two versions of each combining superscript letter, one with a titlo and one without; precomposed Cyrillic numerals; and required diacritical marks. In addition, the authors contemplate the idea of encoding some 200 ligatures. Elsewhere, Kostić (2009) raises the following justification for the creation of such a standard: difficulties associated with the correct placement of diacritical marks over base letters; differences in the meaning of characters used in Church Slavonic and modern Slavonic languages; the proliferation of combining marks and ligatures in Church Slavonic texts; and difficulties associated with the correct implementation of sorting (collation), given the somewhat disorganized nature of the current Cyrillic blocks of Unicode.
The limitations of a system such as the one proposed by Kostić et al. (2009) are obvious. Beyond the limited documentation of the proposed repertoire – some characters, while theoretically possible, have no attestation whatsoever (for example, a combining Cyrillic letter Koppa) – we have the immediate issue that the Unicode Technical Comittee has adopted a strong policy against encoding precomposed glyphs and ligatures, rendering the encoding of much of the repertoire proposed by Kostić et al. (2009) impossible, even if desirable. But even the desirability of such a standard is also questionable, since the presence of two codepoints for analagous characters in modern and ancient script (comparable to encoding two versions of the letter “a” – one for English and one for Turkish) as well as the proliferation of combining letterforms would lead to confusion and difficulties in designing an appropriate input method. Second, it is unclear if the variants presented by Kostić et al. (2009) are typographical or scribal variants (handled most appropriately at the font level) or actual letter variants that need to be handled as unique codepoints or variation sequences. Third, the advances in modern font technology allow us to address the issues of diacritical mark positioning and ligature composition without the use of precomposed glyphs. Indeed, modern OpenType and SIL Graphite technologies have been successfully used for complex writing systems such as Tai, Arabic and Devanagari; there is no reason they cannot be used for Church Slavonic Cyrillic. Finally, issues of collation may be addressed quite simply by an appropriate tailoring of the Default Unicode Collation Element Table (DUCET).
As we have stated above, the Unicode standard addresses only the repertoire of characters and their representation using a numerical code. It does not specify how these characters shall be used or which set of characters shall be used for which language, beyond dividing characters into script-specific blocks. While the Unicode documentation does provide some additional information and the annotations to characters provide some comments, these comments are usually insufficient for a user to be able to correctly and uniquely implement a language and script using Unicode.¹ In general, then, the purpose of the present Roadmap is to document a unified encoding of Church Slavonic within the Unicode standard. Thus, it presents documentation on how a specific language should be correctly typeset in Unicode. To do this, the document specifies two sets of rules: first, which codepoints of the Cyrillic and other blocks shall be used to encode which characters and, second, which font features shall be used to represent these characters visually. As shall be discussed below, conformant fonts will need to rely on modern features, the use of which is also outlined. In addition, the present document presents certain other issues essential for proper Slavonic typography, including collation, transliteration, and the design compatible input methods.
¹Take, for example, the character at 0456, Cyrillic small letter Byelorussian-Ukrainian i. From the name of the character, the user knows that this codepoint is used to represent the letter і in Ukrainian and Byelorussian. Under Annotations, the user sees the comment “Old Cyrillic i.” However, it is unclear if “Old Cyrillic” here refers to Church Slavonic or Russian in traditional orthography; in addition, if it refers indeed to Church Slavonic, it is unclear if “Old Cyrillic i” should represent the dotless form, the dotted form, or the double-dotted form of the Church Slavonic letter.
At present, the Cyrillic characters required for Church Slavonic are encoded in Unicode in a haphazard
manner because, unlike modern languages that have well documented and identifiable character ranges,
Church Slavonic has not been thoroughly and systematically researched and hence was not presented for
inclusion in the Unicode Standard as an entire and coherent writing system. Initially, only the modern
languages written in Cyrillic were represented in Unicode, and it took a few revisions to get the missing
characters for pre-Revolutionary Russian orthography encoded as well. Gradually, a number of other characters commonly used in Church Slavonic were proposed and included in the Unicode Standard. Presently,
various Church Slavonic characters are scattered in non-contiguous blocks in the BMP (Basic Multilingual
Plane) and SMP (Supplemental Multilingual Plane). However, without guidelines such as in this Roadmap,
the encoding system is difficult to use in a way that meets the needs of professionally reprinting the worship books of the Orthodox Church with faithfulness to the historical typographical tradition or for computer aided analysis.
Part of the difficulty in identifying which characters to use stems from the fact that the “official” names used in the Unicode Standard are based on those of modern Russian or other Slavic languages, or based on the recommendations of the South Slavic (Balkan) academics who presented the characters for inclusion in the Unicode Standard. These names are sometimes at variance with traditional North and East Slavic letter names. Other characters come with apparent doppelgängers – that is, visually indistinguishable forms – used in the Cyrillic-based writing systems of non-Slavic languages such as Kazakh, Sakha (Yakut), or Mongolian. Yet other characters do not have different codepoints for modern and ancient forms. In any case, in the tables below, we list the characters by their established name and function in Church Slavonic and not by Unicode name, codepoint, or block.
The present section addresses issues associated with the design of fonts conformant with the proposed encoding standard (“conformant fonts”). A font shall be defined as a quantity of sorts composing a complete character set of a single size and style of a particular typeface. Because fonts are character set and typeface specific, a conformant font shall seek to address only the typeface of one particular era of Church Slavonic – in other words, fonts shall only strive to implement the features of one of Ustav, Incunabula, Poluustav, modern Synodal type, or Skoropis typefaces. While the inclusion of multiple eras of typeface in a single font may be possible via the use of stylistic sets and other advanced features, this is not addressed in this Roadmap. The inclusion of characters not relevant for a particular period (for example, the encoding of a modern Cyrillic Letter Ya in a Synodal-era font) may be desirable, for example, for use in creating ornamented modern text, but is in no way required. On the other hand, conformant fonts are required to include the entire repertoire of characters associated with the era they are designed to reproduce (for example, a Synodal-era font lacking a Big Yus would not be considered a conformant font).
It is proposed that a font carry a distinctive name that shall consist of three parts – the name of the typeface (which is entirely up to author), the name of the era it attempts to reproduce, and the word “Unicode”. Thus, examples of conformant font names shall be: Hilandar Ustav Unicode, Ostrog Poluustav Unicode, Hirmos Synodal Unicode, and so forth. In addition, fonts shall be accompanied by adequate documentation that shall, as a minumum, address the following issues: 1. who is the author of the typeface; 2. under which license (or licenses) is the font distributed; 3. what is the repertoire of glyphs available in this font; and 4. which advanced typographic features are used by the font.
In addition to the relevant repertoire for the appropriate recension of Church Slavonic, presented in Section 2, conformant fonts shall include the following control characters: Variation Selectors 1 through 4 (U+FE00 through U+FE03); Zero Width Joiner (U+200D); Zero Width Non-Joiner (U+200C); Combining Grapheme Joiner (U+034F); Dotted Circle (U+25CC); Left To Right Mark (U+200E); and Right To Left Mark (U+200F). As well, it is highly desirable for fonts to present the basic repertoire of Latin characters, at least the 95 graphic characters of the ASCII block. Evidence strongly suggests that certain software requires the presence of these characters. Fonts should not use Unicode codepoints to represent anything other than the characters located at those codepoints; any characters available in a font that are not mapped in the Unicode standard should be encoded outside of the Unicode range or in the Private Use Area.
The use of advanced typographic features is required for proper Church Slavonic fontography. Briefly,
fonts with advanced typographic features (so-called “smart fonts”) contain not only glyph outlines but
also additional instructions on how glyphs are combined and positioned. Presently, several technologies
providing “smart” features exist – these include OpenType, Apple Advanced Typography (AAT), and SIL
Graphite. These three technologies do not necessarily compete – a font may be designed to work with
more than one of these technologies, thus providing the user with some flexibility. In addition,
as of press
time [<at the present time], not all software applications support all of these technologies or all features of a given technology.
Font designers should keep in mind the target user audience and decide appropriately which technology
should be used. Because OpenType is by far the most prevalent “smart font” technology, we describe the
OpenType features that Church Slavonic fonts should use below.
OpenType is a technology for advanced typography developed by Microsoft Corporation and Adobe Systems
and based on the TrueType font format. OpenType fonts may contain either TrueType or Postscript
outlines. Glyphs in an OpenType font are mapped to their Unicode positions, thus rendering the format
trivially [huh?] Unicode compliant. In addition, however, OpenType fonts may include many non-standard characters,
for example: old-style figures, small capitals, contextual or stylistic alternatives, and ligatures. The
support of these features is contingent on the availability of software that correctly implements the features.
As of press time [<At the present time], a variety of software packages provide varying levels of support for OpenType
features – we have attempted to summarize the details in Appendix A; software that cannot access the
“smart” features of the font can still access the characters mapped to the Unicode positions. Software that
is not complaint with Unicode can only access the first 256 characters of the font.
This subsection outlines which OpenType features shall be used by conformant fonts to implement the elements of Church Slavonic typography. (Note that several of the examples presented in the table below do not yet display properly in HTML because the Slavonic Unicode font is still under development and needs further work.)
|Open Type feature||
|mark - Mark to Base Positioning|
This feature should be used to implement correct positioning of diacritical marks in relation to a base glyph or a ligature glyph. This feature may be implemented as a MarkToBase Attachment lookup (GPOS LookupType 4) or a MarkToLigature Attachment lookup (GPOS LookupType 5). For this purpose, marks are grouped in classes and appropriate anchor points are specified on each base character or ligature for each class of marks.
а + ◌́ → а́
|This feature should also be used to create the numerals by positioning the combining numeral mark over the relevant letter. In certain instances, proper alignment cannot be acheived by using the mark feature, and precomposed glyphs should be used. See the discussion below on the ccmp feature.||
а + ◌҉ → а҉
|mkmk - Mark to Mark Positioning|
|This feature allows for the positioning of diacritical marks in relation to other diacritical marks, for example, when stacking multiple marks or positioning multiple marks next to each other. In fonts of the Ustav era, this feature can be used to produce certain composite marks, as in the example. In fonts of the Poluustav and Synodal eras, this feature should stack multiple diacritical marks on top of each other to indicate to the user the presence of extraneous diacritical marks in a text. Note that this feature cannot be used to create the iso and apostroph digraphs of Synodal texts; instead, the ccmp feature should be used.||
а + ◌́ + ◌́ → а
|This feature can also be used to produce the combining letters under a titlo in Poluustav and Synodal fonts. Note that in some instances, the use of precomposed glyphs may be required, and this is achieved using the ccmp feature.||
а + ◌ⷪ + ◌҇ → аo
|kern - Kerning|
|This feature is used to adjust the amount of space between glyphs, in order to provide optically consistent spacing between glyphs. Because of the shape of certain Church Slavonic letters, some glyph combinations (especially combinations of uppercase and lowercase letters) require adjustment. This feature can also be used to adjust the vertical position of glyphs, where necessary. (It should be noted, however, that kerning was not introduced to Slavonic typography until fairly modern times. Type designers of Synodal era fonts may choose to implement kerning as a design element, but earlier type styles should not contain kerning tables.)||
Т + а → Та
|ccmp - Glyph Composition / Decomposition (Contextual Substitution)|
|This feature is used to compose glyphs made up of constituent parts. For example, the iso and apostroph digraphs are composed of a psili pneumata and a combining accent. The uk digraph is composed of the letters o and u. Note that this feature is implemented by the software prior to any other feature.||
◌̓ + ◌́ → ◌
о + у → ѹ
Note that order matters – the same characters entered in a different order will produce different (usually erroneous) results:
◌́ + ◌̓ → ◌́̓
|Note also that in some instances, it may be required to suppress glyph composition, for example in order to write the character о and у as a standalone, not a digraph. In this case, the Zero Width Non-Joiner (U+200C) should be used, as in the example at right.||
о + [ZWNJ] + у → оу
|Fonts should provide pre-composed glyphs for those glyphs, the correct appearance of which cannot be acheived using the mark and mkmk features. Such pre-composed glyphs should also be accessed via the ccmp feature. At right, we present one such example (the Yats are other good examples):||
ꙋ + ◌̑ →
|calt - Contextual Alternatives|
|This feature is used to replace certain glyphs with alternative forms, used in a specific context. For example, in Synodal era typography (only), the form of the psili over an uppercase letter differs from the form used over a lowercase letter.||
А + ◌̓ → А
|liga - Standard Ligatures|
|This feature is used to replace a sequence of glyphs with a single glyph, called a ligature. Church Slavonic typography does not include any pre-defined ligatures, but may include discretionary ligatures (this is not the right term!). Such ligatures are implemented by using the Zero Width Joiner (U+200D), as in the example at right. Documentation supplied with fonts should list all ligatures available in a font.||
т + [ZWJ] + в →
|clig - Contextual Ligatures|
|This feature is used to replace a sequence of glyphs with a single glyph. Unlike other ligature features, the Contextual Ligatures feature specifies the context in which the ligature is implemented. This feature is widely used for Skoropis fonts and may also be useful for certain features of Synodal or Poluustav typography.||
д + ◌ⷭ + ◌҇ + ꙋ → дс
Graphite is a powerful “smart font” technology developed by SIL International. Graphite and OpenType are not competing technologies; rather, font developers may choose to support both technologies simultaneously. Since, unlike OpenType, Graphite does not have predefined features, it provides the developer with an ability to control subtle typographic features that may be difficult or impossible to handle with OpenType. In addition, while support of OpenType features varies from application to application, Graphite relies on a single engine, and thus all Graphite features are supported whenever an application supports Graphite. However, Graphite is not supported widely: in addition to SIL’s own WorldPad editor (a Windows-only application), it is supported in LibreOffice (starting with version 3.4), Mozilla Firefox (starting with version 11), and XƎTEX (starting with version 0.997).
Graphite features are written in a C-like programming language called Graphite Description Language (GDL), a rule-based programming language that is used to describe the behavior of a writing system. Then, the TrueType font file is compiled against the GDL file by using a Graphite compiler.* The resulting TrueType font file contains additional tables that are used by the Graphite engine of a Graphite-aware application. Note that, as of press time, Graphite is only supported in TrueType fonts; it is not possible to add Graphite tables to OpenType-CFF fonts.
*As of version 2.1, the Graphite compiler is available for GNU/Linux in addition to Windows.
While all of the features of Church Slavonic writing can be successfuly implemented by using only OpenType features, a developer may wish to also add Graphite features to his font for two reasons. First, because Graphite features are written in a separate GDL file and then compiled into a font, they are not font-specific; thus, a developer may quickly develop many fonts with the same Graphite features and using only one GDL file. Second, developing a font with both OpenType and Graphite features will allow the font to be useable in a greater number of applications, especially those in which OpenType support is lacking or unstable (most notably, LibreOffice). Thus, the present Roadmap recommends that font developers implement both OpenType and Graphite features. Font documentation should clearly describe the Graphite features that are available in a given font.
The typesetting of Church Slavonic texts requires the use of certain variant forms of characters encoded in Unicode. These variant forms have been used in historical printed editions; some of them continue to be used today while others remain important because of their provenance in fundamental printed texts, for example, the Ostrog Bible, the Trebnik of Metropolitan Peter (Mohyla) or the printed Oko Tserkovnoye (Typikon). For the most part, the use of these variant forms is not context-specific and thus cannot be implemented using contextual glyph substitution. In addition, these variant forms are actual orthographic variants, not mere differences in font style. That is to say, the variant forms are used alongside the base forms within the same typeface and style. Moreover, they constitute distinct variant forms, usually used for one of three reasons: as a space-saving device; because of some complex rules of orthography; or, lastly, at the whim of the typesetter in order to provide a graphical embellishment, for example, when the same character occurs multiple times in a word or on a line of text. As well, these variant forms have unique representations in the legacy Hyperinvariant Presentation (HIP) standard. For these reasons, these variant forms should be implemented as standalone glyphs, not as, for example, stylistic alternatives at the font level.
YURI: Given Unicode’s stance, I highly doubt that the reason of “whim of the typesetter” will do, as using stylistic sets or something similar would be Unicode’s answer (for example, if we wish to encode the different types of g’s in English we would need to use stylistic sets/alternatives, so the same can be said of the proposed forms here).
NIKITA: Another important reason why variant forms where used (or seemingly chosen by the typographer), particularly in the Poluustav era, was to avoid “character collision” in places where the descender of a character in one line of text would intersect or collide with an ascender, diacritical mark, titlo or capital letter in the following the line of text. The Unicode Consortium will certainly not be concerned with such a random variable in the presentation of texts, especially since this is so “occurence specific”, but typographers need to be aware of why there are so many contextual variants in early printed books.
The correct way to handle these variant forms is via the use of Variation Sequences.* Variation Sequences consist of a single base character followed by a single Variation Selector. The Variation Selectors are a set of 256 characters, encoded at FE00 to FE0F and E0100 to E01EF, designed to be used to define specific variant glyph forms of Unicode characters. When a glyph variant form that cannot be predicted algorithmically (that is, via the context) is required, the user simply appends an appropriate variation selector to the letter to indicate to the rendering system which glyph form is required. Variation Sequences are already used in Unicode to implement a number of features of Mongolian and ‘Phags-Pa typography. In this Roadmap, we propose a standard approach to use Variation Sequences to access character variants in Church Slavonic.
*YURI: We will need to convince Unicode of this statement. At present, I highly doubt that Unicode will buy our argument. We need to emphasise the similarity with Mongolian rather than plain Latin.
NIKITA: I propose implementing a consistent, unified methodology for using the Variation Sequences, which can be applied to all the base characters uniformly. This would allow the typesetter to consistently invoke the same function with one specific Variation Selector (such as the Greek variants). (This would involve revising the system presented in Table 1 below.) For example:
Variation Selector FE00 = alt1 - standard historical variation 1 (could also be used to access the Slavonic asterisk?)
Variation Selector FE01 = alt2 - standard historical variation 2
Variation Selector FE02 = alt3 - standard historical variation 3
Variation Selector FE03 = grk - Greek letterform variant (this would also include the Greek punctuation, such as the high period and high comma?)
Variation Selector FE04 = rev - reverse letterforms (used for several Ustav variants) - This is questionable, but a possible method of text entry
To this end, in Table 1, we propose the encoding of the following standardized variants to the
StandardizedVariants.txt file. The names of the proposed Variation Sequences are in accordance
to the naming conventions given in the Working Group Document L2/10-280 (Pentzlin, 2011). At the font
level, Variation Sequences have traditionally been implemented as ligatures in the GSUB table (by use of
the liga feature, as described above). Adobe Systems has recently proposed an extension to the OpenType standard whereby Variation Sequences are implemented in the CMAP table of the font. As the latter
approach will probably become the eventual standard, we recommend that font designers support both
approaches to encoding Variation Sequences. Both features may be implemented, for example, in the free
font development tool FontForge.
(Note that several of the examples presented in the table below do not yet display properly in HTML because the Slavonic Unicode font is still under development and needs further work.)
|U+0432 U+FE00||в||CYRILLIC SMALL LETTER VE VARIANT-1
|д||U+0434 U+FE00||д||CYRILLIC SMALL LETTER DE VARIANT-1
LONG-LEGGED DOBRO (NIKITA: I disagree with this being a Variation instead of an encoded glyph, but this matter still needs further discussion.)
|о||U+043E U+FE00||o||CYRILLIC SMALL LETTER O VARIANT-1
NARROW ON (NIKITA: I disagree with this being a Variation instead of an encoded glyph, but this matter still needs further discussion.)
|с||U+0441 U+FE00||с||CYRILLIC SMALL LETTER ES VARIANT-1
|U+0442 U+FE00||т||CYRILLIC SMALL LETTER TE VARIANT-1
|т||U+0442 U+FE01||||CYRILLIC SMALL LETTER TE VARIANT-2
|ꙋ||U+A64B U+FE00||||CYRILLIC SMALL LETTER UK VARIANT-1
|ъ||U+044A U+FE00||ъ||CYRILLIC SMALL LETTER YER VARIANT-1
|ѣ||U+0463 U+FE00||ѣ||CYRILLIC SMALL LETTER YAT VARIANT-1
|U+1F545 U+FE00||||SYMBOL FOR MARKS CHAPTER VARIANT-1
FORM WITH KA-ER SUPERSCRIPT
|🕅||U+1F545 U+FE01||||SYMBOL FOR MARKS CHAPTER VARIANT-2
FORM WITH EM-KA LIGATURE AND ER SUPERSCRIPT
|🕅||U+1F545 U+FE02||||SYMBOL FOR MARKS CHAPTER VARIANT-3
FORM WITH KA SUBSCRIPT AND ER SUPERSCRIPT
|🕅||U+1F545 U+FE03||||SYMBOL FOR MARKS CHAPTER VARIANT-4
FORM WITH KA SUBSCRIPT
The Unicode Technical Committee defines “collation” as the process and function of determining the sorting order of strings of characters. Proper collation is required for text processing systems, for example in creating sorted lists of textual data, selecting certain records in a database, or working with items in a dictionary. While Church Slavonic uses the Cyrillic script, it is important to note that collation varies according to language and culture. For example, Germans, French and Swedes sort the same characters differently, although all three use the Latin alphabet. In the same way, the collation rules for Church Slavonic texts will be different from the collation rules for Russian, Ukrainian or Serbian.
We should note that collation is not the same thing as Unicode code point order. In the Latin codepage, capital Z comes before lowercase a, but this has no bearing on the order of sorting the two characters. Thus, the fact that the Cyrillic blocks of Unicode are “not in the correct order” from the standpoint of the Church Slavonic alphabet (or, for that matter, any alphabet using the Cyrillic script) has no bearing on collation. Instead of using binary order, the collation of Unicode strings should rely on multilevel comparison.
The Unicode Collation Algorithm (UCA) details how Unicode strings are compared and sorted. The
UCA is based on the notion of a collation table, which associates multilevel comparison codes with individual
characters based on language and locale. The UCA provides fallback multilevel codes via the
Default Unicode Collation Element Table (DUCET). Particular blocks of the DUCET can be modified to fit
the collation specifications of a given language and locale, a process called tailoring. Below, we present two standard tailorings for Church Slavonic.
Synodal Church Slavonic, which we shall term, in keeping with the conventions of the Unicode Locale Data Markup Language (LDML) standard as cu-RU, requires a number of peculiar collation rules. First, the order of letters in Synodal Church Slavonic mimics that of modern (pre-reform) Russian. Thus, the usual order of letters is as follows:
Аа, Бб, Вв, Гг, Дд, Ее, Єє, Жж, Ѕѕ, Зз (Ꙁꙁ), Ии, Їі, Кк, Лл, Мм, Оо, Ѻѻ, Пп, Рр, Сс, Тт, Ѹѹ, Уу, Хх, Фф, Ѡѡ (Ѽѽ), Ѿѿ, Цц, Чч, Шш, Щщ, Ъъ, Ыы, Ьь, Ээ, Юю, Ꙗꙗ, Ѧѧ, (Ѫѫ), Ѯѯ, Ѱѱ, Ѳѳ, Ѵѵ, Ѷѷ .*
*The letterform Ꙁ is a variant of З that appears in Kievan editions. The letter Ѫ is not used in Synodal Church Slavonic except as a symbol in tables used for computus.
However, while this ordering of letters is traditional, the following sorting rules are usually observed:
• Collation should follow (phonetically), as closely as possible, the collation order of pre-reform Russian. Thus, З and Ꙁ should sort together, as both are pronounced as З (however, Ѕ has traditionally sorted before З). Likewise, Ѻ, О and Ѡ should sort together, as all pronounced as the Russian О. This, for example, is the sorting order we find in the Church Slavonic dictionary of Dyachenko.
• Collation should make it simpler to distinguish forms based on case and number. In particular, the nominative singular form should be sorted first, if possible (ра́бъ < ра̑бъ). In addition, the singular forms should come before the plural/dual forms, thus (ра́бъ < ра̑бъ; м́жи < мжи; рабо́мъ < рабѡ́мъ).
• Abbreviations should come before unabbreviated words of the same forms (дв҃а < два̀). However, as an exception, the letter Ѿ is treated as a ligature for от.
• Sorting of diacritical marks should be from left to right, that is, вещ́ ьми < вещьмѝ.
• Capital letters should sort before lowercase letters.
Therefore, to implement the above behavior the following collation order is proposed:
А «< а < Б «< б < В «< в < Г «< г < Д «< д (д) < Е «< е « Є «< є < Ж «< ж < Ѕ «< ѕ < З «< з « «< < И «< и [й = и + ̆] < І «< і < К «< к < Л «< л < М «< м < Н «< н « О «< о « Ѡ «< ѡ « Ѿ «< ѿ [ѿ = ѡ + т; Ѽ = ѡ + + ̑] < П «< п < Р «< р < С «< с < Т «< т < Оу «< оу « У «< « у < Ф «< ф < Х «< х < Ц «< ц < Ч «< ч < Ш «< ш < Щ «< щ < Ъ «< ъ < Ы «< ы < Ь «< ь < Ѣ «< ѣ < Ю «< ю < Ꙗ «< ꙗ « Ѧ «< ѧ < Ѫ «< ѫ < Ѯ «< ѯ < Ѱ «< ѱ < Ѳ «< ѳ < Ѵ «< ѵ [ѷ = ѵ + ]
Our discussion of Church Slavonic typography cannot be complete without considering the issue of input methods – that is, without considering how Church Slavonic characters should be input into the computer. Of course, there exists a trivial [huh?] solution to this problem: characters could be input directly by selecting the appropriate Unicode codepoints in text processing software, either using keyboard shortcuts or graphical interfaces that come standard in most operating systems, such as the Character Map. However, for obvious reasons, this method of character input is not satisfactory.
A more satisfactory approach allows the user to type Church Slavonic characters on a keyboard (physical or virtual) via the use of a keyboard layout. Keyboard layouts can be customized to fit the needs of the user; however, appropriate keyboard layouts should be supplied with the operating system or be readily available for download and installation. In the present Roadmap, we present one possible keyboard layout, which we propose to include as part of the X Window System. In designing a keyboard layout let us consider three issues:
• Should a keyboard layout support all recensions of Church Slavonic or be targetted to support only one recension? That is, should a keyboard layout provide characters both for Ustav and Synodal scripts?
• Should a keyboard layout support both Church Slavonic and Russian simultaneously (that is, should we leave Russian characters like э and я in their place and simply add the additional characters)?
• Should a keyboard layout be based on the Russian QWERTY (ЙЦУКЕН) layout (known as Winkeys), or should we design our own keyboard layout, based on the “best” positioning of Church Slavonic letters? The latter approach would entail analyzing the frequencies of letters and digraphs in the Church Slavonic corpus and solving a (linear?) programming problem.
Please comment on these three issues.
YURI: My answers to these questions:
1) No, we should support the Synodal recension with one keyboard, and develop appropriate ones for the other recensions.
2) No, since one can treat Synodal Church Slavonic as a regular language (that has its own grammar, spell check, collation order, etc…), we keep the two separate. This approach will be useful, if we are not often typing in Russian/Ukrainian in general. On the other hand, in Russia, having a version that supports both would be quite beneficial, but it would require juggling some of the letters around.
3) No, since the Russian QWERTY is more or less the standard for all Cyrillic keyboards. As well, the differences between the different languages is minimal so the gain obtained from trying to optimise it may not be beneficial in the long run.
NIKITA: I agree with Yuri’s answers. In addition, I would like to suggest that we make use of a number of markers (I'm not sure how we could implement this) for typesetting the various characters, by means of a ‘base character + marker’ sequence. Specifically:
- Variation Selectors (FE00-FE03)
- Superscript Selector/Marker
- Vysoko/Upper Marker
- Mjagko/Soft Marker
- Reverse Marker
- Iota Marker (for ioticised/ligated characters)
- Ligature Marker
- Number Marker
- Greek Marker
YURI: Also, what about transliteration into the Latin Alphabet?
This Appendix will provide some information about which advanced typographic features are supported by which software, probably in the form of a matrix.
This Appendix will provide some typeset examples to convince the user that we know what we’re talking about.
This Appendix will provide some information of text encoding conversion, incuding a HIP > Unicode compiler.
Andreev, A., Y. Shardt, and N. Simmons (2011). Proposal to encode medieval East-Slavic music notation in Unicode. Ponomar Project.
Andreev, A., Y. Shardt, and N. Simmons (2012). Proposal to encode palæoslavic musical notations in Unicode. (forthcoming).
Cleminson, R. and M. Everson (2009). Proposal to encode two cyrillic characters in the BMP of the UCS. Working Group Document: ISO/IEC JTC1/SC2/WG2 N3563R.
Gamanovitch, A. (1991). Грамматика Церковно-славянского Языка. Moscow: Khudozhestvennaya Literatura Press.
Karsky, E. F. (1979). Славянская Кирилловская Палеография. Moscow: Nauka Press.
Kostić, Z. (2009). Explanation of the proposal for standardization of the old slavonic cyrillic script and its registration in unicode. In G. Jovanović, J. Grković-Major, Z. Kostić, and V. Savić (Eds.), Standardization of the Old Church Slavonic Cyrillic Script and Its Registration in Unicode. Serbian Academy of Sciences and Arts.
Kostić, Z., V. Savić, et al. (2009). Standard of the old slavonic cyrillic script. In G. Jovanović, J. Grković-Major, Z. Kostić, and V. Savić (Eds.), Standardization of the Old Church Slavonic Cyrillic Script and Its Registration in Unicode. Serbian Academy of Sciences and Arts.
Nemirovsky, E. ffi. (2003). История славянского кирилловского книгопечатания XV- начала XVII века. Moscow, Russia: Nauka.
Pentzlin, fl. (2011). Proposal to add variation sequences for latin and cyrillic letters. Document number: L2/10-280. Available online: http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic3.pdf.
Schepkin, V. N. (1999). Русская палеография. Moscow: Apekt Press.
Uspensky, B. A. (1987). История русского литературого языка (XI-XVII вв.). München: Verlag Otto Sagner.