##Adobe File Version: 1.000
#=======================================================================
# FTP file name: README.TXT
#
# Contents: Background information on Unicode mapping tables
# for Mac OS text encodings
#
# Copyright: (c) 1995-1999 by Apple Computer, Inc., all rights
# reserved.
#
# Contact: charsets@apple.com
#
# Changes:
#
# b02 1999-Sep-22 Update information on Cyrillic. Update
# contact e-mail address.
# n07 1998-Feb-05 Rewrite to provide additional information
# relevant to using the accompanying mapping
# tables, and to delete some extraneous
# information. Delete Bulgarian (no special
# encoding, uses standard Cyrillic), add
# Farsi, Devanagari, Gurmukhi, Gujarati,
# Celtic, Gaelic, Inuit, Tibetan.
# n04 1995-Nov-15 Update info for Hebrew and Thai
# n03 1995-Apr-15 First version (after fixing some typos).
#
##################
0. Preliminaries
----------------
For maximum interchangeability, this file and the accompanying Mac OS
mapping tables use only ASCII characters. They are intended to be
displayed in a monospaced font.
Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple
Computer, Inc., registered in the United States and other countries.
QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is
a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems
Inc., which may be registered in certain jurisdictions. IBM is a
registered trademark of International Business Machines Corporation. ITC
Zapf Dingbats is a registered trademark of the International Typeface
Corporation. For the sake of brevity, throughout this document and the
accompanying tables, "Macintosh" can be used to refer to Macintosh
computers and "Unicode" can be used to refer to the Unicode standard.
Apple Computer, Inc. ("Apple") makes no warranty or representation,
either express or implied, with respect to this document and the
accompanying tables, their quality, accuracy, or fitness for a
particular purpose. In no event will Apple be liable for direct,
indirect, special, incidental, or consequential damages resulting from
any defect or inaccuracy in this document or the accompanying tables.
1. Introduction
---------------
This document summarizes some Unicode mapping considerations that are
relevant for the accompanying mapping tables. It also provides an
overview of Mac OS encodings.
These mapping tables and character lists are subject to change.
The latest tables should be available from the following:
<ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
<ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/>
2. Round-trip fidelity and overview of mapping techniques
---------------------------------------------------------
For a particular set of national and international standards, Unicode
provides round-trip fidelity: Text in one of those encodings can be
mapped to Unicode and back again, yielding the original characters.
Characters which are distinct in one of these source standards have
a distinct counterpart in Unicode. Note that this counterpart might not
be a single Unicode character; as is pointed out in "The Unicode
Standard, Version 2.0" (page 2-10), "sometimes a single code value in
another standard corresponds to a sequence of code values in the Unicode
Standard, or vice versa."
However, Unicode does not attempt to provide round-trip fidelity for
most vendor standards. Nevertheless, Apple and other platform vendors
may need to provide such round-trip fidelity for their current encodings
(this can be important in file systems, for example). In order to do
this, Apple makes use of some Unicode characters in the corporate-use
zone (the upper end of the private use area).
Corporate-zone characters must be used with care. Indiscriminate use of
such characters can result in text which is not easily interchanged with
other systems, since these characters have no standard meaning outside a
particular platform. The mappings provided here are intended to minimize
the use of private use characters, or to use them in such a way that
basic text content will not be lost if the corporate zone characters are
dropped when text is transferred to another system.
The tables provided here have three goals, in the following order of
importance:
1. Provide 100% round-trip mapping from a Mac OS encoding to Unicode
and back (even if the mappings here are converted to maximal
decompositions, see below).
2. Map characters in a Mac OS encoding into the Unicode characters
that best represent the interpretation and usage of the Mac OS
characters.
3. When mapping text in a Mac OS encoding to Unicode using the tables,
the resulting Unicode text should be as interchangeable as possible.
To satisfy these goals, the mappings use a variety of techniques. First
we attempt to achieve round-trip mappings using any standard Unicode
feature at our disposal, without resorting to corporate-zone characters.
This can includes the following techniques:
- Use of all Unicode characters defined in Unicode 2.1, including
compatibility characters.
- Mapping a single character in a Mac OS encoding to a sequence of
standard Unicode characters, or vice versa. This requires grouping
characters into appropriate chunks for lookup before mapping them
(this mainly applies to sequences of Unicode characters).
- Using Unicode direction overrides to force direction attributes when
mapping to Unicode. This requires resolution of Unicode character
direction, and use of this information, when mapping from Unicode back
to certain Mac OS encodings.
The requirements imposed on Unicode handling are necessary for other,
non-transcoding operations in a full Unicode implementation anyway, so
requiring them for transcoding should not impose much of a burden.
Next, if round-trip fidelity cannot be achieved using the above
techniques, we attempt to use corporate-zone characters only as
"transcoding hints" (more on this below). These are combined with one or
more standard Unicode characters to mark them as special for
transcoding, but have no other function and can be deleted with no loss
of basic text content (only of round-trip fidelity).
Finally, if a character in a Mac OS encoding is unrelated to any Unicode
or Unicode sequence, we may map it to a single corporate-zone Unicode
code point.
These techniques are described in more detail in the following sections.
Some clients of these tables may have a different set of goals. For
example, some clients may prefer to avoid compatibility characters,
perhaps sacrificing round-trip fidelity if necessary. In most cases it
is fairly easy to construct other types of mappings from the mappings
given here. In particular, the mappings here have been designed so that
if they are converted to maximal decomposition mappings (by recursive
application of the canonical decompositions in the Unicode database),
the resulting mappings will still provide 100% roundtrip fidelity.
There is one more round-trip issue that should be mentioned. If a
Unicode character or sequence can be mapped at all into a particular
Mac encoding, then the reverse mapping back to Unicode should yield
the original Unicode character or sequence (except for possible
differences in direction overrides or other Unicode characters in the
"Other, Format" category). The tables here also provide this. For a
related issue, see the next section.
3. Mapping tolerance: Strict and loose
--------------------------------------
In many character sets, a single character may have