![](https://csdnimg.cn/release/download_crawler_static/11009525/bg4.jpg)
Requirements
4. External Standards
4.1 PRC National Standard GB 2312-80
GB 2312-80, PRC National Standard, Code of Chinese Graphics Characters Set for
Information Interchange, Primary Set, published in March 1981, specifies a primary set of
graphics characters with their binary-coded representation for Chinese information
interchange. It applies to Chinese information-processing systems, communication
systems and so on. It covers 682 non-Chinese characters and 6,763 Chinese characters,
7,445 graphic characters in total.
The non-Chinese characters include general characters, ordinal numbers, numerical
characters, Latin alphabet, Japanese Kana, Greek alphabet, Russian alphabet, Chinese
phonetic symbols and Chinese phonetic-annotated letters.
The Chinese Characters are divided into 2 levels, 3,755 of them are included in Level 1
and 3,008 in Level 2, 6,763 Chinese characters in total.
4.1 Relationship of GB 18030-2000 with GB 2312-80 and GBK
GB 18030-2000 is a superset of GB 2312-80 and GBK. Those characters that defined in
GB and GBK have exactly same code assignment in GB 18030-2000.
4.1 Relationship of GB 18030-2000 with Unicode/ISO 10646.1-1993
The Unicode (idt PRC standard GB 13000.1-1993) is international standard for the
universal character encoding scheme for written characters and text. It defines a
consistent way of encoding multilanguagal text that enables the exchange of text data
internationally and creates the foundation for global software. The Unicode standard is a
superset of all characters in widespread use today. It contains the characters from major
international and national standards as well as prominent industry character sets.
GB 18030-2000 contains all characters defined in Unicode, but they have totally different
code assignment. Currently, GB 18030-2000 contains more than 27,000 Chinese
characters which have been defined in the latest version of Unicode 3.0. In the future,
more Chinese characters can be extended in GB 18030-2000.
5. Specification of Character Repertoire
Collected in this standard are coded one-byte, two-byte and four-byte characters.
5.1 The One-Byte Portion
In this standard, collected in the one-byte portion are all the 128 characters from 0x00 to
0x7F set in GB 11383, and the one-byte coded Euro symbol.
5.2 The Two-Byte Portion
In this standard, collected in the two-byte portion are as follows:
All the unified Chinese characters in the CJK (Chinese, Japanese and Korean) , set
in GB 13000.1,
21 Chinese characters selected from the CJK compatible zone, set in GB 13000.1.
139 ideogram characters used in Taiwan region not collected in GB 2312 but
collected in GB 13000.1,
31 other characters collected in GB 13000.1,
Non-Chinese symbols collected in GB 2312,
19 punctuation marks used in vertical alignment, set in GB 12345,
10 lower case Roman numbers not collected in GB 2312.
5 Chinese phonetic letters with tone, not collected in GB 2312 and and ,