The PSF file-format for big charset
(C) 2001 Hu Yong <ccpaging@online.sh.cn>
This file documents the PSF file-format for big charset(called as BPSF as
below), as understood by version 0.15 and above of zhcon, and zhcon is orient
language envirment for linux console. This file is based on PSF format used in
Linux console utilites by Yann Dirson <dirson@debian.org>
This file has revision number 0.1, and is dated 2001/08/08.
Any useful additional information on BPSF files would be great.
0. Changes:
1. Summary
BPSF stands for PC Screen Font in orient language environment, and bases on
PSF file format used in lct. We need understand PSF at first, and why we
extend it?
The PSF file basically contains one character-font, whose width is 8
pixels, i.e. each scan line in a character occupies 1 byte.
It may contain characters of any height between 0 and 255, though character
heights lower than 8 or greater than 32 are not attested to exist or even be
useful [more info needed on this].
Fonts can contain either 256 or 512 characters.
The file can optionally contain a unicode mapping-table, telling, for each
character in the font, which UCS2 characters it can be used to display.
The "file mode" byte controls font size (256/512) and whether file contains
a unicode mapping table.
When we developing zhcon, we have two main problems.
The first is font width, we need to get more choice to display beautiful
and more chars on console, the width may be any between 12 to 24, even 48.
The second is chars count in font. Chinese charset is more then 8,000 chars,
PSF can only hold 512 characters.
According Yann Dirson's document, XPSF(the extend of PSF) can hold on big
charset. We may change BPSF to XPSF in the future, but now we can not wait.
You can not expect BPSF solve all problem for the console font, and it works
for big charset only.
BPSF extend "file mode" byte. BPSF is same as PSF when "file mode" between
0 to 3, and follow only one byte, char height. If "file mode" is 4 or 5, there
is six bytes followed, include char width, char height, and number of chars.
2. History
The PSF file format was designed by H. Peter Anvin <hpa@transmeta.com> in
1989 or so for his DOS screen font editor, FONTEDIT.EXE. When he became
involved with Linux, he used it for the Linux font stuff he worked with,
released a binary of FONTEDIT.EXE for free distribution, and added the Unicode
table to the spec.
3. Known programs understanding this file-format.
Only zhcon can read and/or write BPSF files.
The program in the Linux console utilities, like:
setfont (R/W)
psfaddtable (R/W)
psfstriptable (R/W)
psfgettable (R)
can read and/or write PSF files. Those program would show error message
when involved in BPSF font file.
4. Technical data
The file format is described here in sort-of EBNF notation. Upper-case
WORDS represent terminal symbols, i.e. C types; lower-case words represent
non-terminal symbols, i.e. symbols defined in terms of other symbols.
[sym] is an optional symbol
sym1 sym2 is sym1 followed by sym2
sym1 | sym2 is either sym1 or sym2
{sym} is a symbol that can be repeated 0 or more times
{sym}*N is a symbol that must be repeated N times
Comments are introduced with a # sign.
# The data (U_SHORT's) are stored in LITTLE_ENDIAN byte order.
psf_file = psf_header
raw_fontdata
[unicode_data]
psf_header = CHAR = 0x36 CHAR = 0x04 # magic number
filemode
font info
filemode = CHAR # 0 : 256 characters, no unicode_data
# 1 : 512 characters, no unicode_data
# 2 : 256 characters, with unicode_data
# 3 : 512 characters, with unicode_data
# 4 : big charset, no unicode_data
# 5 : big charset, with unicode_data
# file mode 4 or 5 is the extend of PSF
font info :
fontheight # filemode is between 0 to 3
| big_charset_info # filemode is between 4 to 5
fontheight = CHAR # measured in scan lines
big_charset_info :
charwidth
charheight
fontsize
charwidth :
u8 # char width in bits
charheight :
u8 # char height in bits
fontsize :
u32 # number of chars in font
raw_fontdata = {char_data}*<fontsize>
char_data = {row_data}*<charheight>
row_data = scan line bits expand to byte, i.e., 1 byte when charwidth is
less then 8, 2 bytes when charwidth between 9 to 16, and so on. The expand bit
is 0. This may cost more memory, but it really efficent when display.
# unicode_data is just behind raw_fontdata
unicode_data = { unicode_array psf_separator }*<fontsize>
unicode_array = { unicode } # any necessary number of times
unicode = U_SHORT # UCS2 code
psf_separator = unicode = 0xFFFF