</x-html>
<x-html>
</x-html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta name="premium" content="msdn">
<meta name="ms.locale" content="en-us">
<meta name="description" content>
<meta name="generator" content="microsoft frontpage 3.0">
<title>audio interchange file format</title>
</head>
<body bgcolor="#ffffff" link="#003399" vlink="#996699" background="../jpg/di1.JPG">
<h1>wave file format</h1>
<p>wave file format is a file format for storing digital audio (waveform) data. it
supports a variety of bit resolutions, sample rates, and channels of audio. this format is
very popular upon ibm pc (clone) platforms, and is widely used in professional programs
that process digital audio waveforms. it takes into account some pecularities of the intel
cpu such as little endian byte order.</p>
<p>this format uses microsoft's version of the electronic arts interchange file format
method for storing data in "chunks". </p>
<h3>data types</h3>
<p>a c-like language will be used to describe the data structures in the file. a few extra
data types that are not part of standard c, but which will be used in this document, are:</p>
<table border="0">
<tr>
<td><b>pstring</b></td>
<td>pascal-style string, a one-byte count followed by that many text bytes. the total
number of bytes in this data type should be even. a pad byte can be added to the end of
the text to accomplish this. this pad byte is not reflected in the count.</td>
</tr>
<tr>
<td><b>id</b></td>
<td>a chunk id (ie, 4 ascii bytes).</td>
</tr>
</table>
<p>also note that when you see an array with no size specification (e.g., char ckdata[];),
this indicates a variable-sized array in our c-like language. this differs from standard c
arrays.</p>
<h3>constants</h3>
<p>decimal values are referred to as a string of digits, for example 123, 0, 100 are all
decimal numbers. hexadecimal values are preceded by a 0x - e.g., 0x0a, 0x1, 0x64. </p>
<h3>data organization</h3>
<p>all data is stored in 8-bit bytes, arranged in intel 80x86 (ie, little endian) format.
the bytes of multiple-byte values are stored with the low-order (ie, least significant)
bytes first. data bits are as follows (ie, shown with bit numbers on top): </p>
<pre>
7 6 5 4 3 2 1 0
+-----------------------+
char: | lsb msb |
+-----------------------+
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
+-----------------------+-----------------------+
short: | lsb byte 0 | byte 1 msb |
+-----------------------+-----------------------+
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
+-----------------------+-----------------------+-----------------------+-----------------------+
long: | lsb byte 0 | byte 1 | byte 2 | byte 3 msb |
+-----------------------+-----------------------+-----------------------+-----------------------+
</pre>
<h3>file structure</h3>
<p>a wave file is a collection of a number of different types of chunks. there is a
required format ("fmt ") chunk which contains important parameters describing
the waveform, such as its sample rate. the data chunk, which contains the actual waveform
data, is also required. all other chunks are optional. among the other optional chunks are
ones which define cue points, list instrument parameters, store application-specific
information, etc. all of these chunks are described in detail in the following sections of
this document.</p>
<p>all applications that use wave must be able to read the 2 required chunks and can
choose to selectively ignore the optional chunks. a program that copies a wave should copy
all of the chunks in the wave, even those it chooses not to interpret.</p>
<p>there are no restrictions upon the order of the chunks within a wave file, with the
exception that the format chunk must precede the data chunk. some inflexibly written
programs expect the format chunk as the first chunk (after the riff header) although they
shouldn't because the specification doesn't require this.</p>
<p>here is a graphical overview of an example, minimal wave file. it consists of a single
wave containing the 2 required chunks, a format and a data chunk.</p>
<pre> __________________________
| riff wave chunk |
| groupid = 'riff' |
| rifftype = 'wave' |
| __________________ |
| | format chunk | |
| | ckid = 'fmt ' | |
| |__________________| |
| __________________ |
| | sound data chunk | |
| | ckid = 'data' | |
| |__________________| |
|__________________________|
</pre>
<h4>a bastardized standard</h4>
<p>the wave format is sort of a bastardized standard that was concocted by too many
"cooks" who didn't properly coordinate the addition of "ingredients"
to the "soup". unlike with the aiff standard which was mostly designed by a
small, coordinated group, the wave format has had all manner of much-too-independent,
uncoordinated aberrations inflicted upon it. the net result is that there are far too many
chunks that may be found in a wave file -- many of them duplicating the same information
found in other chunks (but in an unnecessarily different way) simply because there have
been too many programmers who took too many liberties with unilaterally adding their own
additions to the wave format without properly coming to a concensus of what everyone else
needed (and therefore it encouraged an "every man for himself" attitude toward
adding things to this "standard"). one example is the instrument chunk versus
the sampler chunk. another example is the note versus label chunks in an associated data
list. i don't even want to get into the totally irresponsible proliferation of compressed
formats. (ie, it seems like everyone and his pet dachshound has come up with some
compressed version of storing wave data -- like we need 100 different ways to do that).
furthermore, there are lots of inconsistencies, for example how 8-bit data is unsigned,
but 16-bit data is signed. </p>
<p>i've attempted to document only those aspects that you're very likely to encounter in a
wave file. i suggest that you concentrate upon these and refuse to support the work of
programmers who feel the need to deviate from a standard with inconsistent, proprietary,
self-serving, unnecessary extensions. please do your part to rein in half-ass programming.</p>
<h3>sample points and sample frames</h3>
<p>a large part of interpreting wave files revolves around the two concepts of sample
points and sample frames. </p>
<p>a sample point is a value representing a sample of a sound at a given moment in time.
for waveforms with greater than 8-bit resolution, each sample point is stored as a linear,
2's-complement value which may be from 9 to 32 bits wide (as determined by the
wbitspersample field in the format chunk, assuming pcm format -- an uncompressed format).
for example, each sample point of a 16-bit waveform would be a 16-bit word (ie, two 8-bit
bytes) where 32767 (0x7fff) is the highest value and -32768 (0x8000) is the lowest value.
for 8-bit (or less) waveforms, each sample point is a linear, unsigned byte where 255 is
the highest value and 0 is the lowest value. obviously, this signed/unsigned sample point
discrepancy between 8-bit and larger resolution waveforms was one of those
"oops" scenarios where some microsoft employee decided to change the sign
sometime after 8-bit wave files were common but 16-bit wave files hadn't yet appeared.</p>
<p>because most cpu's read and write operations deal with 8-bit bytes, it was decided that
a sample point should