method for virtualization. Although Popek and Goldberg did not
rule out use of other techniques, some confusion has resulted over
the years from informally equating “virtualizability” with the abil-
ity to use trap-and-emulate.
To side-step this confusion we shall use the term classically vir-
tualizable to describe an architecture that can be virtualized purely
with trap-and-emulate. In this sense, x86 is not classically virtualiz-
able, but it is virtualizable by Popek and Goldberg’s criteria, using
the techniques described in Section 3.
In this section, we review the most important ideas from classi-
cal VMM implementations: de-privileging, shadow structures and
traces. Readers who are already familiar with these concepts may
wish to skip forward to Section 3.
2.1 De-privileging
In a classically virtualizable architecture, all instructions that read
or write privileged state can be made to trap when executed in an
unprivileged context. Sometimes the traps result from the instruc-
tion type itself (e.g., an out instruction), and sometimes the traps
result from the VMM protecting structures that the instructions ac-
cess (e.g., the address range of a memory-mapped I/O device).
A classical VMM executes guest operating systems directly, but
at a reduced privilege level. The VMM intercepts traps from the
de-privileged guest, and emulates the trapping instruction against
the virtual machine state. This technique has been extensively de-
scribed in the literature (e.g., [10, 22, 23]), and it is easily verified
that the resulting VMM meets the Popek and Goldberg criteria.
2.2 Primary and shadow structures
By definition, the privileged state of a virtual system differs from
that of the underlying hardware. The VMM’s basic function is to
provide an execution environment that meets the guest’s expecta-
tions in spite of this difference.
To accomplish this, the VMM derives shadow structures from
guest-level primary structures. On-CPU privileged state, such as
the page table pointer register or processor status register, is han-
dled trivially: the VMM maintains an image of the guest register,
and refers to that image in instruction emulation as guest operations
trap.
However, off-CPU privileged data, such as page tables, may re-
side in memory. In this case, guest accesses to the privileged state
may not naturally coincide with trapping instructions. For exam-
ple, guest page table entries (PTEs) are privileged state due to their
encoding of mappings and permissions. Dependencies on this priv-
ileged state are not accompanied by traps: every guest virtual mem-
ory reference depends on the permissions and mappings encoded in
the corresponding PTE.
Such in-memory privileged state can be modified by any store
in the guest instruction stream, or even implicitly modified as a
side effect of a DMA I/O operation. Memory-mapped I/O devices
present a similar difficulty: reads and writes to this privileged data
can originate from almost any memory operation in the guest in-
struction stream.
2.3 Memory traces
To maintain coherency of shadow structures, VMMs typically
use hardware page protection mechanisms to trap accesses to in-
memory primary structures. For example, guest PTEs for which
shadow PTEs have been constructed may be write-protected.
Memory-mapped devices must generally be protected for both
reading and writing. This page-protection technique is known as
tracing. Classical VMMs handle a trace fault similarly to a privi-
leged instruction fault: by decoding the faulting guest instruction,
emulating its effect in the primary structure, and propagating the
change to the shadow structure.
2.4 Tracing example: x86 page tables
To protect the host from guest memory accesses, VMMs typically
construct shadow page tables in which to run the guest. x86 speci-
fies hierarchical hardware-walked page tables having 2, 3 or 4 lev-
els. The hardware page table pointer is control register %cr3.
VMware Workstation’s VMM manages its shadow page tables
as a cache of the guest page tables. As the guest accesses previously
untouched regions of its virtual address space, hardware page faults
vector control to the VMM. The VMM distinguishes true page
faults, caused by violations of the protection policy encoded in
the guest PTEs, from hidden page faults, caused by misses in the
shadow page table. True faults are forwarded to the guest; hidden
faults cause the VMM to construct an appropriate shadow PTE,
and resume guest execution. The fault is “hidden” because it has
no guest-visible effect.
The VMM uses traces to prevent its shadow PTEs from becom-
ing incoherent with the guest PTEs. The resulting trace faults can
themselves be a source of overhead, and other coherency mecha-
nisms are possible. At the other extreme, avoiding all use of traces
causes either a large number of hidden faults or an expensive con-
text switch to prevalidate shadow page tables for the new context.
In our experience, striking a favorable balance in this three-way
trade-off among trace costs, hidden page faults and context switch
costs is surprising both in its difficulty and its criticality to VMM
performance. Tools that make this trade-off more forgiving are rare
and precious.
2.5 Refinements to classical virtualization
The type of workload significantly impacts the performance of the
classical virtualization approach [20]. During the first virtual ma-
chine boom, it was common for the VMM, the hardware, and all
guest operating systems to be produced by a single company. These
vertically integrated companies enabled researchers and practi-
tioners to refine classical virtualization using two orthogonal ap-
proaches.
One approach exploited flexibility in the VMM/guest OS in-
terface. Implementors taking this approach modified guest operat-
ing systems to provide higher-level information to the VMM [13].
This approach relaxes Popek and Goldberg’s fidelity requirement
to provide gains in performance, and optionally to provide features
beyond the bare baseline definition of virtualization, such as con-
trolled VM-to-VM communication.
The other approach for refining classical VMMs exploited flex-
ibility in the hardware/VMM interface. IBM’s System 370 archi-
tecture introduced interpretive execution [17], a hardware execu-
tion mode for running guest operating systems. The VMM encodes
much of the guest privileged state in a hardware-defined format,
then executes the SIE instruction to “start interpretive execution.”
Many guest operations which would trap in a de-privileged environ-
ment directly access shadow fields in interpretive execution. While
the VMM must still handle some traps, SIE was successful in re-
ducing the frequency of traps relative to an unassisted trap-and-
emulate VMM.
Both of these approaches have intellectual heirs in the present
virtualization boom. The attempt to exploit flexibility in the OS/VMM
layer has been revived under the name paravirtualization [25].
Meanwhile, x86 vendors are introducing hardware facilities in-
spired by interpretive execution; see Section 4.
3. Software virtualization
We review basic obstacles to classical virtualization of the x86
architecture, explain how binary translation (BT) overcomes the
obstacles, and show that adaptive BT improves efficiency.