Understanding The
Linux Virtual Memory Manager
Mel Gorman
July 9, 2007
Preface
Linux is developed with a stronger practical emphasis than a theoretical one. When
new algorithms or changes to existing implementations are suggested, it is common
to request code to match the argument. Many of the algorithms used in the Virtual
Memory (VM) system were designed by theorists but the implementations have now
diverged from the theory considerably. In part, Linux does follow the traditional
development cycle of design to implementation but it is more common for changes
to be made in reaction to how the system behaved in the real-world and intuitive
decisions by developers.
This means that the VM performs well in practice but there is very little VM
specic documentation available except for a few incomplete overviews in a small
number of websites, except the web site containing an earlier draft of this book
of course! This has lead to the situation where the VM is fully understood only
by a small number of core developers. New developers looking for information on
how it functions are generally told to read the source and little or no information is
available on the theoretical basis for the implementation. This requires that even a
casual observer invest a large amount of time to read the code and study the eld
of Memory Management.
This book, gives a detailed tour of the Linux VM as implemented in
2.4.22
and gives a solid introduction of what to expect in 2.6. As well as discussing the
implementation, the theory it is is based on will also be introduced. This is not
intended to be a memory management theory book but it is often much simpler to
understand why the VM is implemented in a particular fashion if the underlying
basis is known in advance.
To complement the description, the appendix includes a detailed code commen-
tary on a signicant percentage of the VM. This should drastically reduce the amount
of time a developer or researcher needs to invest in understanding what is happening
inside the Linux VM. As VM implementations tend to follow similar code patterns
even between major versions. This means that with a solid understanding of the 2.4
VM, the later 2.5 development VMs and the nal 2.6 release will be decipherable in
a number of weeks.
The Intended Audience
Anyone interested in how the VM, a core kernel subsystem, works will nd answers
to many of their questions in this book. The VM, more than any other subsystem,
i
Preface
ii
aects the overall performance of the operating system. It is also one of the most
poorly understood and badly documented subsystem in Linux, partially because
there is, quite literally, so much of it. It is very dicult to isolate and understand
individual parts of the code without rst having a strong conceptual model of the
whole VM, so this book intends to give a detailed description of what to expect
without before going to the source.
This material should be of prime interest to new developers interested in adapting
the VM to their needs and to readers who simply would like to know how the VM
works. It also will benet other subsystem developers who want to get the most from
the VM when they interact with it and operating systems researchers looking for
details on how memory management is implemented in a modern operating system.
For others, who are just curious to learn more about a subsystem that is the focus of
so much discussion, they will nd an easy to read description of the VM functionality
that covers all the details without the need to plough through source code.
However, it is assumed that the reader has read at least one general operating
system book or one general Linux kernel orientated book and has a general knowl-
edge of C before tackling this book. While every eort is made to make the material
approachable, some prior knowledge of general operating systems is assumed.
Book Overview
In chapter 1, we go into detail on how the source code may be managed and deci-
phered. Three tools will be introduced that are used for the analysis, easy browsing
and management of code. The main tools are the
Linux Cross Referencing (LXR)
tool which allows source code to be browsed as a web page and
CodeViz
for gener-
ating call graphs which was developed while researching this book. The last tool,
PatchSet
is for managing kernels and the application of patches. Applying patches
manually can be time consuming and the use of version control software such as
CVS (
http://www.cvshome.org/
) or BitKeeper (
http://www.bitmover.com
) are not
always an option. With this tool, a simple specication le determines what source
to use, what patches to apply and what kernel conguration to use.
In the subsequent chapters, each part of the Linux VM implementation will be
discussed in detail, such as how memory is described in an architecture independent
manner, how processes manage their memory, how the specic allocators work and
so on. Each will refer to the papers that describe closest the behaviour of Linux
as well as covering in depth the implementation, the functions used and their call
graphs so the reader will have a clear view of how the code is structured. At the
end of each chapter, there will be a What's New section which introduces what to
expect in the 2.6 VM.
The appendices are a code commentary of a signicant percentage of the VM. It
gives a line by line description of some of the more complex aspects of the VM. The
style of the VM tends to be reasonably consistent, even between major releases of
the kernel so an in-depth understanding of the 2.4 VM will be an invaluable aid to
understanding the 2.6 kernel when it is released.
Preface
iii
What's New in 2.6
At the time of writing,
2.6.0-test4
has just been released so
2.6.0-final
is due
any month now which means December 2003 or early 2004. Fortunately the 2.6
VM, in most ways, is still quite recognisable in comparison to 2.4. However, there
is some new material and concepts in 2.6 and it would be pity to ignore them so
to address this, hence the What's New in 2.6 sections. To some extent, these
sections presume you have read the rest of the book so only glance at them during
the rst reading. If you decide to start reading 2.5 and 2.6 VM code, the basic
description of what to expect from the Whats New sections should greatly aid
your understanding. It is important to note that the sections are based on the
2.6.0-test4
kernel which should not change change signicantly before 2.6. As
they are still subject to change though, you should still treat the What's New
sections as guidelines rather than denite facts.
Companion CD
A companion CD is included with this book which is intended to be used on systems
with GNU/Linux installed. Mount the CD on
/cdrom
as followed;
root@joshua:/$ mount /dev/cdrom /cdrom -o exec
A copy of
Apache 1.3.27
(
http://www.apache.org/
) has been built and cong-
ured to run but it requires the CD be mounted on
/cdrom/
. To start it, run the
script
/cdrom/start_server
. If there are no errors, the output should look like:
mel@joshua:~$ /cdrom/start_server
Starting CodeViz Server: done
Starting Apache Server: done
The URL to access is http://localhost:10080/
If the server starts successfully, point your browser to
http://localhost:10080
to
avail of the CDs web services. Some features included with the CD are:
•
A web server started is available which is started by
/cdrom/start_server
.
After starting it, the URL to access is
http://localhost:10080
. It has been
tested with Red Hat 7.3 and Debian Woody;
•
The whole book is included in HTML, PDF and plain text formats from
/cdrom/docs
. It includes a searchable index for functions that have a commen-
tary available. If a function is searched for that does not have a commentary,
the browser will be automatically redirected to LXR;
•
A web browsable copy of the Linux
2.4.22
source is available courtesy of LXR
Preface
iv
•
Generate call graphs with an online version of the
CodeViz
tool.
•
The
VM Regress
,
CodeViz
and
patchset
packages which are discussed in
Chapter 1 are available in
/cdrom/software
.
gcc-3.0.4
is also provided as it
is required for building
CodeViz
.
To shutdown the server, run the script
/cdrom/stop_server
and the CD may
then be unmounted.
Typographic Conventions
The conventions used in this document are simple. New concepts that are introduced
as well as URLs are in
italicised
font. Binaries and package names are are in
bold
.
Structures, eld names, compile time denes and variables are in a
constant-width
font. At times when talking about a eld in a structure, both the structure and eld
name will be included like
page
→
list
for example. Filenames are in a constant-
width font but include les have angle brackets around them like
<
linux/mm.h
>
and may be found in the
include/
directory of the kernel source.
Acknowledgments
The compilation of this book was not a trivial task. This book was researched and
developed in the open and it would be remiss of me not to mention some of the
people who helped me at various intervals. If there is anyone I missed, I apologise
now.
First, I would like to thank John O'Gorman who tragically passed away while
the material for this book was being researched. It was his experience and guidance
that largely inspired the format and quality of this book.
Secondly, I would like to thank Mark L. Taub from Prentice Hall PTR for giving
me the opportunity to publish this book. It has being a rewarding experience and it
made trawling through all the code worthwhile. Massive thanks go to my reviewers
who provided clear and detailed feedback long after I thought I had nished writing.
Finally, on the publishers front, I would like to thank Bruce Perens for allowing me to
publish under the Bruce Peren's Open Book Series (
http://www.perens.com/Books
).
With the technical research, a number of people provided invaluable insight.
Abhishek Nayani, was a source of encouragement and enthusiasm early in the re-
search. Ingo Oeser kindly provided invaluable assistance early on with a detailed
explanation on how data is copied from userspace to kernel space including some
valuable historical context. He also kindly oered to help me if I felt I ever got
lost in the twisty maze of kernel code. Scott Kaplan made numerous corrections to
a number of systems from non-contiguous memory allocation, to page replacement
policy. Jonathon Corbet provided the most detailed account of the history of the
kernel development with the kernel page he writes for Linux Weekly News. Zack
Brown, the chief behind Kernel Trac, is the sole reason I did not drown in kernel
评论0