Preface
This book provides a working guide to the Open Source Computer Vision Library (OpenCV) and also
provides a general background to the field of computer vision sufficient to use OpenCV effectively.
Purpose
Computer vision is a rapidly growing field, partly as a result of both cheaper and more capable cameras,
partly because of affordable processing power, and partly because vision algorithms are starting to mature.
OpenCV itself has played a role in the growth of computer vision by enabling thousands of people to do
more productive work in vision. With its focus on real-time vision, OpenCV helps students and
professionals efficiently implement projects and jump-start research by providing them with a computer
vision and machine learning infrastructure that was previously available only in a few mature research labs.
The purpose of this text is to:
• Better document OpenCV—detail what function calling conventions really mean and how to use them
correctly.
• Rapidly give the reader an intuitive understanding of how the vision algorithms work.
• Give the reader some sense of what algorithm to use and when to use it.
• Give the reader a boost in implementing computer vision and machine learning algorithms by
providing many working coded examples to start from.
• Provide intuitions about how to fix some of the more advanced routines when something goes wrong.
Simply put, this is the text the authors wished we had in school and the coding reference book we wished
we had at work.
This book documents a tool kit, OpenCV, that allows the reader to do interesting and fun things rapidly in
computer vision. It gives an intuitive understanding as to how the algorithms work, which serves to guide
the reader in designing and debugging vision applications and also to make the formal descriptions of
computer vision and machine learning algorithms in other texts easier to comprehend and remember.
After all, it is easier to understand complex algorithms and their associated math when you start with an
intuitive grasp of how those algorithms work.
Who This Book Is For
This book contains descriptions, working coded examples, and explanations of the computer vision tools
contained in the OpenCV library. As such, it should be helpful to many different kinds of users.
Professionals
For those practicing professionals who need to rapidly implement computer vision systems, the sample
code provides a quick framework with which to start. Our descriptions of the intuitions behind the
algorithms can quickly teach or remind the reader how they work.
Students
As we said, this is the text we wish had back in school. The intuitive explanations, detailed
documentation, and sample code will allow you to boot up faster in computer vision, work on more
interesting class projects, and ultimately contribute new research to the field.
Teachers
Computer vision is a fast-moving field. We’ve found it effective to have the students rapidly cover an
accessible text while the instructor fills in formal exposition where needed and supplements with
current papers or guest lectures from experts. The students can meanwhile start class projects earlier
and attempt more ambitious tasks.
Hobbyists
Computer vision is fun, here’s how to hack it.
We have a strong focus on giving readers enough intuition, documentation, and working code to enable
rapid implementation of real-time vision applications.
What This Book Is Not
This book is not a formal text. We do go into mathematical detail at various points,
1
but it is all in the
service of developing deeper intuitions behind the algorithms or to clarify the implications of any
assumptions built into those algorithms. We have not attempted a formal mathematical exposition here and
might even incur some wrath along the way from those who do write formal expositions.
This book is not for theoreticians because it has more of an “applied” nature. The book will certainly be of
general help, but is not aimed at any of the specialized niches in computer vision (e.g., medical imaging or
remote sensing analysis).
That said, it is the belief of the authors that having read the explanations here first, a student will not only
learn the theory better but remember it longer. Therefore, this book would make a good adjunct text to a
theoretical course and would be a great text for an introductory or project-centric course.
About the Programs in This Book
All the program examples in this book are based on OpenCV version 2.5. The code should definitely work
under Linux or Windows and probably under OS-X, too. Source code for the examples in the book can be
fetched from this book’s website (http://www.oreilly.com/catalog/9780596516130). OpenCV can be loaded
from its source forge site (http://sourceforge.net/projects/opencvlibrary).
OpenCV is under ongoing development, with official releases occurring once or twice a year. To keep up to
date with the developments of the library, and for pointers to where to get the very latest updates and
versions, you can visit OpenCV.org, the library’s official website.
Prerequisites
For the most part, readers need only know how to program in C and perhaps some C++. Many of the math
sections are optional and are labeled as such. The mathematics involves simple algebra and basic matrix
1
Always with a warning to more casual users that they may skip such sections.
algebra, and it assumes some familiarity with solution methods to least-squares optimization problems as
well as some basic knowledge of Gaussian distributions, Bayes’ law, and derivatives of simple functions.
The math is in support of developing intuition for the algorithms. The reader may skip the math and the
algorithm descriptions, using only the function definitions and code examples to get vision applications up
and running.
How This Book Is Best Used
This text need not be read in order. It can serve as a kind of user manual: look up the function when you
need it; read the function’s description if you want the gist of how it works “under the hood”. The intent of
this book is more tutorial, however. It gives you a basic understanding of computer vision along with
details of how and when to use selected algorithms.
This book was written to allow its use as an adjunct or as a primary textbook for an undergraduate or
graduate course in computer vision. The basic strategy with this method is for students to read the book for
a rapid overview and then supplement that reading with more formal sections in other textbooks and with
papers in the field. There are exercises at the end of each chapter to help test the student’s knowledge and
to develop further intuitions.
You could approach this text in any of the following ways.
Grab Bag
Go through Chapter 1–Chapter 3 in the first sitting, then just hit the appropriate chapters or sections as
you need them. This book does not have to be read in sequence, except for Chapter 11 and Chapter 12
(Calibration and Stereo).
Good Progress
Read just two chapters a week until you’ve covered Chapter 1–Chapter 12 in six weeks (Chapter 13 is
a special case, as discussed shortly). Start on projects and dive into details on selected areas in the
field, using additional texts and papers as appropriate.
The Sprint
Just cruise through the book as fast as your comprehension allows, covering Chapter 1–Chapter 12.
Then get started on projects and go into details on selected areas in the field using additional texts and
papers. This is probably the choice for professionals, but it might also suit a more advanced computer
vision course.
Chapter 13 is a long chapter that gives a general background to machine learning in addition to details
behind the machine learning algorithms implemented in OpenCV and how to use them. Of course, machine
learning is integral to object recognition and a big part of computer vision, but it’s a field worthy of its own
book. Professionals should find this text a suitable launching point for further explorations of the
literature—or for just getting down to business with the code in that part of the library. This chapter should
probably be considered optional for a typical computer vision class.
This is how the authors like to teach computer vision: Sprint through the course content at a level where the
students get the gist of how things work; then get students started on meaningful class projects while the
instructor supplies depth and formal rigor in selected areas by drawing from other texts or papers in the
field. This same method works for quarter, semester, or two-term classes. Students can get quickly up and
running with a general understanding of their vision task and working code to match. As they begin more
challenging and time-consuming projects, the instructor helps them develop and debug complex systems.
For longer courses, the projects themselves can become instructional in terms of project management.
Build up working systems first; refine them with more knowledge, detail, and research later. The goal in
such courses is for each project to aim at being worthy of a conference publication and with a few project
papers being published subsequent to further (postcourse) work.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, file extensions, path names, directories, and
Unix utilities.
Constant width
Indicates commands, options, switches, variables, attributes, keys, functions, types, classes,
namespaces, methods, modules, properties, parameters, values, objects, events, event handlers,
XMLtags, HTMLtags, the contents of files, or the output from commands.
Constant width bold
Shows commands or other text that could be typed literally by the user. Also used for emphasis in code
samples.
Constant width italic
Shows text that should be replaced with user-supplied values.
[…]
Indicates a reference to the bibliography. The standard bibliographic form we adopt in this book is the
use of the last name of the first author of a paper, followed by a two digit representation of the year of
publication. Thus the paper “Self-supervised monocular road detection in desert terrain,” authored by
“H. Dahlkamp, A. Kaehler, D. Stavens, S. Thrun, and G. Bradski” in 2006, would be cited as:
“[Dahlkamp06]”.
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.
Using Code Examples
OpenCV is free for commercial or research use, and we have the same policy on the code examples in the
book. Use them at will for homework, for research, or for commercial products. We would very much
appreciate referencing this book when you do, but it is not required. Other than how it helped with your
homework projects (which is best kept a secret), we would like to hear how you are using computer vision
for academic research, teaching courses, and in commercial products when you do use OpenCV to help
you. Again, not required, but you are always invited to drop us a line.
Safari® Books Online
When you see a Safari® Books Online icon on the cover of your favorite technology book, that means the
book is available online through the O’Reilly Network Safari Bookshelf.
Safari offers a solution that’s better than e-books. It’s virtual library that lets you easily search thousands of
top tech books, cut and paste code samples, download chapters, and find quick answers when you need the
most accurate, current information. Try it for free at http://safari.oreilly.com.
We’d Like to Hear from You
Please address comments and questions concerning this book to the publisher: