870
IEEE
TRANFACTIONS ON SOFTWARF FNGINFFRING.
VOL.
16
NO
X.
AUGLlSr
IYYO
Automated Software Test
Data
Generation
BOGDAN KOREL,
MEMBER, lEEE
Abstract-Test data generation
in
program testing is the process of
identifying a set of test data which satisfies given testing criterion. Most
of the existing test data generators
161, IS], 1101, 1161, [30]
use symbolic
evaluation to derive test data. However, in practical programs this
technique frequently requires complex algebraic manipulations, espe-
cially in the presence of arrays. In this paper we present an alternative
approach of test data generation which is based on actual execution of
the program under test, function minimization methods, and dynamic
data flow analysis. Test data are developed for the program using ac-
tual values of input variables. When the program is executed, the pro-
gram execution
flow
is monitored.
If
during program execution an un-
desirable execution flow is observed (e.g., the “actual” path does not
correspond to the selected control path) then function minimization
search algorithms are used to automatically locate the values of input
variables for which the selected path is traversed. In addition, dynamic
data flow analysis is used
to
determine those input variables responsi-
ble for the undesirable program behavior, leading to significant speed-
up of the search process. The approach of generating test data is then
extended to programs with dynamic data structures, and
a
search
method based on dynamic data flow analysis and backtracking is pre-
sented. In the approach described in this paper, values
of
array in-
dexes and pointers are known at each step of program execution, and
this approach exploits this information to overcome difficulties of array
and pointer handling; as a result, the effectiveness of test data gener-
ation can be significantly improved.
Zndex Terms-Automated test generation, dynamic data flow analy-
sis, function minimization, software testing, symbolic evaluation.
I. INTRODUCTION
OFTWARE testing is very labor-intensive and expen-
S
sive; it accounts for approximately
50%
of the cost of
a software system development
[
11, [28]. If the testing
process could be automated, the cost of developing soft-
ware should be reduced significantly. Of the problems
in-
volved in testing software, one is of particular relevance
here: the problem of developing test data. Test data gen-
eration in software testing is the process of identifying
program input data which satisfy selected testing cri-
terion. A test data generator is a tool which assists a pro-
grammer in the generation of test data for a program.
There are three types of test data generators: pathwise test
data generators [6], [8], [lo],
[
161, [30], data specifica-
tion generators [3], [19],
[24],
[25],
and random test data
generators [7]. This paper focuses on pathwise test data
generators which are tools that accept as input a computer
program and a testing criterion (e.g., total path coverage,
statement coverage, branch coverage, etc.) and then au-
Manuscript received December 19, 1988; revised March
24,
1990. Rec-
ommended by
W.
Howden.
The author
is
with the Department
of
Computer Science, Wayne State
University, Detroit,
MI
48202.
IEEE
Log Number 9036267.
tomatically generate test data that meet the selected cri-
terion. The basic operation of the pathwise generator con-
sists of the following steps: program control flow graph
construction, path selection, and test data generation. The
path selector automatically identifies a set of paths (e.g.,
near-minimal set of paths) to satisfy selected testing cri-
terion. Once a set of test paths is determined, then for
every path in this set the test generator derives input data
that results in the execution of the selected path.
Most of the pathwise test data generators [6], [8],
[
101,
[16],
[30]
use symbolic evaluation to derive input data.
Symbolic evaluation involves executing a program using
symbolic values of variables instead of actual values.
Once a path is selected, symbolic evaluation is used to
generate a path constraint, which consists of a set of
equalities and inequalities on the program’s input vari-
ables; this path constraint must be satisfied for the path to
be traversed.
A
number of algorithms have been used for
the inequality solution. As pointed out in [18], [27], sym-
bolic evaluation is a promising approach; however, there
are still several problems which require additional re-
search, e.g., the problem of array element determination.
This problem occurs when the index
of
an array depends
on input values;
in
this case, the array element that is
being referenced or defined is unknown. This problem oc-
curs frequently during symbolic evaluation. Inefficient
so-
lutions exist, for
in
the worst case all possible index val-
ues can be enumerated. Though there has been some work
on this problem [30] and a related problem for record
structures [27], the results are still unsatisfactory.
In this paper we present an alternative approach of test
data generation, referred to as a dynamic approach of test
data generation, which is based on actual execution of a
program under test, dynamic data flow analysis, and func-
tion minimization methods. Test data are developed using
actual values of input variables. When the program is ex-
ecuted on some input data, the program execution flow is
monitored. If, during program execution, an undesirable
execution flow at some branch
is
observed then a real-
valued function is associated
with
this branch. This func-
tion is positive when a branch predicate is false and neg-
ative when the branch predicate is true. Function min-
imization search algorithms are used to automatically
locate values of input variables for which the function be-
comes negative. In addition, dynamic data flow analysis
is used to determine input variables which are responsible
for the undesirable program behavior, leading to signifi-
cant speed-up
of
the search process. In this approach, ar-
rays and dynamic data structures can be handled precisely
0098-5589/90/08OO-0870$0
1
.OO
O
1990 IEEE