ABSTRACT
Software metric models can predict target software metric(s), e.g. the
development work effort or defect rates for any future software project based on
the project predictor software metric(s) such as project team size. Obviously,
the construction of such software measurement model requires use of past similar
project data samples. However incomplete data often appear in such data samples.
The decision on whether a particular predictor metric should be included is most
likely based on the intuition or experienced-based assumption. Unfortunately this
assumption is usually not verifiable after the model is constructed, leading to
redundant predictor metric(s) and/or unnecessary complexity of predictor metric
selection. Moreover, these predictor metrics may contain continuous and discrete
variables. This thesis mainly considers how to simplify the software metrics model
with incomplete data. The contents of this thesis include the following sections:
Chapter 1 is an introduction about the background of this study and details
on three main problems encountered and their solutions (details will be discussed
in Chapters 2, 3 and 4).
Chapter 2 discusses the methods how to deal with the missing data in statistics.
In this chapter, we describe current research progress associated with the missing
data processing and provided some methods related to this study. At last, we discuss
the k-NN method and Monte-Carlo simulation method.
Chapter 3 mainly focuses on the processing method for discrete variables. A
relatively simple approach using so called virtual variables is discussed.
Chapter 4 shows the study details of a method for variable selection. In this
chapter we introduce 3 classical methods often used for variable selection. After
some comparison, we choose stepwise regression method for our paper.
Chapter 5 gives a case study. Using R, SPSS, and Java language, we apply the
methods from Chapter 2, 3 and 4 to the real data, a simplified software metrics
model is constructed successfully.
Key words: software metrics, variable selection, missing data, stepwise regression,
virtual variable method