Abstract:
Aiming at the problem of the dependence between software system
modules, a hybrid defect prediction model based on network representation
learning was constructed by analyzing the network structure of software
system. Firstly, the software system was converted into a software network
on a module-by-module basis. Then, network representation technique was
used to perform the unsupervised learning on the system structural feature of
each module in software network. Finally, the system structural features and
the semantic features learned by the convolutional neural network were
combined to construct a hybrid defect prediction model. The experimental
results show that the hybrid defect prediction model has better defect
prediction e:ects in three open source softwares, poi, lucene and synapse
of Apache, and its F1 index is respectively 3.8%, 1.0%, 4.1% higher than
that of the optimal model based on Convolutional Neural Network (CNN).
Software network structure feature analysis provides an e:ective research
thought for the construction of defect prediction model.
Key words:
software network; defect prediction; Convolutional Neural Network (CNN);
semantic feature; network representation learning
0 引言
随着软件规模的增长,软件缺陷预测技术[1-2]的研究对保证软件可靠性具有重要意
义。软件缺陷预测技术研究软件系统中的缺陷分布,检测出软件中有问题的模块,可用于
辅助软件测试人员进行有目的的测试,在软件部署阶段前充分检测软件内在缺陷,及时修
复有缺陷的软件模块,从而缩短软件开发生命周期、提高软件可靠性。
传统缺陷预测技术主要研究缺陷数量与软件规模的关系。Halstead 等[3]研究了软件
体积度量元与软件缺陷的关系,得出缺陷数量和体积度量元的正比例关系式。Lipow[4]对
其进行了改进,提出了缺陷与可执行代码行数之比是代码行数自然对数的二次函数关系,
并且对于不同的计算机语言函数系数具有差异。Takahashi 等[5]则结合软件文档数量给
出缺陷密度估计式,缺陷率是关于软件规范变更频率、程序员技能、软件设计文档的线性
函数。传统缺陷预测技术只能通过关系式估算缺陷密度,从而预计测试成本。