【摘 要】随着硬件设备的普及,促使信息技术和移动互联网的快速发展,人们已
经告别了信息匮乏的时期,而进入到了信息过载的时期。人们试图用搜索功能搜索
出自己想要的信息,如今已是非常困难,怎样从海量的数据中筛选出有价值的信息
是信息提供者和信息需求者都要面对的挑战。本文对数据分类中的 ID3 算法的基
本概念和原理以及其构造过程进行了详细阐述,针对 ID3 算法倾向于选择取值较
多的属性的缺点,引进属性阈值和信息增益率两个概念。弥补 ID3 算法属性选择
标准的不足,来实现新的属性选择标准,对原有 ID3 算法进行改进。通过实验对
改进前后的算法进行了比较,实验表明,改进后的算法提高了分类准确
度。%With the popularization of hardware equipment, prompting the
rapid development of information tech-nology and mobile Internet,
people have already bid farewell to the period of lack of information, and
entered the period of information overload. People try to use the search
function to search out the information they want, and now it is very
difficult, how to filter out from the mass of valuable information is
information providers and information needs of those who have to face
the challenge. In this paper, the basic concept and principle of ID3
algorithm in data classifica-tion and its construction process are