基于云计算的中文文本分类方法的研究
硕士论文选辑(一)。 With the developing of Internet and the increasing of users , the Chinese text for the Internet are growing quickly , how to abstract invaluable information from massive data becames an important problem and need to be solved. Now the main text-classifying methos can be devided in two kinds as Knowledge-engineering and Statistical learning.The method based on Kowledge-engineering is mainly depended on rules that defineted by the professionals , then considering wethere the text belongs to which class by matching the text and the rules. Statistical learning use the text as material , and use computers to abstract classifying rules , then use this rules to classify automatically for unknown texts. Recently, Statistical learning has been the main method to deal with the classifying for text. But this method will be constrainted by speed of computing processing and memory, especially in much text processing. For solving the problem in text classify based on Statistic-studying , this essay will use the Cloud-computing technology and started with the difficulty of computing process and memory . Using the skill on Cloud-computing can search the metric of computing and story easily and how to classify the text using the Map/Reduce data processing models. In this essay , we use an method called SVM , which represent its advantages different from others in dealing with lineness-undevision and little samples problems. The nature of the SVM algorithm is transforming the text classification problem into an inequality constrained quadratic programming problems which try to seek the the largest margin with the geometric constraints. The improvement of the SVM algorithm in the title is that converse the quadratic programming inequality constraints to the equality constraint ,and that make the solving process more simple. This study focus on how to use the open source Hadoop cloud computing systems to build a cloud platform, and how to use MapReduce model to achieve the improved SVM classification algorithm on the cloud computing platform. The final experimental results show that the new algorithm is better than the SVM algorithm to improve the pre processing efficiency. Key Words: Cloud Computing; Text; Support Vector Machine; MapReduce
- 粉丝: 21
- 资源: 157
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助