硕士论文选辑(一)。
With the developing of Internet and the increasing of users , the Chinese text for
the Internet are growing quickly , how to abstract invaluable information from
massive data becames an important problem and need to be solved. Now the main
text-classifying methos can be devided in two kinds as Knowledge-engineering and
Statistical learning.The method based on Kowledge-engineering is mainly depended
on rules that defineted by the professionals , then considering wethere the text
belongs to which class by matching the text and the rules. Statistical learning use the
text as material , and use computers to abstract classifying rules , then use this rules to
classify automatically for unknown texts. Recently, Statistical learning has been the
main method to deal with the classifying for text. But this method will be constrainted
by speed of computing processing and memory, especially in much text processing.
For solving the problem in text classify based on Statistic-studying , this essay
will use the Cloud-computing technology and started with the difficulty of computing
process and memory . Using the skill on Cloud-computing can search the metric of
computing and story easily and how to classify the text using the Map/Reduce data
processing models. In this essay , we use an method called SVM , which represent its
advantages different from others in dealing with lineness-undevision and little
samples problems. The nature of the SVM algorithm is transforming the text
classification problem into an inequality constrained quadratic programming problems
which try to seek the the largest margin with the geometric constraints. The
improvement of the SVM algorithm in the title is that converse the quadratic
programming inequality constraints to the equality constraint ,and that make the
solving process more simple.
This study focus on how to use the open source Hadoop cloud computing systems
to build a cloud platform, and how to use MapReduce model to achieve the improved
SVM classification algorithm on the cloud computing platform. The final
experimental results show that the new algorithm is better than the SVM algorithm to
improve the pre processing efficiency.
Key Words: Cloud Computing; Text; Support Vector Machine; MapReduce