Abstract
With the development of "Internet + tourism", some tourism websites have
emerged and tourists have also realized "smart tourism". They can buy tickets, book
hotels and plan routes online without consulting travel agencies. The comments made
by tourists become the important information to evaluate the tourist satisfaction.
Exploring the tourists' satisfaction from the text can provide some scientific basis for
scenic spot planning and environmental management, which is of great social
significance for promoting the faster and better development of scenic spot economy.
Based on the relevant theories of Chinese text classification and LDA thematic
model, this paper conducts an empirical research on 4000 tourist comments. First of
all, text preprocessing was completed by text cleaning, word segmentation and word
stopping, and statistical characteristics of the data were described. Secondly, LDA
theme model is established for the processed text data, and four potential themes of
reservation service, tourist attractions, tourism experience and tour guide evaluation
are mined, and the characteristic words corresponding to different themes are
extracted. With the python software build machine finish the text classification
learning and deep learning model, mainly including including naive bayes, k neighbor,
random forests, decision tree, logistic regression, support vector machine (SVM)
algorithm and seven convolutional neural network classifier, then, the recall rate and
accuracy of f1 value and other indicators to evaluate various classifier performance.
The evaluation results show that the convolutional neural network algorithm has the
best classification effect, with the highest accuracy, recall rate and F1 value, all
reaching 85%. The k-nearest neighbor algorithm had the worst classification effect,
with an average accuracy rate of only 56%. Further analysis shows that the reason
why the deep learning model is superior to the machine learning model is that the
extraction of features by the machine learning model relies on frequency statistics,
while the deep learning model extracts features from more angles and dimensions,
avoiding the influence of some high-frequency meaningless features on classification
accuracy,and the future text research work is proposed.
Reference analysis process and conclusion, this case trying to develop the
passenger satisfaction evaluation analysis of site - brigade, for tourism related
industries and departments to provide a more intuitive and convenient travel data
analysis, and will be involved in the process of the whole text classification code and
match the original data are uploaded to making platform, realize the commercial
practice application of data analysis, grasped the tourism data information for the
related department to provide the new train of thought and method.
Keywords: passenger reviews; text classification; LDA model; machine learning
评论0