Books Recommendation
Abstract
With the development of the technology of information and internet, we came to
an information overloaded time. So does the book market. On one hand, it may be a
rather difficult choice for the readers to select their preferred and high quality books;
while on the other hand it’s also of great difficulty to recommend their books to the
suitable people for the writers or the book sellers themselves. Based on the statistics
provided, we deeply explore the inner relations among the different kinds of data in
order to build effective models to rank the books, hence recommending them to the
target readers.
As for question one, our main task is to pre-process the given data to find out the
possible relations among them. We adopt the association rule to the massive statistics
and then come to the dimension reduction. After this process, we came to the
conclusion that the heat of the book tags confine to the heavy-tailed distribution.
Therefore, we set up a mapping table of the statistics, and reached the highly related
influencing factors via using the missing value handling method to fill in the
completed matrixes. In the end, we came to the following two major factors which
may influence the readers’ remarks of the books, which are the reading interest of the
users and the prevailing extent of the books.
In the second question, we suppose that the coding ID of the given books are coded
by Dewey Decimal Number. On the basis that we have pre-processed the given
statistics, we randomly select the book types of 60000 users as the calibration of the
neural network, taking the corresponding heat of the book remarks as the input
terminal. In this context, we can regard the relationship between the remarks and the
influence factors as a black box, which means this question can be seen as a black-
box question. We then use the BP network to train the input and output data, which
can fully take advantage of the nonlinear system of the BP network, hence making
the prediction much more precise, because we finally adopt the well-trained network
to undergo the prediction, and the final result can be seen in the body of the paper.
While, as to the question three, we use the collaborative filtering method based on
the clustering process. In this method, readers who enjoy the similar interests will be
clustered together, and then the neighbors who share the most similar appetites will
be selected out. And then we can obtain the invisible message of the uses via their
neighbors to select the top 3 books as the final recommendations. The detailed
recommendation book lists are adhere to the end of the paper.
Key Word: Massive Data Mining; Heavy-tailed Distribution; BP Network;
Clustering; Collaborative Filtering Analysis