Chapter 11: Opinion Mining
Bing Liu
Department of Computer Science
University of Illinois at Chicago
liub@cs.uic.edu
Bing Liu, UIC Web Data Mining
2
Introduction – facts and opinions
Two main types of textual information on the
Web.
Facts and Opinions
Current search engines search for facts
(assume they are true)
Facts can be expressed with topic keywords.
Search engines do not search for opinions
Opinions are hard to express with a few keywords
How do people think of Motorola Cell phones?
Current search ranking strategy is not appropriate
for opinion retrieval/search.
Bing Liu, UIC Web Data Mining
3
Introduction – user generated content
Word-of-mouth on the Web
One can express personal experiences and opinions on
almost anything, at review sites, forums, discussion groups,
blogs ... (called the user generated content.)
They contain valuable information
Web/global scale: No longer – one’s circle of friends
Our interest: to mine opinions expressed in the user-
generated content
An intellectually very challenging problem.
Practically very useful.
Bing Liu, UIC Web Data Mining
4
Introduction – Applications
Businesses and organizations: product and service benchmarking.
Market intelligence.
Business spends a huge amount of money to find consumer
sentiments and opinions.
Consultants, surveys and focused groups, etc
Individuals: interested in other’s opinions when
Purchasing a product or using a service,
Finding opinions on political topics,
Ads placements: Placing ads in the user-generated content
Place an ad when one praises a product.
Place an ad from a competitor if one criticizes a product.
Opinion retrieval/search: providing general search for opinions.
Bing Liu, UIC Web Data Mining
5
Two types of evaluation
Direct Opinions: sentiment expressions on
some objects, e.g., products, events, topics,
persons.
E.g., “the picture quality of this camera is great”
Subjective
Comparisons: relations expressing
similarities or differences of more than one
object. Usually expressing an ordering.
E.g., “car x is cheaper than car y.”
Objective or subjective.
Bing Liu, UIC Web Data Mining
6
Opinion search (Liu, Web Data Mining book, 2007)
Can you search for opinions as conveniently
as general Web search?
Whenever you need to make a decision, you
may want some opinions from others,
Wouldn’t it be nice? you can find them on a search
system instantly, by issuing queries such as
Opinions: “Motorola cell phones”
Comparisons: “Motorola vs. Nokia”
Cannot be done yet! (but could be soon …)
Bing Liu, UIC Web Data Mining
7
Typical opinion search queries
Find the opinion of a person or organization (opinion
holder) on a particular object or a feature of the object.
E.g., what is Bill Clinton’s opinion on abortion?
Find positive and/or negative opinions on a particular
object (or some features of the object), e.g.,
customer opinions on a digital camera.
public opinions on a political topic.
Find how opinions on an object change over time.
How object A compares with Object B?
Gmail vs. Hotmail
Bing Liu, UIC Web Data Mining
8
Find the opinion of a person on X
In some cases, the general search engine
can handle it, i.e., using suitable keywords.
Bill Clinton’s opinion on abortion
Reason:
One person or organization usually has only one
opinion on a particular topic.
The opinion is likely contained in a single
document.
Thus, a good keyword query may be sufficient.
Bing Liu, UIC Web Data Mining
9
Find opinions on an object
We use product reviews as an example:
Searching for opinions in product reviews is different
from general Web search.
E.g., search for opinions on “Motorola RAZR V3”
General Web search (for a fact): rank pages
according to some authority and relevance scores.
The user views the first page (if the search is perfect).
One fact = Multiple facts
Opinion search: rank is desirable, however
reading only the review ranked at the top is not appropriate
because it is only the opinion of one person.
One opinion ≠ Multiple opinions
Bing Liu, UIC Web Data Mining
10
Search opinions (contd)
Ranking:
produce two rankings
Positive opinions and negative opinions
Some kind of summary of both, e.g., # of each
Or, one ranking but
The top (say 30) reviews should reflect the natural distribution
of all reviews (assume that there is no spam), i.e., with the
right balance of positive and negative reviews.
Questions:
Should the user reads all the top reviews? OR
Should the system prepare a summary of the reviews?