没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
OPEN FORUM
Analyzing concerns of people from Weblog articles
Tomohiro Fukuhara Æ Toshihiro Murayama Æ Toyoaki Nishida
Received: 31 March 2005 / Accepted: 29 August 2006 / Published online: 19 July 2007
Ó Springer-Verlag London Limited 2007
Abstract A system for analyzing concerns of people from Weblog articles is
proposed. The system called KANSHIN analyzes concerns of people by collecting
Japanese, Chinese, and Korean Weblog articles. Users can find concerns of people
in each language. Users can also compare differences of concerns between Japa-
nese, Chinese, and Korean language communities. We describe several analysis
results: (1) patterns of social concerns, (2) change of focuses on a problem along
with the time, (3) differences of concerns on a problem between Japanese, Chinese,
and Korean Weblog sites, and (4) relation between words in Weblog articles and
real world natural phenomenon.
Introduction
Understanding concerns of people is important for the understanding and solving of
social problems. Today, there are many social problems in the world. We have
concerns over various accidents such as the railroad accident that happened in
Amagasaki, Japan in 2005, disasters such as Hurricane Katrina disaster that
T. Fukuhara (&)
Race (Research into Artifacts, Center for Engineering), The University of Tokyo,
5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan
e-mail: fukuhara@race.u-tokyo.ac.jp
T. Murayama
SPSS Japan Inc.,, 1-1-39 Hiroo, Shibuya-ku, Tokyo 150-0012, Japan
T. Nishida
Department of Intelligence Science and Technology, Graduate School of Informatics,
Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
123
AI & Soc (2007) 22:253–263
DOI 10.1007/s00146-007-0124-3
happened in 2005, diseases such as bird flu and BSE (Bovine spongiform
encephalopathy), and so on. Although it is not easy to solve these problems,
understanding concerns of people helps us to find the key matters to be solved.
The aim of this research is to analyze concerns of people from Weblog articles.
Today many people write Weblog articles. People write thoughts and opinions on
various themes including social problems. By collecting these articles in large
amounts, we can find a tendency of social concerns.
In this paper, we propose a system called KANSHIN for analyzing concerns of
people from Weblog articles. The system collects and analyzes Japanese, Chinese, and
Korean Weblog articles. By using the system, users can find topics in the blogosphere.
They can find differences of concerns on a problem by comparing concerns in each
country, and can find change of focuses on a problem along with the time.
This paper consists of following sections. In Sect. 2, we describe an overview and
functions of a Weblog analysis system called KANSHIN. In Sect. 3, we describe
several analysis results found by the system. We report: (1) patterns of social
concerns, (2) change of focuses on a problem along time, (3) differences of concerns
across languages, and (4) relation between words that appeared in Weblog articles
and real world natural phenomenon. In Sect. 4, we discuss differences between our
system and other works. In Sect. 5, we summarize arguments of this paper, and
describe the future work.
KANSHIN: a Weblog analysis system
In this section, we describe an overview and functions of KANSHIN system.
Architecture of the system
Figure 1 shows an overview of the system. The system collects RSS (RDF site
summary) and atom syndication feeds provided by Weblog sites. For comparing
concerns across languages, we collect Japanese, Chinese, and Korean Weblog sites.
For understanding social topics, we also collect RSS and atom feeds provided by
Japanese news sites and a governmental web site
1
.
Collected articles are parsed by using morphological analyzers for extracting
nouns and adjectives. Extracted words are used for: (1) article retrieval function, and
(2) finding daily and monthly topics function. See more details about these functions
in the following subsection. For Japanese, Chinese, and Korean morphological
analyzer, we use JUMAN
2
, ICTCLAS (Zhang et al. 2003), and KLT
3
, respectively.
Extracted words are indexed in relational tables in a database. The system collects
1,374,136 feeds a day by using seven PCs. We have collected 88,767,854 articles
are collected until now
4
.
1
http://www.gov-online.go.jp/ (in Japanese; accessed 12 June 2006).
2
http://www.nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html (in Japanese; accessed 12 June 2006).
3
http://www.nlp.kookmin.ac.kr/HAM/kor/index.html (in Korean; accessed 12 June 2006).
4
On 9 June 2006, 00:00:00 JST.
254 AI & Soc (2007) 22:253–263
123
Analysis functions
The system has following analysis functions:
1. Article retrieval function.
2. Finding co-occurrence words function.
3. Finding daily and monthly topics function.
4. Cross-lingual concern analysis function.
Article retrieval
The system provides a basic article retrieval function that retrieves articles
containing keywords provided by a user. The system shows retrieved articles
ordered chronologically, and a graph showing frequency of articles containing the
keywords. Figure 2 shows an example of the result. Users can see articles and a
graph showing daily trend of articles. Users can also see keywords of the day, and
relevant news articles related to the keywords. Keywords are extracted by using a
keyword extraction system called GENSEN-Web (Nakagawa and Mori 2003).
Finding co-occurrence words
Users can find co-occurrence words of a keyword. The system searches co-
occurrence words of a keyword within an article. We used Dice coefficient for
measuring co-occurrence words of a keyword. By following co-occurring words
along time, users can find temporal change of focuses on a problem. See ‘‘Sect. 3’’
for an example of this function.
RSS
feeds
WWW
News
sites
Governmental
Website
Collecting Japanese,
Chinese, and Korean
RSS/Atom feeds
Keywords
User
Database
CGI
Web server
Daily trend of articles
containing keywords
Personal
Weblog sites
Retrieval
RSS
feeds
Retrieved
articles
0.6The Star Festival ( )
0.7Publishing ( )
0.8Election ( )
3.7Summer ( )
4.7Hot ( )July
0.3Portuguese
0.3Sulty ( )
0.3England
0.4Kintetsu
0.9The rainy season ( )June
0.4Balleyball
0.4Unpaid ( )
0.5Golden Week
0.7Pension funds ( )
1.5GW (GW)May
0.6Cherry blossom viewing ( )
0.7Release ( )
1.1Hostage ( )
1.5Cherry blossom ( )
1.6Iraq April
%Term (Japanese)Month
0.6The Star Festival ( )
0.7Publishing ( )
0.8Election ( )
3.7Summer ( )
4.7Hot ( )July
0.3Portuguese
0.3Sulty ( )
0.3England
0.4Kintetsu
0.9The rainy season ( )June
0.4Balleyball
0.4Unpaid ( )
0.5Golden Week
0.7Pension funds ( )
1.5GW (GW)May
0.6Cherry blossom viewing ( )
0.7Release ( )
1.1Hostage ( )
1.5Cherry blossom ( )
1.6Iraq April
%Term (Japanese)Month
Daily / Monthly topics
Analysis
titlearticle
3194
World Cup
Business English
4054
titlearticle
3194
World Cup
Business English
4054
titlearticle
3194
World Cup
Business English
4054
Article table
word article
3194
Media
Business
4054
Index table
word article
3194
Media
Business
4054
word article
3194
Media
Business
4054
word article
3194
Media
Business
4054
Index table
HTML to RSS
conversion script
RSS
feeds
Storing words
and articles
into tables
Ping
servers
h2r
Co-occurrence
words
Articles
Analysis
Fig. 1 Overview of KANSHIN
AI & Soc (2007) 22:253–263 255
123
剩余10页未读,继续阅读
wendy19871123
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0