Introduction to Text Visualization

所需积分/C币:9 2019-03-07 20:29:32 7.27MB PDF
收藏 收藏

Nan Cao Weiwei Cui编写的文本可视化分析的书籍。全书系统介绍了各种文本信息可视化技术及应用。英文原版。有需要的自然知道这本书的好。
Moreinformationaboutthisseriesathtttp// Nan cao· Weiwei cui Introduction to text Visualization BATLANTIS PRESS We Cu IBMT.. Watson research center Microsoft research asia Yorktown heights ny B USA China Atlantis briefs in artificial Intelligence ISBN97894-6239-185-7 ISBN978-94-6239-186-4( ebook) DOI10.2991978-94-6239-186-4 Library of Congress Control Number: 2016950403 o Atlantis Press and the author(s)2016 This book, or any parts thereof, may not be reproduced for commercial purposes in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system known or to be invented without prior permission from the publisher. Printed on acid-tree paper Acknowledgements We would like to thank Prof. Yu-Ru Lin from University of Pittsburgh for her initial efforts on discussing the outline and the content of this book We also would like to thank Prof. Qiang Yang from the Hong Kong University of Science and Technology who invited us to write the book Contents 1 Introduction 1.1 Information visualization 1.2 Text visualization 8 1. 3 Book Outline 9 References 10 2 Overview of Text Visualization Techniques 2.1 Review Scope and Taxonomy 2.2 Visualizing Document Similarity 3 2.2.1 Projection Oriented Techniques 2.2.2 Semantic Oriented Techniques 15 2. 3 Revealing Text Content 16 2.3.1 Summarizing a Single Document 16 2.3.2 Showing content at the word level 18 2.3.3 Visualizing Topics 2.3.4 Showing Events and Storyline 24 2. 4 Visualizing Sentiments and Emotions 28 2. 5 Document Exploration Techniques 2.5.1 Distortion Based approaches 32 2.5.2 Exploration Based on document similarity 32 2.5.3 Hierarchical Document Exploration 33 2.5.4 Search and Query Based Approaches 33 2.6 Summary of the Chapter 34 References 35 3 Data model 3.1 Data Structures at the word level 43 3.1.1 Bag of words and N-gram 43 3.1.2 Word Frequency Vector 43 3.2 Data Structures at the Syntactical-Level 44 VI ontent 3.3 Data Models at the Semantic level 3.3.1 Network Oriented Data Models 45 3.3.2 Multifaceted Entity-Relational Data Model 46 3.4 Summary of the Chapter 48 References 48 4 Visualizing Document similarity 4.1 Projection Based Approaches 49 4.1.1 Linear projections 50 4.1.2 Non-linear projections 51 4.2 Semantic Oriented Techniques 54 4.3 Conclusion 55 References 55 5 Visualizing Document Content 5.1“ What We say”:Word 58 5.1.1 frequency 5.1.2 Frequency trend 67 5.2“ How We say:St 74 5.2.1 Co-occurrence Relationships 75 5.2.2 Concordance Relationships 78 5.2. 3 Grammar Structure 79 5.2.4 Repetition Relationships 82 5.3“ What can be inferred”: Substance 84 5.3.1 Fingerprint 84 5.3.2 Topics 5.3.3 Topic Evolutions 5.3. 4 Event 93 5.4 Summary of the Chapter 96 References 97 6 Visualizing Sentiments and Emotions 103 6.1 Introduction 103 6.2 Visual Analysis of Customer Comments 107 6.3 Visualizing Sentiment Diffusion 109 6. 4 Visualizing Sentiment Divergence in Social Media 6.5 Conclusion 113 References 113 Chapter 1 Introduction abstract Text is one of the greatest inventions in our history and is a major approach to recording information and knowledge, enabling easy information sharing across both space and time. For example, the study of ancient documents and books are still a main approach for us to studying the history and gaining knowledge from our predecessors. The invention of the Internet at the end of the last century significantly speed up the production of the text data. Currently, millions of websites are ger erating extraordinary amount of online text data everyday for example, facebook the worlds largest social media platform, with the help of over I billion monthly active users, is producing billions of posting messages everyday. The explosion of the data makes seeking information and understanding it difficult. Text visualization techniques can be helpful for addressing these problems. In particular, various visu alizations have been designed for showing the similarity of text documents, revealing and summarizing text content, showing sentiments and emotions derived from the text data, and helping with big text data exploration. This book provides a system atical review of existing text visualization techniques developed for these purposes Before getting into the review details, in this chapter we introduce the background f information visualization and text visualization 1.1 Information visualization In 1755, the French philosopher Denis Diderot made the following prophecy As long as the centuries continue to unfold, the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in an mmense multitude of bound volumes.-Denis diderot about two and a half centuries later, this prophecy has come true we are facing a situation of Information Overload, which refers to the difficulty a person may have in understanding an issue and making decisions because of the presence of too much nformation. However, Information Overload is not mainly caused by the growth of books but mainly by the advent of the Internet o Atlantis Press and the author(s)2016 C. Nan and w. Cui, Introduction to Text Visualization Atlantis briefs in Artificial Intelligence 1, doI 10.29917978-94-6239-186-4-1 1 Introduction Several reasons could be cited for the Internet accelerating the process of infor- mation overload process. First, with the Internet, the generation, duplication, and transmission of information has never been easier. Blogging, Twitter, and Facebook provide ordinary people the ability to efficiently produce information, which could be instantaneously accessed by the whole world more and more people are considered active writers and viewers because of their participation. With the contribution of users, the volume of Internet data has become enormous For example, 16l exabytes of information were created or replicated in the Internet in 2006, which were already more than that the generated information in the past 5000 years [6]. In addition, the information on the Internet is constantly updated. For example, news websites pub lish new articles even every few minute Twitter users post millions of tweets every day, and old information hardly leaves the Internet. For this kind of huge amount of nformation analysis requires digging through historical data which clearly compli- cates understanding and decision making. Furthermore, information on the Internet s usually uncontrolled, which likely causes high noise ratio, contradictions, and inaccuracies in available information on the Internet. Bad information quality will also disorientate people, thereby causing the information overload Understanding patterns in a large amount of data is a difficult task. Sophisticated technologies have been explored to address such an issue. The entire research field of data mining and know ledge discovery are dedicated to extracting useful informa tion from large datasets or databases [5], for which data analysis tasks are usually performed entirely by computers. The end users, on the other hand, are normally not involved in the analysis process and passively accept the results provided by computers These issues could be addressed via information visualization techniques whose primary goal is to assist users see information, explore data, understand insightful data patterns, and finally supervise the analysis procedure. Research in this filed are motivated by the study of perceptions in psychology. Scientists have shown that our brains are capable of effectively processing huge amounts of information and signals in a parallel way when they are properly visually represented. By turning huge and abstract data, such as demographic data, social networks, and document corpora into visual representations, information visualization techniques help users discover patterns buried inside the data or verify the analysis result Various definitions of information visualization exist [1, 3, 7] in current literature One of the most commonly adopted definitions is that of Card et al. [2 ]: the use of computer-supported, interactive visual representations of abstract to amplify cogni- tion. This definition highlights how visualization techniques help with data analysis i.e., the computer roughly processes the data and displays one or some visual rep resentations; we, the end users, perform the actual data analysis by interacting with the representations a good visualization design is able to convey a large amount of information with minimal cognitive effort. Considered as a major advantage of visualization techniques, this feature is informally described by the old saying" picture is worth

试读 122P Introduction to Text Visualization
立即下载 低至0.43元/次 身份认证VIP会员低至7折
  • 签到新秀

  • 分享精英

关注 私信 TA的资源
    Introduction to Text Visualization 9积分/C币 立即下载
    Introduction to Text Visualization第1页
    Introduction to Text Visualization第2页
    Introduction to Text Visualization第3页
    Introduction to Text Visualization第4页
    Introduction to Text Visualization第5页
    Introduction to Text Visualization第6页
    Introduction to Text Visualization第7页
    Introduction to Text Visualization第8页
    Introduction to Text Visualization第9页
    Introduction to Text Visualization第10页
    Introduction to Text Visualization第11页
    Introduction to Text Visualization第12页
    Introduction to Text Visualization第13页
    Introduction to Text Visualization第14页
    Introduction to Text Visualization第15页
    Introduction to Text Visualization第16页
    Introduction to Text Visualization第17页
    Introduction to Text Visualization第18页
    Introduction to Text Visualization第19页
    Introduction to Text Visualization第20页

    试读结束, 可继续阅读

    9积分/C币 立即下载 >