没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
and Big Data Analytics
Geometry and Topology
of Complex Hierarchic Systems
DATA SCIENCE
FOUNDATIONS
Chapman & Hall/CRC
Computer Science and Data Analysis Series
Fionn Murtagh
Version Date: 20170823
and the CRC Press Web site at
http://www.crcpress.com
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
© 2018 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
International Standard Book Number-13: 978-1-4987-6393-6 (Hardback)
Contents
Preface xiii
I Narratives from Film and Literature, from Social Media and
Contemporary Life 1
1 The Correspondence Analysis Platform for Mapping Semantics 3
1.1 The Visualization and Verbalization of Data . . . . . . . . . . . . . . . . . 3
1.2 Analysis of Narrative from Film and Drama . . . . . . . . . . . . . . . . . 4
1.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 The Changing Nature of Movie and Drama . . . . . . . . . . . . . . 4
1.2.3 Correspondence Analysis as a Semantic Analysis Platform . . . . . . 5
1.2.4 Casablanca Narrative: Illustrative Analysis . . . . . . . . . . . . . . 5
1.2.5 Modelling Semantics via the Geometry and Topology of Information 6
1.2.6 Casablanca Narrative: Illustrative Analysis Continued . . . . . . . . 8
1.2.7 Platform for Analysis of Semantics . . . . . . . . . . . . . . . . . . . 8
1.2.8 Deeper Look at Semantics of Casablanca: Text Mining . . . . . . . . 10
1.2.9 Analysis of a Pivotal Scene . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Application of Narrative Analysis to Science and Engineering Research . . 11
1.3.1 Assessing Coverage and Completeness . . . . . . . . . . . . . . . . . 12
1.3.2 Change over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.3 Conclusion on the Policy Case Studies . . . . . . . . . . . . . . . . . 15
1.4 Human Resources Multivariate Performance Grading . . . . . . . . . . . . 19
1.5 Data Analytics as the Narrative of the Analysis Processing . . . . . . . . . 21
1.6 Annex: The Correspondence Analysis and Hierarchical Clustering Platform 21
1.6.1 Analysis Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6.2 Correspondence Analysis: Mapping χ
2
Distances into Euclidean Dis-
tances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6.3 Input: Cloud of Points Endowed with the Chi-Squared Metric . . . . 22
1.6.4 Output: Cloud of Points Endowed with the Euclidean Metric in Factor
Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.5 Supplementary Elements: Information Space Fusion . . . . . . . . . 23
1.6.6 Hierarchical Clustering: Sequence-Constrained . . . . . . . . . . . . 24
2 Analysis and Synthesis of Narrative: Semantics of Interactivity 25
2.1 Impact and Effect in Narrative: A Shock Occurrence in Social Media . . . 25
2.1.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.2 Two Critical Tweets in Terms of Their Words . . . . . . . . . . . . . 26
2.1.3 Two Critical Tweets in Terms of Twitter Sub-narratives . . . . . . . 26
2.2 Analysis and Synthesis, Episodization and Narrativization . . . . . . . . . 32
2.3 Storytelling as Narrative Synthesis and Generation . . . . . . . . . . . . . 33
2.4 Machine Learning and Data Mining in Film Script Analysis . . . . . . . . . 35
2.5 Style Analytics: Statistical Significance of Style Features . . . . . . . . . . 36
2.6 Typicality and Atypicality for Narrative Summarization and Transcoding . 37
2.7 Integration and Assembling of Narrative . . . . . . . . . . . . . . . . . . . 40
II Foundations of Analytics through the Geometry and Topol-
ogy of Complex Systems 43
3 Symmetry in Data Mining and Analysis through Hierarchy 45
3.1 Analytics as the Discovery of Hierarchical Symmetries in Data . . . . . . . 45
3.2 Introduction to Hierarchical Clustering, p-Adic and m-Adic Numbers . . . 45
3.2.1 Structure in Observed or Measured Data . . . . . . . . . . . . . . . 46
3.2.2 Brief Look Again at Hierarchical Clustering . . . . . . . . . . . . . . 46
3.2.3 Brief Introduction to p-Adic Numbers . . . . . . . . . . . . . . . . . 47
3.2.4 Brief Discussion of p-Adic and m-Adic Numbers . . . . . . . . . . . 47
3.3 Ultrametric Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Ultrametric Space for Representing Hierarchy . . . . . . . . . . . . . 48
3.3.2 Geometrical Properties of Ultrametric Spaces . . . . . . . . . . . . . 48
3.3.3 Ultrametric Matrices and Their Properties . . . . . . . . . . . . . . 48
3.3.4 Clustering through Matrix Row and Column Permutation . . . . . . 50
3.3.5 Other Data Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Generalized Ultrametric and Formal Concept Analysis . . . . . . . . . . . . 52
3.4.1 Link with Formal Concept Analysis . . . . . . . . . . . . . . . . . . 52
3.4.2 Applications of Generalized Ultrametrics . . . . . . . . . . . . . . . . 54
3.5 Hierarchy in a p-Adic Number System . . . . . . . . . . . . . . . . . . . . . 54
3.5.1 p-Adic Encoding of a Dendrogram . . . . . . . . . . . . . . . . . . . 54
3.5.2 p-Adic Distance on a Dendrogram . . . . . . . . . . . . . . . . . . . 57
3.5.3 Scale-Related Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Tree Symmetries through the Wreath Product Group . . . . . . . . . . . . 58
3.6.1 Wreath Product Group for Hierarchical Clustering . . . . . . . . . . 58
3.6.2 Wreath Product Invariance . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.3 Wreath Product Invariance: Haar Wavelet Transform of Dendrogram 60
3.7 Tree and Data Stream Symmetries from Permutation Groups . . . . . . . . 62
3.7.1 Permutation Representation of a Data Stream . . . . . . . . . . . . 62
3.7.2 Permutation Representation of a Hierarchy . . . . . . . . . . . . . . 63
3.8 Remarkable Symmetries in Very High-Dimensional Spaces . . . . . . . . . 64
3.9 Short Commentary on This Chapter . . . . . . . . . . . . . . . . . . . . . . 65
4 Geometry and Topology of Data Analysis: in p-Adic Terms 69
4.1 Numbers and Their Representations . . . . . . . . . . . . . . . . . . . . . . 69
4.1.1 Series Representations of Numbers . . . . . . . . . . . . . . . . . . . 69
4.1.2 Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 p-Adic Valuation, p-Adic Absolute Value, p-Adic Norm . . . . . . . . . . . 71
4.3 p-Adic Numbers as Series Expansions . . . . . . . . . . . . . . . . . . . . . 72
4.4 Canonical p-Adic Expansion; p-Adic Integer or Unit Ball . . . . . . . . . . 73
4.5 Non-Archimedean Norms as p-Adic Integer Norms in the Unit Ball . . . . 74
4.5.1 Archimedean and Non-Archimedean Absolute Value Properties . . . 74
4.5.2 A Non-Archimedean Absolute Value, or Norm, is Less Than or Equal
to One, and an Archimedean Absolute Value, or Norm, is Unbounded 74
4.6 Going Further: Negative p-Adic Numbers, and p-Adic Fractions . . . . . . 75
4.7 Number Systems in the Physical and Natural Sciences . . . . . . . . . . . . 76
4.8 p-Adic Numbers in Computational Biology and Computer Hardware . . . . 77
4.9 Measurement Requires a Norm, Implying Distance and Topology . . . . . . 78
4.10 Ultrametric Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.11 Short Review of p-Adic Cosmology . . . . . . . . . . . . . . . . . . . . . . . 80
4.12 Unbounded Increase in Mass or Other Measured Quantity . . . . . . . . . 81
4.13 Scale-Free Partial Order or Hierarchical Systems . . . . . . . . . . . . . . . 81
4.14 p-Adic Indexing of the Sphere . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.15 Diffusion and Other Dynamic Processes in Ultrametric Spaces . . . . . . . 83
III New Challenges and New Solutions for Information Search
and Discovery 85
5 Fast, Linear Time, m-Adic Hierarchical Clustering 87
5.1 Pervasive Ultrametricity: Computational Consequences . . . . . . . . . . . 87
5.1.1 Ultrametrics in Data Analytics . . . . . . . . . . . . . . . . . . . . . 87
5.1.2 Quantifying Ultrametricity . . . . . . . . . . . . . . . . . . . . . . . 88
5.1.3 Pervasive Ultrametricity . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1.4 Computational Implications . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Applications in Search and Discovery using the Baire Metric . . . . . . . . 89
5.2.1 Baire Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.2 Large Numbers of Observables . . . . . . . . . . . . . . . . . . . . . 89
5.2.3 High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.4 First Approach Based on Reduced Precision of Measurement . . . . 90
5.2.5 Random Projections in High-Dimensional Spaces, Followed by the
Baire Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.6 Summary Comments on Search and Discovery . . . . . . . . . . . . 91
5.3 m-Adic Hierarchy and Construction . . . . . . . . . . . . . . . . . . . . . . 91
5.4 The Baire Metric, the Baire Ultrametric . . . . . . . . . . . . . . . . . . . 92
5.4.1 Metric and Ultrametric Spaces . . . . . . . . . . . . . . . . . . . . . 92
5.4.2 Ultrametric Baire Space and Distance . . . . . . . . . . . . . . . . . 93
5.5 Multidimensional Use of the Baire Metric through Random Projections . . 94
5.6 Hierarchical Tree Defined from m-Adic Encoding . . . . . . . . . . . . . . . 95
5.7 Longest Common Prefix and Hashing . . . . . . . . . . . . . . . . . . . . . 96
5.7.1 From Random Projection to Hashing . . . . . . . . . . . . . . . . . . 96
5.8 Enhancing Ultrametricity through Precision of Measurement . . . . . . . . 97
5.8.1 Quantifying Ultrametricity . . . . . . . . . . . . . . . . . . . . . . . 97
5.8.2 Pervasiveness of Ultrametricity . . . . . . . . . . . . . . . . . . . . . 98
5.9 Generalized Ultrametric and Formal Concept Analysis . . . . . . . . . . . . 99
5.9.1 Generalized Ultrametric . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.9.2 Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . 99
5.10 Linear Time and Direct Reading Hierarchical Clustering . . . . . . . . . . 100
5.10.1 Linear Time, or O(N) Computational Complexity, Hierarchical Clus-
tering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.10.2 Grid-Based Clustering Algorithms . . . . . . . . . . . . . . . . . . . 100
5.11 Summary: Many Viewpoints, Various Implementations . . . . . . . . . . . 101
剩余207页未读,继续阅读
资源评论
ramonliu
- 粉丝: 5
- 资源: 13
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功