Team # 30221 Page 3 of 20
2 Introduction
Network science has gained its popularity recently due to the considerable network struc-
tures emerging in reality, which can be of great help to data mining, dynamic systems, etc. In
academic fields, we can establish the corresponding network structures for both citation and co-
authoring relationships, and then the data mining work within these networks is of great interest
and significance. Some simple questions are raised: how to measure the influence of a researcher, or
how to evaluate the importance of a research paper?
To answer this question, there are network-based evaluation tools that use co-author and
citation data to determine the impact factor of researchers, publications and journals, such as
Science Citation Index (SCI), H-factor, Impact factor, Eigenfactor, etc. Our goal is to design
effective measures to analyze influence in research networks, and then extend it to other areas
of society. Specifically, we will do the following things in this paper:
• Establish a co-author network using given data and plot it on a 2-D plane, then propose
several measures to determine who is the most influential researcher.
• Propose models to evaluate the importance of research papers using co-author and citation
data.
• Extend the models to other areas in society or other entities in research areas, then test and
analyze its performances.
• Gain heuristics from this model and discuss how individuals can learn from it in reality.
• Implement the sensitivity test and analyze its strengths and weaknesses, as well as the
potential research directions.
3 Task 1: Co-author Network of the Erdos1 Author
First of all, we build the co-author network of the Erdos1 authors from the source provid-
ed by the problem. Due to the symmetry of the pairwise co-author relationship, it is natural
to utilize graph theory to fully characterize it, where we establish a simple undirected graph and
regard researchers as vertices and the co-authoring relationship as edges. For computational sim-
plicity, we use an adjacency matrix A = (a
ij
)
N×N
to store this relationship, where N = 511 is the
number of researchers who have co-authored with Erd
¨
os, and the entry a
ij
is an indicator of the
relationship:
a
ij
=
(
1 researcher i has co-authored with researcher j
0 elsewhere
(1)
In particular, since this graph is undirected, A is symmetric.
Observing that the dataset is large (with 511 researchers and over 18,000 raw lines), we use
programming to accomplish data extraction. Specifically, we just write a simple string matching
algorithm by Ruby codes, which exports the corresponding adjacency matrix (excluding Erd
¨
os)
for further use
1
. Once we obtain the adjacency matrix A, some of its properties such as transitiv-
ity ratio
2
are listed as follows.
1
Note that the resulting matrix is not symmetric due to some minor errors in source data, and we force its symmetry
by assigning a
ij
← max(a
ij
, a
ji
).
2
The transitivity ratio describes how vertices tend to cluster and is defined as
C ,
3 × number of triangles
number of connected triples of vertices
=
trace(A
3
)
sum(A
2
− A)
评论1
最新资源