However, this method inversely loses some detailed information of original data, which will lead to an inaccurate result
of query in respect of utilization.
Anonymizing based on bigraph is to partition the vertex set into
and
, and make sure that
. In other words, the original graph should be changed into a bigraphbefore anonymizing
process. However, this method can only be applied in some special graphs. If the modification during the above changing
process is too much, it will greatly reduce the utility of original graph. Thus, this method cannot be applied into all kinds
of social network data sets.
Briefly, existing anonymizations usually have the shortcomings of poor anti-attack capability and low utility, which
mainly contains the unavailability and high extraction cost of the part of original data without privacy information. Thus,
in this paper, we propose a novel anonymization, Splitting Anonymization, which is different of the above four existing
methods. Our new method takes the advantages of the above four methods but makes some improvement. Firstly, to the
information attached in the vertices in original graph, it conducts some process of trivialization (removing identity
information), classification and integration, which ensure a high level protection to passive attack. Then, based on the
splitting process to the vertices in original graph, it makes the information on vertices copy scrambled and the
information on edges splitting stored in original graph through a serious of conversion. This phase ensures the original
graph and its substructure cannot be traced in anonymous graph, which provides a high level protection to active attack.
Noted that our method hid all privacy information, and can return a result with high accuracy for utility query.
To summarize, our contributions are listed as follows.
(1) We present a novel anonymization apporach.
(2) We propose the concept of “Knowledge Information”, and apply it into our anonymzing model, which reduce the
constraint to unnecessary protected information and anonymization. Moreover, we propose the new concepts of
“predictable error” and “approximate constraint error”, which provides a more accurate query result.
(3) The splitting anonymization can provide reliable privacy protection to all the known passive attacks and active
attacks, and enhance the speed and accuracy of utility query.
(4) Strict theoretical analysis proves that our splitting anonymization can satisfy the requirement of privacy
protection and has a high utility. Large amount of experiment also verifies our design.
The rest of this paper is organized as follows. In Section 2, we introduce some basic concepts and give out the
problem definition. In Section 3, we describe our principles and phases of splitting anonymizing in detail. In Section 4,
we discuss the techniques of dynamic anonymzation with the update of social network. The theoretical analysis for the
protection effect of splitting anonymizing when facing different kinds of attacks is elaborated in Section 5, followed by
the utility analysis in Section 6. Section 7 and 8 show the experiment results and the related work, respectively. Finally
we conclude our paper in Section 9.
2 Problem Definition
In this section, we first give out some basic definitions on privacy, and then propose our problem.
Definition 2.1. Social Network. Data can be presented by a graph
.
is the vertex set
presenting all the vertices in the graph corresponding to participators in the social network.
presents all the edges in the graph, which means the relationship
between the participators.
is the set of vertex labels, and
presents the label
attached to vertex
for . We use
to present the value of item of label
.
Definition 2.2. The third party that tries to access the sensitive data without authority is called an Attacker. The method
that an attacker only exploits the information and leak in anonymizing data to get the sensitive data is
called Passive Attack. The method that an attacker needs to use other information to get the sensitive data
except for the anonymizing data is called Active Attack.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
评论0