蚂蚁金服人工智能部研究员ICML贡献论文05.pdf

所需积分/C币:44 2019-08-29 05:05:23 593KB .PDF
收藏 收藏
举报

随着机器学习热度的增加和其中“中国力量”的逐渐强大,在各大顶级会议上有越来越多的中国组织排名靠前,大有争夺头把交椅的势头。 比如,本次ICML,清华大学有 12 篇论文被收录;华裔作者的数量也令人惊讶,如佐治亚理工学院终身副教授、机器学习中心副主任宋乐署名的就有8篇论文。 而宋乐教授的另外一个身份,就是蚂蚁金服人工智能部研究员。 蚂蚁金服成为ICML 上“中国力量”的代表之一,为大会奉献了8篇论文。其中,六篇含金量十足的Oral Paper,成为议程上研讨会的主角,接受与会专家的热烈讨论。 这些论文几乎每篇署名作者都有世界级学术专家。比如人工
Adversarial Attack on Graph Structured Data The vanilla GNn model runs the above iteration until period, or creating the friendship with someone who doesnt convergence. But recently, people find a fixed number of share any common friend. The"small modification"con- propagation steps T with various different parameteriza- straint eliminates the possibility of above two possibilities, tions (li et al., 2015; Dai et al., 2016: Gilmer et al., 2017; so as to regulate the behavior of g. With either of the two Lei et al., 2017) work quite well in various applications. realizations of robust classifier r, it is easy to enforce the attacker. Each time when an invalid modification proposed, 3. Graph adversarial attack the classifier can simply ignore such move. Given a learned classifier f and an instance from the dataset Below we first introduce our main algorithm, RL-S2V, for (G,C,y)∈D, the graph adversarial attacker g(∵):9×D→9 learning attacker g in Section 3. 1. Then in Section 3.2, asks to modify the graph G=(V, E)into G=(V,E),such we present other possible attack methods under different scenarios that max I(f(G,c)≠y) 3.1. Attacking as hierarchical reinforcement learning t.G=9(f,(C,c,y) Given an instance(G, c,y) and a target classifier f, we model the attack procedure as a Finite horizon markov decision T(G,G,c)=1 (4) Process M(m)(f, G, c, 3). The definition of such MDPis as follows: Here I(,,, :gxgxVH1O, 1 is an equivalency indicator that tells whether two graphs G and g are equivalent under e Action as we mentioned in sec 3. the attacker is allowed the classification semantics to add or delete edges in the graph so a single action at time step t is at∈.As×V. However, simply In this paper we focus on the modifications to the discrete performing actions in O(V) space is too expensive structures. The attacker g is allowed to add or delete edges We will shortly show how to use hierarchical action to from g to construct the new graph. Such ty pe of actions decompose this action spac are rich enough, since adding or deleting nodes can be State The state st at time t is represented by the tuple performed by a series of modifications to the edges. also (Gt, c), where Gt is a partially modified graph with some modifying the edges is harder than modifying the nodes, of the edges added/deleted from g while naively choosing an edge requires O(//e Omplexity, Reward The purpose of the attacker is to fool the target since choosing a node only requires O(vD complexity classifier. So the non-zero reward is only received at the Since the attacker is aimed at fooling the classifier f, instead end of the mdp, with reward being of actually changing the true label of the instance, the equivalency indicator should be defined first to restrict the r((G, c) 1:f(G,c)≠y modifications an attacker can perform. We use two ways to 1:f(G,c)=y define the equivalency indicator In the intermediate steps of modification, no reward will 1)Explicit semantics. In this case, a gold standard classifier be received. That is to say, r(st, at)=0,t=1, 2, ., m-1 f* is assumed to be accessible. Thus the equivalency In PBA-C setting where the prediction confidence indicator I(,,,)is defined as of the target classifier is accessible, we can also use r((G, )=c((G, c), y)as the reward I(G, G, c-I(f(G, c )-f(G, c)) 5). Terminal Once the agent modifies m edges, the process stops. For simplicity, we focus on the MDp with fixed whereI(c0, 1 is an indicator function length. In the case when fewer modification is enough, 2)Small modifications. In many cases when explicit seman we can simply let the agent to modify the dummy edges ties is unknown we will ask the attacker to make as fe modifications as possible within a neighborhood graph Given the above settings, a sample trajectory from this MDP will be: (81, a1. r1, . Sm, am, ,m, 3m+1), where s1=(G, c), I(G, G, C)=I((E-E)U(E-E)I<m) St=(Gt, c),VtE 2,,m) and m +1=(G, c). The last step I(ECN(G, b) (6) ill have reward rm =r(s ((G, c))and all other intermediate rewards are zero: rt=0, Vt 11, 2, .,m-11 above equation, the Since this is a discrete optimization problem with a finite number of edges that allowed to modify, and horiz on, we use Q-learning to learn the MDPs. In our pre W(G, b)=[(u,0):u,vEV, dlG(u, v)<=b] defines the liminary experiments we also tried with policy optimization b-hop neighborhood graph, where d (G) (u, v)c(1, 2, methods like Advantage Actor Critic, but found Q-learnin is the distance between two nodes in graph G works more stable. So below we focus on the modeling with Take an example in friendship networks, a suspicious Q-learning behavior would be adding or deleting many friends in a short Q-learning is an off-policy optimization where it fits the Adversarial Attack on Graph Structured Data Bellman optimality equation directly as beloy 3.1.1. PARAMETERIZATION OF Q Q(s,a)=r(s1,a)+maxQ(s+1,).(8) From above, we can see the most flexible parameterization wou e imple g2×m7 Q However, we found two distinct parametrization is typically his implicitly suggests a greedy policy enough, i.e., Q*1=QI*, Q*2=Q2*,vt r(at st; Q )=argmaxQ*(St, at) Since the q function is scoring the nodes in the state graph, it is natural to use GNn family models for parameterization in order to learn a generalizable attacker. Specifically, Q* In our finite horizon case, y is fixed to l. Note that directly 1S parameterized as operating the actions in O(Vl)space is too expensive for large graphs. Thus we propose to decompose the action at EV xv into at=(a[, a8 ),where a Q*(st, a ()-Woo(W22, 1), u(st))),(1 ∈V.Thus a single edge action at is decomposed into two ends of this edge. The hierarchical Q-tunction is then modeled as below: where u (1) is the embedding of node a!I) ,A 1) in graph Gt obtained by structure 2 vec(S2V)(Daiet al, 2016) max,(2)Q2(St, at af (2)=r(s12=(a43,a2)+ (h)=relu(w x(u)+I(4) ∑-1),(14) ∈A() max,()Q(st, at+1).(10) where ly (k) and u (o)_0. Also u(st)=u(Gt In the above formulation, Q1* and Q2- are two functions that (Vi, Et), c)is the representation of entire state tupl implement the original Q". An action is considered as com pleted only when a pair of(at, at ) is chosen. Thus the re 1(S+) vAv.graph attack rd will only be valid after ai is made It is easy to see that NG. (c, b)Av, Ac]: node attack (15) such decomposition has the same optimality structure as in q(8), but making an action would only require o (2X VD= In node attack scenario, the state embedding is taken from o(VD complexity. Figure 1 illustrates this process the b-hop neighborhood of node denoted as Na(c, b) Take a further look at Eq(10), since only the reward in last The parameter set of Q-t is 6, is given, we can explicitly unroll the Bellman equations as: consideration of the chosen node a/,ter 02, with an extra time step is non-zero, and also the budget of modification m parameterized similarly with param (1)(2) 1(S1,O maxa(2)21,2(81,1,ai 2*(5/,1),(2) =wal(r2,H14(s)) 2(51,l1,u1 IIlax (1)(2.1 We denote this method as rl-s2v since it learns a Q-function parameterized by S2V to perform attack. Qm,(sm, a (m))=max, (2)Q%2(sm, a m), am2) 3.2. Other attacking methods The Rl-s2v is suitable for black -box attack and transfer However for ditferent attack scenarios. other algorithms To make notations compact, we still use Q*=1Q+12/i=1 might be preferred. We first introduce RandSamplins that to denote the Q-function. Since each sample in the dataset requires least information in Sec 3. 2. 1; Then in Sec 3.2.2 defines an MDP, it is possible to learn a separate Q function a white -box attack GradArgmax is proposed; Finally the for each MDP M(, Gi, Ci, yi), i=1,.,N. However, GeneticAlg, which is a kind of evolutionary computing,is we here focus on a more practical and challenging setting, proposed in Sec 3. 2.3 where only one Q" is learned. The learned Q-function is thus asked to generalize or transfer over all the mdps 3.2. 1 RANDOM SAMPLING This is the simplest attack method that randomly adds or IndX t, a=argmaxatQ"(a::; 0)r((G 12) deletes edges from graph G. When an edge modification action at=(u, v)is sampled, we will only accept it when it satisfies the semantic constrainT I(,).It requires the least where Q" is parameterized by 0. Below we present the information for attack. Despite its simplicity, sometimes it. parameterization for such Q that generalizes over MDPs. can get good attack rate Adversarial Attack on Graph Structured Data pooling> Adjacency 293 p :9 Original Greedy Population Data Adversarial Sampl Figure 2. Illustration of graph structure gradient allack. This white-box attack adds/deletes the edges with maximum gradient Figure 3. Illustration of attack using genetic algorithm. The population evolves with selection, crossover and mutation (with respect to a) magnitudes operations. Fitness is neasured by the loss function 3. 2.2. GRADIENT BASED WHITE BOX ATTACK modifying edges (ut, Ut)of graph Gt Gradients have been successfully used for modifying Vi, Et\(ut,ut)):2af-<0 continuous inputs, e.g., images. However, taking gradient +1 with respect to a discrete structure is non-trivial. recall the (V,DU{(ut,t)}):a>0 general iterative embedding process defined in Eq 3),we associate a coefficient au, for each pair of (u, v)CVXV That is to say, we modify the edges who are most likely to cause the change to the objective. Depending on the sign {aa,(u,),a(v), h-1 of the gradient, we either add or delete the edge. We name u∈A(v) it as GradArgmax since it does the greedy selection based U),(),u2 (k-1) fu'EN() on gradient information (k-1) ) kE(1, 2 K)(17) The attack procedure is shown in Figure 2. Since this approach requires the gradient information, we considerit Let au, v=I(uEN(u)). That is to say, a itself is the binary as a white-box attack method. Also, the gradient considers adjacency matrix. It is easy to see that the above formulation all pairs of nodes in a graph, the computation cost is at has the same effect as in Eq (3). However, such additional least o(v). excluding the back-propagation of gradients coefficients give us the gradient information with respect in Eq(18). Without further approximation, this approach to each edge(either existing or non-existing cannot scale to large graphs C 3.2.3. GENETIC ALGORITHM (18 Evolution computing has been successfully applied in many zero-order optimization scenarios, including neural In order to attack the model, we could perform the gradient architecture search(Real et al., 2017; Miikkulainen et al ascent,i.e,Cu,v+Cu, +mao. However, the attack is 2017)and adversarial attack for images(Su et al., 2017). We on a discrete structure, where only m edges are allowed to be here propose a black-box attack method that implements a added or deleted. So here we need to solve a combinatorial type of genetic algorithms. Given an instance(G, C,g )and optimization problem the target classifier f, Such algorithm involves five major components, as elaborated below C Population: the population refers to a set of candidate solutions. Here we denote it as p()=GI G=Modify(G, au, u,t=l) here each g(r) G; is a valid modification solution to the original graph G. r=1, 2, ., R is the index of generation and R is the maximum numbers of evolutions allowed We siinply use a greedy algorithin to solve the above Fitness: each candidate solution in current population optimiz ation. Here the modification of G given a set of will get a score that measures the quality of the solution coefficients aut, v 1 is performed by sequentially We use the loss function of target model C(f(G, c), y Adversarial Attack on Graph Structured Data Table 1. Application scenarios for different proposed graph attack methods Cost is measured by the time complexity for proposing gle attack WBA PBA-C PBA-D RBA (a)# (b)# 2()#comp-3 pling O(1) gradargmax enetic al O(VI+E Figure 4. Example graphs for classification. Here we show three RL-S2V √O(V|+F| graphs with 1, 2, or 3 components, with 40-50 nodes as the score function. A good attack solution should we constructed contains 15,000 graphs, generated with increase such losS. Since the fitness is a continuous Erdos-Renyi random graph model. It is a three class score, it is not applicable in PBA-D setting, where only graph classification task, where each class contains 5,000 classification label is accessible graphs. The classifier is asked to tell how many connected Selection: Given the fitness scores of current population, components are there in the corresponding undirected graph we can either do weighted sampling or greedy selection to G. The label set y 2, 3. So there could be up to 3 select thebreeding'population Per)for next generation. components in a graph. See Figure 4 for illustration. The Crossover: After the selection of p(r), we randomly pick gold classifier f* is obtained by performing a one-time two candidates G1, 2 b and do the crossover by traversal of the entire graph. The dataset is divided into mixing the edges from these two candidates training and two test sets. The test set I contains 1, 500 graphs, while test set ll contains 150 graphs. Each set contains the (V, (E1nE2Urp(E1 E2)Urp(E2\ E1)).( 21) same number of instances from different classes Here rp( means randomly picking a subset We choose structure2vec as the target model for attack. We Mutation: the mutation process is also biology inspired also tune its number of propagation parameter K=12.,5 For a candidate solution GEp(r), suppose the modi fied Table 2 shows the results with different settings. For test set edges are dE=l(u, ul)Ir. Then for each edge(ul.'L I, we can see the structure2vec achieves very high accuracy we have a certain probability to change it to either(at, 0) on distinguishing the number of connected components Also increasing K seems to improve the generalization in most cases. However, we can see under the practical The population size p(r)l, the probability of crossover used black-box attack scenario, the Genetical and RL-S2V can in rp(), the mutation probability and the number of evolu- bring down the accuracy to 40%n60%. In attacking the tions R are all hyper-parameters that can be tuned. Due to the graph classification algorithm, the gradArgmax seems not limitation of the fitness function, this method can only be used to be very effective. One reason could be the last pooling in the PBA-C setting. Also since we need to execute the target step in S2V when obtaining graph-level embedding. During model f to get fitness scores, the computation cost of such back propagation, the pooling operation will dispatch the genetic algorithm is O(VI+E), which is mainly made up gradient to every other node embeddings, which makes the by the computation cost of GNNs. The overall procedure is L looks similar in most entries illustrated in Figure 3. We simply name it as genetical since it is an instantiation of general genetic algorithm framework For restrict black-box attack on test set ii (see the lower half of Table 2), the attacker is asked to propose adversarial 4. Experiment samples without any access to the target model. Since For GeneticAlg, we set the population size P-100 and RL-S2V is learned on test set l. it is able to transfer its learned the number of rounds r=10. we tune the crossover rate and policy to test set Il. This suggests that the target classifier mutation rate in 0.1,.,0.5. For RL-S2V, we tune the num makes some form of consistent mistakes ber of propagations of its S2V model K=(1,.5. There This experiment shows that,(1)the adversarial examples is no parameter tuning for GradArgmax and Randsampling. do exist for supervised graph problems;(2)a model with We use the proposed attack methods to attack the graph good generalization ability can still suffer from adversarial classification model in sec 4. 1 and node classification model attacks; (3)RL-S2 V can learn the transferrable adversarial in sec 4.2. in each scenario we first show the attack rate policy to attack unseen graphs when queries are allowed for target model, then we show the 4.2.Node-level attack generalization ability of the rL-S2V for rBa setting. In this experiment, we want to inspect the adversarial attack 4.1. Graph-level attack to the node classification problems. Different from Sec 4.1 here the setting is transductive, where the test samples(but In this set ofexperiments, we use synthetic data, where the not their labels) are also seen during training. Here we gold classifier f* is known. Thus the explicit semantics use four real-world datasets, namely the Citeseer, Cora, is used for the equivalency indicator T. The dataset D(and) Pubmed and Finance. The first three are small-scaled citation Adversarial Attack on Graph Structured Data Table 2. Attack graph classification algorithm. We report the 3-class classification accuracy of target model on the vanilla test set I and II, as well as adversarial samples generated. The upper half of the table reports the attack results on test set I, with different levels of access to the information of target classifier. The lower half reports the results of rba setting on test set II where only randSampling and RL-S2V can be used. K is the number of propagation steps used in GNN family models(see Eq ( 3)) attack test set I 15-20 nodes 40-50 nodes 90-100 nodes Settings methods K=2K=3K=4K=5K=2K=3K=4K=5K=2K=3K=4K=5 ( unattacked)93.20%98.20%9887%99.07%92.60%96.20%97.53%97.93%94.60%97.47%98.73%98.20% RBa Randsamp!ing78.73%9227%95.13%97.67%73.60%78.60%82.80%85.73%7447%7413%80.93%8280% Wba GradArgmax6947%6460%95.80%9767%73.93%6480%70.53%7547%72.00%6620%67.80%68.07% PBA-C GeneticA3987%39.07%65.33%85.87%5953%55.67%53.70%4248%6547%63.20%61.60%61.13% PBA-D RL-S2V 42.93%41.93%70.20%91.27%61.00%59.20%58.73%4947%66.07%64.07%64.47%64.67% Restricted black-box attack on test set ll ( unattacked)94.67%97.33%98.67%97.33%9467%97.33%98.67%9867%96.67%9800%99.33%9800% RBA Randsamp!ing78.00%91.33%94.00%9867%7533%84.00%86.00%87.33%6933%73.33%76.00%80.00% RBA RL-S2V 44.00%40.00%67.33%92.00%58.67%60.00%58.00%44.67%62.67%6200%62.567%61.33% Table 3. Statistics of the graphs used for node classification Table 4. Attack node classification algorithm. In the upper half of edges Classes Train/Test i test ll the table, we report target model accuracy before/after the attack Citeseer 3.327 4,732 6 120/1,000/500 on the test set i. with various settings and methods In the lower ora 2,708 5.429 140/1,000/500 half, we reporl accuracy on lest set II with RBA selling only. In Pubmed19,71744,338 60/1,000/500 this second part, only Randsampling and rL-s2v can be used Finance2,382,9808,l01,7572 317,041/812/800 Method Citeseer Cora Pubmed finance (unattacked) l.60%81.00%79.90%88.67% networks commonly used for node classification, where each RBA, Randsampling67.60%78.50%79.00%87.44 node is a paper with corresponding bag-of-words features The last one is a large-scale dataset that contains transactions WBA, gradargmax63.00%71.30%724%86.33% from an e-commerce within one day, where the node set con- PBA-C, GeneticAlg 63.70% 71.20% 72.30% 85.969 tains buyers, sellers and credit cards. The classifier is asked to PBA-D RL-S2V 62.70%71.20%7280%85.43% distinguish the normal transactions from abnormal ones. The Exhaust 62.50%70.70%71.80%85.22 statistics of each dataset is shown in Table 3. the nodes also contain features with different dimensions. For the full table Restricted black- box attack on test set il please refer to Kipf Welling(2016). We use gCn(Kipf (unattacked) 7260%80.20%80.40%91.88% Welling, 2016) as the target model to attack. Here the small modifications"is used to regulate the attacker. That is to say, Randsampling 6800%7840%79.00%90.75 given a graph g and target node C, the adversarial samples RL-S2V 6600%7500%74.00%89.10% are limited to delete single edge within 2-hops of node c. Exhaust 6260%70.80%71.00%88.88% Table 4 shows the results. We can see although deleting a sin gleedge is the minimum modification one can do to the graph the attack rate is still about 10%0 on those small graphs, and 49c in the Finance dataset. We also ran an exhaustive attack as sanity check, which is the best any algorithm can do under (a)pred=2 〔b)pred=1 (c)pred =2 the attack budget. The classifier accuracy will reduce to 60% or lower if two-edge modification is allowed However, con- Figure 5. Attack solutions proposed by rl-S2V on graph classifi sider the average degree in the graph is not large, deleting two cation problem. Target classifier is structure 2vec with K=4. The or more edges would violate the"small modification"con ground truth t components are: (a)1(b)2(c)3 straints. We need to be careful to only create adversarial sam ples, instead of actually changing the true label of that sample samples. Though we do not have gold classifier in real-world datasets, it is highly possible that the adversarial samples In this case, the GradArgmax performs quite good, which proposed are valid: (1) the structure modification is tiny and is different from the case in graph-level attack. Here the within 2-hop; (2)we did not modify the node features gradient with respect to the adjacency matrix a is no longer averaged, which makes it easier to distinguish the useful mod- 4.3. Inspection of adversarial samples fications. For the restrict black-box attack on test set. Il, the In this section, we visualize the adversarial samples proposed RL-S2V still learns an attack policy that generalizes to unseen by different attackers. The solutions proposed by RL-s2V Adversarial Attack on Graph Structured Data os o: neurons in the hidden layers, while edge drop modifies the a discrete structure. It is also different from simply drop the entire hidden vector, since deleting a single edge can affect more than just one edge. For example, GCN computes the (a)pred =2 (b )pred=1 (c)pred =2 normalized graph Laplacian. So after deleting a single edge, the normalized graph Laplacian needs to be recomputed igure 6. Attack solutions proposed by GradArgmax on node for some entries. This approach is similar to hamilton classification problem. Attacked node is colored orange. Nodes et al.( 2017), who samples a fixed number of neighborhoods froim the same class as the allacked node are marked black during training for the efficiency. Here we drop the edges otherwise white, Target classifier is gcn with K=2 globally at random, during each training step Table 5 Results after adversarial training by random edge drop The new results after adversarial training are presented in Citeseer Cora Pubmed Finance Table 5. We can see from the table that, though the accuracy Method of target model remains similar, the attack rate of various (unattacked) 71.30%8|.70%79.50%8855% methods decreases about 1%. Though the scale of the RBA, Randsampling 67.70% 79.20% 78.20% 87.44% improvement is not significant, it shows some effectiveness WBA. GradArgmax 63.90% 72,50% 72.40%87.32% with such cheap ad versarial training PBA-C, GeneticAl 64. 609072.60%072.50%086.45% 5Related work PBA-D.RLS2v63.90%7280%72.90%8580% dversarial attack in continuous and discrete space: In recent years, the adversarial attacks to the deep learning mod for graph-level classification problem are shown in Figure 5. els have raised increasing attention from researchers. Some The ground truth labels are 1, 2, 3, while the target classifier methods focus on the white-box adversarial attack using mistakenly predicts 2, 1, 2, respectively. In Figure 5(b)and gradient information, like box constrained L-BFGS (Szeged c), the rl agent connects two nodes who are 4 hops away et al. 2013), Fast Gradient Sign (Goodfellow et al., 2014) from each other(before the red edge is added). This shows deep fool (Moosavi-Dezfooli et al., 2016), etc.When that. although the target classifier structure 2vec is trained the full information of target model is not accessible,one with K=4, it didnt capture the 4-hop information efficient can train a substitute model(Papernot et al., 2017), or use Also Figure 5(a)shows that, even connecting nodes who are zero-order optimization method( hen et al., 2017). There just 2-hop away the classifier makes mistake on it are also some works working on the attack with discrete functions(Buckman et al 2018)but not the combinatorial Figure 6 shows the solutions proposed by GradArgmax. structures. The one-pixel attack(Su et al., 2017)modifies Orange node is the target node for attack. Edges with blue the image by only several pixels using differential evolution, color are suggested to be added by GradArgmar, while black Jia Liang(2017)attacks the text reading comprehension ones are suggested to be deleted. Black nodes have the same system with the help of rules and human efforts. Zugneret al node label as the orange node, while while nodes do not (2018)studied the problem of adversarial attack over graphs The thicker the edge, the larger the magnitude of the gradient in parallel to our work, although with very different methods is. Figure 6(b) deletes one neighbor with the same label combinatorial optimization: Modifying the discrete but still have other black nodes connected. In this case. the structure to fool the target classifier can be treated as a GCN is over-sensitive. The mistake made in Figure 6(c)is combinatorial optimization problem. Recently, there are reasonable since although the red edge does not connect two some exciting works using reinforcement learning to learn to el,it connects to a large community solve the general sequential decision problems( Bello et al., of nodes from the same class in 2-hop distance. In this case 2016)or graph combinatorial problems(Dai et al., 2017) the prediction made by gcn is reasonable These are closely related to RL-S2V The RL-S2V extends the 4,4. Defense against attacks previous approach using hierarchical way to decompose the Different from the images, here the possible number of graph quadratic action space, in order to make the training feasible structures is finite given the number of nodes. So by adding 6. Conclusion the adversarial samples back for further training, the im provement of the target model s robustness can be expected In this paper, we study the adversarial attack on graph struc For example, in the experiment of Sec 4.1, adding adversarial tured data. To perform the efficient attack, we proposed three ly RL-S2V, GradArgmax and GeneticAl for samples for training is equivalent to increasing the size of three different attack settings. respectively We show that the training set, which will definitely be helpful. So here we a family of gnn models are vulnerable to such attack. By seek to use a cheap method for adversarial training-simply doing edge drop during training for defense visualizing the attack samples, we can also inspect the target classifier. We also discussed about defense methods through Dropping the edges during training is different from experiments. Our future work includes developing more Dropout(Srivastava et al., 2014). Dropout operates on the effective defense algorithms Adversarial Attack on Graph Structured Data Acknowledgements Hamilton, William L, Ying, Rex, and Leskovec, Jure This project was supported in part by NSFIis-1218749, NIH Inductive representation learning on large graphs. arXiv preprint arXiv: /706.02216, 2017 BIGDATA IROIGM108341. NSF CAREER IIS-1350983 NSF IIS-1639792 EAGER, NSF CNS-1704701, onr Jia, Robin and Liang, percy. Adversarial examples for No0014-15-1-2340. Intel IstC. NVidia and Amazon aws evaluating reading comprehension systems TianTian and jun Zhu were supported by the national nsF of China(No. 61620106010)and Beijing Natural Science preprint arXiv: 1707.07328 2017 Foundation(no. LI72037). We thank bo dai for valuable Kipf, Thomas N and welling, Max. Semi-supervised suggestions, and the anonymous reviewers who gave useful classification with graph convolutional networks. arXiv comments preprint arXiv: 1609.02907, 2016 References Lei, Tao, Jin, Wengong, Barzilay, regina, and Jaakkola Tommi Deriving neural architectures from sequence and Akoglu, Leman, Tong, Hanghang, and Koutra, Danai Graph graph kernels. arXiv preprint arXiv: 1705.09037, 2017 based anomaly detection and description: a survey data Mining and knowledge Discovery, 29(3): 626-688,2015. Li, Yujia, Tarlow, Daniel, Brockschmidt, Marc, and Zemel Bello, Irwan, Pham, Hieu, Le, Quoc V, Norouzi, Mo Richard. Gated graph sequence neural networks. arXiv hammad, and Bengio, Samy. Neural combinatorial preprint arXiv: 1511.05493, 2015 optimization with reinforcement learning. arXiv preprint arXiv:l611.09940.2016 Miikkulainen, Risto, Liang, Jason, Meyerson, Elliot, Rawal, Aditya, Fink, Dan, Francon, Olivier raju Buckman, Jacob, Roy, Aurko, Raffel, Colin, and Goodfellow, Bala, Nayruzyan, Arshak, Duffy, Nigel, and Hodjat lan. Thermometer encoding: One hot way to resist Babak. Evol ving deep neural networks. arXiv preprint adversarial examples. In International Conference arXiy:/703.00548,2017. onleArningRepresentations2018.Urlhttps //openreview. net/forum?id=S1BSu--CW Moosavi-Dezfooli, Seyed-Mohsen, Fawzi, Alhussein, and Frossard, Pascal. Deepfool: a simple and accurate method Chen, Pin-Yu, Zhang, Huan, Sharma, Yash, Yi, Jinfeng, to fool deep neural networks. In Proceedings of the IEeE and hsieh, Cho-Jui. Zoo: Zeroth order optimization Conference on Computer Vision and Pattern recognition based black-box attacks to deep neural networks without pp.2574-2582,2016 training substitute models. In Proceedings of the 10th ACM Workshop on artificial Intelligence and Security, pp. Papernot, Nicolas, MCDaniel, Patrick, Goodfellow, lan 15-26.ACM,2017 Jha, Somesh, Celik, Z Berkay, and Swami, Ananthram Practical black-box attacks against machine learning. In Dai, Hanjun, Dai, Bo, and Song, Le. Discriminative Proceedings of the 2017 ACM on Asia Conference on embeddings of latent variable models for structured data Computer and Communications Security, pp. 506-519 In ICML 2016 ACM,2017 Dai, Hanjun, Khalil, Elias B, Zhang, Yuyu, Dilkina, Bistra, Real, Esteban, Moore, Sherry, Selle, Andrew, Saxena and Song, Le. Learning combinatorial optimization Saurabh, Suematsu, Yutaka Leon, Le, Quoc, and Kurakin, algorithms over graphs. arXiv preprint ar Xiv: 1704.01665 Alex. Large-scale evolution of image classifiers. arXiv 2017 preprint ar Xiv: 1703.01041.2017 Duvenaud, David K, Maclaurin, Dougal, Iparraguirre, Jorge Bombarell, Rafael, Hirzel, Timothy, Aspuru-Guzik, Alan, Scarselli, franco, Gor, Marco, Tsoi, Ah chung, hagenbuch and Adams, Ryan P. Convolutional networks on graphs ner, Markus, and Monfardini, Gabriele. The graph neural for learning molecular fingerprints. In Advances in Neural network model Neural Networks, IEEE Transactions on Information Processing Systems, pp. 2215-2223, 2015 20(1):61-80.2009 Gilmer,Justin, Schoenholz, Samuel S, Riley, Patrick F, Srivastava,N, Hinton, G, Krizhevsky, A, Sutskever, I Vinyals, Oriol, and Dahl, George E. Neural mes and Salakhutdinov, R. Dropout: A simple way to prevent sage passing for quantum chemistry. arXiv preprint neural networks from overfitting. The Journal of machine arXiv:704.01212.2017. Learning research, 15(1): 1929-1958, 2014 Goodfellow, lan J, Shlens, Jonathon, and Szegedy, Christian. Su, Jiawei, Vargas, Danilo Vasconcellos, and Kouichi Explaining and harnessing adversarial examples. arXiv Sakurai. One pixel attack for fooling deep neural networks preprint arXiv: /412.6572, 2014 arXiv preprint arXiv: 1710.08864, 2017 Adversarial Attack on Graph Structured Data Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya Bruna. Joan. Erhan. Dumitru. Goodfellow. Ian. and Fergus, Rob. Intriguing properties of neural networks arXiv preprint arXiv: 1312.6199, 2013 Trivedi, Rakshit, Dai, Hanjun, Wang, Yichen, and Song Le. Know-evolve: Deep temporal reasoning for dynamic knowledge graphs In ICML, 2017 Zuigner, Daniel, Akbarnejad, Amir, and Gunnemann Stephan. Adversarial attacks on neural networks for graph data In KDD. 201 8

...展开详情
试读 10P 蚂蚁金服人工智能部研究员ICML贡献论文05.pdf
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    抢沙发
    一个资源只可评论一次,评论内容不能少于5个字
    weixin_38744435 欢迎大家使用并留下宝贵意见
    2019-08-29
    • 至尊王者

      成功上传501个资源即可获取
    关注 私信 TA的资源
    上传资源赚积分,得勋章
    最新推荐
    蚂蚁金服人工智能部研究员ICML贡献论文05.pdf 44积分/C币 立即下载
    1/10
    蚂蚁金服人工智能部研究员ICML贡献论文05.pdf第1页
    蚂蚁金服人工智能部研究员ICML贡献论文05.pdf第2页
    蚂蚁金服人工智能部研究员ICML贡献论文05.pdf第3页

    试读已结束,剩余7页未读...

    44积分/C币 立即下载 >