蚂蚁金服人工智能部研究员ICML贡献论文05.pdf

随着机器学习热度的增加和其中“中国力量”的逐渐强大，在各大顶级会议上有越来越多的中国组织排名靠前，大有争夺头把交椅的势头。 比如，本次ICML，清华大学有 12 篇论文被收录；华裔作者的数量也令人惊讶，如佐治亚理工学院终身副教授、机器学习中心副主任宋乐署名的就有8篇论文。 而宋乐教授的另外一个身份，就是蚂蚁金服人工智能部研究员。 蚂蚁金服成为ICML 上“中国力量”的代表之一，为大会奉献了8篇论文。其中，六篇含金量十足的Oral Paper，成为议程上研讨会的主角，接受与会专家的热烈讨论。 这些论文几乎每篇署名作者都有世界级学术专家。比如人工
Adversarial Attack on Graph Structured Data The vanilla GNn model runs the above iteration until period, or creating the friendship with someone who doesnt convergence. But recently, people find a fixed number of share any common friend. The"small modification"con propagation steps T with various different parameteriza straint eliminates the possibility of above two possibilities, tions (li et al., 2015; Dai et al., 2016: Gilmer et al., 2017; so as to regulate the behavior of g. With either of the two Lei et al., 2017) work quite well in various applications. realizations of robust classifier r, it is easy to enforce the attacker. Each time when an invalid modification proposed, 3. Graph adversarial attack the classifier can simply ignore such move. Given a learned classifier f and an instance from the dataset Below we first introduce our main algorithm, RLS2V, for (G,C,y)∈D, the graph adversarial attacker g(∵):9×D→9 learning attacker g in Section 3. 1. Then in Section 3.2, asks to modify the graph G=(V, E)into G=(V,E),such we present other possible attack methods under different scenarios that max I(f(G,c)≠y) 3.1. Attacking as hierarchical reinforcement learning t.G=9(f,(C,c,y) Given an instance(G, c,y) and a target classifier f, we model the attack procedure as a Finite horizon markov decision T(G,G,c)=1 (4) Process M(m)(f, G, c, 3). The definition of such MDPis as follows: Here I(,,, :gxgxVH1O, 1 is an equivalency indicator that tells whether two graphs G and g are equivalent under e Action as we mentioned in sec 3. the attacker is allowed the classification semantics to add or delete edges in the graph so a single action at time step t is at∈.As×V. However, simply In this paper we focus on the modifications to the discrete performing actions in O(V) space is too expensive structures. The attacker g is allowed to add or delete edges We will shortly show how to use hierarchical action to from g to construct the new graph. Such ty pe of actions decompose this action spac are rich enough, since adding or deleting nodes can be State The state st at time t is represented by the tuple performed by a series of modifications to the edges. also (Gt, c), where Gt is a partially modified graph with some modifying the edges is harder than modifying the nodes, of the edges added/deleted from g while naively choosing an edge requires O(//e Omplexity, Reward The purpose of the attacker is to fool the target since choosing a node only requires O(vD complexity classifier. So the nonzero reward is only received at the Since the attacker is aimed at fooling the classifier f, instead end of the mdp, with reward being of actually changing the true label of the instance, the equivalency indicator should be defined first to restrict the r((G, c) 1:f(G,c)≠y modifications an attacker can perform. We use two ways to 1:f(G,c)=y define the equivalency indicator In the intermediate steps of modification, no reward will 1)Explicit semantics. In this case, a gold standard classifier be received. That is to say, r(st, at)=0,t=1, 2, ., m1 f* is assumed to be accessible. Thus the equivalency In PBAC setting where the prediction confidence indicator I(,,,)is defined as of the target classifier is accessible, we can also use r((G, )=c((G, c), y)as the reward I(G, G, cI(f(G, c )f(G, c)) 5). Terminal Once the agent modifies m edges, the process stops. For simplicity, we focus on the MDp with fixed whereI(c0, 1 is an indicator function length. In the case when fewer modification is enough, 2)Small modifications. In many cases when explicit seman we can simply let the agent to modify the dummy edges ties is unknown we will ask the attacker to make as fe modifications as possible within a neighborhood graph Given the above settings, a sample trajectory from this MDP will be: (81, a1. r1, . Sm, am, ,m, 3m+1), where s1=(G, c), I(G, G, C)=I((EE)U(EE)I<m) St=(Gt, c),VtE 2,,m) and m +1=(G, c). The last step I(ECN(G, b) (6) ill have reward rm =r(s ((G, c))and all other intermediate rewards are zero: rt=0, Vt 11, 2, .,m11 above equation, the Since this is a discrete optimization problem with a finite number of edges that allowed to modify, and horiz on, we use Qlearning to learn the MDPs. In our pre W(G, b)=[(u,0):u,vEV, dlG(u, v)<=b] defines the liminary experiments we also tried with policy optimization bhop neighborhood graph, where d (G) (u, v)c(1, 2, methods like Advantage Actor Critic, but found Qlearnin is the distance between two nodes in graph G works more stable. So below we focus on the modeling with Take an example in friendship networks, a suspicious Qlearning behavior would be adding or deleting many friends in a short Qlearning is an offpolicy optimization where it fits the Adversarial Attack on Graph Structured Data Bellman optimality equation directly as beloy 3.1.1. PARAMETERIZATION OF Q Q(s,a)=r(s1,a)+maxQ(s+1,).(8) From above, we can see the most flexible parameterization wou e imple g2×m7 Q However, we found two distinct parametrization is typically his implicitly suggests a greedy policy enough, i.e., Q*1=QI*, Q*2=Q2*,vt r(at st; Q )=argmaxQ*(St, at) Since the q function is scoring the nodes in the state graph, it is natural to use GNn family models for parameterization in order to learn a generalizable attacker. Specifically, Q* In our finite horizon case, y is fixed to l. Note that directly 1S parameterized as operating the actions in O(Vl)space is too expensive for large graphs. Thus we propose to decompose the action at EV xv into at=(a[, a8 ),where a Q*(st, a ()Woo(W22, 1), u(st))),(1 ∈V.Thus a single edge action at is decomposed into two ends of this edge. The hierarchical Qtunction is then modeled as below: where u (1) is the embedding of node a!I) ,A 1) in graph Gt obtained by structure 2 vec(S2V)(Daiet al, 2016) max,(2)Q2(St, at af (2)=r(s12=(a43,a2)+ (h)=relu(w x(u)+I(4) ∑1),(14) ∈A() max,()Q(st, at+1).(10) where ly (k) and u (o)_0. Also u(st)=u(Gt In the above formulation, Q1* and Q2 are two functions that (Vi, Et), c)is the representation of entire state tupl implement the original Q". An action is considered as com pleted only when a pair of(at, at ) is chosen. Thus the re 1(S+) vAv.graph attack rd will only be valid after ai is made It is easy to see that NG. (c, b)Av, Ac]: node attack (15) such decomposition has the same optimality structure as in q(8), but making an action would only require o (2X VD= In node attack scenario, the state embedding is taken from o(VD complexity. Figure 1 illustrates this process the bhop neighborhood of node denoted as Na(c, b) Take a further look at Eq(10), since only the reward in last The parameter set of Qt is 6, is given, we can explicitly unroll the Bellman equations as: consideration of the chosen node a/,ter 02, with an extra time step is nonzero, and also the budget of modification m parameterized similarly with param (1)(2) 1(S1,O maxa(2)21,2(81,1,ai 2*(5/,1),(2) =wal(r2,H14(s)) 2(51,l1,u1 IIlax (1)(2.1 We denote this method as rls2v since it learns a Qfunction parameterized by S2V to perform attack. Qm,(sm, a (m))=max, (2)Q%2(sm, a m), am2) 3.2. Other attacking methods The Rls2v is suitable for black box attack and transfer However for ditferent attack scenarios. other algorithms To make notations compact, we still use Q*=1Q+12/i=1 might be preferred. We first introduce RandSamplins that to denote the Qfunction. Since each sample in the dataset requires least information in Sec 3. 2. 1; Then in Sec 3.2.2 defines an MDP, it is possible to learn a separate Q function a white box attack GradArgmax is proposed; Finally the for each MDP M(, Gi, Ci, yi), i=1,.,N. However, GeneticAlg, which is a kind of evolutionary computing,is we here focus on a more practical and challenging setting, proposed in Sec 3. 2.3 where only one Q" is learned. The learned Qfunction is thus asked to generalize or transfer over all the mdps 3.2. 1 RANDOM SAMPLING This is the simplest attack method that randomly adds or IndX t, a=argmaxatQ"(a::; 0)r((G 12) deletes edges from graph G. When an edge modification action at=(u, v)is sampled, we will only accept it when it satisfies the semantic constrainT I(,).It requires the least where Q" is parameterized by 0. Below we present the information for attack. Despite its simplicity, sometimes it. parameterization for such Q that generalizes over MDPs. can get good attack rate Adversarial Attack on Graph Structured Data pooling> Adjacency 293 p :9 Original Greedy Population Data Adversarial Sampl Figure 2. Illustration of graph structure gradient allack. This whitebox attack adds/deletes the edges with maximum gradient Figure 3. Illustration of attack using genetic algorithm. The population evolves with selection, crossover and mutation (with respect to a) magnitudes operations. Fitness is neasured by the loss function 3. 2.2. GRADIENT BASED WHITE BOX ATTACK modifying edges (ut, Ut)of graph Gt Gradients have been successfully used for modifying Vi, Et\(ut,ut)):2af<0 continuous inputs, e.g., images. However, taking gradient +1 with respect to a discrete structure is nontrivial. recall the (V,DU{(ut,t)}):a>0 general iterative embedding process defined in Eq 3),we associate a coefficient au, for each pair of (u, v)CVXV That is to say, we modify the edges who are most likely to cause the change to the objective. Depending on the sign {aa,(u,),a(v), h1 of the gradient, we either add or delete the edge. We name u∈A(v) it as GradArgmax since it does the greedy selection based U),(),u2 (k1) fu'EN() on gradient information (k1) ) kE(1, 2 K)(17) The attack procedure is shown in Figure 2. Since this approach requires the gradient information, we considerit Let au, v=I(uEN(u)). That is to say, a itself is the binary as a whitebox attack method. Also, the gradient considers adjacency matrix. It is easy to see that the above formulation all pairs of nodes in a graph, the computation cost is at has the same effect as in Eq (3). However, such additional least o(v). excluding the backpropagation of gradients coefficients give us the gradient information with respect in Eq(18). Without further approximation, this approach to each edge(either existing or nonexisting cannot scale to large graphs C 3.2.3. GENETIC ALGORITHM (18 Evolution computing has been successfully applied in many zeroorder optimization scenarios, including neural In order to attack the model, we could perform the gradient architecture search(Real et al., 2017; Miikkulainen et al ascent,i.e,Cu,v+Cu, +mao. However, the attack is 2017)and adversarial attack for images(Su et al., 2017). We on a discrete structure, where only m edges are allowed to be here propose a blackbox attack method that implements a added or deleted. So here we need to solve a combinatorial type of genetic algorithms. Given an instance(G, C,g )and optimization problem the target classifier f, Such algorithm involves five major components, as elaborated below C Population: the population refers to a set of candidate solutions. Here we denote it as p()=GI G=Modify(G, au, u,t=l) here each g(r) G; is a valid modification solution to the original graph G. r=1, 2, ., R is the index of generation and R is the maximum numbers of evolutions allowed We siinply use a greedy algorithin to solve the above Fitness: each candidate solution in current population optimiz ation. Here the modification of G given a set of will get a score that measures the quality of the solution coefficients aut, v 1 is performed by sequentially We use the loss function of target model C(f(G, c), y Adversarial Attack on Graph Structured Data Table 1. Application scenarios for different proposed graph attack methods Cost is measured by the time complexity for proposing gle attack WBA PBAC PBAD RBA (a)# (b)# 2()#comp3 pling O(1) gradargmax enetic al O(VI+E Figure 4. Example graphs for classification. Here we show three RLS2V √O(V+F graphs with 1, 2, or 3 components, with 4050 nodes as the score function. A good attack solution should we constructed contains 15,000 graphs, generated with increase such losS. Since the fitness is a continuous ErdosRenyi random graph model. It is a three class score, it is not applicable in PBAD setting, where only graph classification task, where each class contains 5,000 classification label is accessible graphs. The classifier is asked to tell how many connected Selection: Given the fitness scores of current population, components are there in the corresponding undirected graph we can either do weighted sampling or greedy selection to G. The label set y 2, 3. So there could be up to 3 select thebreeding'population Per)for next generation. components in a graph. See Figure 4 for illustration. The Crossover: After the selection of p(r), we randomly pick gold classifier f* is obtained by performing a onetime two candidates G1, 2 b and do the crossover by traversal of the entire graph. The dataset is divided into mixing the edges from these two candidates training and two test sets. The test set I contains 1, 500 graphs, while test set ll contains 150 graphs. Each set contains the (V, (E1nE2Urp(E1 E2)Urp(E2\ E1)).( 21) same number of instances from different classes Here rp( means randomly picking a subset We choose structure2vec as the target model for attack. We Mutation: the mutation process is also biology inspired also tune its number of propagation parameter K=12.,5 For a candidate solution GEp(r), suppose the modi fied Table 2 shows the results with different settings. For test set edges are dE=l(u, ul)Ir. Then for each edge(ul.'L I, we can see the structure2vec achieves very high accuracy we have a certain probability to change it to either(at, 0) on distinguishing the number of connected components Also increasing K seems to improve the generalization in most cases. However, we can see under the practical The population size p(r)l, the probability of crossover used blackbox attack scenario, the Genetical and RLS2V can in rp(), the mutation probability and the number of evolu bring down the accuracy to 40%n60%. In attacking the tions R are all hyperparameters that can be tuned. Due to the graph classification algorithm, the gradArgmax seems not limitation of the fitness function, this method can only be used to be very effective. One reason could be the last pooling in the PBAC setting. Also since we need to execute the target step in S2V when obtaining graphlevel embedding. During model f to get fitness scores, the computation cost of such back propagation, the pooling operation will dispatch the genetic algorithm is O(VI+E), which is mainly made up gradient to every other node embeddings, which makes the by the computation cost of GNNs. The overall procedure is L looks similar in most entries illustrated in Figure 3. We simply name it as genetical since it is an instantiation of general genetic algorithm framework For restrict blackbox attack on test set ii (see the lower half of Table 2), the attacker is asked to propose adversarial 4. Experiment samples without any access to the target model. Since For GeneticAlg, we set the population size P100 and RLS2V is learned on test set l. it is able to transfer its learned the number of rounds r=10. we tune the crossover rate and policy to test set Il. This suggests that the target classifier mutation rate in 0.1,.,0.5. For RLS2V, we tune the num makes some form of consistent mistakes ber of propagations of its S2V model K=(1,.5. There This experiment shows that,(1)the adversarial examples is no parameter tuning for GradArgmax and Randsampling. do exist for supervised graph problems;(2)a model with We use the proposed attack methods to attack the graph good generalization ability can still suffer from adversarial classification model in sec 4. 1 and node classification model attacks; (3)RLS2 V can learn the transferrable adversarial in sec 4.2. in each scenario we first show the attack rate policy to attack unseen graphs when queries are allowed for target model, then we show the 4.2.Nodelevel attack generalization ability of the rLS2V for rBa setting. In this experiment, we want to inspect the adversarial attack 4.1. Graphlevel attack to the node classification problems. Different from Sec 4.1 here the setting is transductive, where the test samples(but In this set ofexperiments, we use synthetic data, where the not their labels) are also seen during training. Here we gold classifier f* is known. Thus the explicit semantics use four realworld datasets, namely the Citeseer, Cora, is used for the equivalency indicator T. The dataset D(and) Pubmed and Finance. The first three are smallscaled citation Adversarial Attack on Graph Structured Data Table 2. Attack graph classification algorithm. We report the 3class classification accuracy of target model on the vanilla test set I and II, as well as adversarial samples generated. The upper half of the table reports the attack results on test set I, with different levels of access to the information of target classifier. The lower half reports the results of rba setting on test set II where only randSampling and RLS2V can be used. K is the number of propagation steps used in GNN family models(see Eq ( 3)) attack test set I 1520 nodes 4050 nodes 90100 nodes Settings methods K=2K=3K=4K=5K=2K=3K=4K=5K=2K=3K=4K=5 ( unattacked)93.20%98.20%9887%99.07%92.60%96.20%97.53%97.93%94.60%97.47%98.73%98.20% RBa Randsamp!ing78.73%9227%95.13%97.67%73.60%78.60%82.80%85.73%7447%7413%80.93%8280% Wba GradArgmax6947%6460%95.80%9767%73.93%6480%70.53%7547%72.00%6620%67.80%68.07% PBAC GeneticA3987%39.07%65.33%85.87%5953%55.67%53.70%4248%6547%63.20%61.60%61.13% PBAD RLS2V 42.93%41.93%70.20%91.27%61.00%59.20%58.73%4947%66.07%64.07%64.47%64.67% Restricted blackbox attack on test set ll ( unattacked)94.67%97.33%98.67%97.33%9467%97.33%98.67%9867%96.67%9800%99.33%9800% RBA Randsamp!ing78.00%91.33%94.00%9867%7533%84.00%86.00%87.33%6933%73.33%76.00%80.00% RBA RLS2V 44.00%40.00%67.33%92.00%58.67%60.00%58.00%44.67%62.67%6200%62.567%61.33% Table 3. Statistics of the graphs used for node classification Table 4. Attack node classification algorithm. In the upper half of edges Classes Train/Test i test ll the table, we report target model accuracy before/after the attack Citeseer 3.327 4,732 6 120/1,000/500 on the test set i. with various settings and methods In the lower ora 2,708 5.429 140/1,000/500 half, we reporl accuracy on lest set II with RBA selling only. In Pubmed19,71744,338 60/1,000/500 this second part, only Randsampling and rLs2v can be used Finance2,382,9808,l01,7572 317,041/812/800 Method Citeseer Cora Pubmed finance (unattacked) l.60%81.00%79.90%88.67% networks commonly used for node classification, where each RBA, Randsampling67.60%78.50%79.00%87.44 node is a paper with corresponding bagofwords features The last one is a largescale dataset that contains transactions WBA, gradargmax63.00%71.30%724%86.33% from an ecommerce within one day, where the node set con PBAC, GeneticAlg 63.70% 71.20% 72.30% 85.969 tains buyers, sellers and credit cards. The classifier is asked to PBAD RLS2V 62.70%71.20%7280%85.43% distinguish the normal transactions from abnormal ones. The Exhaust 62.50%70.70%71.80%85.22 statistics of each dataset is shown in Table 3. the nodes also contain features with different dimensions. For the full table Restricted black box attack on test set il please refer to Kipf Welling(2016). We use gCn(Kipf (unattacked) 7260%80.20%80.40%91.88% Welling, 2016) as the target model to attack. Here the small modifications"is used to regulate the attacker. That is to say, Randsampling 6800%7840%79.00%90.75 given a graph g and target node C, the adversarial samples RLS2V 6600%7500%74.00%89.10% are limited to delete single edge within 2hops of node c. Exhaust 6260%70.80%71.00%88.88% Table 4 shows the results. We can see although deleting a sin gleedge is the minimum modification one can do to the graph the attack rate is still about 10%0 on those small graphs, and 49c in the Finance dataset. We also ran an exhaustive attack as sanity check, which is the best any algorithm can do under (a)pred=2 〔b)pred=1 (c)pred =2 the attack budget. The classifier accuracy will reduce to 60% or lower if twoedge modification is allowed However, con Figure 5. Attack solutions proposed by rlS2V on graph classifi sider the average degree in the graph is not large, deleting two cation problem. Target classifier is structure 2vec with K=4. The or more edges would violate the"small modification"con ground truth t components are: (a)1(b)2(c)3 straints. We need to be careful to only create adversarial sam ples, instead of actually changing the true label of that sample samples. Though we do not have gold classifier in realworld datasets, it is highly possible that the adversarial samples In this case, the GradArgmax performs quite good, which proposed are valid: (1) the structure modification is tiny and is different from the case in graphlevel attack. Here the within 2hop; (2)we did not modify the node features gradient with respect to the adjacency matrix a is no longer averaged, which makes it easier to distinguish the useful mod 4.3. Inspection of adversarial samples fications. For the restrict blackbox attack on test set. Il, the In this section, we visualize the adversarial samples proposed RLS2V still learns an attack policy that generalizes to unseen by different attackers. The solutions proposed by RLs2V Adversarial Attack on Graph Structured Data os o: neurons in the hidden layers, while edge drop modifies the a discrete structure. It is also different from simply drop the entire hidden vector, since deleting a single edge can affect more than just one edge. For example, GCN computes the (a)pred =2 (b )pred=1 (c)pred =2 normalized graph Laplacian. So after deleting a single edge, the normalized graph Laplacian needs to be recomputed igure 6. Attack solutions proposed by GradArgmax on node for some entries. This approach is similar to hamilton classification problem. Attacked node is colored orange. Nodes et al.( 2017), who samples a fixed number of neighborhoods froim the same class as the allacked node are marked black during training for the efficiency. Here we drop the edges otherwise white, Target classifier is gcn with K=2 globally at random, during each training step Table 5 Results after adversarial training by random edge drop The new results after adversarial training are presented in Citeseer Cora Pubmed Finance Table 5. We can see from the table that, though the accuracy Method of target model remains similar, the attack rate of various (unattacked) 71.30%8.70%79.50%8855% methods decreases about 1%. Though the scale of the RBA, Randsampling 67.70% 79.20% 78.20% 87.44% improvement is not significant, it shows some effectiveness WBA. GradArgmax 63.90% 72,50% 72.40%87.32% with such cheap ad versarial training PBAC, GeneticAl 64. 609072.60%072.50%086.45% 5Related work PBAD.RLS2v63.90%7280%72.90%8580% dversarial attack in continuous and discrete space: In recent years, the adversarial attacks to the deep learning mod for graphlevel classification problem are shown in Figure 5. els have raised increasing attention from researchers. Some The ground truth labels are 1, 2, 3, while the target classifier methods focus on the whitebox adversarial attack using mistakenly predicts 2, 1, 2, respectively. In Figure 5(b)and gradient information, like box constrained LBFGS (Szeged c), the rl agent connects two nodes who are 4 hops away et al. 2013), Fast Gradient Sign (Goodfellow et al., 2014) from each other(before the red edge is added). This shows deep fool (MoosaviDezfooli et al., 2016), etc.When that. although the target classifier structure 2vec is trained the full information of target model is not accessible,one with K=4, it didnt capture the 4hop information efficient can train a substitute model(Papernot et al., 2017), or use Also Figure 5(a)shows that, even connecting nodes who are zeroorder optimization method( hen et al., 2017). There just 2hop away the classifier makes mistake on it are also some works working on the attack with discrete functions(Buckman et al 2018)but not the combinatorial Figure 6 shows the solutions proposed by GradArgmax. structures. The onepixel attack(Su et al., 2017)modifies Orange node is the target node for attack. Edges with blue the image by only several pixels using differential evolution, color are suggested to be added by GradArgmar, while black Jia Liang(2017)attacks the text reading comprehension ones are suggested to be deleted. Black nodes have the same system with the help of rules and human efforts. Zugneret al node label as the orange node, while while nodes do not (2018)studied the problem of adversarial attack over graphs The thicker the edge, the larger the magnitude of the gradient in parallel to our work, although with very different methods is. Figure 6(b) deletes one neighbor with the same label combinatorial optimization: Modifying the discrete but still have other black nodes connected. In this case. the structure to fool the target classifier can be treated as a GCN is oversensitive. The mistake made in Figure 6(c)is combinatorial optimization problem. Recently, there are reasonable since although the red edge does not connect two some exciting works using reinforcement learning to learn to el,it connects to a large community solve the general sequential decision problems( Bello et al., of nodes from the same class in 2hop distance. In this case 2016)or graph combinatorial problems(Dai et al., 2017) the prediction made by gcn is reasonable These are closely related to RLS2V The RLS2V extends the 4,4. Defense against attacks previous approach using hierarchical way to decompose the Different from the images, here the possible number of graph quadratic action space, in order to make the training feasible structures is finite given the number of nodes. So by adding 6. Conclusion the adversarial samples back for further training, the im provement of the target model s robustness can be expected In this paper, we study the adversarial attack on graph struc For example, in the experiment of Sec 4.1, adding adversarial tured data. To perform the efficient attack, we proposed three ly RLS2V, GradArgmax and GeneticAl for samples for training is equivalent to increasing the size of three different attack settings. respectively We show that the training set, which will definitely be helpful. So here we a family of gnn models are vulnerable to such attack. By seek to use a cheap method for adversarial trainingsimply doing edge drop during training for defense visualizing the attack samples, we can also inspect the target classifier. We also discussed about defense methods through Dropping the edges during training is different from experiments. Our future work includes developing more Dropout(Srivastava et al., 2014). Dropout operates on the effective defense algorithms Adversarial Attack on Graph Structured Data Acknowledgements Hamilton, William L, Ying, Rex, and Leskovec, Jure This project was supported in part by NSFIis1218749, NIH Inductive representation learning on large graphs. arXiv preprint arXiv: /706.02216, 2017 BIGDATA IROIGM108341. NSF CAREER IIS1350983 NSF IIS1639792 EAGER, NSF CNS1704701, onr Jia, Robin and Liang, percy. Adversarial examples for No00141512340. Intel IstC. NVidia and Amazon aws evaluating reading comprehension systems TianTian and jun Zhu were supported by the national nsF of China(No. 61620106010)and Beijing Natural Science preprint arXiv: 1707.07328 2017 Foundation(no. LI72037). We thank bo dai for valuable Kipf, Thomas N and welling, Max. Semisupervised suggestions, and the anonymous reviewers who gave useful classification with graph convolutional networks. arXiv comments preprint arXiv: 1609.02907, 2016 References Lei, Tao, Jin, Wengong, Barzilay, regina, and Jaakkola Tommi Deriving neural architectures from sequence and Akoglu, Leman, Tong, Hanghang, and Koutra, Danai Graph graph kernels. arXiv preprint arXiv: 1705.09037, 2017 based anomaly detection and description: a survey data Mining and knowledge Discovery, 29(3): 626688,2015. Li, Yujia, Tarlow, Daniel, Brockschmidt, Marc, and Zemel Bello, Irwan, Pham, Hieu, Le, Quoc V, Norouzi, Mo Richard. Gated graph sequence neural networks. arXiv hammad, and Bengio, Samy. Neural combinatorial preprint arXiv: 1511.05493, 2015 optimization with reinforcement learning. arXiv preprint arXiv:l611.09940.2016 Miikkulainen, Risto, Liang, Jason, Meyerson, Elliot, Rawal, Aditya, Fink, Dan, Francon, Olivier raju Buckman, Jacob, Roy, Aurko, Raffel, Colin, and Goodfellow, Bala, Nayruzyan, Arshak, Duffy, Nigel, and Hodjat lan. Thermometer encoding: One hot way to resist Babak. Evol ving deep neural networks. arXiv preprint adversarial examples. In International Conference arXiy:/703.00548,2017. onleArningRepresentations2018.Urlhttps //openreview. net/forum?id=S1BSuCW MoosaviDezfooli, SeyedMohsen, Fawzi, Alhussein, and Frossard, Pascal. Deepfool: a simple and accurate method Chen, PinYu, Zhang, Huan, Sharma, Yash, Yi, Jinfeng, to fool deep neural networks. In Proceedings of the IEeE and hsieh, ChoJui. Zoo: Zeroth order optimization Conference on Computer Vision and Pattern recognition based blackbox attacks to deep neural networks without pp.25742582,2016 training substitute models. In Proceedings of the 10th ACM Workshop on artificial Intelligence and Security, pp. Papernot, Nicolas, MCDaniel, Patrick, Goodfellow, lan 1526.ACM,2017 Jha, Somesh, Celik, Z Berkay, and Swami, Ananthram Practical blackbox attacks against machine learning. In Dai, Hanjun, Dai, Bo, and Song, Le. Discriminative Proceedings of the 2017 ACM on Asia Conference on embeddings of latent variable models for structured data Computer and Communications Security, pp. 506519 In ICML 2016 ACM,2017 Dai, Hanjun, Khalil, Elias B, Zhang, Yuyu, Dilkina, Bistra, Real, Esteban, Moore, Sherry, Selle, Andrew, Saxena and Song, Le. Learning combinatorial optimization Saurabh, Suematsu, Yutaka Leon, Le, Quoc, and Kurakin, algorithms over graphs. arXiv preprint ar Xiv: 1704.01665 Alex. Largescale evolution of image classifiers. arXiv 2017 preprint ar Xiv: 1703.01041.2017 Duvenaud, David K, Maclaurin, Dougal, Iparraguirre, Jorge Bombarell, Rafael, Hirzel, Timothy, AspuruGuzik, Alan, Scarselli, franco, Gor, Marco, Tsoi, Ah chung, hagenbuch and Adams, Ryan P. Convolutional networks on graphs ner, Markus, and Monfardini, Gabriele. The graph neural for learning molecular fingerprints. In Advances in Neural network model Neural Networks, IEEE Transactions on Information Processing Systems, pp. 22152223, 2015 20(1):6180.2009 Gilmer,Justin, Schoenholz, Samuel S, Riley, Patrick F, Srivastava,N, Hinton, G, Krizhevsky, A, Sutskever, I Vinyals, Oriol, and Dahl, George E. Neural mes and Salakhutdinov, R. Dropout: A simple way to prevent sage passing for quantum chemistry. arXiv preprint neural networks from overfitting. The Journal of machine arXiv:704.01212.2017. Learning research, 15(1): 19291958, 2014 Goodfellow, lan J, Shlens, Jonathon, and Szegedy, Christian. Su, Jiawei, Vargas, Danilo Vasconcellos, and Kouichi Explaining and harnessing adversarial examples. arXiv Sakurai. One pixel attack for fooling deep neural networks preprint arXiv: /412.6572, 2014 arXiv preprint arXiv: 1710.08864, 2017 Adversarial Attack on Graph Structured Data Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya Bruna. Joan. Erhan. Dumitru. Goodfellow. Ian. and Fergus, Rob. Intriguing properties of neural networks arXiv preprint arXiv: 1312.6199, 2013 Trivedi, Rakshit, Dai, Hanjun, Wang, Yichen, and Song Le. Knowevolve: Deep temporal reasoning for dynamic knowledge graphs In ICML, 2017 Zuigner, Daniel, Akbarnejad, Amir, and Gunnemann Stephan. Adversarial attacks on neural networks for graph data In KDD. 201 8

20190829
 2.18MB
蚂蚁金服人工智能部研究员ICML贡献论文01.pdf
20190828随着机器学习热度的增加和其中“中国力量”的逐渐强大，在各大顶级会议上有越来越多的中国组织排名靠前，大有争夺头把交椅的势头。 比如，本次ICML，清华大学有 12 篇论文被收录；华裔作者的数量也令人惊
 286KB
蚂蚁金服人工智能部研究员ICML贡献论文07.pdf
20190829随着机器学习热度的增加和其中“中国力量”的逐渐强大，在各大顶级会议上有越来越多的中国组织排名靠前，大有争夺头把交椅的势头。 比如，本次ICML，清华大学有 12 篇论文被收录；华裔作者的数量也令人惊
 947KB
蚂蚁金服人工智能部研究员ICML贡献论文04.pdf
20190829随着机器学习热度的增加和其中“中国力量”的逐渐强大，在各大顶级会议上有越来越多的中国组织排名靠前，大有争夺头把交椅的势头。 比如，本次ICML，清华大学有 12 篇论文被收录；华裔作者的数量也令人惊
 6.0MB
蚂蚁金服人工智能部研究员ICML贡献论文06.pdf
20190829随着机器学习热度的增加和其中“中国力量”的逐渐强大，在各大顶级会议上有越来越多的中国组织排名靠前，大有争夺头把交椅的势头。 比如，本次ICML，清华大学有 12 篇论文被收录；华裔作者的数量也令人惊
 3.7MB
蚂蚁金服人工智能部研究员ICML贡献论文02.pdf
20190829随着机器学习热度的增加和其中“中国力量”的逐渐强大，在各大顶级会议上有越来越多的中国组织排名靠前，大有争夺头把交椅的势头。 比如，本次ICML，清华大学有 12 篇论文被收录；华裔作者的数量也令人惊
 1.24MB
蚂蚁金服人工智能部研究员ICML贡献论文03.pdf
20190829随着机器学习热度的增加和其中“中国力量”的逐渐强大，在各大顶级会议上有越来越多的中国组织排名靠前，大有争夺头把交椅的势头。 比如，本次ICML，清华大学有 12 篇论文被收录；华裔作者的数量也令人惊
 37.60MB
ICML19attention.pdf
20200326attention机制在深度学习中的应用及其原理，最新讲座使用PPT，供大家学习使用。仅用于个人学习使用，禁止商用，如有侵权，请联系删除！
 231KB
icml 2018年 会议文章目录（含文章下载链接）
20180917international conference on machine learning (ICML) 2018年会议文章目录， 含论文下载链接
 794.21MB
ICML20202.zip
20200906ICML 是 International Conference on Machine Learning的缩写，即国际机器学习大会。ICML如今已发展为由国际机器学习学会（IMLS）主办的年度机器学习国
 950.15MB
ICML20201.zip
20200906ICML 是 International Conference on Machine Learning的缩写，即国际机器学习大会。ICML如今已发展为由国际机器学习学会（IMLS）主办的年度机器学习国
 8.49MB
请看最新8篇ICML 2020投稿论文（包括：自监督学习、联邦学习、图学习、数据隐私、语言模型、终身学习）.zip
202002222020的机器学习在研究什么？请看最新8篇ICML2020投稿论文：自监督学习、联邦学习、图学习、数据隐私、语言模型、终身学习…通过作者们放到 ArXiv 上的 ICML 投稿文章，一窥 ICML20
 1KB
ICML2020论文列表与下载链接爬虫
20200925爬取ICML2020公开的论文清单中的论文信息（标题、作者）与对应下载链接，并写入到CSV文件中，共计1084项内容。
 277KB
ICML 2019年 会议文章目录 （含论文下载链接）
20190604international conference on machine learning（ICML） 2019年 会议文章目录 含论文下载链接
 699KB
icml2020文章列表及下载链接.zip
20200831icml 2020 所有文章的下载链接，全部 1086 篇文章，链接点击直接跳转到 pdf，可直接下载paper
 33.76MB
ICML 2013国际会议论文集论文
20141111ICML 2013国际会议论文集论文，机器学习，深度学习领域，比较热
 20.29MB
2019人工智能发展报告.pdf
201912122019年11月最新由清华大学中国工程院知识智能联合研究中心、中国人工智能学会吴文俊人工智能科学技术评选基地共同发布史上最全、最专业的人工智能发展报告。 编制概要··················
 17.8MB
多智能体DMICMLACAI.pdf
20200808强化学习与多智能体入门读物，这篇文章对多智能体强化学习（MARL）的背景，目的，代表性的算法进行了调研，在这样一个环境中，每个智能体拥有独立的 Q network，独自采集数据并进行训练，都有对环境的
 136.94MB
A Little Book of Python for Multivariate Analysis 等 28 本
20181104A Little Book of Python for Multivariate Analysis.epub Algorithmic Information Theory  Review For P
 1.91MB
ICML2020_Machine Learning Production Pipeline.pdf
20200719英伟达人工智能应用团队的计算机科学家 Chip Huyen讲述机器学习产品生产部署流程关键要点。【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署
Python数据分析三剑客主流数据分析库精讲
20190928Python数据分析三剑客主流数据分析库精讲
Python数据清洗实战入门
20191209本次课程主要以真实的电商数据为基础，通过Python详细的介绍了数据分析中的数据清洗阶段各种技巧和方法。
 6.72MB
uinty fog of war
20170824uinty 实用战争迷雾
 66.88MB
亚太数学建模（APMCM）历年赛题与优秀论文1418年.zip
20191125亚太数学建模（APMCM）历年优秀论文
 397B
vs 2017最新离线安装包（全功能完整版）
20180110vs2017最新离线安装包，将官网文件整合成4个压缩包，包含所有组件的功能，安装过程无需联网。很方便。
 608.96MB
Visio_2016
20180614visio_2016下载安装，亲测可用，不需要破解，而且无秘钥。简单方便实用
SpringCloud微服务轻松入门
20200223[为什么要学习Spring Cloud微服务] SpringCloud作为主流微服务框架，已成为各互联网公司的首选框架，国内外企业占有率持续攀升，是Java工程师的必备技能。就连大名鼎鼎的阿里巴巴dubbo也正式更名为Spring Cloud Alibaba，成为了Spring Cloud 微服务中的一个子模块。Spring Cloud是企业架构转型、个人能力提升、架构师进阶的不二选择。 【推荐你学习这门课的理由】 1、本课程总计29课时，从微服务是什么、能够做什么开始讲起，绝对的零基础入门 2、课程附带全部26个项目源码，230页高清PDF正版课件 【课程知识梳理】 1、先讲解了什么是单体架构、什么是微服务架构、他们之间有什么区别和联系，各自有什么优缺点。 2、从本质入手，使用最简单的Spring Boot搭建微服务，让你认清微服务是一种思想和解决问题的手段，而不是新兴技术。 3、讲解Spring Boot 与Spring Cloud 微服务架构之间的联系，原生的RestTemplate工具，以及Actuator监控端点的使用。 4、带着微服务所带来的各种优缺点，为大家引入服务发现与注册的概念和原理，从而引入我们的第一个注册中心服务Eureka。 5、引入负载均衡的理念，区分什么是服务端负载均衡，什么是客户端负载均衡，进而引入Ribbon负载均衡组件的详细使用。 6、为了解决微服务之间复杂的调用，降低代码的复杂度，我们引入了Feign声明式客户端，让你几行代码搞定服务的远程调用。 7、最后为大家介绍了整个微服务体系应该包含什么，学习路线是什么，应该学习什么。 【学习方法】 每一节课程均有代码，最好的方式是静下心来，用一天的时间，或者两个半天时间来学习。 一边听我的讲解，一边使用我提供的项目代码进行观察和运行。 只要你能跟住我的节奏，你就可以搞定微服务。

博客
Redis12：缓存穿透与雪崩
Redis12：缓存穿透与雪崩

学院
Windows批处理教程
Windows批处理教程

博客
Mybatis一级缓存和二级缓存（转）
Mybatis一级缓存和二级缓存（转）

学院
JDK15新特性系列课程
JDK15新特性系列课程

学院
2020版_全国计算机Web二级真题讲解
2020版_全国计算机Web二级真题讲解

博客
git和pylint结合自动检测规范 (gitpylintcommithook）
git和pylint结合自动检测规范 (gitpylintcommithook）

学院
基于unicloud全栈开发商业项目(第三季)
基于unicloud全栈开发商业项目(第三季)

学院
Windows入门精讲
Windows入门精讲

学院
方法的定义和使用 视频
方法的定义和使用 视频

下载
收支月报表Excel图表模板
收支月报表Excel图表模板

博客
springboot下载工具类
springboot下载工具类

下载
保险业利润费用表Excel图表模板
保险业利润费用表Excel图表模板

学院
JAVA20分钟手写HashMap
JAVA20分钟手写HashMap

下载
固态继电器原理和应用
固态继电器原理和应用

下载
jre/jdk 1.6 6u45
jre/jdk 1.6 6u45

学院
2020版_全国计算机Python二级等级考试
2020版_全国计算机Python二级等级考试

下载
比率分析表Excel图表模板
比率分析表Excel图表模板

下载
公司劳务派遣费用结算单Excel图表模板
公司劳务派遣费用结算单Excel图表模板

学院
小程序·云开发实战 微信朋友圈所有功能
小程序·云开发实战 微信朋友圈所有功能

学院
Windows批处理详细教程
Windows批处理详细教程

下载
Java30天思维训练题第一天
Java30天思维训练题第一天

下载
129个微信小程序源码模板
129个微信小程序源码模板

博客
20201125
20201125

学院
Zabbix 5.2 基础与实践（1）
Zabbix 5.2 基础与实践（1）

学院
大白话系列算法
大白话系列算法

下载
截屏软件Snipaste2.4Betax64.zip
截屏软件Snipaste2.4Betax64.zip

学院
循环语句(下) 视频
循环语句(下) 视频

下载
JS模拟bootstrap下拉菜单效果实例
JS模拟bootstrap下拉菜单效果实例

学院
方法的重载和递归介绍 视频
方法的重载和递归介绍 视频

博客
2020.11.25【NOIP提高A组】模拟 总结
2020.11.25【NOIP提高A组】模拟 总结