Stochastic Training of Graph Convolutional Networks with Variance Reduction(附全文PDF版下载)

发布时间：2018-07-16 09:50:52 文章来源：SCI论文网我要评论

SCI论文（www.lunwensci.com）:

小编特地整理了蚂蚁金服人工智能部研究员ICML贡献论文系列第三篇论文，以下只是改论文摘录出来的部分英文内容和翻译内容，具体论文英文版全文，请在本页面底部自行下载学习研究。

Stochastic Training of Graph Convolutional Networks with Variance Reduction
方差缩减的卷积网络随机训练
Jianfei Chen ， Jun Zhu ， Le Song
陈建飞，朱军，乐松

Abstract
Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data.However, GCN computes the representation of a node recursively from its neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have a convergence guarantee, and their receptive field size per node is still in the order of hundreds. In this paper, we develop control variate based algorithms which allow sampling an arbitrarily small neighbor size. Furthermore,we prove new theoretical guarantee for our algorithms to converge to a local optimum of GCN.
Empirical results show that our algorithms enjoy a similar convergence with the exact algorithm
using only two neighbors per node. The runtime of our algorithms on a large Reddit dataset is only one seventh of previous neighbor sampling algorithms.

中文翻译：
图卷积网络(GCNS)是一种强大的深层次神经网络.但是，GCN从其邻居递归地计算节点的表示，使接收字段的大小随层数呈指数增长。以前关于减少Recep的尝试子抽样邻居的域大小不具有收敛性保证，且其每个节点的接受域大小仍为数百个。在本文中，我们开发了基于控制变量的控制变量。允许对任意小的邻居大小进行采样的算法。此外，我们证明了我们的算法收敛到GCN局部最优的新的理论保证。实验结果表明，我们的算法与精确算法具有相似的收敛性。每个节点只使用两个邻居。在大型Reddit数据集上，我们算法的运行时间仅是以前邻居抽样算法的七分之一。

1. Introduction
Graph convolution networks (GCNs) (Kipf & Welling,2017) generalize convolutional neural networks (CNNs) (LeCun et al., 1995) to graph structured data. The “graph convolution” operation applies same linear transformation to all the neighbors of a node, followed by mean pooling and nonlinearity. By stacking multiple graph convolution layers, GCNs can learn node representations by utilizing information from distant neighbors. GCNs and their variants (Hamilton et al., 2017a; Velickovi ˇ c et al. ´ , 2017) have been applied to semi-supervised node classification (Kipf & Welling, 2017), inductive node embedding (Hamilton et al.,
2017a), link prediction (Kipf & Welling, 2016; Berg et al.,2017) and knowledge graphs (Schlichtkrull et al., 2017),1Dept. of Comp. Sci. & Tech., TNList Lab, State Key Lab for Intell. Tech. & Sys., Tsinghua University, Beijing, 100084, China 2Georgia Institute of Technology 3Ant Financial. Correspondence to: Jun Zhu .
Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s).outperforming multi-layer perceptron (MLP) models that do not use the graph structure, and graph embedding approaches (Perozzi et al., 2014; Tang et al., 2015; Grover & Leskovec, 2016) that do not use node features.
However, the graph convolution operation makes GCNs dif-ficult to be trained efficiently. The representation of a node at layer L is computed recursively by the representations of all its neighbors at layer L − 1. Therefore, the receptive field of a single node grows exponentially with respect to the number of layers, as illustrated in Fig. 1(a). Due to the large receptive field size, Kipf & Welling (2017) propose to train GCN by a batch algorithm, which computes the representations of all the nodes altogether. However, batch algorithms cannot handle large-scale datasets because of their slow convergence and the requirement to fit the entire dataset in GPU memory.
Hamilton et al. (2017a) make an initial attempt to develop stochastic training algorithms for GCNs via a scheme of neighbor sampling (NS). Instead of considering all the neighbors, they randomly subsample D(l) neighbors at the l-th layer. Therefore, they reduce the receptive field size to Q l D(l), as shown in Fig. 1(b). They find that for two-layer GCNs, keeping D(1) = 10 and D(2) = 25 neighbors canachieve comparable performance with the original model.However, there is no theoretical guarantee on the convergence of the stochastic training algorithm with NS. Moreover, the time complexity of NS is still D(1)D(2) = 250 times larger than training an MLP, which is unsatisfactory.In this paper, we develop novel control variate-based stochastic approximation algorithms for GCN. We utilize the historical activations of nodes as a control variate. We show that while the variance of the NS estimator depends on the magnitude of the activation, the variance of our algorithms only depends on the difference between the activation and its historical value. Furthermore, our algorithms bring new theoretical guarantees. At testing time, our algorithms give exact and zero-variance predictions, and at training time, our algorithms converge to a local optimum of GCN regardless of the neighbor sampling size D(l). The theoretical results allow us to significantly reduce the time complexity by sampling only two neighbors per node, yet still retain the quality of the model.

中文翻译：
1. 介绍
图卷积网络(GCNS)(Kipf&Wling，2017)将卷积神经网络(CNNs)(Le村等人，1995年)推广到图结构数据。“图卷积”运算同样适用l。线性变换到节点的所有邻居，然后是均值池和非线性。通过叠加多个图卷积层，GCNS可以利用InFormat学习节点表示。来自遥远邻居的离子。GCNS及其变体(Hamilton等人，2017a；Velickoviˇc等人，2017年)已被应用于半监督节点分类(Kipf&Wling，2017)，归纳节点嵌入(Hamilton等人，(2007年a)，链接预测(Kipf&Wling，2016；Berg等人，2017年)和知识图表(Schlichtkrul等人，2017年)，1 Comp.司。科技，TNList实验室，INTEL国家重点实验室。TSI技术与Sys. 中华大学，北京，100084，中国2佐治亚理工学院3蚂蚁金融。通信地址：Junju。

第35届机器学习国际会议记录，瑞典斯德哥尔摩，PMLR 80，2018年.版权2018年由作者(S)。表现好的多层感知器(MLP)模型，而不是u。不使用节点特征的图结构和图嵌入方法(Perozzi等人，2014年；Tang等人，2015年；Grover&Leskovec，2016年)。然而，图的卷积运算使GCNS算法得到了有效的训练.L层节点的表示是通过表示递归计算的。它的所有邻居在L层1。因此，如图1（a）所示，单个节点的接收场相对于层的数目呈指数增长。由于大接收场大小Kipf&Wling(2017)提出用一种批处理算法训练GCN，该算法计算所有节点的表示。但是，批处理算法不能处理lar。 GE-scale数据集由于其收敛速度较慢，并且要求将整个数据集与GPU内存相拟合。

Hamilton等人(207 A)初步尝试通过邻居抽样(NS)方案开发GCNS的随机训练算法。他们没有考虑所有的邻居，而是随机地在l层有足够的D(L)邻域.因此，它们将接收字段缩小为q L d（L），如图1（b）所示。他们发现，对于两层GCNS，保持D（1）＝10和D（2）＝25个邻居可以达到与原始模型相当的性能，但是，没有理论上的。保证了随机训练算法与NS算法的收敛性。此外，NS的时间复杂度仍然是训练MLP的D(1)D(2)=250倍，这是令人不满意的。本文提出了一种新的基于控制变量的GCN随机逼近算法.我们利用节点的历史激活作为控制变量。我们表明，当 NS估计量取决于激活的大小，算法的方差仅取决于激活量与其历史值之间的差异。此外，我们的算法带来新的理论保证。在测试时，我们的算法给出了精确的零方差预测，并在训练时给出了预测结果。时间上，我们的算法收敛到GCN的局部最优，而不考虑相邻采样大小D(L)。理论结果使我们可以通过仅采样tw来显着地降低时间复杂度。 o每个节点的邻居，但仍然保持模型的质量。

以上只摘录了论文一小部分，具体论文全文请点击下方下载，用于研究和学习。
蚂蚁金服人工智能部研究员ICML贡献论文系列第三篇论文
《Stochastic Training of Graph Convolutional Networks with Variance Reduction》下载链接：
http://www.lunwensci.com/uploadfile/2018/0716/20180716100330880.pdf

关注SCI论文创作发表，寻求SCI论文修改润色、SCI论文代发表等服务支撑，请锁定SCI论文网！

文章出自SCI论文网转载请注明出处：https://www.lunwensci.com/jisuanjilunwen/160.html

本文标签： ICML贡献论文 ,蚂蚁金服人工智能论文 ,蚂蚁金服论文 ,人工智能论文 ,人工智能 ,方差缩减的卷积网络随机训练 ,卷积网络 ,随机训练

发表评论

点击排行

Learning to Explain: An

　可解释性是一个非常重要的标准。机器学习模型...　详细>>

SCI论文范文

中国地方政府环境治理

如何设计有效的环境治理政策, 是学术界和政策...　详细>>

Stochastic Training of Graph Convolutional Networks with Variance Reduction(附全文PDF版下载)

相关内容

发表评论

点击排行

SCI论文范文