藏经阁-Analysisofdropoutlearnning.pdf资源-CSDN文库

需积分: 5 49 浏览量 2023-08-26 15:13:14 上传评论收藏 185KB PDF 举报

资源推荐

资源详情

资源评论

arXiv:1706.06859v1 [cs.LG] 20 Jun 2017

Analysis of dropout learning regarded as

ensemble learning

Kazuyuki Hara

1

Daisuke Saitoh

2

Hayaru Shouno

3

College of Industrial Technology, Nihon University,

1-2-1 Izumi-cho, Narashino-shi, Chiba, 275-8575 Japan.

Graduate School of Industrial Technology, Nihon University

Graduate School of Informatics and Engineering,

The University of Electro-Communications

1-5-1 Chofugaoka, Chofu-shi, Tokyo, 182-8585 Japan.

Abstract

Deep learning is the state-of-the-art in ﬁelds such as visual object

recognition and speech recognition. This learning uses a large number of

layers, huge number of units, and connections. Therefore, overﬁ tting is

a serious problem. To avoid this problem, dropout learning is proposed.

Dropout learning neglects some inputs and hidden units in the learning

process with a probability, p, and then, the neglected inputs and hidden

units are combined with the learned n etwork to express the ﬁnal output.

We ﬁnd t hat the process of combining th e neglected hidden units with

the learned network can be regarded as ensemble learning, so we analyze

dropout learning from this point of view.

keywords: Dropout learning, over ﬁtting, regularization, ensemble learning,

soft-committee machine, teacher-student formulation

1 Introduction

Deep learning [1, 2] is attracting much a ttention in the ﬁeld of vis ual object

recognition, speech recognition, object detection, and many other domains. It

provides automatic feature extr action and has the ability to achieve outstanding

performance [3, 4].

Deep le arning uses a very deep layered networ k and a huge number o f data,

so overﬁtting is a serio us problem. To avo id overﬁtting, regularizatio n is used.

Hinton et al. proposed a regularization method called “dropout learning ” [5]

for this purpose. Dropout learning follows two processes. At learning time,

some hidden units are neglected with a probability p, and this process reduces

the network size. At test time, learned hidden units and those not learned are

1

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余8页未读，立即下载

内容反馈

weixin_40191861_zj

粉丝: 62
资源: 1万+

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip