多模态学习方法综述
陈鹏
1,2)
,李擎
1,2)
苣
,张德政
3,4)
,杨宇航
1)
,蔡铮
1)
,陆子怡
1)
1)北京科技大学自动化学院,北京1000832)工业过程知识自动化教育部重点实验室,北京1000833)北京科技大学计算机与通信工程
学院,北京1000834)材料领域知识工程北京市重点实验室,北京100083
苣通信作者,E-mail:liqing@ies.ustb.edu.cn
摘要大数据是多源异构的.在信息技术飞速发展的今天,多模态数据已成为近来数据资源的主要形式.研究多模态学习
方法,赋予计算机理解多源异构海量数据的能力具有重要价值.本文归纳了多模态的定义与多模态学习的基本任务,介绍了
多模态学习的认知机理与发展过程.在此基础上,重点综述了多模态统计学习方法与深度学习方法.此外,本文系统归纳了
近两年较为新颖的基于对抗学习的跨模态匹配与生成技术.本文总结了多模态学习的主要形式,并对未来可能的研究方向
进行思考与展望.
关键词多模态学习;统计学习;深度学习;对抗学习;特征表示
分类号TP18
Asurveyofmultimodalmachinelearning
CHEN Peng
1,2)
,LI Qing
1,2)
苣
,ZHANG De-zheng
3,4)
,YANG Yu-hang
1)
,CAI Zheng
1)
,LU Zi-yi
1)
1)SchoolofAutomationandElectricalEngineering,UniversityofScienceandTechnologyBeijing,Beijing100083,China
2)KeyLaboratoryofKnowledgeAutomationforIndustrialProcesses,MinistryofEducation,Beijing100083,China
3)SchoolofComputerandCommunicationEngineering,UniversityofScienceandTechnologyBeijing,Beijing100083,China
4)BeijingKeyLaboratoryofKnowledgeEngineeringforMaterialsScience,Beijing100083,China
苣Correspondingauthor,E-mail:liqing@ies.ustb.edu.cn
ABSTRACT“Bigdata”isalwayscollectedfromdifferentresourcesthathavedifferentdatastructures.Withtherapiddevelopmentof
information technologies, current precious data resources are characteristic of multimodes. As a result, based on classical machine
learning strategies, multi-modal learning has become a valuable research topic, enabling computers to process and understand “ big
data”.Thecognitiveprocessesofhumansinvolveperceptionthroughdifferentsenseorgans.Signalsfromeyes,ears,thenose,andhands
(tactilesense)constituteaperson’sunderstandingofaspecialsceneortheworldasawhole.Itreasonabletobelievethatmulti-modal
methodsinvolvingahigherabilitytoprocesscomplexheterogeneousdatacanfurtherpromotetheprogressofinformationtechnologies.
Theconceptsofmultimodalitystemmedfrompsychologyandpedagogyfromhundredsofyearsagoandhavebeenpopularincomputer
scienceduringthepastdecade.Incontrasttotheconceptof“media”,a“mode”isamorefine-grainedconceptthatisassociatedwitha
typicaldatasourceordataform.Theeffectiveutilizationofmulti-modaldatacanaidacomputerunderstandaspecificenvironmentina
moreholisticway.Inthiscontext,wefirstintroducedthedefinitionandmaintasksofmulti-modallearning.Basedonthisinformation,
themechanismandoriginofmulti-modalmachinelearningwerethenbrieflyintroduced.Subsequently,statisticallearningmethodsand
deeplearningmethodsformulti-modaltaskswerecomprehensivelysummarized.Wealsointroducedthemainstylesofdatafusionin
multi-modalperceptiontasks,includingfeaturerepresentation,sharedmapping,andco-training.Additionally,noveladversariallearning
strategiesforcross-modalmatchingorgenerationwerereviewed.Themainmethodsformulti-modallearningwereoutlinedinthispaper
收稿日期:2019−03−21
基金项目:国家重点研发计划(云计算和大数据专项)资助项目(2017YFB1002304)
工程科学学报,第42卷,第5期:557−569,2020年5月
ChineseJournalofEngineering,Vol.42,No.5:557−569,May2020
https://doi.org/10.13374/j.issn2095-9389.2019.03.21.003;http://cje.ustb.edu.cn
评论0