没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
试读
272页
多模态技术综述 目录: Preface v Foreword 1 1 Introduction 3 1.1 Introduction to Multimodal Deep Learning . . . . . . . . . . 3 1.2 Outline of the Booklet . . . . . . . . . . . . . . . . . . . . . . 4 2 Introducing the modalities 7 2.1 State-of-the-art in NLP . . . . . . . . . . . . . . . . . . . . . 9 2.2 State-of-the-art in Computer Vision . . . . . . . . . . . . . . 33 2.3 Resources and Benchmarks for NLP, CV and multimodal tasks 54 3 Multimodal architectures 83 3.1 Image2Text . . . . . . . . . . . .
资源推荐
资源详情
资源评论
Multimodal Deep Learning
arXiv:2301.04856v1 [cs.CL] 12 Jan 2023
Contents
Preface v
Foreword 1
1 Introduction 3
1.1 Introduction to Multimodal Deep Learning . . . . . . . . . . 3
1.2 Outline of the Booklet . . . . . . . . . . . . . . . . . . . . . . 4
2 Introducing the modalities 7
2.1 State-of-the-art in NLP . . . . . . . . . . . . . . . . . . . . . 9
2.2 State-of-the-art in Computer Vision . . . . . . . . . . . . . . 33
2.3 Resources and Benchmarks for NLP, CV and multimodal tasks 54
3 Multimodal architectures 83
3.1 Image2Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.2 Text2Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.3 Images supporting Language Models . . . . . . . . . . . . . . 125
3.4 Text supporting Vision Models . . . . . . . . . . . . . . . . . 146
3.5 Models for both modalities . . . . . . . . . . . . . . . . . . . 159
4 Further Topics 181
4.1 Including Further Modalities . . . . . . . . . . . . . . . . . . . 181
4.2 Structured + Unstructured Data . . . . . . . . . . . . . . . . . 197
4.3 Multipurpose Models . . . . . . . . . . . . . . . . . . . . . . 209
4.4 Generative Art . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5 Conclusion 235
6 Epilogue 237
6.1 New influential architectures . . . . . . . . . . . . . . . . . . . 237
6.2 Creating videos . . . . . . . . . . . . . . . . . . . . . . . . . 238
7 Acknowledgements 239
iii
Preface
Author: Matthias Aßenmacher
FIGURE 1:
LMU seal (left) style-transferred to Van Gogh’s Sunflower
painting (center) and blended with the prompt - Van Gogh, sunflowers -
via CLIP+VGAN (right).
In the last few years, there have been several breakthroughs in the methodolo-
gies used in Natural Language Processing (NLP) as well as Computer Vision
(CV). Beyond these improvements on single-modality models, large-scale multi-
modal approaches have become a very active area of research.
In this seminar, we reviewed these approaches and attempted to create a solid
overview of the field, starting with the current state-of-the-art approaches in
the two subfields of Deep Learning individually. Further, modeling frameworks
are discussed where one modality is transformed into the other Chapter 3.1
and Chapter 3.2), as well as models in which one modality is utilized to
enhance representation learning for the other (Chapter 3.3 and Chapter 3.4).
To conclude the second part, architectures with a focus on handling both
modalities simultaneously are introduced (Chapter 3.5). Finally, we also cover
other modalities (Chapter 4.1 and Chapter 4.2) as well as general-purpose
multi-modal models (Chapter 4.3), which are able to handle different tasks on
different modalities within one unified architecture. One interesting application
(Generative Art, Chapter 4.4) eventually caps off this booklet.
v
剩余271页未读,继续阅读
资源评论
T1.Faker
- 粉丝: 2w+
- 资源: 9
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功