Swin-Unet: Unet-Like Pure Transformer
for Medical Image Segmentation
Hu Cao
1
, Yueyue Wang
2
, Joy Chen
3
, Dongsheng Jiang
4(
B
)
, Xiaopeng Zhang
4
,
Qi Tian
4(
B
)
, and Manning Wang
2(
B
)
1
Technische Universit¨at M¨unchen, M¨unchen, Germany
hu.cao@tum.de
2
Fudan University, Shanghai, China
{yywang17,mnwang}@fudan.edu.cn
3
Johns Hopkins University, Baltimore, MD, USA
4
Huawei Technologies, Shanghai, China
dongsheng
jiang@outlook.com, tian.qi1@huawei.com
Abstract. In the past few years, convolutional neural networks (CNNs)
have achieved milestones in medical image analysis. In particular, deep
neural networks based on U-shaped architecture and skip-connections
have been widely applied in various medical image tasks. However,
although CNN has achieved excellent performance, it cannot learn global
semantic information interaction well due to the locality of convolution
operation. In this paper, we propose Swin-Unet, which is an Unet-like
pure Transformer for medical image segmentation. The tokenized image
patches are fed into the Transformer-based U-shaped Encoder-Decoder
architecture with skip-connections for local-global semantic feature learn-
ing. Specifically, we use a hierarchical Swin Transformer with shifted
windows as the encoder to extract context features. And a symmet-
ric Swin Transformer-based decoder with a patch expanding layer is
designed to perform the up-sampling operation to restore the spatial
resolution of the feature maps. Under the direct down-sampling and up-
sampling of the inputs and outputs by 4×, experiments on multi-organ
and cardiac segmentation tasks demonstrate that the pure Transformer-
based U-shaped Encoder-Decoder network outperforms those methods
with full-convolution or the combination of transformer and convolu-
tion. The codes have been publicly available at the link (https://github.
com/HuCaoFighting/Swin-Unet).
Keywords: Transformer
· Self-attention · Medical image segmentation
1 Introduction
Benefiting from the development of deep learning, computer vision technology
has been widely used in medical image analysis. Image segmentation is an impor-
H. Cao and Y. Wang—Work done as an intern in Huawei Technologies.
c
The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
L. Karlinsky et al. (Eds.): ECCV 2022 Workshops, LNCS 13803, pp. 205–218, 2023.
https://doi.org/10.1007/978-3-031-25066-8
_9