Fashion Meets Computer Vision: A Survey 3
•
We provide a comprehensive survey of the current state-of-the-art research progress in the
fashion domain and categorize fashion research topics into four main categories: detection,
analysis, synthesis, and recommendation.
•
For each category in the intelligent fashion research, we provide an in-depth and organized
review of the most significant methods and their contributions. Also, we summarize the
benchmark datasets as well as the links to the corresponding online portals.
•
We gather evaluation metrics for different problems and also give performance comparisons
for different methods.
•
We list possible future directions that would help upcoming advances and inspire the research
community.
This survey is organized in the following sections. Sec. 2 reviews the fashion detection tasks
including landmark detection, fashion parsing, and item retrieval. Sec. 3 illustrates the works for
fashion analysis containing attribute recognition, style learning, and popularity prediction. Sec. 4
provides an overview of fashion synthesis tasks comprising style transfer, human pose transformation,
and physical texture simulation. Sec. 5 talks about works of fashion recommendation involving
fashion compatibility, outfit matching, and hairstyle suggestion. Besides, Sec. 6 demonstrates selected
applications and future work. Last but not least, concluding remarks are given in Sec. 7.
2 FASHION DETECTION
Fashion detection is a widely discussed technology since most fashion works need detection first.
Take virtual try-on as an example [
67
]. It needs to early detect the human body part of the input
image for knowing where the clothing region is and then synthesize the clothing there. Therefore,
detection is the basis for most extended works. In this section, we mainly focus on fashion detection
tasks, which are split into three aspects: landmark detection, fashion parsing, and item retrieval. For
each aspect, state-of-the-art methods, the benchmark datasets, and the performance comparison are
rearranged.
2.1 Landmark Detection
Fashion landmark detection aims to predict the positions of functional keypoints defined on the
clothes, such as the corners of the neckline, hemline, and cuff. These landmarks not only indicate the
functional regions of clothes, but also implicitly capture their bounding boxes, making the design,
pattern, and category of the clothes can be better distinguished. Indeed, features extracted from these
landmarks greatly facilitate fashion image analysis.
It is worth mentioning the difference between fashion landmark detection and human pose estima-
tion, which aims at locating human body joints as Fig. 2(a) shows. Fashion landmark detection is a
more challenging task than human pose estimation as the clothes are intrinsically more complicated
than human body joints. In particular, garments undergo non-rigid deformations or scale variations,
while human body joints usually have more restricted deformations. Moreover, the local regions of
fashion landmarks exhibit more significant spatial and appearance variances than those of human
body joints, as shown in Fig. 2(b).
2.1.1 State-of-the-art methods. The concept of fashion landmark was first proposed by Liu et
al. [
123
] in 2016, under the assumption that clothing bounding boxes are given as prior information
in both training and testing. For learning the clothing features via simultaneously predicting the
clothing attributes and landmarks, Liu et al. introduced FashionNet [
123
], a deep model. The
predicted landmarks were used to pool or gate the learned feature maps, which led to robust and
discriminative representations for clothes. In the same year, Liu et al. also proposed a deep fashion
alignment (DFA) framework [
124
], which consisted of a three-stage deep convolutional network
, Vol. 1, No. 1, Article . Publication date: January 2021.