IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, X XXXX 1
Enhanced Standard Compatible Image Compression
Framework based on Auxiliary Codec Networks
Hanbin Son, Taeoh Kim, Hyeongmin Lee, and Sangyoun Lee, Member, IEEE
Abstract—Recent deep neural network-based research to en-
hance image compression performance can be divided into
three categories: learnable codecs, postprocessing networks, and
compact representation networks. The learnable codec has been
designed for end-to-end learning beyond the conventional com-
pression modules. The postprocessing network increases the
quality of decoded images using example-based learning. The
compact representation network is learned to reduce the capacity
of an input image, reducing the bit rate while maintaining
the quality of the decoded image. However, these approaches
are not compatible with existing codecs or are not optimal
for increasing coding efficiency. Specifically, it is difficult to
achieve optimal learning in previous studies using a compact
representation network due to the inaccurate consideration of the
codecs. In this paper, we propose a novel standard compatible
image compression framework based on auxiliary codec net-
works (ACNs). In addition, ACNs are designed to imitate image
degradation operations of the existing codec, which delivers
more accurate gradients to the compact representation network.
Therefore, compact representation and postprocessing networks
can be learned effectively and optimally. We demonstrate that
the proposed framework based on the JPEG and High Efficiency
Video Coding standard substantially outperforms existing image
compression algorithms in a standard compatible manner.
Index Terms—Image compression, deep neural networks, com-
pact representation, JPEG, High Efficiency Video Coding.
I. INTRODUCTION
W
ITH the development of media technology, the de-
mand for live streaming or communicating using high-
resolution visual data has increased, requiring better perfor-
mance of image and video compression algorithms. Standard
algorithms have been carefully developed and released for
compatibility between the encoder and decoder of compression
algorithms across users.
The JPEG standard [1], a traditional image compression
algorithm, has been the most widely used in still-image com-
pression because of its simplicity and compatibility. Its block
partitioning, transform, quantization, and entropy-coding-
based scheme widely affects many other image and video
This work was partly supported by the Institute of Information & communi-
cations Technology Planning & Evaluation (IITP) grant funded by the Korea
government, Ministry of Science and ICT (MSIT) (No. 2021-0-00172, The
development of human Re-identification and masked face recognition based
on CCTV camera) and the Institute of Information & communications Tech-
nology Planning & Evaluation (IITP) grant funded by the Korea government,
Ministry of Science and ICT (MSIT) (No.2016-0-00197, Development of the
high-precision natural 3D view generation technology using smart-car multi
sensors and deep learning).
H. Son, T. Kim, H. Lee and S. Lee are with the School of Elec-
trical and Electronic Engineering, Yonsei University, Seoul, South Ko-
Corresponding author: Sangyoun Lee
compression standards, such as JPEG2000 [2], H.264/AVC [3],
and High Efficiency Video Coding (HEVC) [4]. Recent video
coding standards [3], [4] have adopted prediction-based coding
methods to reduce the spatial and temporal redundancy of
input video. Prediction-based coding increases the complex-
ity of the compression algorithm but produces much better
compression performance.
On the other hand, compression frameworks with end-to-
end trainable deep neural networks [5]–[14] (learnable codecs
in this paper) have been proposed based on the rapid develop-
ment of deep learning. The approaches use trainable networks
to produce bitstreams and reconstruct the original image
(Fig. 1 (a)). Although these kinds of approaches structurally
consider the compression ratio and reconstruction quality,
their performance is still undesirable and incompatible with
standard codecs, which decreases the algorithm’s utility.
It is easy to propose a method to restore an image after the
compression process to improve the compression performance
while being compatible with standard codecs. Following the
developments of convolutional neural networks (CNN), such
as ResNet [15], DenseNet [16], and attention networks [17],
[18], the CNN-based image postprocessing algorithms [19]–
[28] have drastically improved the performance of image
restoration. These kinds of approaches are designated as
a postprocessing network (PPNet) in this paper. Although
PPNets perform well in reconstruction and are compatible
with standard codecs, they only efficiently increase the visual
quality of the reconstructed image but do not consider the
compression ratio (Fig. 1 (b)).
The preprocessing and postprocessing-based coding strate-
gies have been applied to consider both image quality and
compression ratio. These kinds of approaches place a spatially
downsampled image into the codec and upsample [29]–[31],
or are processed by the PPNet [32], [33]. Generally, reducing
the spatial size of an image can increase the compression
ratio. However, the approaches only work at a low bit-rate
setting [29]–[31], and the predefined downsampling operations
degrade detailed information and increase the ratio of high-
frequency components, causing an increment in the bitstream.
The content-adaptive downsampling algorithm [34] or
learning-based downsampling algorithms [35]–[39] have been
proposed to overcome these problems. These algorithms train
the content-adaptive downsampling (called compact represen-
tation in [38], abbreviated to CRNet in this paper) oper-
ation using the reconstruction loss after the PPNet. These
approaches can achieve both a high compression ratio and
better reconstruction quality with two networks. However,
backward gradients from the loss function for the CRNet
arXiv:2009.14754v2 [eess.IV] 15 Dec 2021
评论0
最新资源