2021EnhancedStandardCompatibleImageCompressionFrameworkBa资源-CSDN文库

需积分: 5 46 浏览量 2022-09-14 11:25:55 上传评论收藏 3.58MB PDF 举报

资源详情

资源评论

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, X XXXX 1

Enhanced Standard Compatible Image Compression

Framework based on Auxiliary Codec Networks

Hanbin Son, Taeoh Kim, Hyeongmin Lee, and Sangyoun Lee, Member, IEEE

Abstract—Recent deep neural network-based research to en-

hance image compression performance can be divided into

three categories: learnable codecs, postprocessing networks, and

compact representation networks. The learnable codec has been

designed for end-to-end learning beyond the conventional com-

pression modules. The postprocessing network increases the

quality of decoded images using example-based learning. The

compact representation network is learned to reduce the capacity

of an input image, reducing the bit rate while maintaining

the quality of the decoded image. However, these approaches

are not compatible with existing codecs or are not optimal

for increasing coding efﬁciency. Speciﬁcally, it is difﬁcult to

achieve optimal learning in previous studies using a compact

representation network due to the inaccurate consideration of the

codecs. In this paper, we propose a novel standard compatible

image compression framework based on auxiliary codec net-

works (ACNs). In addition, ACNs are designed to imitate image

degradation operations of the existing codec, which delivers

more accurate gradients to the compact representation network.

Therefore, compact representation and postprocessing networks

can be learned effectively and optimally. We demonstrate that

the proposed framework based on the JPEG and High Efﬁciency

Video Coding standard substantially outperforms existing image

compression algorithms in a standard compatible manner.

Index Terms—Image compression, deep neural networks, com-

pact representation, JPEG, High Efﬁciency Video Coding.

I. INTRODUCTION

ITH the development of media technology, the de-

mand for live streaming or communicating using high-

resolution visual data has increased, requiring better perfor-

mance of image and video compression algorithms. Standard

algorithms have been carefully developed and released for

compatibility between the encoder and decoder of compression

algorithms across users.

The JPEG standard [1], a traditional image compression

algorithm, has been the most widely used in still-image com-

pression because of its simplicity and compatibility. Its block

partitioning, transform, quantization, and entropy-coding-

based scheme widely affects many other image and video

This work was partly supported by the Institute of Information & communi-

cations Technology Planning & Evaluation (IITP) grant funded by the Korea

government, Ministry of Science and ICT (MSIT) (No. 2021-0-00172, The

development of human Re-identiﬁcation and masked face recognition based

on CCTV camera) and the Institute of Information & communications Tech-

nology Planning & Evaluation (IITP) grant funded by the Korea government,

Ministry of Science and ICT (MSIT) (No.2016-0-00197, Development of the

high-precision natural 3D view generation technology using smart-car multi

sensors and deep learning).

H. Son, T. Kim, H. Lee and S. Lee are with the School of Elec-

trical and Electronic Engineering, Yonsei University, Seoul, South Ko-

rea (e-mail: [email protected]; [email protected]; [email protected];

[email protected])

Corresponding author: Sangyoun Lee

compression standards, such as JPEG2000 [2], H.264/AVC [3],

and High Efﬁciency Video Coding (HEVC) [4]. Recent video

coding standards [3], [4] have adopted prediction-based coding

methods to reduce the spatial and temporal redundancy of

input video. Prediction-based coding increases the complex-

ity of the compression algorithm but produces much better

compression performance.

On the other hand, compression frameworks with end-to-

end trainable deep neural networks [5]–[14] (learnable codecs

in this paper) have been proposed based on the rapid develop-

ment of deep learning. The approaches use trainable networks

to produce bitstreams and reconstruct the original image

(Fig. 1 (a)). Although these kinds of approaches structurally

consider the compression ratio and reconstruction quality,

their performance is still undesirable and incompatible with

standard codecs, which decreases the algorithm’s utility.

It is easy to propose a method to restore an image after the

compression process to improve the compression performance

while being compatible with standard codecs. Following the

developments of convolutional neural networks (CNN), such

as ResNet [15], DenseNet [16], and attention networks [17],

[18], the CNN-based image postprocessing algorithms [19]–

[28] have drastically improved the performance of image

restoration. These kinds of approaches are designated as

a postprocessing network (PPNet) in this paper. Although

PPNets perform well in reconstruction and are compatible

with standard codecs, they only efﬁciently increase the visual

quality of the reconstructed image but do not consider the

compression ratio (Fig. 1 (b)).

The preprocessing and postprocessing-based coding strate-

gies have been applied to consider both image quality and

compression ratio. These kinds of approaches place a spatially

downsampled image into the codec and upsample [29]–[31],

or are processed by the PPNet [32], [33]. Generally, reducing

the spatial size of an image can increase the compression

ratio. However, the approaches only work at a low bit-rate

setting [29]–[31], and the predeﬁned downsampling operations

degrade detailed information and increase the ratio of high-

frequency components, causing an increment in the bitstream.

The content-adaptive downsampling algorithm [34] or

learning-based downsampling algorithms [35]–[39] have been

proposed to overcome these problems. These algorithms train

the content-adaptive downsampling (called compact represen-

tation in [38], abbreviated to CRNet in this paper) oper-

ation using the reconstruction loss after the PPNet. These

approaches can achieve both a high compression ratio and

better reconstruction quality with two networks. However,

backward gradients from the loss function for the CRNet

arXiv:2009.14754v2 [eess.IV] 15 Dec 2021

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, X XXXX 2

Codec

Bitstream

(a) Learnable codecs

Codec

Bitstream

PPNet

(b) Postprocessing networks (PPNet)

Codec

Bitstream

PPNet

CRNet

network (CRNet)

Codec

Bitstream

PPNet

CRNet

ACN

(d) Proposed framework based on the auxiliary codec network (ACN)

Fig. 1. Conceptual comparison between frameworks. Green and red arrows indicate forward (or inference) and backward pass (or gradients) to train the

CRNet, respectively. Gray modules indicate that it is not differentiable or a standard codec. Blue modules indicate a differentiable network.

do not consider the degradation process through the standard

codec (Fig. 1 (c)) because the standard codec, including the

quantization process, is a nondifferentiable module.

In this paper, we propose a novel standard compatible end-

to-end image compression framework based on auxiliary codec

networks (ACNs). The ACNs are designed to imitate the

forward image degradation process of existing codecs in dif-

ferentiable networks to provide the correct backward gradients

for training the CRNet (Fig. 1 (d)). These gradients allow the

compact represented image to consider both the degradation

process by the ACN and the reconstruction process by the PP-

Net. Based on ACNs, both the CRNet and PPNet are learned

together to achieve better image compression performance in

a standard compatible manner. In addition, a bit estimation

network (BENet) is proposed for training as a regularization

function to prevent undesired bit-rate increments. As recent

CRNet-based [35]–[39] generate models at a single level, the

proposed framework is also trained and optimized per codec

and rate.

The contributions of this paper are summarized as follows:

• We propose a novel CNN architecture called the ACN,

based on the prior of the image compression process to

effectively and precisely train the CRNet.

• Based on the ACN, we propose an enhanced compression

framework based on the collaborative learning scheme

between the ACN, PPNet, and CRNet. Furthermore, the

BENet facilitates training using a proper bit prediction,

preventing undesirable artifacts in compactly represented

images.

• The framework is compatible with compression algo-

rithms from the standard codecs to learning-based codecs

and any off-the-shelf image restoration networks. Based

on the highly accurate ACNs for two standard codecs:

JPEG and HEVC, our framework exhibits state-of-the-art

results compared to other image compression algorithms,

including standards and learnable codecs.

II. RELATED WORK

A. Compression Frameworks Based on End-to-end Trainable

Networks (Learnable Codecs)

As deep learning has been successful in the ﬁeld of image

processing, Toderici et al. [5], [6] ﬁrst proposed an end-to-end

deep neural network-based approach in image compression.

An input image with dimensions reduced through an auto-

encoder is stored as a binary vector for a given compression

rate and is optimized for minimum distortion.

As the possibility of the strong modeling capacity of a

neural network is revealed, many follow-up studies have been

conducted. Theis et al. [7] proposed a compressive auto-

encoder based on a residual neural network [15] and used a

Laplace-smoothed histogram as the entropy model. Ball

e et al.

[8] jointly optimized the entire model for rate-distortion per-

formance using a generalized divisive normalization transform.

Further, Ball

e et al. [9] proposed a hyperprior to effectively

capture spatial redundancy in the latent encoding. In addition,

Johnston et al. [10] proposed a priming technique and spatially

adaptive entropy model for image compression. Moreover,

Li et al. [11] proposed a content-weighted method based on

spatially adaptive importance map learning. Mentzer et al. [12]

proposed a model that concurrently trains a context model

with an encoder and used three-dimensional convolutional

networks. Minnen et al. [13] and Lee et al. [14] combined

a context-adaptive entropy model and hyperprior, producing

substantial performance improvements.

The primary difﬁculties of deep image compression algo-

rithms include making the nondifferential quantization process

end-to-end trainable, designing an entropy model that predicts

the bitstream generated from coefﬁcients, and enabling com-

pression considering both the bit rate and distortion. However,

although many deep network-based approach algorithms have

been developed, it is challenging to replace conventional

compression schemes due to compatibility. Furthermore, al-

though the state-of-the-art approaches outperform even the

Better Portable Graphics (BPG) [40] codec, which is designed

based on the intra-mode of HEVC, a signiﬁcant performance

improvement has not been demonstrated.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, X XXXX 3

B. Postprocessing Networks

Following the success of the deep learning-based approach

for high-level vision problems, methods for low-level vision

problems, such as image super-resolution and compression

artifact removal, have improved progressively. In single-image

super-resolution, Dong et al. [41] ﬁrst proposed a CNN called

the super-resolution CNN (SRCNN) for the super-resolution

problem to learn end-to-end mapping from downsampled

images to high-resolution images. In addition, Kim et al. [19]

proposed a deeper network architecture with the residual skip

connection. Ledig et al. [20] and Lim et al. [27] proposed

networks based on ResNet [15], and Tong et al. [21] proposed

a network based on DenseNet [16]. Based on the attention

networks [17], [18] in the recognition area, Liu et al. [22]

and Zhang et al. [28] proposed attention-based restoration

networks.

For the removal of compression artifacts, Dong et al. [42]

proposed a network that is slightly deeper than SRCNN [41]

to reduce artifacts in the intermediate feature maps. Like

super-resolution networks, deeper networks [23]–[25] have

been proposed. Some researchers have proposed frequency-

based networks [26], [43]–[45] to restore images from fre-

quency transform-based compression algorithms (e.g., JPEG).

Although these approaches can be used to recover a decoded

image after compression algorithms or recover the original

resolution if an image is downsampled before compression,

they can only treat already compressed images and are not

accessible to the bit-rate-related module.

C. Standard Compatible Frameworks based on the Compact

Representation Network

Learning-based image downsampling methods have been

less actively researched compared to upsampling methods.

These methods can be used as a compact representation of an

image without losing important structures and without aliasing

effects. Kim et al. [35] proposed a task-aware downscaling

(TAD) network, and Li et al. [36] proposed a compact repre-

sentation network called CNN-CR to downsample adaptively

using joint learning of the downsampling and super-resolution

networks. To preserve essential structures in an input image,

they both adapted regularization constraints between the input

and compact images. Sun et al. [37] proposed a content-

adaptive downsampling network using adaptive sampling on

the input image to prevent signiﬁcant changes in the image

structure based on a dynamic ﬁlter network [46], which does

not require regularization loss.

In addition, several approaches can reduce bit rates through

downsampling followed by postprocessing in an image com-

pression framework. For example, Afonso et al. [32] and Li

et al. [33] proposed compressing images using the handcrafted

downsampling method and restoring images using the deep

learning-based super-resolution algorithm. In addition, Jiang

et al. [38] proposed an end-to-end framework that consists of

three parts: a compact CNN (ComCNN), an image codec, and

a reconstruction CNN (RecCNN). The ComCNN produces a

compact representation of an input image, and the RecCNN

restores the degraded image through compression. The two

CNNs are conﬁgured with codecs in a compression pipeline to

increase coding efﬁciency. Jiang et al. also proposed an itera-

tive optimization algorithm because the end-to-end framework

includes a standard codec with nondifferential operations.

The CRNet (ComCNN) learns in an end-to-end fashion by

connecting directly to the pretrained PPNet (RecCNN). The

standard codec is approximated as an identity function in this

procedure, which is not optimal in the inference phase.

Unlike [38], Zhao et al. [39] proposed a virtual codec neural

network (VCNN) to propagate the gradient from the post-

processed image to a preprocessing network called a feature

description neural network corresponding to the CRNet in this

paper. However, the VCNN and postprocessing neural net-

work corresponding to PPNet are alternately trained because

they comprise different pipelines. In addition, the VCNN is

designed without careful consideration of the codec structure,

making it difﬁcult to guarantee that the correct gradient is

propagated through the VCNN. Because the VCNN is used

in the training phase and the postprocessing neural network

is used when encoding and decoding images in the testing

phase, an approximation error of the VCNN causes a train-test

discrepancy. In Section IV-B3, the CRNet-based compression

algorithms will be analyzed and compared in detail.

III. PROPOSED STANDARD COMPATIBLE IMAGE

COMPRESSION FRAMEWORK

A. Problem Statement

We employed a novel image compression framework with

the advantages of both the existing standard codec and deep

learning-based image processing networks. An important goal

in constructing a compression framework was to increase the

coding efﬁciency, which reduces the distortion of the output

image and lowers the bit rate of the compressed bitstream. To

achieve the optimal parameter of an end-to-end network, we

minimized the rate-distortion cost J = D + λR, where D is

the distortion, R is the bit rate, and λ is the weight of the

relative importance between D and R. The distortion term D

measures how different the reconstructed image is from the

original image x, which is deﬁned in the following equation:

D = δ(x, g(Φ(f(x)))), (1)

where f denotes the CRNet, g represents the PPNet, and Φ

is the function that generates a reconstruction image from

the codec. Moreover, δ represents a metric that measures the

distortion between two images. The bit-rate term R is deﬁned

in the following equation:

R = φ(f (x)), (2)

where φ denotes the function that generates the number of bits

of the compressed bitstream from the codec. Then, we deﬁned

the following objective function to train f and g to minimize

the rate-distortion cost:

J = δ(x, g(Φ(f (x)))) + λφ(f (x)). (3)

Using this objective function, we jointly optimized the

CRNet and PPNet to improve the coding efﬁciency of the

剩余14页未读，继续阅读

评论收藏

内容反馈

2021 Enhanced Standard Compatible Image Compression Framework Ba

评论0

最新资源

2021 Enhanced Standard Compatible Image Compression Framework Ba

评论0

最新资源

相关推荐

Academic Phrasebank 2021 enhanced edition 最新增强版 含中英文对照

Enhanced k-Means Clustering Algorithm for Malaria Image.pdf

EDSR-Enhanced Deep Residual Networks for Single Image Super-Resolution代码

MR brain image segmentation using an enhanced fuzzy C-means algorithm

AcademicPhrasebank-2021学术论文写作模板.pdf

Academic Phrasebank_2021_中英文对照_314页.rar

E-EDID-Standard

Advanced Image Processing Techniques and Applications-IGI Global(2017).pdf

academic phrasebank enhanced pdf 2021版

Enhanced Deep Residual Networks for Single Image Super-Resolution.pdf

Enhanced Image Viewer-crx插件

Framework Design Guidelines Digest

Academic Phrasebank Navigable 2021原版+中英文对照

eclipse_enhanced-class-decompiler_3.1.1

enhanced-class-decompiler.zip

Eclipse反编译插件 Enhanced Class Decompiler 3.1.1

Entity.Framework.Tutorial.2nd.Edition.1783550015

EDSR(torch)-Enhanced Deep Residual Networks for Single Image Super-Resolution代码

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

农村公交与异构无人机协同配送优化

李飞飞自传 我看见的世界 The World I see

4个亲测好用的ChatGPT4渠道

152.STM32-外部中断控制数码管加减.zip

零售百货品牌线上多平台整合营销策划方案【零售商场】【种草传播】.pdf

Academic Phrasebank 2021 enhanced edition 最新增强版含中英文对照

李飞飞自传我看见的世界 The World I see