使用编码掩码的快照HDR视频构造_SnapshotHDRVideoConstructionUsingCodedMas资源-CSDN文库

版权申诉

74 浏览量 2022-01-14 23:37:50 上传评论收藏 15.38MB PDF 举报

资源详情

资源评论

SNAPSHOT HDR VIDEO CONSTRUCTION USING CODED MASK

A PREPRINT

Masheal M. Alghamdi

National Center for Data Analytics and Artiﬁcial Intelligence

KACST

[email protected]

Qiang Fu

Visual Computing Center

KAUST

Thuwal, SA

[email protected]

Ali Thabet

Visual Computing Center

KAUST

Thuwal, SA

[email protected]

Wolfgang Heidrich

Visual Computing Center

KAUST

Thuwal, SA

[email protected]

December 7, 2021

ABSTRACT

This paper study the reconstruction of High Dynamic Range (HDR) video from snapshot-coded LDR

video. Constructing an HDR video requires restoring the HDR values for each frame and maintaining

the consistency between successive frames. HDR image acquisition from single image capture, also

known as snapshot HDR imaging, can be achieved in several ways. For example, the reconﬁgurable

snapshot HDR camera is realized by introducing an optical element into the optical stack of the

camera; by placing a coded mask at a small standoff distance in front of the sensor. High-quality HDR

image can be recovered from the captured coded image using deep learning methods. This study

utilizes 3D-CNNs to perform a joint demosaicking, denoising, and HDR video reconstruction from

coded LDR video. We enforce more temporally consistent HDR video reconstruction by introducing

a temporal loss function that considers the short-term and long-term consistency. The obtained results

are promising and could lead to affordable HDR video capture using conventional cameras.

Keywords high dynamic range imaging · image and video processing · computational photography

1 Introduction

The human visual system can sense up to 20 F-stops of luminance contrast with minimal eye adaption Banterle et al.

[2017]. Modern image sensor technology is incapable of matching this performance and reproducing the full dynamic

range of natural scenes within a single exposure. The challenge of single shot (or snapshot) High Dynamic Range

(HDR) imaging arises from the tremendous gap between the huge intensity range in natural scenes and the very limited

bit depths that modern camera sensors can offer.

Modern ﬁlms are often shot using cameras with a higher dynamic range, which mainly require both HDR shooting and

rendering in addition to special effects, particularly seamless mixing of natural and synthetic footage. HDR video is

also required in all applications that require high accuracy in capturing temporal aspects of the changes in a scene. HDR

video capture is essential for speciﬁc industrial tracking processes, such as melting, machine vision such as autonomous

driving, and monitoring systems.

In Alghamdi et al. [2019, 2021] we proposed a computational imaging solution to single shot HDR imaging by minimal

modiﬁcations of the camera to implement per-pixel exposure, as well as a deep learning algorithm based on the inception

network to reconstruct HDR images. Speciﬁcally we explored a variant of the spatially modulated HDR camera design

that does not require a custom sensor, and can be incorporated into any existing camera, be it a smartphone, a machine

arXiv:2112.02522v1 [eess.IV] 5 Dec 2021

Snapshot HDR Video Construction Using a Coded Mask A PREPRINT

SensorBinary pattern

Imaging lens

Optical mask

Binary pattern closer to the imaging sensor

Figure 1: Illustration of the effect of the binary pattern distance from the sensor plane on the resulting optical mask.

When the binary pattern is exactly on the sensor plane, we obtain the exact binary pattern mask. As the binary pattern is

moved away from the sensor by various distances

, the mask becomes blurred versions of the binary pattern. The

blurriness depends on the distance.

vision camera, or a digital SLR. We envision in particular a scenario where the camera can be reconﬁgured on the ﬂy

into an HDR mode with the introduction of an optical element into the optical stack of the camera. To realize these

desired properties, we propose a mask that is not attached directly to the surface of the image sensor as in the case

of Assorted Pixels Nayar and Mitsunaga [2000], Nayar and Branzoi [2003], but is instead placed at a small standoff

distance in front of the sensor. To support dynamic hardware reconﬁguration, we explored rapid calibration of the mask,

as well as snapshot HDR image reconstruction. We proposed (1)an easy-to-implement modulation method that requires

minimum hardware modiﬁcation and a simple self-calibration technique; (2) a new HDR reconstruction algorithm built

upon inception network that decodes decent HDR images from the raw Bayer data. We demonstrated both in simulation

and by a prototype that the combination of hardware encoding and software. In Alghamdi et al. [2019, 2021], our

primary focus was capturing and reconstructing a single HDR image from a single coded LDR image, and we worked

mainly on the spatial domain of the image.

Our main focus in this paper is to work with coded LDR videos, where we expect to beneﬁt greatly from working

jointly on both the spatial and temporal domains of video images. In this article, we will concentrate on constructing

HDR video from coded LDR images obtained using the reconﬁgurable snapshot HDR camera. Reconstructing HDR

video from coded LDR video requires restoration of the HDR values for each frame and maintenance of the consistency

between successive frames. Spatio-temporal coherence between, and among, the frames must be exploited appropriately

for accurate prediction of HDR values. Although 2D-CNNs are powerful for modeling images, 3D-CNNs are more

appropriate for spatio-temporal feature extraction as they can maintain the temporal information. For this reason, the

reconstruction would fail if we directly used our proposed networks in Alghamdi et al. [2019, 2021] to video frames

separately, because they lack a mechanism to preserve temporal coherence.

2 Related Work

Basically there are two main methods for acquiring HDR images: HDR sensors, and multiple LDR exposures captured

with standard sensors. Some HDR cameras have been presented to the scientiﬁc society, but are still not accessible for

market consumers, e.g., Nayar and Branzoi [2003], Tocci et al. [2011], Chalmers and Debattista [2011]. There are a

few alternative commercial HDR sensors, such as the Red Epic camera Red Company, Thomson Viper Grass Valley,

Arri Alexa Arri Alexa, Sony PXW-Z90 Sony, and Phantom HD Vision Research, though these devices still have a

Snapshot HDR Video Construction Using a Coded Mask A PREPRINT

limited dynamic range under 16 f-stops and are extremely expensive. To overcome this limitation, many computational

imaging techniques have been developed via co-designing the sensor architecture and post-processing algorithms for

HDR image acquisition Reinhard et al. [2010]. These methods can be categorized into three distinct approaches. The

most common way is to capture a sequence of low dynamic range (LDR) images with different exposures and fuse

them into an HDR image Debevec and Malik [1997], Mann et al. [1995]. Modern cameras and mobile devices can

easily afford successive image capture, making this method capable of producing decent HDR images for static scenes.

However, when either the scene is dynamic or the camera shakes during capture, the resulting images can suffer from

ghosting artifacts. The second approach is to utilize multiple sensors to simultaneously capture differently exposed LDR

images by, for example, splitting the light to multiple sensors with a beam-splitter McGuire et al. [2007], Tocci et al.

[2011], Kronander et al. [2013]. This sophisticated approach is expensive and needs additional rigorous calibration.

The third approach is to capture a single LDR image with a per-pixel or per-scanline coded exposure. Reconstruction

algorithms are applied later to create HDR images Nayar and Mitsunaga [2000], Nayar and Branzoi [2003], Serrano

et al. [2016]. This type of computational camera can be achieved by using a per-pixel coded exposures in the sensor

architecture Kensei et al. [2014] or by mounting an optical mask onto an off-the-shelf camera sensor.

In Alghamdi et al. [2019] for easy implementation of a grayscale mask, we choose to place a random binary optical

mask at a short distance (typically 1-2 mm) in front of the sensor. Note that we did not optimize the distance, but

simply mounted our mask on the cover glass that is usually present in front of the sensor. Light propagation from the

mask to the sensor results in a blurred version of the binary mask. The actual statistics depends on both the mask and

propagation distance. Figure 1 illustrates the effect of distance on the resulting optical mask. For HDR reconstruction

we introduced an algorithm built upon an inception network that decodes reliable HDR images from the raw noisy

coded Bayer data. We demonstrate both in simulation and using a prototype that the combination of hardware encoding

and software decoding leads to a simple, yet efﬁcient, HDR image acquisition system.

In Alghamdi et al. [2021] we present a transfer learning framework for solving the HDR reconstruction part. Our

motivation comes from the fact that available HDR image datasets are small compared to the typical requirement for

training deep neural networks. In Alghamdi et al. [2019] we solved this issue by pre-training on a large simulated HDR

dataset. This pre-training is expensive in both in memory and time; experimenting with different network structures will

need weeks of pre-training. In tIn Alghamdi et al. [2021] we incorporate architectures pre-trained on a different large

scale task, and transfer them to our HDR reconstruction. This new approach reduces our processing time substantially.

Speciﬁcally, we propose an encoder-decoder framework, that learns an initial estimation of the HDR image, as well

as useful image features. We then reﬁne our estimate through residual learning Ronneberger et al. [2015]. Our ﬁnal

network can be trained end-to-end. For the encoder, we use a VGG16 Simonyan and Zisserman [2014] network

pre-trained on ImageNet. With few epochs of training on a small dataset the network learned to reconstruct high quality

results.

3D-CNNs have successfully been applied to high-level vision tasks for videos, such as action recognition and event

classiﬁcation Ji et al. [2012], Tran et al. [2015]. The spatio-temporal feature extraction capability of 3D-CNNs was

demonstrated in Ji et al. [2012], Tran et al. [2015]. In Tran et al. [2015], the authors argued that 3D-CNNs provide an

adequate video descriptor, and a homogeneous architecture with small 3

3 convolution kernels in all layers is among

the best-performing architecture for 3D-CNNs. Moreover, the capabilities of 3 D-CNN in video enhancement, inpainting

and super-resolution have been proven Lv et al. [2018], Kappeler et al. [2016], Wang et al. [2017], Wan [2019]. This

article will use a 3D CNN to globally perform a joint demosaicking, denoising, and HDR video reconstruction coded

LDR video. As far as the author knows, there is no published work on the construction of HDR video from coded LDR

images that utilizes temporal information in the reconstruction process.

3 Methods

3.1 Imaging Model

In our HDR system, we propose placing an optical mask into the optical path in close proximity to the image sensor.

The propagation of light from the mask to the sensor leads to a grayscale modulation pattern on the captured image. In

a color camera, a Bayer Color Filter Array (CFA) samples the radiance into three color channels. The camera sensor

then converts the photons impinging on the image plane over a speciﬁc exposure time into electrons, and quantizes the

voltage values into digital numbers (DNs). Basically, the process of capturing coded LDR video can be mathematically

expressed as follows:

= g (f (BΦx

∆t)) , k = 1, 2, 3, .. (1)

剩余12页未读，继续阅读

评论收藏

内容反馈

版权申诉

使用编码掩码的快照HDR视频构造_Snapshot HDR Video Construction Using Coded Mas

评论0

最新资源

使用编码掩码的快照HDR视频构造_Snapshot HDR Video Construction Using Coded Mas

评论0

最新资源

相关推荐

众核处理器中使用写掩码实现混合写回_写穿透策略.pdf

Visual C++源代码 12 如何使用掩码格式化文本框

实现视频目标移除/视频水印移除/视频掩码补全/视频外扩等多个实用功能

修改XP_Win7系统网络连接配置IP_子网掩码_网关_DNS

ip地址掩码反掩码转换

06 子网掩码的作用视频

子网掩码计算器下载/子网掩码计算器单机版--亲测好用

子网掩码换算工具 NetMask_IPSubnetter

shell实现netmask掩码和cidr掩码位转换1

子网掩码子网掩码计算器

知IP地址和子网掩码_求另外六项

子网掩码计算子网掩码计算

计算机网络课件：2_6_4_1 使用子网掩码的分组转发过程实例.ppt

子网掩码及其应用子网掩码及其应用子网掩码及其应用

中文子网掩码和通配符掩码计算器

ipv6子网掩码计算器

rfc.rar_IP匹配_love4ng_五元组匹配_匹配掩码软件_掩码

增加ARP拦截的passthru驱动代码以及Delphi7如何使用该驱动

子网掩码计算器 2008

Cobalt Strike下载

北京邮电大学计算机考研复试笔试资料

计算机系统-笔记-HUN2021级

cs1.6老版本供下载

合成孔径雷达的经典成像算法cs(matlab)仿真代码（吐血整理，内容全，注释全）

港大CS（MSC）面试整理

合成孔径雷达RD CS OmegaK算法点目标仿真.rar

计算机科学导论原书第二版答案.zip

Cobalt-Strike-4.5

cobaltstrike4.3.zip