《AFR-Net:Attention-DrivenFingerprintRecognitionNetwork》论文资源-CSDN文库

生物特征识别

193 浏览量 2024-12-09 11:31:01 上传评论收藏 15.65MB PDF 举报

资源推荐

资源详情

资源评论

30 IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, VOL. 6, NO. 1, JANUARY 2024

AFR-Net: Attention-Driven Fingerprint

Recognition Network

Steven A. Grosz and Anil K. Jain , Life Fellow, IEEE

Abstract—The use of vision transformers (ViT) in computer

vision is increasing due to its limited inductive biases (e.g., local-

ity, weight sharing, etc.) and increased scalability compared

to other deep learning models. This has led to some initial

studies on the use of ViT for biometric recognition, includ-

ing ﬁngerprint recognition. In this work, we improve on these

initial studies by i.) evaluating additional attention-based archi-

tectures, ii.) scaling to larger and more diverse training and

evaluation datasets, and iii.) combining the complimentary rep-

resentations of attention-based and CNN-based embeddings for

improved state-of-the-art (SOTA) ﬁngerprint recognition (both

authentication and identiﬁcation). Our combined architecture,

AFR-Net (Attention-Driven Fingerprint Recognition Network),

outperforms several baseline models, including a SOTA com-

mercial ﬁngerprint system by Neurotechnology, Veriﬁnger v12.3,

across intra-sensor, cross-sensor, and latent to rolled ﬁngerprint

matching datasets. Additionally, we propose a realignment strat-

egy using local embeddings extracted from intermediate feature

maps within the networks to reﬁne the global embeddings in

low certainty situations, which boosts the overall recognition

accuracy signiﬁcantly. This realignment strategy requires no

additional training and can be applied as a wrapper to any

existing deep learning network (including attention-based, CNN-

based, or both) to boost its performance in a variety of computer

vision tasks.

Index Terms—Fingerprint embeddings, ﬁngerprint recogni-

tion, attention, vision transformers, ﬁxed-length ﬁngerprint

representations, cross-sensor ﬁngerprint recognition, sensor inter-

operability, universal representation.

I. INTRODUCTION

UTOMATED ﬁngerprint recognition systems have con-

tinued to permeate many facets of everyday life, appear-

ing in many civilian and governmental applications over the

last several decades [1]. As an example, India’s Aadhaar

civil registration system is used to authenticate approximately

70 million transactions per day, primarily with ﬁngerprints.

Due to the impressive accuracy of ﬁngerprint recognition

algorithms (0.626% False Non-Match Rate at a False Match

Rate of 0.01% on the FVC-ongoing 1:1 hard benchmark [2]),

researchers have turned their attention to addressing difﬁcult

Manuscript received 2 May 2023; accepted 15 September 2023. Date of

publication 19 September 2023; date of current version 8 March 2024. This

article was recommended for publication by Associate Editor S. Schuckers

upon evaluation of the reviewers’ comments. (Corresponding author:

Steven A. Grosz.)

The authors are with the Department of Computer Science and Engineering,

Michigan State University, East Lansing, MI 48824 USA (e-mail: groszste@

cse.msu.edu; jain@cse.msu.edu).

Digital Object Identiﬁer 10.1109/TBIOM.2023.3317303

https://uidai.gov.in/aadhaar_dashboard/auth_trend.php

edge-cases where accurate recognition remains challenging,

such as partial overlap between two candidate ﬁngerprint

images and cross-sensor interoperability (e.g., optical to capac-

itive, contact to contactless, latent to rolled ﬁngerprints, etc.),

as well as other practical problems like template encryp-

tion, privacy concerns, and matching latency for large-scale

(gallery sizes on the order of tens or hundreds of millions)

identiﬁcation.

For many reasons, some of which mentioned above

(e.g., template encryption and latency), methods for extract-

ing ﬁxed-length ﬁngerprint embeddings using various deep

learning approaches have been proposed. Some of these

methods were proposed for speciﬁc ﬁngerprint-related tasks,

such as minutiae extraction [3], [4] and ﬁngerprint index-

ing [5], [6], whereas others were aimed at extracting a single

“global” embedding [7], [8], [9]. Of these methods, the most

common architecture employed is the convolutional neural

network (CNN), often utilizing domain knowledge (e.g., minu-

tiae [8]) and other tricks (e.g., speciﬁc loss functions, such as

triplet loss [10]) to improve ﬁngerprint recognition accuracy.

More recently, motivated by the success of attention-based

Transformers [11] in natural language processing, the com-

puter vision ﬁeld has seen an inﬂux of the use of the vision

transformer (ViT) architecture for various computer vision

tasks [12], [13], [14], [15].

In fact, two studies have already explored the use of ViT

for learning discriminative ﬁngerprint embeddings [16], [17];

albeit, with the following limitations: i.) the authors of [16]

supervised their ViT model using a pretrained CNN as a

teacher model and thus did not give the transformer archi-

tecture the freedom to learn its own representation and ii.) the

authors of [17] were limited in the data and choice of loss

function used to supervise their transformer model, thereby

limiting the ﬁngerprint recognition accuracy compared to the

baseline ResNet50 model. Nonetheless, the authors in [17] did

note the complimentary nature between the features learned by

the CNN-based ResNet50 model and the attention-based ViT

model. This motivated us to evaluate additional attention-based

models that bridge the gap between purely CNN and purely

attention-based models, in order to leverage the beneﬁts of

each. Toward this end, we evaluate two ViT variants (vanilla

ViT [12] and Swin [15]) along with two variants of a CNN

model [18] (ResNet50 and ResNet101) for ﬁngerprint recog-

nition. In addition, we propose our own architecture, AFR-Net

(Attention-Driven Fingerprint Recognition Network), consist-

ing of a shared feature extraction and parallel CNN and

attention classiﬁcation layers.

2637-6407

 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: SICHUAN UNIVERSITY. Downloaded on November 04,2024 at 13:06:08 UTC from IEEE Xplore. Restrictions apply.

GROSZ AND JAIN: AFR-Net 31

Fig. 1. Example correspondence between local features extracted from the

intermediate feature maps of our AFR-Net model for two images of the same

ﬁnger. Note, these local features are not necessarily the same as minutiae

points, which are commonly used in ﬁngerprint recognition.

Even though these models are trained to extract a single,

global embedding representing the identity of a given ﬁnger-

print image, we make the observation that for both CNN-based

and attention-based models, the intermediate feature maps

encode local features that are also useful for relating two can-

didate ﬁngerprint images. Correspondence between these local

features can be used to guide the network in placing atten-

tion on overlapping regions of the images in order to make a

more accurate determination of whether the images are from

the same ﬁnger. Additionally, these local features are useful

in explaining the similarity between two candidate images by

directly visualizing the corresponding keypoints, as shown in

Figure 1.

One remaining concern with regards to deep learning-based

ﬁngerprint matchers is their generalization across different

ﬁngerprint sensing technology (e.g., optical, capacitive, etc.),

ﬁngerprint readers (e.g., CrossMatch, GreenBit, etc.) and ﬁn-

gerprint impression types (e.g., rolled, plain, contactless, etc.).

This problem is often referred to as sensor interoperability,

which has received some attention in recent years [19], [20],

[21], [22]. In this paper, we demonstrate the generalizability of

our learned representations via extensive experiments across

a wide range of ﬁngerprint sensors and types. As we show

in the ablation study in Section IV-E, much of the challenge

of sensor interoperability is mitigated by training on a large,

diverse training dataset; however, additional performance gains

are achieved by incorporating both of the complimentary CNN

and attention-based features into our network.

More concisely, the contributions of this research are as

follows:

• Analysis of various attention-based architectures for ﬁn-

gerprint recognition.

• Novel architecture for ﬁngerprint recognition, AFR-Net,

which incorporates attention layers into the ResNet archi-

tecture.

• State-of-the-art (SOTA) ﬁngerprint recognition

performance (authentication and identiﬁcation) across

several diverse benchmark datasets, including intra-

sensor, cross-sensor, contact to contactless, and latent to

rolled ﬁngerprint matching.

• Novel use of local embeddings extracted from

intermediate feature maps to both improve the recognition

accuracy and explainability of the model.

• Ablation analysis demonstrating the importance of each

aspect of our model, including choice of loss function,

training dataset size, use of spatial alignment module, use

of both classiﬁcation heads, and use of local embeddings

to reﬁne the global embeddings.

II. R

ELATED WORK

Here we brieﬂy discuss the prior literature in deep learning-

based ﬁngerprint recognition and the use of vision transformer

models for computer vision. For a more in-depth discussion on

these topics, refer to one of the many survey papers available

(e.g., [23] for deep learning in biometrics and [24] for the use

of transformers in vision).

A. Deep Learning for Fingerprint Recognition

Over the last decade, deep learning has seen a plethora

of applications in ﬁngerprint recognition, including minutiae

extraction [3], [4], ﬁngerprint indexing [5], [6], presentation

attack detection [25], [26], [27], [28], synthetic ﬁngerprint

generation [29], [30], [31], [32], and ﬁxed-length ﬁnger-

print embeddings for recognition [7], [8], [9]. For purposes

of this paper, we limit our discussion to ﬁxed-length (global)

embeddings for ﬁngerprint recognition.

Among the ﬁrst studies on extracting global ﬁngerprint

embeddings using deep learning was proposed by Li et al. [7],

which used a fully convolutional neural network to produce a

ﬁnal embedding of 256 dimensions. The authors of [8] then

showed improved performance of their ﬁxed-length embedding

network by incorporating minutiae domain knowledge as an

additional supervision. Similarly, Lin and Kumar incorporated

additional ﬁngerprint domain knowledge (minutiae and core

point regions) into a multi-Siamese CNN for contact to con-

tactless ﬁngerprint matching [9]. More recently, [16] and [17]

proposed the use of vision transformer architecture for extract-

ing discriminative ﬁxed-length ﬁngerprint embeddings, both

showing that incorporating minutiae domain knowledge into

ViT improved the performance.

B. Vision Transformers for Biometric Recognition

Transformers have led to numerous applications across the

computer vision ﬁeld in the past couple of years since they

were ﬁrst introduced for computer vision applications by

Doesovitskiy et al. in 2021 [12]. The general principle of

transformers for computer vision is the use of the attention

mechanism for aggregating sets of features across the entire

image or within local neighborhoods of the image. The notion

of attention was originally introduced in 2015 for sequence

modeling by Bahdanau et al. [33] and has been shown to be

a useful mechanism in general for operations on a set of fea-

tures. Today, numerous variants of ViT have been proposed

for a wide range of computer vision tasks, including image

recognition, generative modeling, multi-model tasks, video

processing, low-level vision, etc. [24].

Some recent works have explored the use of transformers

for biometric recognition across several modalities including

face [34], ﬁnger vein [35], ﬁngerprint [16], [17], ear [36],

gait [37], and keystroke recognition [38]. In this work,

Authorized licensed use limited to: SICHUAN UNIVERSITY. Downloaded on November 04,2024 at 13:06:08 UTC from IEEE Xplore. Restrictions apply.

32 IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, VOL. 6, NO. 1, JANUARY 2024

Fig. 2. Overview of the AFR-Net architecture. First, input ﬁngerprint images are passed through a spatial alignment module for better alignment of two

ﬁngerprints under comparison, then passed through a shared feature extraction, followed by two classiﬁcation heads (one CNN-based and the other attention-

based). For our implementation, we followed the ResNet50 architecture as our backbone and CNN classiﬁcation head and used 12 multi-headed attention

transformer encoder blocks for the attention-based classiﬁcation head.

we improve upon these previous uses of transformers by

evaluating additional attention-based architectures for extract-

ing global ﬁngerprint embeddings.

III. AFR-N

ET:ATTENTION-DRIVEN FINGERPRINT

RECOGNITION MODEL

Our approach consists of i.) investigating several baseline

CNN and attention-based models for ﬁngerprint recognition,

ii.) fusing a CNN-based architecture with attention into a sin-

gle model to leverage the complimentary representations of

each, iii.) a strategy to use intermediate local feature maps to

reﬁne global embeddings and reduce uncertainty in challeng-

ing pairwise ﬁngerprint comparisons, and iv.) use of spatial

alignment module to improve recognition performance. Details

of each component of our approach are given in the following

sections.

A. Baseline Methods

First, we improve on the initial studies [16], [17] apply-

ing ViT to ﬁngerprint recognition to better establish a fair

baseline performance of ViT compared to CNN-based mod-

els. This is accomplished by removing the limitations of the

previous studies in terms of choice of supervision and size of

training dataset used to learn the parameters of the models.

We then compare ViT with two variants of the ResNet CNN-

based architecture, ResNet50 and ResNet101. For our speciﬁc

choice of ViT, we decided on the small version with patch size

of 16, number of attention heads of 6, and layer depth of 12.

We selected this architecture as it presents an adequate trade-

off in speed and accuracy compared to other ViT variants. In

addition, we compare the performance of a popular ViT suc-

cessor, Swin, which uses a hierarchical structure and shifted

windows for computing attention within local regions of the

image. Speciﬁcally, we used the small Swin architecture with

patch size of 4, window size of 7, and embedding dimension

of 96.

For additional baseline comparisons with previous methods,

we included the latest version of the commercial-of-the-shelf

(COTS) ﬁngerprint recognition system from Neurotechnology,

Veriﬁnger v12.3,

and DeepPrint [8], a ﬁngerprint recogni-

tion network based on Inceptionv4 backbone that incorporates

ﬁngerprint domain knowledge into the learning framework.

According to the FVC On-going competition, Veriﬁnger is

the top performing algorithm for the 1:1 ﬁngerprint veriﬁca-

tion benchmark [2] and DeepPrint has also shown competitive

performance with Veriﬁnger on some benchmark datasets [8].

B. Proposed AFR-Net Architecture

Based on previous research suggesting the complimentary

nature of ViT and ResNet embeddings, we were motivated

to merge the two into a single architecture, referred to as

AFR-Net. As shown in Figure 2, AFR-Net consists of a spa-

tial alignment module, shared CNN feature encoder, CNN

classiﬁcation head, and an attention classiﬁcation head. The

shared alignment module and feature encoder greatly reduces

the number of parameters compared to the fusion of the two

separate networks and also allows the two classiﬁcation heads

to be trained jointly.

Due to the two classiﬁcation heads, we have two bot-

tleneck classiﬁcation layers which map each of the 384-d

embeddings, Z

and Z

, into a softmax output represent-

ing the probability of a sample belonging to one of N

classes (identities) in our training dataset. We employ the

Additive Angular Margin (ArcFace) loss function to encour-

age intra-class compactness and inter-class discrepancy of

the embeddings of each branch [39]. Through an ablation

study, presented in Section IV-E, we ﬁnd that despite the

relatively little use of this loss function in previous ﬁnger-

print recognition papers [17], [40], the ArcFace loss function

makes an enormous difference in the performance of our

model.

https://neurotechnology.com/veriﬁnger.html

Authorized licensed use limited to: SICHUAN UNIVERSITY. Downloaded on November 04,2024 at 13:06:08 UTC from IEEE Xplore. Restrictions apply.

剩余12页未读，继续阅读

评论收藏

内容反馈

czlczl20020925

粉丝: 137
资源: 2

《AFR-Net: Attention-Driven Fingerprint Recognition Network》论文

最新资源

《AFR-Net: Attention-Driven Fingerprint Recognition Network》论文

《AFR-Net: Attention-Driven Fingerprint Recognition Network》PPT

“人”小本领高 阿尔法AFR-G4高性能宽带路由器.pdf

matlab精度检验代码-HDC-Language-Recognition:用于语言识别的超维计算：Matlab和RTL实现

个小功能多!阿尔法AFR-V8S光纤路由器应用揭秘.pdf

addons-vauxoo:我们所有与解决Odoo上的通用问题或解决Odoo Core上的内部问题的开发相关的模块，如果有的话，也许是在解决您公司中的问题，请尝试并报告您所看到的

七个LAN口的AFR-V8E宽带路由器.pdf

RomRaider-Assist:简单的页面可帮助您使用RomRaider创建地图。 目前是日志查看器

一个功能超强的查找与替换工具

AFR - Advanced Find and Replace 中文绿色正式版

最具性价比的家用路由器 阿尔法 AFR-G3.pdf

元器件应用中的CF系列CF7754AFR集成电路实用检测数据

工程常用缩写.pdf

阿尔法千兆网吧解决方案

AFR-GU50 afr gu50无线网卡驱动下载

奥托尼克斯接线端子概览.pdf

afr-example-weather-station:该示例演示了使用Amazon FreeRTOS的IoT库实现气象站的实现

bang-chrome-extension:快速而令人讨厌的搜索引擎选择器。 将 !bang (asduckduckgo) 添加到 chrome 中

士研电机液面控制器AFR-1说明书.pdf

士研电机液面控制器AFR-12说明书.pdf

AFR1:改写AFR

士研电机液面控制器AFR-G说明书.pdf

STM32F407运用总结1

HYNIX 4G DDR3

afr-f1.github.io

A20 DDR3 支持列表

Computing_DDR3L_H5TC4G4

H5TQ4G63AFR datasheet

GBase8s 数据库网络监控及调整方法.docx

最新资源

“人”小本领高阿尔法AFR-G4高性能宽带路由器.pdf

RomRaider-Assist:简单的页面可帮助您使用RomRaider创建地图。目前是日志查看器

最具性价比的家用路由器阿尔法 AFR-G3.pdf

bang-chrome-extension:快速而令人讨厌的搜索引擎选择器。将 !bang (asduckduckgo) 添加到 chrome 中