基于强化学习的黑盒模型反演攻击方法研究与应用

强化学习

模型反演

需积分: 1 177 浏览量 2024-10-13 22:59:54 上传评论收藏 751KB PDF 举报

资源推荐

资源详情

资源评论

Reinforcement Learning-Based Black-Box Model Inversion Attacks

Gyojin Han Jaehyun Choi Haeil Lee Junmo Kim

School of Electrical Engineering, KAIST

{hangj0820, chlwogus, haeil.lee, junmo.kim}@kaist.ac.kr

Abstract

Model inversion attacks are a type of privacy attack that

reconstructs private data used to train a machine learning

model, solely by accessing the model. Recently, white-box

model inversion attacks leveraging Generative Adversarial

Networks (GANs) to distill knowledge from public datasets

have been receiving great attention because of their excel-

lent attack performance. On the other hand, current black-

box model inversion attacks that utilize GANs suffer from

issues such as being unable to guarantee the completion of

the attack process within a predetermined number of query

accesses or achieve the same level of performance as white-

box attacks. To overcome these limitations, we propose a

reinforcement learning-based black-box model inversion at-

tack. We formulate the latent space search as a Markov De-

cision Process (MDP) problem and solve it with reinforce-

ment learning. Our method utilizes the conﬁdence scores of

the generated images to provide rewards to an agent. Fi-

nally, the private data can be reconstructed using the latent

vectors found by the agent trained in the MDP. The exper-

iment results on various datasets and models demonstrate

that our attack successfully recovers the private informa-

tion of the target model by achieving state-of-the-art attack

performance. We emphasize the importance of studies on

privacy-preserving machine learning by proposing a more

advanced black-box model inversion attack.

1. Introduction

With the rapid development of artiﬁcial intelligence,

deep learning applications are emerging in various ﬁelds

such as computer vision, healthcare, autonomous driving,

and natural language processing. As the number of cases

requiring private data to train the deep learning models in-

creases, the concern of private data leakage including sensi-

tive personal information is rising. In particular, studies on

privacy attacks [21] show that personal information can be

extracted from the trained models by malicious users. One

of the most representative privacy attacks on machine learn-

ing models is a model inversion attack, which reconstructs

the training data of a target model with only access to the

model. The model inversion attacks are divided into three

categories, 1) white-box attacks, 2) black-box attacks, and

3) label-only attacks, depending on the amount of informa-

tion of the target model. The white-box attacks can access

all parameters of the model. The black-box attacks can ac-

cess soft inference results consisting of conﬁdence scores,

and the label-only attacks only can access inference results

in hard label forms.

The white-box model inversion attacks [5, 25, 27] have

succeeded in restoring high-quality private data including

personal information by using Generative Adversarial Net-

works (GANs) [10]. First, they train the GANs on sepa-

rate public data to learn the general prior of private data.

Then beneﬁting from the accessibility of the parameters of

the trained white-box models, they search and ﬁnd latent

vectors that represent data of speciﬁc labels with gradient-

based optimization methods. However, these methods can-

not be applied to machine learning services such as Ama-

zon Rekognition [1] where the parameters of the model

are protected. To reconstruct private data from such ser-

vices, studies on black-box and label-only model inversion

attacks are required. Unlike the white-box attacks, these at-

tacks require methods that can explore the latent space of

the GANs in order to utilize them, as gradient-based op-

timizations are not possible. The recently proposed Model

Inversion for Deep Learning Network (MIRROR) [2] uses a

genetic algorithm to search the latent space with conﬁdence

scores obtained from a black-box target model. In addi-

tion, Boundary-Repelling Model Inversion attack (BREP-

MI) [14] has achieved success in the label-only setting by

using a decision-based zeroth-order optimization algorithm

for latent space search.

Despite these attempts, each method has a signiﬁcant is-

sue. BREP-MI starts the process of latent space search from

the ﬁrst latent vector that generates an image classiﬁed as

the target class. This does not guarantee how many query

accesses will be required until the ﬁrst latent vector is found

by random sampling, and in the worst case, it may not be

possible to start the search process for some target classes.

In the case of MIRROR, it performs worse than the label-

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

the final published version of the proceedings is available on IEEE Xplore.

20504

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余9页未读，立即下载

评论收藏

内容反馈

pk_xz123456

粉丝: 2565
资源: 3629

基于强化学习的黑盒模型反演攻击方法研究与应用

ENVI基于改进的 CASA 模型反演 NPP计算软件

基于模型-数据-知识驱动和深度学习的地表温度反演方法

基于深度学习的地震岩相反演方法.pdf

大数据-算法-水环境数学模型参数反演方法研究与应用.pdf

基于深度动态学习神经网络和辐射传输模型地表温度反演算法研究.pdf

overthrust.rar_overthrust模型_反演模型_模型反演_波速_波速反演

基于深度学习的飞行载荷测试与反演方法研究.pdf

《反演控制方法与实现-乔继红》PDF

基于MapReduce模型的生态遥感参数反演并行化方法与实现

模型反演方法在煤层顶板岩性预测中的应用

深度学习攻击方式汇总

基于深度学习的海表温度遥感反演模型.pdf

基于粒子群神经网络模型反演玉米、小麦叶面积指数.pdf

基于BP神经网络的左心室心肌组织参数反演方法的研究.pdf

基于深度学习的重力异常与重力梯度异常联合反演.pdf

基于神经网络技术的遥感水深反演模型研究.pdf

thband.rar_三波段_模型反演_遥感 反演_遥感反演_遥感数据反演

基于树模型机器学习方法的GNSS-R海面风速反演.pdf

基于matlab实现armousi模型，用于地震资料的反演方法，可作为验证反演方法精度的模型.rar

基于SMOS卫星数据的BP神经网络盐度反演模型.pdf

土壤水分反演的水云模型

博客中聚类算法（K-means、FCM、DBSCAN、DPC）的数据集（免积分）

实验三 医学知识图谱构建与推理

机器学习期末复习题及答案

最新资源

thband.rar_三波段_模型反演_遥感反演_遥感反演_遥感数据反演

实验三医学知识图谱构建与推理