没有合适的资源?快使用搜索试试~ 我知道了~
键盘敲击声解码论文
资源推荐
资源详情
资源评论
A Practical Deep Learning-Based Acoustic Side
Channel Attack on Keyboards
Joshua Harrison
1
, Ehsan Toreini
2
, and Maryam Mehrnezhad
3
1
Durham University, joshua.b.harrison@durham.ac.uk
1
University of Surrey, e.toreini@surrey.ac.uk
1
Royal Holloway University of London,
maryam.mehrnezhad@rhul.ac.uk
August 3, 2023
Abstract
With recent developments in deep learning, the ubiquity of micro-
phones and the rise in online services via personal devices, acoustic side
channel attacks present a greater threat to keyboards than ever. This pa-
per presents a practical implementation of a state-of-the-art deep learning
model in order to classify laptop keystrokes, using a smartphone integrated
microphone. When trained on keystrokes recorded by a nearby phone, the
classifier achieved an accuracy of 95%, the highest accuracy seen without
the use of a language model. When trained on keystrokes recorded using
the video-conferencing software Zoom, an accuracy of 93% was achieved,
a new best for the medium. Our results prove the practicality of these
side channel attacks via off-the-shelf equipment and algorithms. We dis-
cuss a series of mitigation methods to protect users against these series of
attacks.
Index terms— Acoustic side channel attack, Deep learning, User security and
privacy, Laptop keystroke attacks, Zoom-based acoustic attacks
1 Introduction
Side channel attacks (SCAs) involve the collection and interpretation of sig-
nals emitted by a device [30]. Such attacks have been successfully implemented
utilising a number of emanation types, such as electromagnetic (EM) waves
[34], power consumption [17], mobile sensors [23, 22, 21], as well as sound [4].
With such a wide range of available mediums, target devices have been similarly
varied, with compromised devices including printers [5], the Enigma machine
[32] and even Intel x86 processors [37]. It was found in [34] that wireless key-
boards produce detectable and readable EM emanations, however there exists
a far more prevalent emanation that is both ubiquitous and easier to detect:
keystroke sounds [27]. The ubiquity of keyboard acoustic emanations makes
them not only a readily available attack vector, but also prompts victims to
underestimate (and therefore not try to hide) their output. For example, when
1
arXiv:2308.01074v1 [cs.CR] 2 Aug 2023
typing a password, people will regularly hide their screen but will do little
to obfuscate their keyboard’s sound. The lack of concern regarding keyboard
acoustics could be due to the relatively small body of modern literature. While
multiple papers have created models capable of inferring the correct key from
test data, these models are often trained and tested on older, thicker, mechani-
cal keyboards with far more pronounced acoustics than modern ones, especially
laptops.
While keyboards have gotten less pronounced over time, the technology with
which their acoustics can be accessed and processed has improved dramatically.
Examples include advancements in microphone technology, with Voice over In-
ternet Protocol (VoIP) calls [8] and smartwatches [20] being used to collect
keystroke recordings.
Deep Learning (DL) is a subsection of machine learning (ML), in which the
model consists of multiple layers of connected neurons. Despite being prevalent
in the field of computing since the 1960s, DL saw a boom in research in the 2010s
benefiting from improvements in graphics processing technology and resulting
in huge advances in image recognition [18], the invention of Generative Adver-
sarial Networks [14] and the invention of transformers [33]. This trend in the
performance improvement continues still, with the recent development of the
state-of-the-art CoAt Network for image recognition [9], which combines more
traditional convolutional models with transformers. This improvement in DL
performance coincides with an increase in access to DL tools. Python packages
such as PyTorch [26] provide free and near-universal access to the tools required
to run these models on most devices. With the recent developments in both the
performance of (and access to) both microphones and DL models, the feasibility
of an acoustic attack on keyboards begins to look likely, as reiterated in recent
research [6]. While recent papers have explored the viability of ASCAs on lap-
top keyboards [6, 8], the area remains under-explored considering that laptops
make a prime attack vector. Laptops are more transportable than desktop com-
puters and therefore more available in public areas where keyboard acoustics
may be overheard, such as libraries, coffee shops and study spaces. Moreover,
laptops are non-modular, meaning the same model will have the same keyboard
and hence similar keyboard emanations. This uniformity within laptops could
mean that, should a popular laptop prove susceptible to ASCA, a large portion
of the population could be at risk.
In the early 2000s, SCA attacks evaluation was suggested to be encompassed
in cryptographic algorithm evaluation in many international standards bodies,
such as 3GPP security architecture [1]. However, due to a lack of testable
methods and practical tools, such an important suggestion never turned into
practical standards and guidelines. There have been many academic attempts,
but nothing led to standardisation. For instance, in a NIST report in 2011
[13], a testing methodology was proposed to assess whether a cryptographic
module utilising side channel analysis countermeasures can provide resistance to
these attacks commensurate with the desired security level. In a recent report
[2], the authors developed and compared SCA-protected implementations of
three finalists in the NIST LWC standardisation process. While there is no
specific research dedicated to side channel attack standardisation, there have
been industrial attempts to rectify some of the known attacks. For instance,
in 2018 Google proposed a new technique to mitigate the infamous Spectre
class of attacks. Similarly, Intel added hardware and firmware mitigations to
2
tackle the same range of side channel attacks. Similarly, some general guidelines
lines have been developed. For instance, the NSA TEMPEST includes acoustic
emanations as a side channel but there are limitations in how they have defined
acoustic in their terminology. Also, FIPS 140-3 draft, does not include acoustic
emanations as a side channel, despite the fact that it has been used to extract
RSA private keys from CPU’s [12]. Despite these efforts, there is no explicit
standardisation work on ASC attacks. W3C specifications on sensors
1
(e.g.,
motion sensors on mobile devices) has a dedicated section to security and privacy
considerations, where among the other risks, suggests keystroke monitoring as
one of the possible threats enabled by such sensors. These sensors have proved
to contribute to ASC attacks. The mitigation strategies suggest a range of
methods, though none of them guarantees full support.
In this paper, we present a practical fully–automated ASCA which deploys
cutting edge deep learning models to improve the body of knowledge. We will
address these research questions: (RQ1) Can we design and implement a fully
automated ASCA pipeline, including the keystroke separation, feature extrac-
tion and predictions? (RQ2) Can we deploy an accurate deep learning approach
for ASCA? (RQ3) Can we perform an accurate remote ASCA attack on VoIP
communications considering the compression and information loss in the audio
transmissions?
In this paper, we contribute to the body of knowledge in a number of ways.
(1) We propose a novel technique to deploy deep learning models featuring self-
attention layers for an ASC attack on a keyboard for the first time. (2) We
propose and implement a practical deep learning-based acoustic side channel
attack on keyboards. We use self-attention transformer layers in this attack
on keyboards for the first time. (3) We evaluated our designed attack in real–
world attack scenarios; laptop keyboards in the same room as the attacker
microphone (via a mobile device) and laptop keystrokes via a Zoom call. We
perform experiments and run multiple evaluations and our results outperform
those of previous work.
2 Related Work
While they remain a relatively under-explored topic of research, ASCAs are
not a new concept to the field of cybersecurity. Encryption devices have been
subject to emanation-based attacks since the 1950s, with British spies utilising
the acoustic emanations of Hagelin encryption devices (of very similar design to
Enigma) within the Egyptian embassy [35]. Additionally, the earliest paper on
emanation-based SCAs found by this review was written for the United States’
National Security Agency (NSA) in 1972 [11]. This governmental origin of AS-
CAs creates speculation that such an attack may already be possible on modern
devices, but remains classified. [4] notes that classified documents produced by
the NSA’s side channel specification (TEMPEST) are known to discuss acous-
tic emanations. Additionally, the partially declassified NSA document NACSIM
5000 [24] explicitly listed acoustic emanations as a source of compromise in 1982.
Within the realm of public knowledge, ASCAs have seen varying success when
applied to modern keyboards, employing a similarly varied array of methods.
1
w3.org/TR/generic-sensor/#mitigation-strategies
3
Surveying these methods, various observations may be made about the current
research landscape.
In the last decade, the number of microphones within acoustic range of
keyboards has increased and will likely continue to do so. In an attempt to
explore these attack vectors, recent research has been utilising alternate methods
of keystroke collection. As an example, in [38], the authors implemented an
attack utilising a number of off-the-shelf smartphones. These devices (as is the
case for a majority of modern phones) feature 2 distinct microphones at opposite
ends of the phone. When used together, recordings made by the collective
microphones provided sufficient time delay of arrival (TDoA) information to
triangulate keystroke position, achieving over 72.2% accuracy. [6] built upon this
research by implementing TDoA via a single smartphone in order to establish
distance to a target device, eventually achieving 91.52% keystroke accuracy
when used within a larger attack pipeline.
Alongside smartphones, video conferencing applications have seen promising
results as an attack vector. Keystrokes intercepted from a VoIP call were used
in [3], achieving a keystroke accuracy of 74.3% and this success was echoed by
[8] which achieved a top-5 accuracy of 91.7% via simply calling a victim over
Skype. These successes mark the first ASCAs implemented without the need for
physical access to a victim’s vicinity and carry the implication that if a victim’s
microphone could be accessed covertly, a similar attack could be performed. The
same implication can be found with the use of smartwatches as an attack vector.
While it remains unlikely an attacker could covertly place their smartwatch
in a private location such as an office, compromising a victim’s smartwatch
could allow unbridled collection of acoustic keystroke information. Additionally,
smartwatches can uniquely access wrist motion, a concerning property which is
utilised by [20] to achieve 93.75% word recovery.
One approach that saw prominent usage in the 2000’s but has become less
common in modern papers is the use of hidden Markov models (HMMs). A
HMM (in this context) is a model trained on a corpus of text in order to predict
the most likely word or character in the positions of a sequence. For example,
if a classifier output ‘Hwllo’, a HMM could be used to infer that ‘w’ was in fact
a falsely classified ‘e’. [39] presents a method of ASCA attack on keyboards in
which two HMMs are utilised: the first generating likely letters from a series of
classes and the second correcting the grammar and spelling of the first. Similarly
to [39], [5] used a HMM to correct the output of a classifier and saw an increase
from 72% to 95% accuracy when implemented. A difference in the two studies
however, sheds light on a potential drawback to HMM usage (and the possible
reason for lack of recent popularity).
In much of the literature, neural networks are not perceived as very successful
models when conducting keystroke recognition. In [39], a neural network was
tested against a linear classifier and was deemed less accurate. Additionally, in
[16] a neural network was found to perform the worst out of all methods tested,
and it is noted that neither [39] nor [16] could reproduce the results achieved in
[4] through use of a neural network. [3] found that multiple methods performed
better than neural networks in testing, while [32] implemented a neural network
that performed third best out of all tested classifiers. A majority of these papers
give very little detail regarding the structure or size of the neural networks
implemented, making comparison between them difficult, but in none of these
cases was a neural network selected as the final model. Given that Transformers
4
were invented in 2018 by Vaswani et al. [33], this paper is the first use of neural
networks featuring self-attention layers for an ASC attack on a keyboard.
Alongside models, variety exists between studies with respect to target de-
vices. [4], the paper most commonly cited as the first ASCA targeting a key-
board, was written in 2004 and attacked high-profile plastic keyboards synony-
mous with the time. Despite being such an early paper in the field, success
was found in attacking an ATM keypad, a corded telephone as well as 2 keys
from a laptop keyboard. While [39] and [7] perform their experiments on key-
boards similar to those from [4], [16] investigates a more modern keyboard with
a slightly recessed design. The keycaps remain large and plastic however and
differ greatly from modern laptop keyboards. [16]’s authors do however ac-
knowledge that the testing of laptop keyboards may produce different results,
due to a lack of ‘release peak’ in the waveform.
Of the surveyed literature, [6] and [8] were the only 2 papers to feature AS-
CAs on full laptop keyboards and are the most promising studies with respect
to real-world implementation. Both papers utilise two statistical models used in
similar ways: the first to infer some information regarding the victim’s environ-
ment and the second to classify keystrokes into letters. The two papers differ
in most other ways however, with [8] gathering keystrokes via Skype and the
inbuilt microphone of the laptops, while [6] utilises a mobile phone placed near
the victim’s computer. Additionally, [8] uses k-NN clustering and a Logistic
Regression classifier while [6] utilises support vector machines (SVMs). Despite
their differences, both papers are notable for their accuracy, with [6] achiev-
ing 91.2% in cross validation and 72.25% when attacking unknown victims and
keyboards. Meanwhile [8] achieves a top-5 accuracy of 91.7% given knowledge
of the victim’s typing style. [6] implements it’s attack on 2 laptops, made by
Alienware and Lenovo respectively and is notable for being the only study to
feature membrane keyboards. [8] presents a much more representative study of
keyboards, attacking 6 laptops, two of each: MacBook Pro 13” 2014, Lenovo
Thinkpad E540 and Toshiba Tecras M2.
3 Attack Design
In this section, we discuss the overall design of our proposed ASC attack. Then,
we explain our proposed approach in data collection, feature extraction and our
model design.
3.1 Fully–automated On–site and Remote ASCA
In both set of experiments (via phone and Zoom), 36 of the laptop’s keys were
used (0-9, a-z) with each being pressed 25 times in a row, varying in pressure
and finger, and a single file containing all 25 presses.
Keystroke isolation: Once all presses were recorded, a function was im-
plemented with which individual keystrokes could be extracted. Keystroke ex-
traction is executed in a majority of recent literature [39, 7, 16, 6] via a similar
method: performing the fast Fourier transform on the recording and summing
the coefficients across frequencies to get ‘energy’. An energy threshold is then
defined and used to signify the presence of a keystroke. The complete isolation
5
剩余20页未读,继续阅读
资源评论
mike_tzx
- 粉丝: 1
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功