Deterministic Crash Recovery for NAND Flash Based
Storage Systems
Chi Zhang
†
, Yi Wang
†
, Tianzheng Wang
§
, Renhai Chen
†
, Duo Liu
‡
, Zili Shao
†
†
Department of Computing, The Hong Kong Polytechnic University
§
Department of Computer Science, University of Toronto
‡
College of Computer Science, Chongqing University
ABSTRACT
NAND flash memory has long b een the dominant storage
medium in mobile devices. However, power failure may oc-
cur at any time and result in loss of important data. Crash
recovery therefore becomes vitally important in NAND flash
memory storage systems. As flash translation layer (FTL)
directly manages flash memory using various metadata, the
problem of FTL crash recovery in NAND flash is how to effi-
ciently and effectively maintain and recover the consistency
of FTL metadata after system crash.
In this paper, we present DCR, a deterministic approach
to crash recovery for NAND flash based storage systems.
The basic idea is to exploit the determinism of FTL and re-
produce events that happened between the last checkpoint
and the crash p oint during crash recovery. Different from
existing approaches which have to scan the whole flash mem-
ory chip, we show that DCR can recover the system more
efficiently by only checking a limited number of blocks based
on deterministic FTL operations. We have implemented
DCR for a block-level FTL and compared it with a popu-
lar version-based scheme using an ARM11-based embedded
evaluation board. Experimental results show that DCR can
greatly reduce recovery time and guarantee the consistency
of FTL metadata after recovery.
Categories and Subject Descriptors
D.4.2 [Operating Systems]: Storage Management—Sec-
ondary Storage; B.3.4 [Memory Structures]: Reliability,
Testing, and Fault-Tolerance—Error-checking
General Terms
Design, Experimentation, Performance, Reliability
Keywords
NAND flash memory, reliability, crash recovery
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
DAC ’14, June 01 - 05 2014, San Francisco, CA, USA.
Copyright 2014 ACM 978-1-4503-2730-5/14/06 ...$15.00.
NAND flash memory is widely used in embedded systems,
such as smartphones and tablets. With more and more func-
tionalities being integrated, mobile devices now often suffer
from sudden power failure because of operating system crash
and the limited battery capacity. Embedded systems run-
ning in harsh environments may also have the same prob-
lem. If a flash memory page is storing important information
(e.g., file system metadata, address mappings), the data cor-
ruption of the page is very serious, as it may cause an un-
intended change in functionality of the entire system [14].
Crash recovery therefore becomes an important component
in these systems.
Different from hard disks, the “erase-before-write” limita-
tion of flash memory requires a special layer of system soft-
ware called flash translation layer (FTL) to emulate a block
device interface for backward compatibility [4, 18]. File sys-
tems and applications can use flash memory as if they were
using a hard disk. To manage flash memory, an FTL uses
various metadata (such as address mapping tables), which
directly determine whether data can be stored and accessed
correctly. Crash recovery module must recover the system to
a consistent state by correctly manipulating FTL metadata.
This pap er fo cuses on recovering FTL metadata rather
than upper level file systems metadata. Crash recovery
has been extensively studied in file systems. Modern file
systems such as Ext4 often keep a journal to recover from
power failure. However, used on top of an FTL, they rely
on the correctness of the underlying storage system to cor-
rectly function and recover consistent states. Errors in FTL
metadata can result in disastrous consequences in file sys-
tem recovery. Since crash recovery techniques in modern
file systems normally target at hard disks, these techniques
cannot be applied to solve crash recovery problems in flash
memory. Therefore, designing a correct and robust crash
recovery mechanism for FTL not only affects the integrity
of data in flash memory but also influences the functionality
of the entire system.
Most existing approaches for FTL metadata crash recov-
ery fo cus on improving recovery efficiency by avoiding scan-
ning all pages in a flash memory block. In flash-specific file
systems, version-based schemes [2, 15] are used to identify
updated blocks and accelerate crash recovery. File-system-
aware FTLs use metadata filter and summary page in each
block to reduce overhead of crash recovery [17]. There are
also schemes targeting on specific types of FTLs [8, 12, 13].
For example, some previous studies [16, 19] showed that in
segment-based FTL, a superblock is helpful in reducing the