New Flash memory technology!

所需积分/C币:49 2013-04-29 11:30:10 266KB PDF
收藏 收藏 9

New Flash memory technology related to price trend!
block reaches the rated lifetime of its chip. hannel o Figure 1 shows the chips' rated lifetime as well as the Die o bit error rate(Ber) measured at that lifetime. The chips lifetimes decrease slowly with feature size, but fall pre- Controller Channel 23 cipitously across SlC, MlC and tlc devices. while the error rates span a broad range, there is a clear upward Flash Flash Die 2 Die 3 trend as feature size shrinks and densities increase. Ap- plications that require more reliable or longer-term stor- Figure 3: Architecture of SSD-CDC The architecture of age prefer SlC chips and those at larger feature sizes our baseline SSD. This structure remains constant w hile ecause tey expenence lar lewer enors Ior many more we scale the technology used for each fash die cycles than denser technology Architecture Parameter Value Theory and empirical evidence also indicate lower Example Interface PCle 1.1x4 performance for denser chips, primarily for the program FTL Overhead Latency 30 us or write operation. Very early flash memory would apply Channels a steady, high voltage to any cell being programed for Channel Speed[ 400 MB/s [1] fixed amount of time. However, Suh et al. [10] quickly Dies per Channel(DPC)|4 determined that the Incremental Step Pulse Programming Baseline parameter value (ISPP) would be far more effective in tolerating variation between cells and in environmental conditions. ISPP per SSD Price $7, 800 Capacity 320 GB forms a series of program pulses each followed by a read Feature Size 34 nm verify step. Once the cell is programmed correctly, pro Cell Type MLC gramming for that cell stops. This algorithm is necessary because programming is a one-way operation: There is Table 1: Architecture and Baseline Configuration of no way to"unprogram"a cell short of erasing the en- SSD-CDC These parameters define the Enterprise-class tire block, and overshooting the correct voltage results in Constant Die Count SSD(SSD-CDC)architecture and storing the wrong value ISPP remains a key algorithm in starting values for the flash technology it contains modern chips and is instrumental in improving the per formance and reliability of higher-density cells 3 A Prototypical SSD Not long after Samsung proposed MiC for NAND Hash [ 5, 6], Toshiba split the two bits to separate pages so To model the effect of evolving flash characteristics on that the chip could program each page more quickly by omplete SSDs we combine empirical measurement of moving the cell only halfway through the voltage range flash chips in an SSD architecture with a constant die with each operation [11]. Much later, Samsung pro- count called SSD-CDC. SSD-CDC's architecture is rep vided further performance improvements to pages stored resentative of high-end SSDs from companies such as in the least significant bit of each cell [8]. By applying FusionIO. OCZ and Virident. We model the complexi fast, imprecise pulses to program the fast pages and us ties of FTL design by assuming optimistic constants and ing fine-grain, precise pulses to program the slow pages. overheads that provide upper bounds on the performance These latter pulses generate the tight VTH distributions characteristics of SSDs built with future generation flash that mlC devices require, but they make programming technology much slower. All the mlc and tlc devices we tested Section 3. 1 describes the architecture of SSD-CDc split and program the bits in a cell this way while section 3. 2 describes how we combine this model For Ssd designers, this performance variability be with our empirical data to estimate the performance of tween pages leads to an opportunity to easily trade off an SSD with fixed die area apacity and performance[4, 9. The SSD can, for exam ple use only the fast pages in MLC parts, sacrificing half 3.1 SSD-CDC their capacity but making latency comparable to SLC In this work, we label such a configuration "MLC-1"- Table l describes the parameters of SSD-CDC's archi- an mlc device using just one bit per cell. Samsung and tecture and Figure 3 shows a block representation of Micron have formalized this trade-off in multi-level flash its architecture. SSD-CDC manages an array of flash by providing single and multi-level cell modes [7] in the chips and presents a block-based interface. Given current same chip and we believe FusionIO uses the property in trends in PCle interface performance, we assume that the the controller of their SMlC-based drives g PCIe link is not a bottleneck for our design TLC-3 t tlC.3 100 MLC-2 2.5 X MLC-2 MLC-1 米MLC-1 60 SLC-1 SLC-1 51.5 40 一 0.5 0.0 512256 2 256 128 Feature Size(nm) Feature Size(nm) Figure 4: Flash Chip Latency Trends Fitting an exponential to the collection of data for each cell technology, SLC-l MLC-1, MLC-2 and TLC-3, allows us to project the behavior of future feature sizes for(a) read latency and(b) write latency. Doing the same with one standard deviation above and below the average for each chip yields a range of probable behavior, as shown by the error bars Read Latency (u.s) Write I atency (us Configuration Equation Inm Equation Inm max=24.0e 0.36%max=287.0 1.07% SLC-1‖av g 0.32%avg=2626 1.19% min=228e-2.9e-3 029%min=239.3e 1.34 max=34.8 069%max=467.3e10e-21.01% MIC1avg=35-683-3063%avg=390c-87-3087% min=32.2 e 5.6e-3 0.57%min=3165e-702-3f|0.70% max =52.5e 0.45%max=1782e 0.84 MLC2‖avg=4332352-3f052%avg=1084486-37086% min=342e-66-30.66%mn=3937e99-31.00% max 1.3e-3f 0.13%max=484.8c-1l-2/1.12% TcC3ag=78244400%ayg=2262230.7% min=54.0e99e-4 0.10 min=2620.8c46-24.67% Table 2: Latency Projections We generated these equations by fitting an exponential(y= l)to our empirical data and they allow us to project the latency of flash as a function of feature size ( )in nm. The percentages represent the increase in latency with Inm shrinkage. TThe trends for TLC are less certain than for SLC or MLC, because our data for tlc devices is more limited Number Metric Value BisPercellproj Feature Sizer apacity SSD_BWproj Channel Count x ChannelBWproj ChannelEr i Dies Per Channel-1)*Page Size when Dielatencyproj s BwThreshold ChannelBWproj=ChannelSpeed when Dielatencyproj Bw Threshold Trans ferTime Channelspeed 56798 BWThreshold=(Dies Per Channel-1)xTransferTime SSD 1OPsproj=Channel Count x ChannelloPsproj ChannelloPsproi Transfer Time when Dielatencyproi s lOPsthreshold ChannelloPsproj i Dies perchannel-I Dielatencyproj when Dielatencyproj > 1oPsThreshold 10 Trans ferTime Channelspeed 11 1OPsThreshold=(Dies Per Channel-1) Trans ferTime Table 3: Model's Equations These equations allow us to scale the metrics of our baseline Ssd to future process technologies and other cell densities The SSDs controller implements the FTL. We esti 16384 mate that this management layer incurs an overhead of 4096 30 us for ECC and additional FtL operations. The con 1024 troller coordinates 24 channels. cach of which connects four dies to the controller via a 400 MB/s bus To fix the 16 TLC.3 cost of SSD-CDC, we assume a constant die count equal MLC-2 to 96 dies SLC-1 MLC-1 0706050403 3.2 Projections Feature Size(nm) Figure 5: Scaling of SSD Capacity Flash manufacturers We now describe our future projections for seven met- increase SSDs' capacity through both reducing feature rics of SSD-CDC: capacity, read latency, write latency, size and storing more bits in each cell read bandwidth. write bandwidth. read IoPs and write IOPs. Table 1 provides baseline values for SSD-CDc bandwidth is the bottleneck. Equation 6 determines the and Table 2 summarizes the projections we make for threshold between these two cases by multiplying the the underlying flash technology. This section describes transfer time(see Equation 5)by one less than the num- the formulas we use to compute each metric from the ber of dies on the channel. If the latency of the operation projections(summarized in Table 3). Some of the cal- on the die is larger than this number, the die is the bot culations involve making simplifying assumptions about tleneck and we use Equation 3. Otherwise, the channels SSD-CDC's behavior. In those cases. we make the as- bandwidth is simply the speed of its bus(Equation 4) sumption that maximizes the SSDs performance IOPs The calculation for iops is very similar to band Capacity Equation I calculates the capacity of SSD. width, except instead of using the fiash's page size in all CDC, by scaling the capacity of the baseline by the cases, we also account for the access size since it effect square of the ratio of the projected feature size to the the transfer time: If the access size is smaller than one baseline feature size(34 nm). We also scale capacity page, the system still incurs the read or write latency of depending on the number of bits per cell(BPC) the pro- one entire page access. Equations 7-11 describe the cal jected chip stores relative to the baseline BPC(2-MLC). culations In some cases. we configure SSD-CDC to store fewer bits per cell than a projected chip allows, as in the case of MLC-1. In these cases, the projected capacity would 4 Results reflect the elective bits per cell This section explores the performance and cost of SSD Latency To calculate the projected read and write la- CDC in light of the flash feature size scaling trends de- tencies, we fit an exponential function to the empirical scribed above. We explore four different cell technolo data for a given cell type. Figure 4 depicts both the gies(SLC-1, MLC-1, MLC-2, and TLC-3)and feature raw latency data and the curves fitted to Slc-l, MLC- sizes scaled down from 72 nm to 6.5 nm(the smallest 1, MLC-2 and TLC-3. To generate the data for MlC- feature size targeted by industry consensus as published which ignores the "slow" pages, we calculate the av- in the International Technology Roadmap for Semicon- erage latency for reads and writes for thefast"pages ductors(ITRS)[2), using a fixed silicon budget for flash only. Other configurations supporting reduced capacity storage and improved latency, such as tLC-l and TLC-2, would use a similar method. We do not present these latter con figurations. because there is very little tlc data ava 4.1 Capacity and cost able to create reliable predictions. Figure 4 shows each Figure 5 shows how SSD-CDC's density will increase collection of data with the fitted exponentials for average, as the number of bits per cell rises and feature size con minimum and maximum, and Table 2 reports the egua- tinues to scale. Even with the optimistic goal of scaling tions for these fitted trends. We calculate the projected flash cells to 6.5 nm, SSD-CDC can only achieve capac latency by adding the values generated by these trends to ities greater than 4.6 TB with two or more bits per cell the ssds overhead reported in Table 1 TLC allows for capacities up to 14 TB-pushing capac Bandwidth To find the bandwidth of our SSD, we ity beyond this level will require more dies must first calculate each channel's bandwidth and then Since capacity is one of the key drivers in SSd design multiply that by the number of channels in the Ssd and because it is the only aspect of SsDs that improves (Equation 2). Each channels bandwidth requires an un- consistently over time, we plot the remainder of the char derstanding of whether channel bandwidth or per-chip acteristics against SSD-CDC's capacity TLC-3 2.50 0.15 MLC-1 SLC-1 2.00 00 a0.05 100 000 00 1024 16384 1024 409616384 SSD Capacity(GB) SSD Capacity(GB) Figure 6: SSD Latency In order to achieve higher densities, fash manufacturers must sacrifice(a)read and(b)write latency. 10000 4000 3500 8000 3000 6000 2500 m4000 a150 S 乏1000 2000 MLC MLC- 500 TLC 0 409616384 1024 409616384 SSD Capacity(GBy SSD Capacity(GB) Figure 7: SSD Bandwidth SlC will continue to be the high performance option. To obtain higher capacities without additional dies and cost will require a significant performance hit in terms of (a) read and (b) write bandwidth moving from slc-l to mlc-2 or TlC-3 R4000 4000 15000 3500 3500 83000 3000 82500 2500 100009 2000 1000 2000 1500 O1500 5000 51000 500。g1000 500 c500 0 25 1024409616384 1024 409616384 1000 U01000 SLC-1 800 400 MLC-2 800 3000 600 1C300260 2000£ a400 20 400 g200 100 g200 100a ≥ 0 2561024409616384 1024409616384 SSD Capacity(GB) SSD Capacity(GB) Figure 8: SSD IOPS With a fixed die area, higher capacities can only be achieved with low-performing MlC-2 and TLC-3 technologies, for 512B (a) reads and(c)writes and for 4kB(b) reads and (d) writes 4.2 Latency 1.4 Reduced latency is among the frequently touted advan 1.2 tages of flash-based SSDs over disks, but changes in Hlash technology will erode the gap between disks and SSDs. Figure 6 shows how both read and write laten a0.8 二:;=,二,,:2二,二二二 8 cies increase with SSD-CDC's capacity. Reaching be 0.6 Write latency yond 4.6TB pushes write latency to I ms for MLC-2 and 0.4 Read Latency over 2. I ms for TLC. Read latency, rises to least 70 us SSD Cost 0.2 Read Throughpu for MlC-2 and 100 us for TLC. Write Throughput The data also makes clear the choices that ssd de 2012 2020 2024 signers will face. Either SSD-CDC's capacity stops scal Time ing at 4.6 TB or its read and write latency increases sharply because increasing drive capacity with fixed die Figure 9: Scaling of all parameters While the cost of an area would necessitate switching cell technology from MLC-based SSD remains roughly A constant, read and SLC-1 or MLC-1 to MLC-2 or TLC-3. With current Particularly write performance decline trends, our SSDs could be up to 34x larger, but the la- currently available flash. Our projections show that the tency will be I7x worse for reads and 2. 6x worse for cost of the flash in SSD-CDC will remain roughly con writes. This will reduce the write latency advantage that stant and that density will continue to increase(as long as SSDS offer relative to disk from 8.3x(vS. a 7 ms disk flash scaling continues as projected by the ITRS). How- access) to just 3. 2X. Depending on the application, this ever, they also show that access latencies will increase reduced improvement may not justify the higher cost of by 26% and that bandwidth (in both MB/s and IOPS SSDS will drop by 21%o 4.3 Bandwidth and IoPs 5 Conclusion SSDs offer moderate gains in bandwidth relative to disks, The technology trends we have described put SSDs in but very large improvements in random IOP perfor- an unusual position for a cutting-edge technology: SSDs ce. However, increases in operation latency will will continue to improve by some metrics (notably den drive down iops and bandwidth sity and cost per but every thing else about them Figure 7 illustrates the effect on bandwidth. Read is poised to get worse. This makes the future of SSDs bandwidth drops due to the latency of the operation on cloudy: While the growing capacity of SSDs and high the iash die. Operation latency also causes write band- IOP rates will make them attractive in many applications width to decrease with capacity the reduction in performance that is necessary to increase SSDs provide the largest gains relative to disks for capacity while keeping costs in check may make it dif small, random IOPs. We present two access sizes- the ficult for SSDs to scale as a viable technology for some istorically standard disk block size of 512 B and the applications most common Hash page size and modern disk access size of 4 kB. Figure 8 presents the performance in terms References of IOPs. When using the smaller, unaligned 512B ac- cesses, SLC and mic chips must access 4 kB of data and the Ssd must discard 88% of the accessed data. For [1] Open nand flash interface specification 3.0 TLC, there is even more wasted bandwidth because page 2] International lechnology roadmap for semiconductors: Emerging size is 8 kB research devices 2010 When using 4kB accesses, MLC IOPs drop as density 3 DENALI hp/dmr/2009/07/16/nand-forward-prices-rate-of-decline-will creases, falling by 18%0 between the 64 and 1024 GB 14 GRUPP, I.M., CAULFIELD COBURN.. SWANSON configurations. Despite this drop the data suggest that S. YAAKOBLE. SIEGEL. PH. AND WOLF J.K. Character SSDs will maintain an enormous(but slowly shrinking) izing flash memory: anomalies, observations, and applications advantage relative to disk in terms of iops. even the In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM Inter- national Symposium on Microarchitecture(New York, NY, USA fastest hard drives can sustain no more than 200 IOPs 2009),ACM,pp.24-33 and the slowest ssd configuration we consider achieves 5 JUNG, T -S, CHOL, Y.J., SUH, K.-D, SUH, B.-H.. KIM, J over 32, 000 IOPs K. LIMY-H. KOH.Y-N. PARK.J -W.. LEEK-. PARK J -Il.. PARK.K -. KIM. J -R.. LEE.J, -IL. AND LIM.IL -K Figure 9 shows all parameters for an ssd made from A 3.3 v 128 mb multi-level nand fash memory for mass storage MLC-2 flash normalized to ssd-Cdc configured with applications. In Solid-State Circuits Conference, 1996. Digest of Technical Papers. 42nd ISSCC, 199 IEEE International(reb 1996),pp.32-33,412 6] JUNG, T.S., CHOl,YJ, SUH, K.D., SUH, B.H., KIM, J K, LIM, Y.H., KOH Y-N, PARK, J -W, LEE, K.-J, PARK, L-H. PARK. K-T. KIM.J-R.YLJ-H. AND LIM.H -. A 117-mm2 3.3-v only 128-mb multilevel nand fiash memory fo mass storage applications. Solid-State Circuits, IEEE Journal of 3l,1l(now1996),1575-1583. [7 MAROTTA, G.E. A. A 3bit/cell 32gb nand flash memory at 34nm with 6mb/s program throughput and with dynamic 2b/cell blocks configuration mode for a program throughput increase up u 13mmlb/s. In Solid-State Circuits Conference Digest of Techni cal Papers(ISSCC), 2010IEEE International(feb. 2010), Pp. 444 45 [8] PARK, K.-T, KANG, M, KIM, D, HWANG, S.-W, CHOI B.Y., LEE, Y.-T, KIM, C, AND KIM, K. A zeroing cell to-cell interference page architecture with temporary lsb storing and parallel msb program scheme for mlc nand flash memories Solid-State Circuits, IEEE Journal of 43, 4(april 2008),919 [9 RAFFO, D). Fusionio builds ssd bridge between slc, mIc, july 2009 [10] SUH,K.D, SUH, B.-H, LIM, Y.H. KIM, J -K, CHOL, Y J, KOH,Y.N, LEE, S.-S, KWoN, S -C, CHOL, B -S, YUM, J -S. CHOLJ - H. KIM.J - R. AND LIM.H -K.A33v 32 mb nand flash memory with incremental step pulse programming scheme. Solid-State Circuits, IEEE Journal of 30, ll (nov 1995) 1149-1156 [11] TAKEUCHI,K, TANAKA, T, AND TANZAWA, T. A multipage cell architecture for his gh-speed programming multilevel nand flash memories. Solid-State Circuits, IEEE Journal of 33,&(aug 1998),1228-1238 [121 TRINH, C.E. A. A 5.6mb/s 64gb 4b/cell nand flash memory in 431Ill CIlIOS. In Solid-State Circuits Conference - Digest of Technical Papers, 2009. ISSCC 2009 IEEE International (fcb 2009),pp.246-247,247a.

试读 8P New Flash memory technology!
立即下载 低至0.43元/次 身份认证VIP会员低至7折
关注 私信 TA的资源
New Flash memory technology! 49积分/C币 立即下载
New Flash memory technology!第1页
New Flash memory technology!第2页

试读结束, 可继续读1页

49积分/C币 立即下载 >