所需积分/C币:10 2018-05-18 14:40:56 4.99MB PDF
收藏 收藏

ARIES论文英文版 ARIES: A transaction recovery method supporting fine-granularity lockig and partial roolbacks using write-ahead logging
6 C. Mohan et al on stored data (e.g, requiring unique keys for all records, restricting maxi mum size of objects to the page size, etc ) ability to support novel lock modes which allow the concurrent execution, based on commutativity and other properties [2, 26, 38, 45, 88, 891, of operations like increment /decrement on the same data by different transactions, and so on In this paper we introduce a new recovery method, called ARIES (Algorithm for Recovery and Isolation Exploiting Semantics), which fares very well with respect to all these metrics. It also provides a great deal of flexibility to take advantage of some special characteristics of a class of applications for better performance(e.g, the kinds of applications that IMS Fast Path [28, 42] supports efficiently 'o meet transaction and data recovery guarantees, aRies records in a log the progress of a transaction, and its actions which cause changes to recover able data objects. The log becomes the source for ensuring either that the Lransaction's committed actions are reflected in the databasc despite various types of failures, or that its uncommitted actions are undone (i. e, rolled back). When the logged actions reflect data object content, then those log records also become the source for reconstruction of damaged or lost data (i. e,, media recovery) Conceptually, the log can be thought of as an ever growing sequential file. In the actual implementation, multiple physical files may be used in a serial fashion to ease the job of archiving log records [15] Every log record is assigned a unique log sequence number(lsn)when that record is appended to the log. The LSNs are assigned in ascending sequence Typically they are the logical addresses of the corresponding log records. At times, version numbers or timestamps are also used as lSNs [67]. If more than one log is used for storing the log records relating to different pieces of data. then a form of two-phase commit protocol (e. g, the current industry- standard Presumed Abort protocol [63, 64])must be used The nonvolatile version of the log is stored on what is generally called stable storage. Stable storage means nonvolatile storage which remains intact and available across system failures. Disk is an example of nonvolatile storage and its stability is generally improved by maintaining synchronously two identical copies of the log on different devices. We would expect the online log records stored on dircct access storage devices to be archived to a cheaper and slower medium like lape at regular intervals. The archived log records may be discarded once the appropriate image copies(archive dumps) of the database have been produced and those log records are no longer needed for media recovery Whenever log records are written, they are placed first only in the volatile storage (i.e, virtual storage) buffers of the log file. Only at certain times (e. g. at commit time)are the log records up to a certain point (Lsn) written in log page sequence, to stable storage. This is called forcing the log up to that lsn. Besides forces caused by transaction and buffer manager activi The choice of the name ARIES, besides its use as an acronym that describes certain features of our recovery method, is also supposed to convey the relationship of our work to the Starburst project at IBM, since Aries is the name of a constellation CM Transactions on Database Systems, Vol. 17, No 1, March 1992 ARIES: A Transaction Recovery Method 97 ties, a system process may, in the background, periodically force the log buffers as they fill up For ease of exposition, we assume that each log record describes the update performed to only a single page. This is not a requirement of ARIES. In fact, in the Starburst [87]implementation of ARIES, sometimes a single log record might be written to describe updates to two pages The undo(respectively redo) portion of a log record provides information on how to undo (respec tively, redo) changes performed by the transaction. A log record which contains both the undo and the redo information is called an undo-redo log record. Sometimes, a log record may be written to contain only the redo information or only the undo information. Such a record is called a redo-only log record or an undo-only log record, respectively. Depending on the action that is performed, the undo-redo information may be recorded physical (e g-, before the update and after the update images or values of specific fields within the object) or operationally (e. g., add 5 to field 3 of record 15 subtract 3 from field 4 of record 10). Operation logging permits the use of high concurrency lock modes, which exploit the semantics of the operations performed on the data. For example, with certain operations, the same field of a record could have uncommitted updates of many transactions. These permit more concurrency than what is permitted by the strict executions property of the model of [3l, which essentially says that modified objects must be locked exclusively(X mode)for commit duration Q ARIES uses the widely accepted write ahead logging (WAL) protocol. Some the commercial and prototype systems based on WAL are IBMs AS /400TM 9,21, CMU's Camelot23,90], tBM' S DB2M[1,10,11,12,13,14,15,19,35 961, Unisys's DMS/ /1100 [271, Tandems EncompassM [4, 37l, IBMs IMS 142 43, 53, 76, 80, 941, Informix's Informix-TurbotM[16], Honeywell's MRDS [91] Tandems NonStop SQLM [95], MCC's ORION [29], IBMs OS/2 Extended Edition Database Manager [7], IBMs quick silver [40], IBMs Starburst [87], SYNAPSE [78], IBM,'s System /38 [99], and DEC's VAX DBMS M and VAX Rdb/vMS M [81]. In WAL-based systems, an updated page is written back to the same nonvolatile storage location from where it was read. That is, in-place updating is performed on nonvolatile storage. Contrast this with what happens in the shadow page technique which is used in systems such as System R [31] and sQL/DS [5] and which is illustrated in Figure 1. There the updated version of the page is written to a different location on nonvolatile storage and the previous version of the page is used for performing database recovery if the system were to fail before the next checkpoint The WAL protocol asserts that the log records representing changes to some data must already be on stable storage before the changed data is allowed to replace the previous version of that data on nonvolatile storage That is, the system is not allowed to write an updated page to the nonvolatile storage version of the database until at least the undo portions of the log records which describe the updates to the page have been written to stable storage. To enable the enforcement of this protocol, systems using the Wal method of recovery store in every page the lsn of the log record that describes the most recent update performed on that page. The reader is ACM Transactions on Databasc Systems, Vol. 17, No. 1, March 1992 98 LP1P1P F L update 1g Logical page LPl is read from physical page P1 and after version and P1 s the shadow version Durng a checkpoint. the shadow version is discarded and the current vers becomes the shadow version also On a fallure, data base referred to [31, 97] for discussions about why the Wal technique is consid- ered to be better than the shadow page technique [16, 78] discuss methods in which shadowing is performed using a separate log While these avoid some of the problems of the original shadow page approach, they still retain some of the important draw backs and they introduce some new ones. Similar comments apply to the methods suggested in [82, 88]. Later, in Section 10,we show why some of the recovery paradigms of System R, which were based on the shadow page technique, are inappropriate in the WAL context, when we need support for high levels of concurrency and various other features that are described in Section 2 Transaction slatus is also stored in the log and no transaction can be considered complete until its committed status and all its log data are safely recorded on stable storage by forcing the log up to the transaction's commit log records lsn. This allows a restart recovery procedure to recover any transactions that completed successfully but whose updated pages were not hysically written to nonvolatile storage before the failure of the system This means that a transaction is not permitted to complete its commit processing (see [63, 64])until the redo portions of all log records of that. transaction have been written to stable storage We deal with three types of failures: transaction or process, system, and media or device. When a transaction or process failure occurs, typically the transaction would be in such a state that its updates would have to be undone. It is possible that the transaction had corrupted some pages in the buffer pool if it was in the middle of performing some updates when the process disappeared. When a system failure occurs, typically the virtual storage contents would be lost and the transaction system would have to be restarted and recovery performed using the nonvolatile storage versions of the database and the log. When a media or device failure occurs, typically the contents of that media would be lost and the lost data would have to be recovered using an image copy (archive dump) version of the lost data and the lo Forward processing refers to the updates performed when the system is in normal (i, e, not restart recovery) processing and the transaction is updating ACM Transactions on Database Systems, Vol 17, No, 1, March 1992 ARIES: A Transaction Recovery Method 99 the database because of the data manipulation(e. g. SQL) calls issued by the user or the application program. That is, the transaction is not rolling back and using the log to generate the(undo) update calls. Partial rollback refers to the ability to set up savepoints during the execution of a transaction and later in the transaction request the rolling back of the changes performed b the transaction since the establishment of a previous savepoint [1, 31]. This is to be contrasted with total rollback in which all updates of the transaction are undone and the transaction is terminated. Whether or not the savepoint concept is exposed at the application level is immaterial to us since this paper deals only with database recovery. A nested rollback is said to have taken place if a partial rollback were to be later followed by a total rollback or another partial rollback whose point of termination is an earlier point in the transaction than the point of termination of the first rollback. Normal undo refers to total or partial transaction rollback when the system is in norma operation. A normal undo may be caused by a transaction request to rollback or it may be system initiated because of deadlocks or errors( e.g., integrity constraint violations). Restart undo refers to transaction rollback during restart recovery after a system failure. To make partial or total rollback efficient and also to make debugging easier, all the log records written by a transaction are linked via the preul Sn field of the log records in reverse chronological order. That is, the most recently written log record of the transaction would point to the previous most recent log record written by that transaction, if there is such a log record. In many WAL-based systems the updates performed during a rollback are logged using what are called compensation log records(CLRs)[15]. Whether a CLr's update is undone should that Clr be encountered during a rollback depends on the particular system. As we will see later, in aries, a CLr's update is never undone and hence CLRs are viewed as redo-only log records Page-oriented redo is said to occur if the log record whose update is being redone describes which page of the database was originally modified during normal processing and if the same page is modified during the redo process ing. No internal descriptors of tables or indexes need to be accessed to redo the update That is, no other page of the database needs to be examined. This is to be contrasted with logical redo which is required in System R, SQL/DS and AS 400 for indexes [21, 62]. In those systems, since index changes are not logged separately but are redone using the log records for the data pages performing a redo requires accessing several descriptors and pages of the database. The index tree would have to be retraversed to determine the page(s) to be modified and, sometimes, the index page(s)modified bccausc of this redo operation may be different from the index page (s) originally modified during normal processing. Being able to perform page-oriented redo allows the system to provide recovery independence amongst objects. That is the recovery of one pages contents does not require accesses to any other 2 The AS/400, Encompass and Non Stop SQL do not, explicitly link all the log records written by a transaction This makes undo inefficient since a sequential backward scan of the log must be performed to retrieve all the desired log records of a transaction ACM Transactions on Database Systems. Vol. 17. No. l, March 1992 100 C Mohan et al (data or catalog) pages of the database. As we will describe later, this makes media recovery very simple In a similar fashion, we can define page-oriented undo and logical undo Being able to perform logical undos allows the system to provide higher levels of concurrency than what would be possible if the system were to be restricted only to page-oriented undos. This is because the former, with appropriate concurrency control protocols, would permit uncommitted updates of one transaction to be moved to a differcnt page by another transaction. If one were restricted to only page-oriented undos, then the latter transaction would have had to wait for the former to commit. Page-oriented redo and page-oriented undo permit faster recovery since pages of the database other than the pages mentioned in the log records are not accessed. In the interest of efficiency, ARIES supports page-orienled redo and its supports, in the interest of high concurrency, logical undos. In [621, we introduce the ARIES/ IM method for concurrency control and recovery in Bt-tree indexes and show the advantages of being able to perform logical undos by comparing ARIES IM with other index methods 1.2 Latches and locks Normally latches and locks are used to control access to shared information Locking has been discussed to a great extent in the literature. Latches, on the other hand have not been discussed that much. Latches are like semaphores. Usually, latches are used io guarantee physical consistency of data, while locks are used to assure logical consistency of data. We need to worry about physical consistency since we need to support a multiprocessor environment. Latches are usually held for a much shorter period than are locks. Also, the deadlock detector is not informed about latch waits. Latches are requested in such a manner so as to avoid deadlocks involving latches alone, or involving latches and locks A Acquiring and releasing a latch is much cheaper than acquiring and leasing a lock. In the no-conflict case, the overhead amounts to 10s of instructions for the former versus 100s of instructions for the latter. latches are cheaper because the latch control information is always in virtual mem ory in a fixed place, and direct addressability to the latch information is possible given the latch name. As the protocols presented later in this paper and those in [57, 62 show, each transaction holds at most two or three latches simultaneously. As a result, the latch request blocks can be perman ently allocated to each transaction and initialized with transaction ID, etc ht at the start of that transaction On the other hand, typically, storage for individual locks has to be acquired formatted and released dynamically causing more instructions to be executed to acquire and release locks. This is advisable because, in most systems, the number of lockable objects is nany orders of magnitude greater than the number of latchable objects. Typically all information relating to locks currently held or requested by all the transactions is stored in a single, central hash table. Addressability to a particular lock's information is gained by first hashing the lock name to get the address of the hash anchor and then, possibly, following a chain of pointers. Usually, in the process of trying to locate the lock control block ACM Transactions on Database Systems, Vol 17, No 1, March 1992 ARIES: A Transaction Recovery Method 101 because multiple transactions may be simultaneously reading and modifying the contents of the lock table one or more latches will be acquired and released-one latch on the hash anchor and, possibly one on the specific ock's chain of holders and waiters Locks may be obtained in different modes such as S(Shared), X(eXclusive), IX (Intention eXclusive), Is(Intention Shared) and SIX (Shared Intention eXclusive), and at different granularities such as record (tuple), table (rela- tion), and file(tablespace)[32]. The s and X locks are the most common ones S provides the read privilege and X provides the read and write privileges Locks on a given object can be held simultaneously by different transactions only if those locks'modes are compatible. The compatibility relationships amongst the abovc modes of locking are shown in Figure 2. A check mark (v)indicates that the corresponding modes are compatible. With hierarchi cal locking, the intention locks(IX, IS, and sIX) are generally obtained on the higher levels of the hierarchy (e. g, table), and the S and X locks are obtained on the lower levels (e.g, record). The nonintention mode locks(S and X), when obtained on an object at a certain level of the hierarchy, implicitly grant locks of the corresponding mode on the lower level objects of that higher level object. The intention mode locks, on the other hand, only give the privilege of requesting the corresponding intention or nonintention mode locks on the lower level objects. For example, SIX on a table implicitly grants S on all the records of that table, and it allows X to be requested explicitly on the records. Additional, semantically rich lock modes have been defined in the literature [2. 38, 45, 55] and ARIES can accommodate them Lock requests may be made with the conditional or the unconditional option. a conditional request means that the requestor is not willing to wait if, when the request is processed, the lock is not grantable immediately.An unconditional request means that the requestor is willing to wait until the lock becomes grantable. Locks may be held for different durations. An unconditional request for an instant duration lock means that the lock is not to be actually granted, but the lock manager has to delay returning the lock call with the success status until the lock becomes grantable. Manual duration locks are released some time after they are acquired and, typically long before transaction termination. Commit duration locks are released only when the transaction terminates, i. e, after commit or rollback is completed The above discussions concerning conditional requests, different modes, and durations, except for commit duration, apply to latches also 1.3 Fine-Granularity Locking database systems(e.g, IMS [53, 76, 80) for a long time. Surprisingly, onlp7 Fine-granularity(e. g record) locking has been supported by nonrelational few of the commercially available relational systems provide fino-granularity locking, even though IBM's System R [321, S/38 [99] and SQL/DS [51, and Tandems Encompass [37] supported record and or key locking fr the beginning. Although many interesting problems relating to providing Encompass and s 88 had only X locks for records and no locks were acquired automatically by these systems for reads ACM Transactions on Database Systems vol. 17. No 1 March 1992 102 C. Mohan et al S sⅨ Fig. 2. Lack mode compatability matrix SiX fine-granularity locking in the context of WAL remain to be solved, the research community has not been paying enough attention to this area [3, 75 88]. Some of the System R solutions worked only because of the use of the shadow page recovery technique in combination with locking(see Section 10). Supporting fine-granularity locking and variable length records in a flexible fashion requires addressing some interesting storage management issues which have never really been discussed in the database literature Unfortunately, some of the interesting techniques that were developed for System R and which are now part of sQL ds did not get documented in the literature. At the expense of making this paper long, we will be discussing here some of those problems and their solutions As supporting high concurrency gains importance(see [79] for the descrip tion of an application requiring very high concurrency) and as object-oriented systems gain in popularity, it bec essary to invent concurrent control and recovery methods that take advantage of the semantics of the operations on the data [2, 26, 38, 88, 89], and that support fine-granularity ocking efficiently. Object-oriented systems may tend to encourage users lo define a large number of small objects and users may expect object instances the appropriate granularity of locking. In the object-oriented logical view of the database, the concept of a page, with its physical orientation as the container of objects, becomes unnatural to think about as the unit of ocking during object accesses and modifications. Also, object-oriented system users may tend to have many terminal interactions during the course of a transaction, thereby increasing the lock hold times. If the unit of locking were to be a page, lock wait times and deadlock possibilities will be aggra vated. Other discussions concerning transaction management in an object oriented environment can be found in [22, 29] As more and more customers adopt relational systems for production applications, it becomes ever more important to handle hot-spots [28, 34, 68 77, 79, 83] and storage management without requiring too much tuning by C systcm uscrs or administrators. Since relational systems have been welcomed to a great extent because of their ease of use, it is important that we pay greater attention to this area than what has been done in the context of the nonrelational systems. Apart from the need for high concurrency for user data, the ease with which online data definition operations can be performed in relational systems by even ordinary users requires the support for high concurrency of access to, at least, the catalog data. Since a leaf page in an index typically describes data in hundreds of data pages, page-level locking of index data is just not acceptable. A flexible recovery method that ACM Transactions on Database Systems, Vol 17, No. 1, March 1992 ARIES: A Transaction Recovery Method 103 allows the support of high levels of concurrency during index accesses is neede d The above facts argue for supporting semantically rich modes of locking such as increment/decrement which allow multiple transactions to concur rently modify even the same piece of data. In funds-transfer applications, increment and decrement operations are frequently performed on the branch and teller balances by numerous transactions. If those transactions are forced to usc only X locks, then they will be serialized, even though their operations commute 1. 4 Buffer Management The buffer manager (BM)is the component of the transaction system that manages the buffer pool and does I/Os to read write pages from/to the nonvolatile storagc version of the database. The fix primitive of the BM may be used to request the buffer address of a logical page in the database. If the d page is not in the buffer pool, BM allocates a buffer slot and reads the page in. There may be instances (e. g, during a b+-tree page split, when the new page is allocated) where the current contents of a page on nonvolatile storage are not of interest. In such a case, the fix-new primitive may be used to make the BM allocate a free slot and return the address of that slot, if bm does not find the page in the buffer pool. The fix_new invoker will then format the page as desired. Once a page is fixed in the buffer pool the corresponding buffer slot is not available for page replacement until the unfix primitive is issued by the data manipulative component. Actually, for each page, BM keeps a fix count which is incremented by one during every fix operation and which is decremented by one during every unfix operation a page in the buffer pool is said to be dirty if the buffer version of the page has some updates which are not yet reflected in the nonvolatile storage version of the same page. The fix primitive is also used to communicate the intention to modify the page. Dirty pages can be written back to nonvolatile storage when no fix with the modification intention is held, thus allowing read accesses to the page while it is being written out. [96] discusses the role of BM in writing in the background, on a continuous basis, dirty pages to nonvolatile storage to reduce the amount of redo work that would be needed if a system failure were to occur and also to keep a certain percentage of the buffer pool pages in the nondirty state so that they may be replaced with other pages without sync te I Os having to be perfor time of replacement. While performing those writes, BM ensures that the WAL protocol is obeyed. As a consequence, Bi may have to force the log up to the lsn of the dirty page before writing the page to nonvolatile storage Given the large buffer pools that are common today, we would expect a force of this nature to be very rare and most log forces to occur because of transactions committing or entering the prepare state BM also implements the support for latching pages. To provide direct addressability to page latches and to reduce the storage associated with those latches, the latch on a logical page is actually the latch on the corresponding buffer slot. This means that a logical page can be latched only after it is fixed ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992

试读 69P ARIES论文英文版
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    ARIES论文英文版 10积分/C币 立即下载

    试读结束, 可继续读6页

    10积分/C币 立即下载 >