下载  >  课程资源  >  讲义  > Algorithms Implementing Distributed Shared Memory

Algorithms Implementing Distributed Shared Memory 评分:

Algorithms Implementing Distributed Shared MemoryAlgorithms Implementing Distributed Shared MemoryAlgorithms Implementing Distributed Shared Memory
memory multiprocessors with CPU caches Nonreplicated Replicated and write- back buffers. 12 If the stricter definition holds, then so does the weake Non-migrating Central Full-replication definition (but the converse is not true) Migration Read-replication Central-server algorithm. The sim plest strategy for implementing distributed shared memory uses a central server that is Figure l. Four distributed memory algorithms responsible shared data and maintains the only copy of the shared data. Both read and write opera tions involve the sending of a message to the data server by the process executing Figure 2. The data server executes the Central request and responds either with the data server item in the case of a read operation or with Client Central server an acknowledgment in the case of a write Send data request A simple request-response protocol can Receive request be used for communication in implementa Perform data access Send response tions of this algorithm. For reliability,a Receive response request is retransmitted after each time-out clients period with no response. This is sufficient, the read Figure 2. The central-server algorithm sequence number for each client so that it can detect duplicate transmissions and acknowledge them appropriately. a ure condition is raised after several time. out periods with no response fetch/store and in/out. The type of the data attempt to exploit locality in data accesses Hence, this algorithm requires two being accessed can also vary and include and decrease the number of remote ac- messages for each data access: one from integers, byte arrays, or more complex cesses, thus avoiding communication the process requesting the access to the user-defined ty pes. Finally, the semantics overhead The two other algorithms repli- data server and the other containing the of the memory access functions can go cate data so that multiple read accesses can data server's response. Moreover, each beyond those offered by traditional men- take place at the same time using local data access requires four packet events ory systems and can include atomic accesses. two at the requesting process(one to send enqueuing or dequeuing operations, or Implementations of distributed shared the request and one to receive the re even entire database operations. For ex- memory based on replication should make sponse). and two at the serve ample, Linda is a programming language this replication transparent to the applica- One potential problem with the centra that directly supports a shared data space tions. In other words processes should not server is that it may become a bottleneck, by providing a shared tuple space and three be able to observe(by reading and writing since it has to service the requests from all asic operations: Read reads an existing shared data)that all data accesses are not clients to distribute the server load, the data item called"tuple"from the tuple directed to the same copy of data shared Gata can be distributed onto several space, Out adds a new tuple, and In reads More formally, the result of applica- servers. In that case, clients must be able to and then removes an existing tuple. tions using shared data should be the same locate the correct server for data access. A Linda's tuples are addressed by content as if the memory operations of all hosts client can multicast its access requests to rather than by location were executing in some sequential order, all servers, but this would not significantly and the operations of each individual host decrease the load on all servers,sinceevery Basic algorithms appear in sequence in the order specified server would incur the overhead of a packet by its program, in which case the shared event for each such request. A better solu memory is said to tion is to partition the data by address and This section describes four basic distri Shared memory in a shared memory use some simple mapping function to de uted shared memory algorithms. For each multiprocessor is expected to behave this cide which server to contact of these algorithms, we consider the cost of way. This definition of consistency should read and write operations and issues in not be confused with a stricter definition Migration algorithm. In the migration heir implementation. The algorithms de- requiring read accesses to retum the value algorithm, depicted in Figure 3, the data is scribed can be categorized by whether they of the most recent write to the same loca- always migrated to the site where it is migrate and/or replicate data, as depicted tion, which is naturally applicable to con- accessed. This is a"single reader/single in Figure 1. Two of the algorithms migrate current processes running on a uniproces- writer""(SRSW) protocol, since only the data to the site where it is accessed in an sor but not necessarily to those on shared threads executing on one host can read or COMPUTER write a given data item at any one time Instead of migrati dividual data Gl‖ent Remote host items, data is typically migrated between reques servers in a fixed size unit called a block to If block not local facilitate the management of the data. The advantage of this algorithm is that no ○○ determine location send request Receive request communication costs are incurred when a Send block process accesses data currently held lo Data block Receive response Access data if an application exhibits high locality of reference, the cost of data migration is Figure 3. The migration algorithm amortized over multiple accesses. How ever, with this algorithm, it is also possible or pages to thrash between hosts, result- ing in few memory accesses between mi gration and thereby poor performance Often, the application writer will be able to control thrashing by judiciously assigning Block Client Remote host to ble request If block not local A second advantage of the migration determine location algorithm is that it can be integrated with ○○○ send request the virtual memory system of the host Block ReceMe request operating system if the size of the block is Send block chosen equal to the size of a virtual mem- Receive block ory page(or a multiple thereof). If a shared Invalidate Multicast invalidate Receive invalidate memory page is held locally, it can be Invalidate block mapped into the applications virtual ad Access data dress space and accessed using the normal machine instructions for accessing mem ory. An access to a data item located in data Figure 4. The write operation in the read-replication algorithm. blocks not held locally triggers a page fault so that the fault handler can communicate with the remote hosts to obtain the data block before mapping it into the applications address space. When a data block is migrated away, it is removed from become more expensive, since the replicas This strategy resembles the write-invali any local address space it has been mapped may have to be invalidated or updated to date algorithm for cache consistency im into maintain consistency. Nevertheless, if the plemented by hardware in some multi The location of a remote data block can ratio of reads over writes is large the extra processors. 3The read-replication algo be found by multicasting a migration re expense for the write operations may be rithm is consistent because a read access quest message to all remote hosts, hut more more than offset by the lower average cost always returns the value of the most recent efficient methods are known. For ex- of the read operations. write to the same location ample, one can statically assign each data Replication can be naturally added to This type of replication has been inves block to a managing server that always the migration algorithm by allowing either tigated extensively. 4 In Li's implementa knows the location of the data block. To one site a read /write copy of a particular tion, each block has a server designated as distribute the load, the management of all block or Multiple sites read-only copies of its owner that is responsible for maintain data blocks is partitioned across all hosts. that block. This type of replication is re- ing a list of the servers having a read-only A client queries the managing server of a ferred to as"multiple readers/single copy of that block. This list is called the data block, both to determine the current writer"(MRSW) replication block’ s copy s location of the data block and to inform the For a read operation on a data item in a A read (or write)access to a block for manager that it will migrate the data block. block that is currently not local, it is neces- which a server does not have the appropri sary to communicate with remote sites to ate access rights causes a read (or write) Read-replication algorithm. One dis- first acquire a read-only copy of that block fault. The fault handler transmits a request dvantage of the algorithms described so and to change toread-only the access rights to the server that has ownership of the far is that only the threads on one host can to any writable copy if necessary before appropriate block. For a read fault, the access data contained in the same block at the read operation can complete. For a owning server replies with a copy of the any given time. Replication can reduce the write operation to data in a block that is block, adds the requesting server to the average cost of read operations, since it either not local or for which the local host copy set of the requested block and, if allows read operations to be simultane- has no write permission, all copies of the necessary, changes the access rights of its ously executed locally(with no communi- same block held at other sites must be local copy to read-only cation overhead) at multiple hosts. How- invalidated before the write can proceed When a write fault occurs, ownership ever,some of the write operations may (See Figure 4.) for the block is transferred from the previ May 1990 57 Performance Sequencer comparisons Client Sequencer Hosts Write I write All four algorithms describ Update send data previous section ensure consistency in Receive data Add se distributed shared memory. However, quencer nu their performance is sensitive to the data Multicast ccess behavior of the application. In this Clients Receive Receive data section, we identify the factors in data acknow- Update local access costs and investigate the applica lodgement memory Update local tion behaviors that have significant bear ings on the perfomance of the algo Based on some simple analyses, we com pare the relative merits of the algorithms in Figure 5. The full-replication algorithm. an attempt to unveil the underlying rela- tionship between access patterns of appli- cations and the shared memory algorithmIs that are likely lo produce better perfor- mance for the ous owner to the server where the write ded modification is sent to the se Model and assumptions. the parame- sponse, the write-fault handler requests all sequence number to the modification and of accessing shared data and the apple ts fault occurred. After receiving the re- quencer This sequencer assigns the next ters in Figure 6 characterize the basic costs servers in the copy set to invalidate their multicasts the modification with this se- tion behaviors. Among them. the two types local copy, after which the access rights to quence number to all sites of access fault rates, f and f, have the that block are set to write access at the site Each site processes broadcast write greatest impaci on performance of the cor of the new owner and the copy set is operations in sequence number order. responding algorithms but, unfortunately, cleared When a modification arrives at a site. the are also the most difficult to assess since sequence number is verified as the next they vary widely from application to appli- Full-replication algorithm. Full repli- expected one. If a gap in the sequence cation. we should also point out that these cation allows data blocks to be replicated numbers is detected, either a modification parameters are not entirely independent of even while being written to. The full-repli- was missed or a modification was received one another. For instance, the size of a data cation algorithm therefore adheres to a out of order in which case a retransmission block and there fure the bluck transfer cos1, multiple readers/multiple writers"of the modification message is requested. P, influencesfandf, in conflicting direc- (MRMW) protocol. Keeping the data cop- (This implies that somewhere a log of tions. As the block size increases,more ies consistent is straightforward for non- recent write requests be maintained. )In accesses to the block are possible before replicated algorithms, since accesses to effect. this strategy implements a negative another block is accessed; however, access data are sequenced according to the order acknowledgment protocol interferences between sites become more in which they occur at the site where the In the common case within our assumed likely S also has direct impact on the fault data is held In the case of fully replicated environment, packets arrive at all sites in rates. Nevertheless, the analyses below data, accesses to the data must either be their proper order. Therefore, a write re- suffice to characterize the shared memory properly sequenced or controlled to ensure quires two packet events at the writing al gorithms. consistency. process, two packet events at the st To focus on the essential per rformance One possiblc way to kccp the replicated quencer, and a packet event at each of the characteristics of the algorithms and to data consistent is to globally sequence the other replica sites, for a system total of $+2 simplify our analyses, we make a number write operations, while only sequencing packet events with S participating sites of assumptions the read operations relative to the writes Many variants to the above algorithm that occur local to the site where the reads exist. For example, Bisiani and Forin (1)The amount of message traffic will are executed. For example, the write up- described an algorithm for full replication not cause network congestion. Hence, we date algorithm for cache consistency im- that uses the same principle as the sequenc- will only consider the messagc-processing plemented by hardware in some multi- ing algorithm to ensure individual data costs, p and P, but not the network band processorsmaintains consistency in this structures remain consistent. However, width occupied by messages fashion; that is, its reads occur locally from rather than using a single server to se ) Server congestion is not serious the cache while writes are broadcast over quence all writes, writes to any particular enough to significantly delay remote ac- the bus that sequences them automatically. data structure are sequenced by the server cess. This is reasonable for the algorithms A simple strategy based on sequencing that manages the master copy of that data we study, since there are effective ways to uses a single global gap-free sequencer, structure. Although each data structure is reduce the load on the servers(see the depicted in Figure 5, which is a process maintained in a consistent manner, there is Central-server algorithm"section executing on a host participating in distrib- no assurance with this algorithm that up 3) The cost of accessing a locally avail uted shared memory. When a process at- dates to multiple data structures are made able data item is negligible compared to tempts a write to shared memory the in- consistenty remote access cost and therefore does not COMPUTER show up in our access cost calculations below p: The cost of a packet event, that is, the processing cost of sending or re 4)Message passing is assumed to be ceiving a short packet, which includes possible context switching, data eliable, so the cost of retransmission is not copying, and interrupt handling overhead. Typical values for real systems incurred. note, however, that the cost f range from one to several milliseconds acknowledgment messages, required to determine whether a retransmission is P: The cost of sending or receiving a data block. This is similar to P, except necessary or not, is included in our models that P is typically significantly higher, For an 8-kilobyte block, where of- ten multiple packets are needed, typical values range from 20 to 40 milli To compare the pcrformance of the dis seconds tributed shared memory algorithms,we need to detine a ormance measure For our analyses, only the ratio between P and p is important, rather than Distributed shared memory is often used to their absolute values. support parallel applications in which multiple threads of execution may be in S: The number of sites participating in distributed shared memory progress on a number of sites We there fore choose the average cost per data ac r: Read/ write ratio, that is, there is one write operation for every r reads on cess to the entire system as the perfor- average. This parameter is also used to refer to the access pattern of entire mance measure. Hence. if a data access in blocks. Although the two ratios may differ, we assume they are equal in volves one or more remote sites the mes order to simplify our analyses sage-processing costs on both the local and remote site(s) are included f: Probability of an access fault on a nonreplicated data block(used in the Using the basic parameters and the migration algorithm). This is equal to the inverse of the average number of simplifying assumptions described above consecutive accesses to a block by a single site, before another site makes the average access costs of the four algo an access to the same block, causing a fault. f characterizes the locality of ithms can be expressed as follows data accesses for the migration algorithm tral server: C=( f Probability of an access fault on replicated data blocks used in the read Mi C=f*(2P+4p) replication algorithm. It is the inverse of the average number of consecu- tive accesses to data items in blocks kept locally, before a data item in a Read replication: C =f*[2P + 4p+Dp] Full replication: CK=*(S+2 *I block not kept locally is accessed. f characterizes the locality of data ac r+I cesses for the read-replication algorithm Each of these expressions has two com- Figure 6. Parameters that characterize the basic costs of accessing shared data. ponents. The first component, to the left of the * is the probability of an access to a data item being remote. The second com- ponent, to the right of the * is equal to the average cost of accessing a remote data item. Since the cost of local accesses is and that the request is forwarded by the Comparative analyses. The discussion assumed to be negligible, the average cost manager to the server. The sequence of above prepares us to make some pair-wise of accessing a data item therefore equals packet events is send (on local site), re- comparisons of the algorithms'perfor- the product of these two components ceive(on manager site), forward (on man- mance to illustrate the conditions under In the case of the central-server algo- ager site), and receive(on server site) which one algorithm might outperform rithm, the probability of accessing a re- For read replication, the remote access another. Each comparison is made by mote data item is 1-1/S, in which case four cost approximates that of the migration equating the average costs of the two algo packet events are necessary for the access algorithm except that, in the case of a write rithms concerned, to derive a curve along (assuming that data is uniformly distrib- fault(which occurs with a probability of l/ which they yield similar perfomance uted over all sites). The overall cost, C is (r+ 1)), a multicast invalidation packet This curve, which we call the equal- therefore mainly determined by the cost of must be handled by all S sites. The block performing curve, divides the parameter a packet event, as long as the number of transfer cost is always included in our space into two halves such that in each half sites is more than four or five expression, although it may not be neces- one of the algorithms will perform better For the migration algorithm, represents sary if a write fault occurs and a local than the other. For example. in the follow the probability of accessing a nonlocal (read) copy of the block is available ing comparison between migration and data item. The cost of accessing a data item Finally, for the full-replication ago- read replication, the equation on the right in that case equals the cost of bringing the rithm, the probability of a remote access of Figure 7 is that of the equal-performing data block containing the data item to the equals the probability of a write access. curve, derived from C.=C(with some local site, which includes a total of one The associated cost for this write is always rearrangement). Since all of the cost for- block transfer(2P)and four packet events a mcssage from the local site to the se- mulas include packet cost P, only the ratio distributed across the local, manager, and quencer(two packet events), followed by a betwcen P and p matters in the following server sites. Weassume here that the local, multicast update message to all other sites comparative analyses. We assume the manager,and server sites are all distinct, (S Packet events value of P/p to be 20. Based on these ay performing curve is almost flat, that is 012 insensitive to the number of sites. more over the influence of the read /write ratio is 0.1 s=3,险10 also minimaL. Hence, the key considera- choosing be the two alg 2 0.081Read-replication better rithms is the locality of access. Typically a block fault rate of 0.07(14 accesses 006 faults)is considered very high 004 (faults very frequent). Therefore, read Migration better eplication appears to be more favorable 0.02 (r+1) for many applications Read replication versusfull replication 00020040.060.0801 Both algorithms use read replication. The Replicated block fault rate, f full-replication algorithm is more aggres sive in that multiple copies are maintained even for updates. Figure 9 shows that the Figure 7. Performance comparison: migration versus read replication. :lative performance of the two algorithms depends on a number of factors, including the degree of replication, the read/write atio, and the degree of locality achievable in read replication. The full-replication lgorithm is not susceptible to poor local On the other hand. the cost of multicast increases with s. Therefore, full replica- 0.1 Central better tion performs poorly for large systems and whe is high(that is 008 when r is low) 006 Central server versus full replication algorit 0.04 extremes in supporting shared data: or completely centralized the other is com 4(1-3 pletely distributed and replicated, Except 百002 Read-replication better 44+ for small values of s, the curve shown in igure 10 is replicat 10121416 of full replication seems to be advanta Number ot sites, s geous, as long as the read/write ratio is five higher. For very large replic Figure 8. Performance comparison: central server versus read replication. ever, the update costs of full replication catch up, and the preference turns to the Remaining comparisons. The two re maining pairs not yet considered are sum- marized as follows The comparison be- comparisons, we will be able to make some that of a small message, the curves for tween central server and migration re general comments on performance. different values of S and r cluster closely sembles that between central server and together and are very close to thef=f line. read replication, with a rather flat curve Migration versus read replication. The Typically, read replication effectively beneath f=0.09. Thus, unless the block only difference between these two algo- reduces the block fault rate because, in fault rate is very high, migration performs rithms is that replication is used in the contrast to the migration algorithm. inter- better The comparison between migration read-replication algorithm to allow inter- leaved read accesses to the same block will and full replication reveals no clear win leaved reads on several sites without block no longer cause faults, so the value off is ner, as in the case of read replication versus movements, but at the expense of multi- smaller than f. Therefore, one can expect full replication, with the key deciding fac casting invalidation requests upon up- read replication to outperform migration tors being Sand r dates. Interestingly, as shown in Figure 7, for a vast majority of applications the invalidation traffic does not have a Comments. Based on the comparisons strong influence on the algorithms'rela- Central server versus read replication. above, we can make a few observations tive performances. As long as the cost of a Figure 8 compares the central-server and The central-server algorithm is simple to block transfer is substantially higher than read-replication algorithms. The equal- implement and may be sufficient for infre COMPUTER quent accesses to shared data, especially if high 0. percentage of accesses are writes ) This is often the case with locks, as will be dis 008 cussed further below. However, locality of Full-replication better reference and a high-block-hit ratio are present in a wide range of applications, 006 making block migration and replication 0.04 广=10 advantageous The fault ratc of the simple migration algorithm may incrcase due to interleaved 002 accesses if different data items that happen Read-replication better s+44(r+1) to be in the same block are accessed by different sites. It thus does not explore locality to its full extent. The full-replica Number of sites, S tion algorithm is suitable for small-scale eplication and infrequent updates In contrast, the read-replication algo- Figure 9. Performance comparison: read replication versus full replication ithm is often a good compromise for many applications. The central-server and ull replication algorithms share the property of insensitivity to access locality, so they may outperform the read-replication algo rithm if the application exhibits poor ac cess locality A potentially serious performance prob lem with algorithms that move large data blacks is block thrashing. For migration. it takes the form of moving data back and forth in quick succession when interleaved Fulk-replication better data accesses are made by two or more sites. For read replication, it takes the form of blocks with read-only permissions being repeatedly invalidated soon after they are Central bette rep 4s-)+331 Such situations indicate poor(site)lo- cality in references. For many applica- tions shared data can be allocated and the computation can be partitioned to mini- Number of sites s mize thrash locks can also be used to suppress thrash- ing(see the section entitled"Application- Figure 10. Performance comparison: central server versus full replication level control withlocking"). In either case, the complete transparency of the distri uted shared memory is compromised Applications From the abovc compari- frequently used set of matrix manipulation We observed excellent speedup using the sons,it is clear that strong interactions subroutines implemented in Fortran. Typi- read replication algorithm. Li studied exist betwccn the shared data access pat- cally, one or morc(large)matrices are used similar applications in his thesis, and also terns of applications and their expected as input to generate a result matrix of the reported very good speedup. 3 performance using the various algorithms. same dimension. The amount of computa- Interestingly, this data-access pattem is To make our discussion more concrete, we tion, usually substantial, can be sped up by widespread. For example, we converted a study such interactions further in this sec assigning processors to compute graphics application to use distributed tion, based on our expcrience implement- subregions of the result matrix. This re- shared memory. This application uses ing the algorithms and measuring their sults in the input data being widely read- three-dimensional description of a set of performance. We do this by examining a shared and the individual regions of the objects to compute the visible scene few ty pes of applications and the consis- result matrix being updated by a single Again, the scene description is input to the tency algorithms that might suit them. master process and read-shared among all Thus, read replication is highly desir- the slave processes, which compute parts Numerical subroutines and other appli- able, whereas update multicast for write of the scene. Once the parts are completed, cations. The Blas.Ill package contains a replication is unnecessary and too costly. the master displays the entire scene. Like May 1990 matrix computation, there is very little cess, or a call-back "mechanism might be An optimistic full-replication algo- interference between the slaves except at used to avoid spinning. In either case, the rithm. All of the basic algorithms de the block boundaries of the scene buffer. cost of accessing remotely stored locks can scribed in the Basic algorithms" section A parallel version of a printed circuit be significant, and migrating this lock are pessimistic in that they ensure a priori board inspection program exhibits a very alone is desirable. Some specialized algo that a process can access data only when similar data-access pattern, as well as rithm seems desirable fordistributed locks. the shared data space is and will remain excellent speedup using the read- replica consistent. Here, we describe an algo tion algorithm thm'that in that it dete Improvin mines a posteriori whether a pi has Shortest paths. Similar to the above ap- performance accessed inconsistent data, in which case plications, a large matrix represents the that process is rolled back to a previous paths between pairs of nodes in a graph consistent state. This algorithm evolved However, elements of this matrix are up- Many variations to the basic distributed from attempts to increase performance of dated in place as new, better paths are shared memory algorithms exist. Many of the full-replication algorithm by coalesc discovered In our straightforward parallel them improve performance for specific ing multiple write operations into a single implementation, the processes make inter- applications by optimizing for particular communication packet. leaved and mixed read and write accesses memory access behaviors, typically by Instead of obtaining a sequence number to the matrix, with the update rate being attempting to reduce the amount of com- for each individual write operation, a se nning an ng munication since costs are dominated by quence number is obtained for a series of better paths become harder and harder to communication costs. Here, we describe consecutive writes by a single process find. The central-server algorithm is inap- two particularly interesting variations and Obviously, this may lead to inconsistent propriate because, like the applications also describe how applications can help copies of data, sincc writes are made to the above, accesses are frequent improve performance by controlling the local copy and only later transmitted (in On the other hand, since access locality actions of the distributed shared memory batch) to other sites. If the modified data is is poor, large numbers of block transfers algorithm. The largest improvements to not accessed before it reaches a remote due to thrashing are unavoidable if either performance can probably be achieved by site, temporary inconsistencies will not the migration or the read-replication algo- relaxing consistency constraints, some- matter. Otherwise, a conflict has occurred rithm is used. Full replication appears to thing we do not consider here in which case one of the processes will perform better, especially for the later haye to roll back stages of the computation. Instead of trans Full replication with delayed broad- To roll back a process, all accesses to ferring or invalidating a whole block when casts. The full-replication algorithm in- shared data are logged. To maintain the one entry is updated, only that entry is curs S+ 2 packet events on each write manageability of the size of these logs(and distributed operation, where Sis the number of partici- the operations on them efficient), it is A slight modific iient to organize the data accesses algorithm can reduce the numbcr of packet into transactions, where each transaction Hypermedia and disiributed game play- events per write to four, while read opera- consists of any number of read ing. Although these two types of applica- tions continue to be executed locally(with and write operations bracketed by a tions serve very different purposes, y no communication overhead begin_ transaction and a end_transaction often share the characteristics of making Instead of broadcasting the data modifi- When end_transaction is executed,a interactive concurrent accesses to shared cations to each site, the sequencer only unique gap -free sequence number for the data, with updates to moving, focused logs them. a write sent to the sequencer by transaction is requested from a central regions of the data space as the participants process P is acknowledged directly and a sequencer. This sequence number deter- cooperate on or fight over the same areas copy of all modifications that arrived at the mines the order in which transactions are The read-replication algorithm may ex- sequencer since the previous time P sent a to be committed. A transaction T with hibit block thrashing as a result, and full write request is included with the acknowl- sequence number n is aborted if any con replication again shows its merits edgment. Thus, the shared memory at a site currently executed transaction with a se is updated only when a write occurs at that quence number smallcr than n has modi Distributed locks. Locks are often used site. As in the full-replication algorithm, fied any of the data that transaction T parallel applications for synchroniza- read operations are performed locally accessed. Otherwise, the transaction is tion Typically, locks require little storage without delays committed and its modifications to shared and exhibit poor locality, so that algo Dr on d This variation on the full-replication data are transmitted to all other sites. These rithms using block transfers -migration algorithm still maintains consistency, but transactions will never have to roll back and read replication- are inappropriate has at least two disadvantages. First, it The logs of a transaction can then be dis If the demand on a lock is light, a thread refrains from updating shared memory for carded will usually find the lock free and may as long as possible and therefore does not Clearly, the performance of this opti- simply lock it, perform the relevant opera- conform to any real-time behavior. Pro- mistic algorithm will depend on the access tion and then unlock it grams cannot simply busywait, waiting for behavior of the application program and Since such operations are relatively in- a variable to be set. Second, it places an the application programs use of transac frequent, a simple algorithm such as cen- additional load on the(central)sequencer, tions. If rollbacks are infrequent, it will tral server is sufficient. However, if a lock which must maintain a log of data modifi- easily outperform the basic full-replica- is highly contended for, a process might cations and eventually copy each such tion algorithm. It will also outperform the repeatedly attempt to lock it without suc- modification into a message S-1 times. read-replication algor ithm for those appli- COMPUTER cations that suffer from thrashing, since when data is passed directly using commu- convert the page without knowing the type the optimistic algorithm compares shared nication primitives. Moreover, with re- of the application-level data contained in memory accesses at the per-data- item level spect to performance, numerous imple- the page and the actual page layout. This as opposed to a per-data- block level). The mentations have shown that distributed complicates the interface between the primary drawback of this algorithm, how- shared memory can compete with and, in memory system and the application. ever, is the fact that the shared memory is some cases, even outperform data-passing If noncompatible compilers are used for no longer transparent to the application programs an application to generate code for the because the memory accesses must be On the negative side, the performance of different hosts such that size ofthe applica organized into transaction and because roll the algorithms that implement distributed tion -level data structures differs from host backs must be properly handled by the shared memory are sensitive to the shared to host, then conversions on a per-page application emory access behavior of the applica- basis become impossible. For example,an tions. Hence, as we have shown, no single additional problem for numcrical applica Application-level control with lock- algorithm for distributed shared memory tions is that, sincc the application has no ing. Locks can be used by the application will be suitable for most applications. The control over how often a block is migrated not only for its synchronization needs, but performancc-conscious application writer or converted and since accuracy may be lso to improve the performance of the will need to choose an appropriate algo- lost on floating-point conversions, the shared memory algorithms. For example, rithm foran application after careful analy- result may become numerically question in the case of the migration and read -repli- sis or experimentation able cation algorithms, locking data to prevent In some cases, he or she will want to use For these reasons, we consider distrib other sites from accessing that data for a different algorithms (for different data) uted shared memory to be a useful para short period of time can reduce thrashing. within a single application. Moreover digm for implementing a large class of In the case of the full-replication algo- because these algorithms are sensitive to distributed applications, but do not expect rithm, the communication overhead can be the access behavior of the applications, it is it to become widely available in the form of reduced if replica sites only communicate possible to improve their performance a single standardized package, as has been after multiple writes instead of after each significantly either by fine-tuning the the case for remote procedure calls, for write. If a write lock is associated with the application' s use of the memory or by fine- example. rather, we expect that distrib data, then once a process has acquired the tuning the shared memory algorithm for uted shared memory will be made avail lock, it is guaranteed that no other process the access behavior of the particular appli- able in a nunber of furms frun which the ll access that data, allowing it to make cation, thus eliminating the advantages of application writer can choose. L multiple modifications to the data and transparent shared memory access. We Transmitting all modifications made dur- should also emphasize that distributed ng the time the lock was held in a single shared memory may be entirely unsuitable message without causing a consistency for some applications oblem That only incurred when data is being unlocked distributed shared memory as versatile as Acknowledgments rather than every time a process writes to its data-passing counterparts. For ex shared data ample, the distributed shared memory al Many thanks go to Tim Mclnerney, who did Having the application use locks to gorithms we have described are not toler st of the work implementing the migration and read-replication algorithms for the Sun and improve the performance of the shared ant of faults. Whenever a host containing DEC Firefly workstations, and to Orran Krie memory has a number of disadvantages the only copy of some data items crashes, ger, who designed and implemented the opti however. First, the use of locks needs to be critical state is lost. Although the central- mistic full-replication algorithm. The anony directed towards a particular shared mem- server and the full-replication algorithms mous reviewers and the editors provided nu ory algorithm; the shared memory abstrac- can be made tolerant of single-host crashes merous valuable suggestions for improve tion can no longer be transparent. Second, (for example, by using a backup server in ments the application must be aware of the shared the case of the central-server algorithm). it data it is accessing and its shared data. is not clear how to make the migration and access patterns. Finally, in the case of those read-replication algorithms equally fault algorithms that migrate data blocks, the tolerant References application must be aware of the block Compared tu data passing, distributed sizes and the layout of its data in the shared memory does not appear to be as memory suitable for heterogeneous environments R. Bisiani and A. Forin, Multilanguage Parallel Programming of Heterogeneous at this time, although several research cf- Machines, IEEE Trans. Compurers, Vol forts on this problem arc currently under 37, No 8, Aug. 1988, pp. 930-945 D espite the simplifying assump- way. Consider, for example, the migra tions made in the performance tion algorithm in an environment consist. 2. D R Cheriton,"Problem-Oriented Shared Memory: A Decentralized Approach to analyses, the essential characteris- ing of hosts that use different byte order Distributed System Design, Proc. Sixth tics of the four basic algorithms are cap- ings and floating-point representations Int7. Distributed Computing Systems, tured in the models used The concept of when a page is migrated between two Mayl986.pp.190-197 distributed shared memory is appealing hosts of different types, the contents of the because, for many distrib applica- page must he converted before it can be 3. K Li. "Shared Virtual Memory on Loosely Coupled Multiprocessors, "PhD thesis, tions, the shared memory paradigm leads accessed by the application. It is not pos Dept. of Computer Science, Yale Univ to simpler (application) programs than sible for the distributed memory system to 1986 May 1990

...展开详情
2018-12-04 上传 大小:1.07MB
举报 收藏
分享
Machine Learning: Step-by-Step Guide To Implement Algorithms with Python

Machine Learning: Step-by-Step Guide To Implement Machine Learning Algorithms with Python By 作者: Rudolph Russell ISBN-10 书号: 1719528403 ISBN-13 书号: 9781719528405 出版日期: 2018-05-22 pages 页数: 106 This book contains illustrations and step-by-step explanations with bullet points and exercises for easy

立即下载
Step-by-Step Guide To Implement Machine Learning Algorithms with Python pdf

This book is for anyone who would like to learn how to develop machine-learning systems. We will cover the most important concepts about machine learning algorithms, in both a theoretical and a practical way, and we'll implement many machine-learning algorithms using the Scikit-learn library in the

立即下载
Python Algorithms

Python Algorithms explains the Python approach to algorithm analysis and design. Written by Magnus Lie Hetland, author of Beginning Python, this book is sharply focused on classical algorithms, but it also gives a solid understanding of fundamental algorithmic problem-solving techniques. * The book

立即下载
Pro Machine Learning Algorithms Implementing Algorithms in Python

Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R by V Kishore Ayyadevara Bridge the gap between a high-level understanding of how an algorithm works and knowing the nuts and bolts to tune your models better. This book will give you the confidence and s

立即下载
Python Algorithms - Mastering Basic Algorithms in the Python Language

副标题: Mastering Basic Algorithms in the Python Language 作者: Magnus Lie Hetland 出版社: Apress 出版年: 2010-11-24 页数: 336 定价: USD 49.99 装帧: Paperback ISBN: 9781430232377 内容简介 · · · · · ·   Python Algorithms explains the Python approach to algorithm analysis and design. Written by Magnus Lie Hetland, auth

立即下载
Algorithms in Java

The objective of this book is to study a broad variety of important and useful algorithms—methods for solving problems that are suited for computer implementation. We shall deal with many different areas of application, always concentrating on fundamental algorithms that are important to know and in

立即下载
Machine Learning Algorithms

Machine Learning Algorithms by Giuseppe Bonaccorso English | 24 July 2017 | ISBN: 1785889621 | ASIN: B072QBG11J | 360 Pages | AZW3 | 12.18 MB Build strong foundation for entering the world of Machine Learning and data science with the help of this comprehensive guide About This Book Get started i

立即下载
PHP 7 Data Structures and Algorithms

PHP 7 Data Structures and Algorithms by Mizanur Rahman English | 6 Jun. 2017 | ASIN: B01IF7NLDW | 340 Pages | AZW3 | 2.55 MB Key Features Gain a complete understanding of data structures using a simple approach Analyze algorithms and learn when you should apply each solution Explore the true poten

立即下载
c# sorting algorithms

hi this is a program to implement sort algorithms

立即下载
Algorithms in a Nutshell

Creating robust software requires the use of efficient algorithms, but programmers seldom think about them until a problem occurs. This updated edition of Algorithms in a Nutshell describes a large number of existing algorithms for solving a variety of problems, and helps you select and implement th

立即下载
Mastering Machine Learning Algorithms 2018

Explore and master the most important algorithms for solving complex machine learning problems. Key Features Discover high-performing machine learning algorithms and understand how they work in depth. One-stop solution to mastering supervised, unsupervised, and semi-supervised machine learning alg

立即下载
Machine Learning Algorithms pdf format

Machine Learning Algorithms by Giuseppe Bonaccorso English | 24 July 2017 | ISBN: 1785889621 | ASIN: B072QBG11J | 360 Pages | AZW3 | 12.18 MB Build strong foundation for entering the world of Machine Learning and data science with the help of this comprehensive guide About This Book Get started i

立即下载
JavaScript Data Structures and Algorithms

Explore data structures and algorithm concepts and their relation to everyday JavaScript development. A basic understanding of these ideas is essential to any JavaScript developer wishing to analyze and build great software solutions. You’ll discover how to implement data structures such as hash ta

立即下载
Data Structures and Algorithms with JavaScript

If you’re using JavaScript on the server-side, you need to implement classic data structures that conventional object-oriented programs (such as C# and Java) provide. This practical book shows you how to use linked lists, stacks, queues, and graphs, as well as classic algorithms for sorting and sear

立即下载
C++ Data Structures and Algorithms

C++ Data Structures and Algorithms: Build effective, maintainable and readable code in C++ By 作者: Wisnu Anggoro ISBN-10 书号: 1788835212 ISBN-13 书号: 9781788835213 出版日期: 2018-06-11 pages 页数: 409 $44.99 Book Description to Finelybook sorting C++ is a general-purpose programming language which has e

立即下载
Learning Functional Data Structures and Algorithms

Title: Learning Functional Data Structures and Algorithms Author: Atul Khot Length: 394 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2017-05-04 ISBN-10: 1785888730 ISBN-13: 9781785888731 What you will learn Learn to think in the functional paradigm Understand com

立即下载
html+css+js制作的一个动态的新年贺卡

该代码是http://blog.csdn.net/qq_29656961/article/details/78155792博客里面的代码,代码里面有要用到的图片资源和音乐资源。

立即下载
qBittorrent插件集合(22个)

btetree.py cpasbien.py divxtotal.py ilcorsaronero.py kickass.py leetx.py limetorrents.py linuxtracker.py nyaa.py nyaapantsu.py nyaasi.py pantsu.py psychocydd.py rarbg.py rutor.py skytorrents.py sukebei.py sumotorrent.py tntvillage.py torrent9.py torrentfunk.py zooqle.py

立即下载
压缩包爆破解密工具(7z、rar、zip)

压缩包内包含三个工具,分别可以用来爆破解密7z压缩包、rar压缩包和zip压缩包。

立即下载
服务器CPU天梯图_最全CPU天梯图

主要是服务器CPU天梯图_最全CPU天梯图,文字版,不是图片

立即下载