论文研究-Improved Taboo Search Algorithm for Designing DNA Words.pdf

所需积分/C币:10 2019-08-23 15:34:30 440KB .PDF
7
收藏 收藏
举报

DNA词设计的改进禁忌搜索算法,张凯,许进,DNA序列设计是DNA计算中现实而重要的研究课题之一。我们通过改进禁忌搜索算法来系统地设计等长DNA序列,该方法设计的DNA序列可以满足
山国武技记文在线 http://www.paper.edu.cn the Dimer structure, such subsequences AAAA,, TTTT, 'CCCC, 'GGGG,, ' CATG', TCGA, AGCT,GTAC are forbidden Hairpin Constraint: Hairpin is an undesirable structure, because it can hybridize itself. Hairpin structure consists of ring part and stem part. Stem is formed with basepairs hy bridization. Typical minimum stem length 3 Taboo Search Algorithm for DNA Encoding The taboo search has been successfully applied to a large number of combinatorial optimization problems. The algorithm is an enhancement of t he well-known hill climbing heuristic, which use a memory function to avoid being trapped at a locaI minimum Generate an initial DNA word. store it as the current seed solution and clear the taboo list. Is stop criterion satisfied? Output DNA words set Generate neighbors of the current seed solution by a ncighborhood structure Y Does the evaluate Store the neighbor solution constraints satisfy? as the new seed Add the neighbor solution to the taboo list. Fig. 1. Taboo search algorithm for DNA encoding The TS procedure is generally simple. The procedure starts with a feasible initial DNA word with required length and stores it as the current seed. Then neighbor DNa words of the current seed are produced by the ncighborhood structurc. Those arc candidate solutions. Thesc DNA words arc evaluated by ccrtain combinatorial constraints and thermodynamic constraints, and a candidate which satisfics thc aspiration critcrion is sclcctcd as new seed solution. This selection is called a move and added to taboo list. Iterations are repeated until a stop criterion is satisfied. Fig. 1 shows the procedure of taboo search algorithm 山国武技记又在线 http://www.paper:edu.cn 1. Initial solution The initial solution is a nucleotide word string which generated by random method. Then we use the GC content, forbidden subsequence and slide lllisIalches constraints to lest the initialize. If the initialize solution fits all these constraints it will be added to taboo list and stores it as the current seed solution 2. The neighborhood structure A neighborhood structure is a mechanism which can obtain a set of new neighbor solutions by applying a diversification strategy to a given solution. Each neighbor solution is reached immediately from a seed solution by a move. Neighborhood structure is directly effective on the efficiency of Ts algorithm In this paper. the neighborhood solutions are generated by exchanging sequences nucleotide alphabet. The wholc ncighborhood representation is about a permutation of all dna words fits hamming distance constraint This significantly facilitates the data structure, since every solution may be stored by means of a permutation of the nucleotide alphabet ( A, C, G, Tf which length is n For a given solution word W=(w1, W2, .. Wj,..., Wn), replace w; wiLh another nuclide alphabet ki. W2 A k,k;∈{A,CG,T}. The new word is w′=(1,2,……,k2,…,mn) and the Hamming distance H(W,W")is1 If required Hamming distance is h, replace(ui1, wi2, .. win) with(ki1, ki2, .. kih) at different positions. the Hamming distance between two word I(W.w')is h If required DNA word length is m and required Hamming distance is h, we random change h different nuclide alphabet of seed word W, such neighborhood structure can construct Ch neighbor candidate solutions 3. Evaluating neighbor solution Each gencratcd dNa words must bc evaluated by combinatorial constraints and thcrmodynamic constraints If the word violates any of thosc constraints. thc word will bc added to taboo list For each solution W S,let fHmeasure(w, w:) be the H-measure constraint function, let fGc(wy be the GC content constraint function, let fs(w, w: be the Similarity constraint function, let frs(w) be the forbidden subsequence constrainT funcLion, and let / P(w) be Lhe Hairpin coustraint function. When loving from a soluTioN w lo another solulion W'EN(w). the new solution is evalualed by below functions (1)fr measuer(w)-2I measure(w Wi) compute all the II measure( w, Wi) values. If the function result don't satisfy target value, new DNA sequence w should be neglected and added to the taboo list table (2)fGa(w) calculate the GC content of the word W. If it is don't satisfy the required GC content, the tested sequence should be added to the tahoo list. (3)J5(w) S W, Wi)compute all the S(w, W.) values. If the function result don't satisfy target value, new DNA scqucncc W should be neglected and added to the taboo list table (4)fEs(w) scarch for all forbidden sequences. If it is don't satisfy thc rcquired GC content, the tested scqucncc should be added to the taboo list (5) fHp()test the Hairpin constraint. If it is don't satisfy the minimum stem length, the tested sequence should be added to the taboo list 山国武技记文在线 http://www.paper.edu.cn Because all these constraints are parallel, the five terms listed above are logical multiplication respectively Solutions are then evaluated using a funclion F(W)=fs(w) fH W)JFS(W)JHP(W) 4. Taboo list and Termination Criterion Because evaluating a neighbor solution will spend much time, if a neighbor solulion violate any combinatorial or thermodynamic constraint, il will be added into taboo list lo avoid repeat testing The algorithm stops when the number of iterations to a maximum value, or the solution DNA words set has got enough DNA words 4 Simulation results In the above section, the taboo search algorithm was improved to design good DNA sequences. The algorithm has been implemented on Pascal language compiler of Borland Delphi 7. In the simulation, DNA sequences of length 20-mer are considered. The neighborhood structure Hamming distance is 9. The subsequences 'AAA,,'CCC GGG, tTT' could guarantee that cach dNA scqucncc must satisfy the continuity constraint We choose a best DNA scqucncos sct generated by our algorithm and comparcd the rosults with Dcaton( 19 et al. and Soo-Yong Shin 20 et al. The comparison results are shown in Table 1. Our sequences show much lower similarity values and H-measure. This implies the sequences made by our algorithm have much higher probability lo hy bridize with the its conpleinentary sequences. Moreover, the secondary structure is more prohibited due Lo the very low continuity and hairpin. GC conlent assure these DNA sequences have sinilar thermodynamic characteristics 5 Conclusion A new algorithm of DNA sequence design for DNA computing is proposed. Because the neighbor structure in t his algorithm can overlap whole the solution space, the dNA sequence set is one of greatest sets which satisfy the constraints. Due to all the constraints in our algorithm are parallel, additional constraints are supported by the algorithm and can be integrated into our model in a straightforward way Acknowledgements The projcct was supported by the National Natural Scicncc Foundation of China(grant Nos. 60373089, 60674106 30570431, 60703047, and 60533010), the Program for New Century Exccllent Talents in University(NCET-05 0612), the Ph.D. Programs Foundation of Ministry of Education of China(20060487014)), and the Chenguang Program of Wuhan(200750731262) 山国武技记又在线 http://www.paper:edu.cn Table 1. Comparison results of the sequences by our algorithm, and the sequences in [19] and [20] equence(5′-3) Continuity Hairpin( Similariy H-measureGC% Our Algorithn CCACCACCAATGATGTTAGG 53 62 50 ACCTTAGAACAGACGCTTGC 0000000 60 CACCTAACTAATCGGCTGGA AACITGATGAGCGAIGGACO CTCCAATGTGAACCTGTGTC CGCCATAGATAGATACCGTC ATAGGCGTTACTGAGCATCC 000000n 60 50 53 50 66 50 53 60 50 Deaton et al ATAGAGTGGATAGTTOTGGG 9 55 CATIGGCGGCGCGIAGGOII 6i9 CTTGTGACCGCTTCTGGGGA 16 60 GAAAAAGGACCAAAAGAGAG 41 45 58 40 GATGGTGCTTAGAGAAGTGG 0 GTATCTCGIITTAACATCO 000043 50 35 TTGTAAGCCTACTGCGTGAC 55 75 50 Soo-Yong Shin et al CTCTTCATCCACCTCTTCTC 43 50 CTCTCATCTCTCCGTTCTTC 00000 00000 50 TATCCTGTGGTGTCCTTCCT 57 45 AlTCIGTCOGI'TGCGTGTC 2 50 TCTCTTACGTTGGTTGGOTG 50 GTATTCCAAGCGTCCGTGTT 0 50 AAACCTCCACCAACACACCA 9 5 50 References 1. L M. Adelman: Molecular computation of solutions to combinatorial problems. Science, 266(5187),(1994), 102-1024 2. A. Brennerman and A. E Condon: Strand Design for Bio-Molecular Computation. Theoretical Computer Science, 287 (201),39-58 3. A. Marathe, A. Condon, and R. M. Corn. Condon: On Combinatorial DNA Word Design. Journal of Computational Biology,8,(2001),201-219 4. R. Dealoll, M. Garzon. R. Murphy, D. Franceschetti, and S Stevens: Genetic Search of Reliable Encodings for DNA Based Computation. in Proceedings of the 1st Annual Conference on Genetic Programming, 21,(1996),9-15 5. D. D. Shoemaker, D. A. Lashkari, D. Morris, M. Mittmann, and R. w. Davis: Quantitative Phenotypic Analysis of Yeast Deletion Mutants Using a Highly Parallel Molecular Bar-coding Strategy. Nature, 16,(1996), 450-456 6. A. G. FrutosL. M. Smith, and R. M. Corn: Enzymatic ligation reactions of DNA "Word"on surfaces for DNA compuling. Journal of the American Chemical Society, 120(40), 1998, 10277-10282 7. A.G. ITutos, Qinghua liu, A 'I'. 'Thiel, A. W. Scanner, A. E Condon, I M. Smith, and R. M. Corn: Demonstration of a word design strategy for DNA computing on surface. Nucleic Acids Research, 25,(1997),4748-4757 8. U. Feldkamp, W. Banzhaf, and H. Rauhe. A DNA sequence compiler. In: Proceedings of 6th DIMACs Workshop on DNA Based Computers, 2000 9. R. DealOll, M. Garzon, R. C. Murphy, J. A. Rose, D. R. FrancescheTti, and S.E. Stevens. Genetic Search of reliable Encodings for DNA-Based Computation. In: 15I Genetic Pro-gramming Conference, 1996 10. R. Deaton, R C. Murphy, M. Garzon, D. R Franceschetti, and S. E. Stevens. Good encod-ings for DNA-based solutions to combinatorial problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 11,(1999) 247258 11. R. Deaton, R. C. Murphy. J. A Rose, M. Garzon, D. R Franceschetti, and S.E. Stevens. A DNA Based Implementation of an Evolutionary Computation. In: Proceedings IEEE Con-ference on Evolutionary Computation, Indians, 1997, 267 271 12. F. Glover: Tabu searchPart I ORSA Journal on Computing 1,(1989), 190-206 13. F. Glover: Tabu searchPart II. ORSA Journal on Computing 2,(1990), 4-32 14. Alain Hertz and Marino Widmer: An improved ta bu search approach for solving the job shop scheduling problem with tooling constraints Discrete Applied Mathematics, 65(3);(1996), 319-345 15. Ivo Blochligera, Nicolas Zuffereyb: A graph coloring heuristic using partial solutions and a reactive tabu scheme Computers and Operations Research, 11(5).(2006), 1-16 6 山国武技记文在线 http:/www.paper.edu.cn 16. M. P. Hansen: Tabu search for multiobjective optimization: MOTS. Paper Presented at the 13th International Con ference on Mulli Criteria DecisiOn Making(MCDMA97),(1997) 17. S. Kulturel-Konak, D). W. Coit and A. E. Smit h: Efficiently solving the redundancy a. I location problem using tabul search. IIE Transactions, 35(6),(2003), 515-520 18. C. Friden, A. Hertz and d. de Werra: An exact algorithm based on tabu search for finding a maximum independent set in a graph Computers and Operations Research, 17(5),(1990), 437-445 19. R. DealOll, R. C. Murphy, M. Garzon, D. R. FranceschetTi, anld S.E. Stevens, Jr. Goud encodings for DNA-base solutions to combinatorial problems. in Proc. 2nd Annu. Meeting I)\A Based Comput, 1996, 247-258 20. Soo-Yong Shin, In-Hee Lee, Dongmin Kim, and Byoung-Tak Zhang Multiobjective Evolutionary Optimization of DNA Sequences for Reliable DNA Computing. IEEE Trans on Evolutionary Computation, 2005, 9(2): 113-158 7

...展开详情
试读 7P 论文研究-Improved Taboo Search Algorithm for Designing DNA Words.pdf
立即下载 低至0.43元/次 身份认证VIP会员低至7折
一个资源只可评论一次,评论内容不能少于5个字
weixin_39841848 如果觉得有用,不妨留言支持一下
2019-08-23
  • 至尊王者

    成功上传501个资源即可获取
关注 私信 TA的资源
上传资源赚积分or赚钱
    最新推荐
    论文研究-Improved Taboo Search Algorithm for Designing DNA Words.pdf 10积分/C币 立即下载
    1/7
    论文研究-Improved Taboo Search Algorithm for Designing DNA Words.pdf第1页
    论文研究-Improved Taboo Search Algorithm for Designing DNA Words.pdf第2页

    试读结束, 可继续读1页

    10积分/C币 立即下载 >