# Problems and solutions in biological sequence analysis pdf

Posted on Sunday, May 2, 2021 5:21:39 AM Posted by BernabГ© V. - 02.05.2021 and pdf, guide pdf 3 Comments

File Name: problems and solutions in biological sequence analysis .zip

Size: 21906Kb

Published: 02.05.2021

- Problems and Solutions in Biological Sequence Analysis
- Biological Sequence Analysis
- Biological Sequence Analysis with Hidden Markov Models on an FPGA

*Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly.*

Metrics details. Chaos Game Representation CGR is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L -long suffix will be located within 2 -L distance of each other.

## Problems and Solutions in Biological Sequence Analysis

This algorithm is based on the classical Simulated Annealing SA. SAPS is implemented in order to obtain results of pair and multiple sequence alignment. SA is a simulation of heating and cooling of a metal to solve an optimization problem. In order to select randomly a current solution, SAPS algorithm chooses a solution from solutions that have been previously generated within the Metropolis Cycle.

This simple change has led to increase the quality of the solution to the problem of aligning genomic sequences with respect to the classical Simulated Annealing algorithm. The parameters of SAPS, for certain instances, are tuned by an analytical method, and some parameters have experimentally been tuned. The instances used are specific genes of the AIDS virus. Sequence alignment is one of the most important and challenging problems in computational biology and bioinformatics [ 1 , 2 ].

Finding the optimal alignment of a set of sequences is known as a NP-complete problem [ 3 ]. Alignment of sequences can be an important tool to measure the similarity of two or more sequences. Sequence Alignment is classified as a combinatorial optimization problem [ 4 ], which is solved by using computer algorithms. These algorithms lead to represent, to process, and to compare genetic information to determine evolutionary relationships among living beings [ 3 ].

The sequence alignment highlights areas of similarity among sequences. The similarities among sequences may indicate functional or evolutionary relationships among genes or proteins [ 5 ]. The problem of sequence alignment is to obtain the maximum alignment of a set of genomic sequences, which is denoted as ; each sequence of this set is formed by the alphabet.

The solution to this problem is represented by , which denotes a set with the alphabet. Exact algorithms have been applied to solve the sequence alignment problem. For example, dynamic programming has been one of the most used to solve the sequence alignment problem [ 6 , 7 ]. The disadvantage of using exact algorithms is that these generate optimal solutions for small problems, but for large problems, exact algorithms become inefficient.

For this reason, several metaheuristic methods have been designed to obtain suboptimal alignments. Metaheuristics have also been applied to solve this problem [ 8 ], for example, Ant Colony Algorithm [ 9 ], Simulated Annealing [ 10 , 11 ], Genetic Algorithms [ 12 ], among others.

The disadvantage is that metaheuristics do not guarantee optimal solutions, but solutions generated can be very close to optimal solution in a reasonable processing time. The proposed algorithm is a modified version of classical Simulated Annealing.

This paper is organized as follows: in Section 2 , classical simulated annealing algorithm is described. In Section 4 , the analytical tuning method is described. In Section 6 , the experimentation and results are described. Finally, Section 7 discusses the conclusions. The classical Simulated Annealing is an algorithmic process that simulates the gradual metal cooling for crystallization.

This algorithm usually starts at high value of temperature, and then this parameter is decreased until a final temperature is reached. The final temperature typically is very close to zero [ 13 , 14 ].

Through a cooling function, the temperature value is decreased from the initial temperature to the final temperature. There are cooling functions that have been used in the simulated annealing algorithm [ 15 — 18 ]; the most common cooling function is defined by.

This function decreases the temperature value by a factor, which does a range of. A gradual cooling is applied when is very close to 1, and a fast cooling is applied when is very close to 0. The classical Simulated Annealing has two cycles; the first cycle is named Cycle of Temperature. Into this cycle, value temperature is decreased by a cooling function.

The second cycle is named Metropolis Cycle, and it is applied to generate, to accept, or to reject solutions for the problem to be optimized. Algorithm 1 shows the pseudo code of the classical Simulated Annealing. The initial and final temperature values are set see line 1. These values are obtained by an analytical see Section 4 or experimental way.

It is recommended that the initial temperature is as high as possible, and the final temperature is as close to zero. The initial solution of the problem to be optimized is created see line 2.

The current solution is set to. Set T to initial temperature see line 3. The temperature cycle is executed from the initial temperature to the final temperature see lines 4— The Metropolis Cycle gets started see lines 5— This cycle takes a number of times specified in the stop criterion.

A new solution is created within the Metropolis Cycle by creating a small perturbation to the current solution see line 6. The difference between these two solutions and is obtained. If the difference is less or equal than zero see line 8 , the new solution is accepted see line 9. If the difference is greater than zero, the Boltzmann probability is calculated see line If the Boltzmann probability is higher than a random value between 0 and 1 see line 12 then the new solution is accepted see line After the Metropolis Cycle is completed, the temperature value is decreased see line Algorithm 2 shows the pseudo code of the SA, which is applied to obtain solutions to the problem of aligning two or more genomic sequences.

The Simulated Annealing algorithm is modified then it can be implemented to solve the problem of alignment sequence. The values of initial and final temperatures are tuned by using an analytical method see lines The cooling factor value is set to a value very close to 1 see line 3. The current solution is set to the original solution see line 4. The similarity of this solution is calculated by comparing base by base see line 5. The variable is set to the initial temperature see line 6.

The Metropolis Cycle length is set to an initial value see line 7. This cycle has an increasing length, at high temperature, it has a low value, and it is increased as the temperature is decreased.

The length of Metropolis Cycle is increased by a factor , where must be greater than 1. Temperature cycle is executed see lines 8—29 with a logic condition that T is greater than. Within this cycle, the variable is updated with value 1 see line 9 , and within the metropolis cycle, this variable is incremented see line The Metropolis Cycle is executed see lines 10— At the end of the Metropolis Cycle, the temperature is decreased see line 27 , and the Metropolis Cycle length is increased see line Within the Metropolis Cycle, new solutions are generated by modifying the current solution.

This is done by adding or removing gaps into DNA sequences see line The similarity of new solutions is calculated see line 12 , and the difference of similarities between and is calculated see line This difference is denoted by.

The new solutions are accepted when these are better than current solutions, so current solutions are replaced by new solutions see line When new solutions are of low quality worse solutions than current solutions, then new solutions are accepted using the Boltzmann probability see line This probability is directly related to the current value of the temperature and the quality difference between and.

The Boltzmann probability is calculated by the following equation. As the temperature value is decreased, the probability of is decreased, which is of range. In order to generate high-quality solutions to sequence alignment, the classical SA was modified, so the SAPS algorithm is a modified version of the classical SA.

After the Metropolis Cycle execution is done, the selection of a current solution is done. During the execution of Metropolis Cycle, the best solutions are stored in a set named. The best of all solutions created in this cycle is stored in.

The original sequence is stored in. After the Metropolis cycle is finished, a current solution is randomly selected from , , or. At high temperature, is set to a small value and as the temperature value is increased, the value of the Metropolis Cycle length is increased until. Thus, an increasing number of solutions are created as the temperature is decreased. At high temperatures, a small number of solutions are created and as the temperature is decreased, the number of solutions is increased with a factor , where.

Algorithm 3 shows the pseudo code of SAPS, some lines of code were added to SA, for example, at line 5, and are set with. At line 19, is added to. At line 31, is chosen from , , or. Some parameters of SAPS are tuned by the analytical method [ 19 — 22 ]. For example, in order to calculate the initial temperature, the maximum deterioration defined by of the instance is applied. The probability of accepting a solution is applied at high temperature.

On other hand, the final temperature is calculated by applying the minimum deterioration defined by of the instance and the probability of accepting a Solution at low temperature. The analytical tuning based on Boltzmann distribution can be helpful for setting up the initial temperature [ 21 ]. The probability of accepting any new solution is very close to 1 at high temperatures, so the deterioration of cost function is maximal.

The initial temperature is associated with the maximum deterioration admitted and the defined acceptance probability. Let be the current solution and a new proposed one, and and are the costs associated to and , respectively; the maximum and minimum deteriorations are expressed as and , respectively. Then, the probability of accepting a new solution with the maximum deterioration is defined by. This equation basically is the Boltzmann Distribution, which is applied for calculating the.

This temperature value is defined by.

## Biological Sequence Analysis

Search this site. Aikido para tod s PDF. Album de la Maison Charri re. Alice in Wonderland PDF. Alyvia PDF. Angela Burt-murray PDF. Artisan bread and pastry cookbook PDF.

## Biological Sequence Analysis with Hidden Markov Models on an FPGA

The Open University has a new and improved website. Get familiar with our new site. Topics include advanced alignment methods, Hidden Markov Models, and next-generation sequencing data analysis methods.

#### You are here

To browse Academia. Skip to main content. By using our site, you agree to our collection of information through the use of cookies. To learn more, view our Privacy Policy. Log In Sign Up.

This algorithm is based on the classical Simulated Annealing SA. SAPS is implemented in order to obtain results of pair and multiple sequence alignment. SA is a simulation of heating and cooling of a metal to solve an optimization problem. In order to select randomly a current solution, SAPS algorithm chooses a solution from solutions that have been previously generated within the Metropolis Cycle. This simple change has led to increase the quality of the solution to the problem of aligning genomic sequences with respect to the classical Simulated Annealing algorithm. The parameters of SAPS, for certain instances, are tuned by an analytical method, and some parameters have experimentally been tuned.

In bioinformatics , a sequence alignment is a way of arranging the sequences of DNA , RNA , or protein to identify regions of similarity that may be a consequence of functional, structural , or evolutionary relationships between the sequences. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as indels that is, insertion or deletion mutations introduced in one or both lineages in the time since they diverged from one another. In sequence alignments of proteins, the degree of similarity between amino acids occupying a particular position in the sequence can be interpreted as a rough measure of how conserved a particular region or sequence motif is among lineages.

Танкадо решил потрясти мир рассказом о секретной машине, способной установить тотальный правительственный контроль над пользователями компьютеров по всему миру.