EduAlign
Data
Sequences to align | Optimal alignment |
---|---|
SILENT | |
LISTEN |
Parameters
Match: | 1 |
Substitution: | -1 |
Gap: | -1 |
Actions
The matrices and detailed explanations will appear here once you run the alignment algorithm.
Starting with two empty matrices:
The algorithm begins by initializing two matrices: the score matrix and the traceback matrix. The score matrix is used to store cumulative scores for aligning a prefix of the first sequence with a prefix of the second sequence. It represents all possible alignments as a two-dimensional table, where each cell corresponds to an intermediate step of alignment. The first sequence is associated with the rows, and the second sequence with the columns. The traceback matrix is used to record the directions from which the optimal scores originate in the score matrix. These directions (⭦, ⭡, ⭠) are crucial for reconstructing the optimal alignment during the backtracking phase. Initially, both matrices are empty except for a score of 0 in the cell (0, 0)
.
Initializing the first row:
The first row of the score matrix is filled to represent the alignment of an empty sequence with a prefix of the second sequence. Each cell in this row is calculated by cumulatively adding gap penalties, as only alignments with gaps are possible in this case. For example, the cell (0, j)
contains the score j * gap
, where gap
is the penalty defined by the user. In the traceback matrix, each cell in this row is marked with a leftward direction (⭠), indicating that these scores are obtained by successively adding gaps to the first sequence.
Initializing the first column:
The first column of the score matrix is filled to represent the alignment of a prefix of the first sequence with an empty sequence. As with the first row, each cell is calculated by cumulatively adding gap penalties, with the score i * gap
for the cell (i, 0)
. In the traceback matrix, each cell in this column is marked with an upward direction (⭡), indicating that these scores are obtained by successively adding gaps to the second sequence.
Filling the matrices row by row:
This step is the core of the algorithm. For each cell (i, j)
in the matrix (other than those in the first row and first column), three possible scores are calculated: the score for a position match or mismatch between the two sequences (⭦), the score for a gap in the first sequence (⭠), and the score for a gap in the second sequence (⭡). The maximum score among these three possibilities is recorded in the cell (see the tooltip), reflecting the best possible alignment for the corresponding prefixes of the two sequences. The traceback matrix is simultaneously updated to record the source of the maximum score (diagonal, upward, or leftward). If scores are tied, the algorithm makes an arbitrary choice. All cells in the matrix are progressively filled in this manner until the bottom-right corner is reached.
Backtracking:
Once the score matrix is completely filled, the algorithm uses the traceback matrix to reconstruct the optimal alignment. The process starts at the bottom-right cell (m, n)
of the matrix, where m
and n
represent the lengths of sequences 1 and 2. Following the directions recorded (⭦, ⭡, ⭠), the algorithm moves back through the matrix to reach the top-left cell (0, 0)
. Each direction corresponds to a decision: a diagonal move (⭦) aligns two characters, a horizontal move (⭠) introduces a gap in the first sequence, and a vertical move (⭡) introduces a gap in the second sequence. This step produces the two aligned sequences, ready to be displayed.
Alignment completed:
When the algorithm reaches the cell (0, 0)
, the optimal alignment is fully reconstructed. The global score, located in the cell (m, n)
, represents the best possible score for aligning the two sequences based on the defined parameters. The user can visualize the aligned sequences, the introduced gaps, and the path followed in the matrix to achieve this optimal solution.