PatScanUI provides a graphical interface for the PatScan tool. See the about page for the relevant references.
As a first step, you need to upload a sequence file in FASTA format. This is the sequence you can search using PatScan. The file will be kept on the server as long as you work on it and deleted an hour after you finished using the software.
Once your file is uploaded, you can specify a pattern by dragging pattern elements from the menu bar on the right side of the screen to the work area. Once the pattern element is on the work area, you can fill in details, and reorder the elements by dragging them around.
Above the pattern area, there is a configuration menu containing a couple of buttons to adjust the software.
The DNA/Protein selector allows you to tell PatScan what kind of sequence file you uploaded. This affects the selection of available pattern elements.
While working on the pattern, you can switch on the continuous preview, that will regularly search your uploaded sequence for the currently selected pattern. This will allow you to tweak the pattern until you get the desired result.
If you want to get rid of the current pattern elements, you can clear all of them using the clear button. Alternatively, you may drag single elements onto the trash on the right hand menu.
If you want to get an example of many of the possible pattern elements, you can load an example.
Below the work area, you can expand the computed pattern view, in case you need to tweak the generated pattern manually
PatScan supports many different pattern types, here we will explain how they work.
String patterns allow you to specify a DNA/RNA/protein sequence to search for.
This is the most basic pattern to use. For DNA sequences, in addition to
the regular ACGT
nucleotide letters, you can also use the
IUPAC ambiguity codes.
String patterns can be matched while allowing variations.
Range patterns are a shorthand for using a string pattern consisting of a run of
arbitrary nucleotides or amino acids. They are especially useful to allow
for a variable size of insertions. So if you e.g. wanted to allow for 20
to 30 arbitrary nucleotides, you can use a range pattern from
20 to 30 instead of using a string pattern of 20
N
s and allowing up to 10 insertions.
Complement patterns match the reverse complement of a named pattern. Thus, they only apply for DNA and RNA sequences. It is possible to specify alternative complementation rules to specify which bases can be complements. Complements can also have some variations.
Repeat patters repeat a previously specified named pattern. It can have variations compared to the original pattern.
An alternative pattern contains any two other patterns and requires that one or the other (or potentially both) get a match.
The length limit pattern allows to specify that the length of any named pattern(s) is/are less than a specified number of bases or amino acids. Use shift-click to select multiple named patterns from the list.
The weight pattern allows users to specify a position specific probability/weight matrix for DNA sequences. For every position in the matrix, the probabilities of the individual bases (in %) should add up to 100. The final weight of the pattern is calculated from the percent values of the matching bases.
The Any-Of pattern allows matching a list of amino acids for a given position.
The Not-Any-Of pattern is the inverse of the Any-Of pattern and allows specifying a list of amino acids that can not occur at a given position.
Alternative complementation rules allow creating base pairings that are not covered by the
conventional AT
, TA
, CG
, or GC
pairings.
If you want to allow for e.g. GU
and UG
pairings in RNA, you can create
an alternative complementation rule for all of these pairings.
When patterns allow for variations, you can specify the number of mutations (mismatches), insertions or deletions you want to allow.
Patterns can be assigend names by clicking the unnamed button. Named patterns can be used in complement patterns and repeat patterns.
Code | M | R | W | S | Y | K | V | H | D | B | N |
Nucleotides | AC | AG | AT | CG | CT | GT | ACG | ACT | AGT | CGT | ACGT |