PatScan

Basic Usage

PatScanUI provides a graphical interface for the PatScan tool. See the about page for the relevant references.

As a first step, you need to upload a sequence file in FASTA format. This is the sequence you can search using PatScan. The file will be kept on the server as long as you work on it and deleted an hour after you finished using the software.

Once your file is uploaded, you can specify a pattern by dragging pattern elements from the menu bar on the right side of the screen to the work area. Once the pattern element is on the work area, you can fill in details, and reorder the elements by dragging them around.

Above the pattern area, there is a configuration menu containing a couple of buttons to adjust the software.

The DNA/Protein selector allows you to tell PatScan what kind of sequence file you uploaded. This affects the selection of available pattern elements.

While working on the pattern, you can switch on the continuous preview, that will regularly search your uploaded sequence for the currently selected pattern. This will allow you to tweak the pattern until you get the desired result.

If you want to get rid of the current pattern elements, you can clear all of them using the clear button. Alternatively, you may drag single elements onto the trash on the right hand menu.

If you want to get an example of many of the possible pattern elements, you can load an example.

Below the work area, you can expand the computed pattern view, in case you need to tweak the generated pattern manually

Available Patterns

PatScan supports many different pattern types, here we will explain how they work.

String Pattern

String patterns allow you to specify a DNA/RNA/protein sequence to search for. This is the most basic pattern to use. For DNA sequences, in addition to the regular ACGT nucleotide letters, you can also use the IUPAC ambiguity codes. String patterns can be matched while allowing variations.

Range Pattern (insertion range)

Range patterns are a shorthand for using a string pattern consisting of a run of arbitrary nucleotides or amino acids. They are especially useful to allow for a variable size of insertions. So if you e.g. wanted to allow for 20 to 30 arbitrary nucleotides, you can use a range pattern from 20 to 30 instead of using a string pattern of 20 Ns and allowing up to 10 insertions.

Complement Pattern

Complement patterns match the reverse complement of a named pattern. Thus, they only apply for DNA and RNA sequences. It is possible to specify alternative complementation rules to specify which bases can be complements. Complements can also have some variations.

Repeat Pattern

Repeat patters repeat a previously specified named pattern. It can have variations compared to the original pattern.

Alternative Pattern

An alternative pattern contains any two other patterns and requires that one or the other (or potentially both) get a match.

Length limit Pattern

The length limit pattern allows to specify that the length of any named pattern(s) is/are less than a specified number of bases or amino acids. Use shift-click to select multiple named patterns from the list.

Weight Pattern

The weight pattern allows users to specify a position specific probability/weight matrix for DNA sequences. For every position in the matrix, the probabilities of the individual bases (in %) should add up to 100. The final weight of the pattern is calculated from the percent values of the matching bases.

Any-Of Pattern

The Any-Of pattern allows matching a list of amino acids for a given position.

Not-Any-Of Pattern

The Not-Any-Of pattern is the inverse of the Any-Of pattern and allows specifying a list of amino acids that can not occur at a given position.

Alternative Complementation Rule

Alternative complementation rules allow creating base pairings that are not covered by the conventional AT, TA, CG, or GC pairings. If you want to allow for e.g. GU and UG pairings in RNA, you can create an alternative complementation rule for all of these pairings.

Code	M	R	W	S	Y	K	V	H	D	B	N
Nucleotides	AC	AG	AT	CG	CT	GT	ACG	ACT	AGT	CGT	ACGT