`PATRONUS` homepage

### Overview

Assessing the statistical relevance of a motif in a stretch of DNA or protein is a task of central importance in biology, since in many cases the over- or under-representation of a motif may be linked to biological activity (although in general additional factors should be taken into account to decide whether or not a given occurrence of a motif has indeed some real biological activity).

`PATRONUS`is a program designed to compute in a very fast way the exact probability of observing a given number of occurrences of a simple motif (that is, a continuous word without gaps) in a sequence. Its intended scope is the analysis of very long biological sequences, like chromosomes or whole genomes of complex organisms. The probability is computed on the basis of the Markovian statistics of order m for the sequence, that is the recorded number of the occurrences of all the submotifs of length m + 1 in the sequence. Contrary to what many people believe, computing such a probability for a generic motif is a computationally demanding task, mainly because motifs can overlap in non-trivial ways.

`PATRONUS`(from "

`PAT`tern

`R`ecognition by

`O`ptimized

`N`umerical

`U`niversal

`S`coring") is able to produce the whole bulk of the probability distribution function, and works best in the range of m between 0 and 3, where we believe it to be the fastest in the world in its class (this claim is currently under peer review); for low orders of the Markov model our program is sometimes even faster than approximate methods — and the longer the sequence being analyzed, the better. The principles of our new computational method, which is based on systolic deterministic finite-state automata and the fast Fourier transform, are explained in http://arXiv.org/abs/0801.3675.

On the other hand, if you need to use Markov models of order larger than 3, you have to evaluate a composite (gapped) motif, or you prefer approximate methods for the analysis of your sequence, then you should still refer to other "cousin" programs, like for instance spatt. Similarly, you should consider that

`PATRONUS`is not by itself a tool for motif finding (although we are currently developing new applications that exploit its algorithm along this direction).`PATRONUS`is mostly written in Objective Caml, a very-high-level functional programming language, with a couple of

`C`insets for the sake of optimization. Thanks to this architecture, the source code is remarkably compact if compared to the functionality offered by the program. To get a feeling about what you can expect from the program a number of different timings for many realistic situations may be found in http://arXiv.org/abs/0801.3675, or in

`PATRONUS`User's manual. In general, the program is very fast but also quite memory-hungry (you should have at least 2 Gigabytes of RAM if you intend to explore sequences which are between 10^8 and 10^9 in length).

At the moment the program is distributed in executable form, but starting from some later point in time the source code will be made freely available. The program is free for academic and non-commercial use, but is

*not*free for any commercial use; please check carefully how the conditions of the license (which is distributed along with the program) apply to your case. Any kind of use of the program implies full acceptance of the license.
Finally, although we tried hard to produce a valuable and high-quality software, we are conscious that many bugs and oddities are still lurking in the depths of the code. Thus, any bug report or suggestion for improvement will be sincerely welcome.

### News

(

**12/10/2008**) Announcement:*Starting from the next public release —after we have solved some packaging problem*`PATRONUS`to go open-source!`:)`—`PATRONUS`' sources will be distributed along with the precompiled binaries. Stay tuned!
(

**7/7/2008**) Version 101 is now out. It incorporates various bugfixes w.r.t. the new features introduced in version 94. Empty lines in FASTA input, which were not legal in previous builds, are now accepted.
(

**not public**) Version 94 is out.*A new non-FFT algorithm to directly evaluate the truncated distribution and the p-value has been introduced; this algorithm is much faster for moderate values of the number of pattern occurrences (<=25).*Different evaluation "strategies"; are now possible, allowing the user to specify whether the computation should be preferentially directed towards the computation of the p-value, the z-value, or both. The default behaviour (which used to be z-value in previous versions) has been changed to p-value.
(

**5/5/2008**) Version 90 is out. It is mostly a bug-fixing release.*Since two major bugs have been fixed (one affecting the computation of p-values, one in the computation of Markov models of order 0), we recommend to all the users of version 84 to upgrade to version 90*.