GeneBee BLAST 2.2.8 Services Help
Search options
Databases available for BLAST search
(More info)
The BLAST pages offer several different databases for searching.
Peptide Sequence Databases
- nr – All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
- swissprot – Last major release of the SWISS-PROT protein sequence database (no updates)
Nucleotide Sequence Databases
- nr – All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2
HTGS sequences). No longer "non-redundant".
- Human –
- Rodentes –
- Other Mammals –
- Other Vertebrates –
- Invertebrates –
- Plants –
- Fungi –
- Prokaryotes –
- Organelles –
- Viruses –
- Bacteriophage –
- HTGs – Unfinished High Throughput Genomic Sequences:
phases 0, 1 and 2 (finished, phase 3 HTG sequences are in nr)
- GSSs – Genome Survey Sequence,
includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
- Synthetic –
- Unclassified –
- est –
Your E-Mail
A valid internet email address (somebody@somewhere.domain.country). You should type
your email address in this text box. You don't have to fill in this box if you want to run your search
interactively.
Limit by Entrez Query
BLAST searches can be limited to the results of an Entrez query against the database chosen. This can
be used to limit searches to subsets of the BLAST databases. Any terms can be entered that would normally
be allowed in an Entrez search session. For example:
protease NOT hiv1[Organism]
This will limit a BLAST search to all proteases, except those in HIV 1. This can also be used to limit
searches to a particular molecule type:
biomol_mrna[PROP] AND brain
To limit to a specific organism you can enter the name of the organism in the Entrez Query field with
the [Organism] qualifier. For example:
Mus musculus[Organism]
For help in constructing Entrez queries please see the
"
Writing Advanced Search Statements" section of the Entrez Help document.
Filter (Low-complexity)
Mask off segments of the query sequence that have low compositional complexity, as determined by the
SEG program of
Wootton & Federhen (Computers and Chemistry, 1993) or, for BLASTN, by the
DUST program of
Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically
uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich
regions), leaving the more biologically interesting regions of the query sequence available for specific
matching against database sequences.
Filtering is only applied to the query sequence (or its translation products), not to database sequences.
Default filtering is DUST for BLASTN, SEG for other programs.
It is not unusual for nothing at all to be masked by SEG, when applied to sequences in SWISS-PROT, so
filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are
masked in their entirety, indicating that the statistical significance of any matches reported against
the unfiltered query sequence should be suspect.
Filter (Mask lower case)
This option specifies that any lower-case letters in the input FASTA file should be masked.
Expect
The statistical significance threshold for reporting matches against database sequences; the default
value is 10, meaning that 10 matches are expected to be found merely by chance, according to the
stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is
greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more
stringent, leading to fewer chance matches being reported. Increasing the threshold shows less
stringent matches. Fractional values are acceptable.
Word size
The BLAST algorithm uses "words" to nucleate regions of similarity. The default Word size for a protein sequence
is 3 residues and for nucleotide sequences it is 11 bp. A blastn search will not work with a Word size of less than 7.
A good rule of thumb is that the query length must be at least twice the Word size. For example, if your query is a
protein sequence of 4 residues, than the Word size should be reduced to 2. Please note that the smaller the Word size,
the slower your search will be.
Nucleotide mismatch penalty
Penalty for nucleotide mismatch. For blastn and megablast programs only. Default value is -3.
Matrix
A key element in evaluating the quality of a pairwise sequence alignment is the "substitution matrix", which assigns a score for
aligning any possible pair of residues. The theory of amino acid substitution matrices is described in [1], and applied to DNA
sequence comparison in [2]. In general, different substitution matrices are tailored to detecting similarities among sequences that are
diverged by differing degrees [1-3]. A single matrix may nevertheless be reasonably efficient over a relatively broad range of
evolutionary change [1-3]. Experimentation has shown that the BLOSUM-62 matrix [4] is among the best for detecting most weak protein
similarities. For particularly long and weak alignments, the BLOSUM-45 matrix may prove superior. A detailed statistical theory for
gapped alignments has not been developed, and the best gap costs to use with a given substitution matrix are determined empirically.
For proteins, a provisional table of recommended substitution matrices and gap costs for various query lengths is:
Query length |
Substitution matrix |
Gap costs |
<35 |
PAM30 |
(9,1) |
35-50 |
PAM70 |
(10,1) |
50-80 |
BLOSUM80 |
(10,1) |
>85 |
BLOSUM62 |
(11,1) |
Open And Extended Gaps
The raw score of an alignment is the sum of the scores for aligning pairs of residues and the scores for gaps. Gapped BLAST and
PSI-BLAST use "affine gap costs" which charge the score -a for the existence of a gap, and the score -b for each residue
in the gap. Thus a gap of k residues receives a total score of -(a+bk); specifically, a gap of length 1 receives the score -(a+b).
Only following combinations of the Matrix, Open & Extended Gaps are available: Recomended combinations are colored.
BLOSUM80 |
BLOSUM62 |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
11 |
1 |
8 |
2 |
12 |
1 |
9 |
2 |
10 |
1 |
7 |
2 |
11 |
1 |
8 |
2 |
9 |
1 |
6 |
2 |
10 |
1 |
7 |
2 |
BLOSUM50 |
BLOSUM45 |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
18 |
1 |
15 |
2 |
12 |
3 |
19 |
1 |
15 |
2 |
13 |
3 |
17 |
1 |
14 |
2 |
11 |
3 |
18 |
1 |
14 |
2 |
12 |
3 |
16 |
1 |
13 |
2 |
10 |
3 |
17 |
1 |
13 |
2 |
11 |
3 |
15 |
1 |
12 |
2 |
9 |
3 |
16 |
1 |
12 |
2 |
10 |
3 |
PAM30 |
PAM70 |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
Open Gap |
Extended Gap |
10 |
1 |
7 |
2 |
11 |
1 |
8 |
2 |
9 |
1 |
6 |
2 |
10 |
1 |
7 |
2 |
8 |
1 |
5 |
2 |
9 |
1 |
6 |
2 |
Drop Off
Dropoff value for gapped alignment ( in bits ). Default value is zero.
Other Advanced Options
Options |
Valuetype |
Default |
Descriptions |
-r |
Integer |
1 |
Reward for a nucleotide match |
-f |
Integer |
0 |
Threshold for extending hits, default if zero |
-Q |
Integer |
1 |
Query Genetic code to use |
-D |
Integer |
1 |
DB Genetic code (for tblast[nx] only) |
-J |
T/F |
F |
Believe the query defline |
-W |
Integer |
0 |
Word size, default if zero |
-z |
Integer |
0 |
Effective length of the database (use zero for the real size) |
-K |
Integer |
100 |
Number of best hits from a region to keep |
-L |
Integer |
20 |
Length of region used to judge hits |
-Y |
Real |
0 |
Effective length of the search space (use zero for the real size) |
NCBI GI's
Causes NCBI gi identifiers to be shown in the output, in addition to the accession and/or locus name.
HTML Output
Causes output to be shown in PLAIN or HTML format.
Graphical Overview
An overview of the database sequences aligned to the query sequence is shown. The score of each alignment is indicated by one of
five different colors, which divides the range of scores into five groups. Multiple alignments on the same database sequence are
connected by a striped line. Mousing over a hit sequence causes the definition and score to be shown in the window at the top,
clicking on a hit sequence takes the user to the associated alignments.
Descriptions
Restricts the number of short descriptions of matching sequences reported to the number specified; default limit is 100
descriptions. See also EXPECT.
Alignments
Restricts database sequences to the number specified for which high-scoring segment pairs (HSPs) are reported; the default limit
is 100. If more database sequences than this happen to satisfy the statistical significance threshold for reporting
(see EXPECT), only the matches ascribed the greatest statistical significance are reported.
Alignment View
- pairwise
Standard BLAST alignment in pairs of query sequence and database match.
- Query-anchored with identities
The databases alignments are anchored (shown in relation to) to the query sequence.
Identities are displayed as dashes, with mismatches displayed as single letter nucleotide abbreviations.
-
Query-anchored without identities
Identities are shown as single letter nucleotide abbreviations.
- Flat Query-anchored with identities
The 'flat' display shows inserts as deletions on the query.
Identities are displayed as dashes, with mismatches displayed as single letter nucleotide abbreviations.
- Flat Query-anchored without identities
The 'flat' display shows inserts as deletions on the query. Identities are shown as single letter nucleotide abbreviations.
Level Of Details in Alignment Output
Choose how many details are to be shown in the output.
Genetic Codes
The parameter D can be set to a positive integer to select the genetic code that will be used by blastx and tblastx to
translate the query sequence. In each case, the default genetic code is the so-called "Standard" or "Universal" genetic code.
Note: the numerical identifiers used here for genetic codes parallel those defined in the NCBI software Toolbox; hence some
numerical values will be skipped as genetic codes are updated.
The list of genetic codes available and their associated values for the parameter D are:
Value |
Description |
1 |
Standard or Universal |
2 |
Vertebrate Mitochondrial |
3 |
Yeast Mitochondrial |
4 |
Mold, Protozoan, Coelenterate Mitochondrial and
Mycoplasma/Spiroplasma |
5 |
Invertebrate Mitochondrial |
6 |
Ciliate Macronuclear |
9 |
Echinodermate Mitochondrial |
10 |
Alternative Ciliate Macronuclear |
11 |
Eubacterial |
12 |
Alternative Yeast |
13 |
Ascidian Mitochondrial |
14 |
Flatworm Mitochondrial |