AliComp - GeneBee Alignment Comparison Help

General options

Alignment title

Type in a title for this session for you to remember. This title used in pictures.

1st alignment

Short name of the first alignment.
Type in an alignment algorith name (Genebee, ClustalW, Manual...). This name placed at upper left corner of pictures.

2nd alignment

Short name of the second alignment.
Type in an alignment algorith name (Genebee, ClustalW, Manual...). This name placed at upper left corner of pictures.

Comparison picture options

Picture size

All images have same size (except "AUTO" height when two alignments have different number of sequences). Width and height of the images may be set from 320 to 3000 and 200 to 10000 pixels, respectively. Defaults are 765 and "AUTO". If the height set to "AUTO" the program select the value itself depending on the number of sequences in the alignment. Two alignments may have different number of sequences and different order of sequences. Same sequences not necessary must have same names, but must have same letter strings (case insensitive).

Offset color scale

1 - max - offset color range is 1 to maximum value;
min - max - offset color range is minimum to maximum value.
The second option works only when both smallest left offset and smallest right offset are greater than 1.

Picture types

There are the following comparison pictures:
Offset comparison - two basic comparison pictures.  First one  contains
                    offsets  of the first alignment relative to the second 
                    alignment. Second picture contains offsets of the second
                    alignment relative to the first alignment. 

Column values comparison - two pictures (the 1st alignment versus the 2nd 
                           one, and 2nd versus 1st) with comparison of column 
                           values in the range best to worst. X axe contains percent
                           of columns of alignment. These pictures are useful to
                           compare alignments with very different lengths too.

Col. values comparison 2 - two pictures (the 1st alignment versus the 2nd
                           one, and 2nd versus 1st) with comparison of column
                           values in the range best to worst. X axe contains quantity
                           of columns of the 1st or 2nd alignment respectively. 

Compare mismatched sequences

The sequences of same names that have at least one different letter at same position are mismatched sequences. Such sequences will not be compared as a default (mode "No"). You can select "Yes" to compare them as though they are okay. In any case, the first mismatched positions are marked out in the picture (see the picture legends).

Picture options for separate alignment images

Picture size

All images have same size. Width and height of the images may be set from 450 to 3000 and 200 to 10000 pixels, respectively. Defaults are 765 and "AUTO". When height set to "AUTO", the program selects the value itself (about 16 pixels per displayed sequence).

Coloring mode

Even - all column space are colored;
Sparse - every column is colored with respect to the column weight:
         less weight more space is left uncolored;
Very sparse - more space is left uncolored. So, important columns
              (with greater weight) are better seen.

Splitting block size

Long alignments can be splitted into several parts. You can select the size of one part in the range 50 to 500 letters.

Weight matrices

Select a weight matrix to compute the column weights.
Defaults are Blosum62 for protein and DNA/RNA for nucleotide.

Picture types

An alighment can be presented in one or several graphical forms :
All groups - all letter groups are colored;
Max group at column - colored only letters of a maximum (dominant) group 
                      at every column;
Column values - a few statistical functions (column values): 
  - the number of letters in max group at the column,
  - the number of same letters at the column,
  - the number of groups at the column,
  - the weight of the column;
Column values (best-to-worst) - same functions but another ordering. 
                                That is in the range best to worst over X-axe.

Also you can get a picture when only selected groups are colored. 

Color group type

AUTO - the program defines the molecular type of the source 
       alignment and selects respective group type;
Protein - protein groups will be used (see below);
Nucleotide - nucleotide groups will be used (see below);
You can select the color group type, protein or nucleotide, or leave it to the program (the type "AUTO") to decide which one should be used. There are the following protein letter groups:
      1. _______A G _______________ 

      2. _______C _________________ 

      3. _______D E N Q B Z _______ 

      4. _______I L M V ___________ 

      5. _______F W Y _____________
 
      6. _______H _________________ 

      7. _______K R _______________ 

      8. _______P _________________ 

      9. _______S T _______________
 
     10. _______Others ____________ 
There are the following nucleotide letter groups:
      1. _______A _________________ 

      4. _______C _________________ 

      6. _______G _________________ 
 
      4. _______T U _______________ 

      5. _______Others ____________

Add picture with selected groups

Select one or several groups to be colored in additional picture. You can get also the picture just with gaps without any groups colored (option "Gaps only").
Group set depends on selected group type.

Alignment

The query alignment should be written in one-letter code (low or upper case) and can be divided to several strings. It have to be blank string between sequental batches and at the begining of every batch all sequence names should be repeated.

Example of two alignments to be compared:

First alignment:

                     .**.. ....+. .....+.. +.......++.. . +.... +....   ....*. ..
HCVPCP2     (     1) EDGVNVHDVTVTTDKSFEQQV-GVIADKDKDLSGAVPSDLNTSELLTK-----AIDV-DW
TGVPCP2     (     1) EDNVNHERVSVSFDKTYGEQLKGTVVIKDKDVTNQLPSAFDVGQKVIK-----AIDI-DW
MHVPCP2     (     1) VDGVNFRSCCVAEGEVFGKTL-GSVFCDGINVTKVRCSAIYKGKVFFQysdLSEADLvAV
MHVPCP1     (     1) REGIAEAKATVCAD--AVDACPDQVEA--FEIEKVEDSILDELQTELN----APADK-TY
HCVPCP1     (     1) EEGGNDLSLPVMISEWPLSVQQAQQEATLPDIAEDVVDQVEEVNSIFD---IETVDV---
TGVPCP1     (     1) SEG--------------AEGTSSQEEVETVEVADITSTD-EDVD-IVE---VSAKDD-PW
IBVPCP      (     1) EDGVKYRSIVLKPGDSLGQ--FGQVYAKNK-IV-FTADDVEDKEILYV----PTTDK-SI


                     ....+..+...+..... ....++...  .+....+..++***+++.++.+* ....+. 
HCVPCP2     (    54) VEFYGFKDAVTFATVDHSAF-AYESAV-VNGIRVLKTSDNNCWVNAVCIALQYSKPHFIS
TGVPCP2     (    55) QAHYGFRDAAAFSASSHDAY-KFEVVT-HSNFIVHKQTDNNCWINAICLALQRLKPQWKF
MHVPCP2     (    60) KDAFGFDEPQLLKYYTMLGMCKWPVVV-CGNYFAFKQSNNNCYINVACLMLQHLSLKFPK
MHVPCP1     (    52) EDVLAFDAVCSEALSAFYAVPSDETHFkVCGFYSPAIERTNCWLRSTLIVMQSLPLEFKD
HCVPCP1     (    55) -----KHDVSPF-------EMPFEELN---GLKILKQLDNNCWVNSVMLQIQLTGI----
TGVPCP1     (    41) AAAVDVQEAEQF----NPSLPPFKTTN-LNGKIILKQGDNNCWINACCYQLQ----AFDF
IBVPCP      (    52) LEYYGL-DAQKYVIYLQTLAQKWNVQY-RDNFLILEWRDGNCWISSAIVLLQAAKIRFKG


                      ....+*..*. *... **..++.......+. .*.. .+ .+. ....*.. . . . .
HCVPCP2     (   112) QGLDAAWNKFVLGDVEIFVAFVYYVARLMKGDKGDAEDTLTKLSKYLANEAQV-QLEHYS
TGVPCP2     (   113) PGVRGLWNEFLERKTQGFVHMLYHISGVKKGEPGDAELMLHKLGDLMDNDCEI-IVTHTT
MHVPCP2     (   119) WQWQEAWNEFRSGKPLRFVSLVLAKGSFKFNEPSDSIDFMR--VVLREADLSGATCNLEF
MHVPCP1     (   112) LEMQKLWLSYKAGYDQCFVDKL--VKSVPKSIILPQGGYVADFAYFFLSQCSF-KAYANW
HCVPCP1     (    96) LDGDYAMQFFKMGRVAKMIERCYTAEQCIRGAMGDVGLCMYRL----LKDLHTGFMVMDY
TGVPCP1     (    92) FN-NEAWEKFKKGDVMDFVNLCYAATTLARGHSGDAE-YLLEL---MLNDYSTAKIVLAA
IBVPCP      (   110) F-LTEAWAKLLGGDPTDFVAWCYASCTAKVGDFSDANWLLANLAEHFDADYTNAFLKKRV


                     .* .*... ..  .++.  ..+..... ........* .+..  ......+......+..
HCVPCP2     (   171) SCVECDAKFK--NSVASINSAIVCASVKRDGVQVGYCVHGIK--YYSRVRSVRGRAIIVS
TGVPCP2     (   172) ACDKC-------AKVEKFVGPVVAAPLAIHGTD-ETCVHGVS-VNV-KVTQIKGTVAITS
MHVPCP2     (   177) VC-KCGVKQEQRKGVDA---VMHFGTLDKGDLVRGYNIACTCgSKLVHCTQFNVPFLICS
MHVPCP1     (   169) RCLECDMELK-LQGLDA---MFFYGDVVSHMCK---CGNSMT------LLSADIPYTLHF
HCVPCP1     (   152) KC-SC-----TSGRLEE-SGAVLFCTPTKKAFPYGTCLNCNA-PRMCTIRQLQGTIIFVQ
TGVPCP1     (   147) KC-GCGEK---EIVLER---AVFKLTPLKESFNYGVCGDCMQ-VNTCRFLSVEGSGVFVH
IBVPCP      (   169) SC-NCGIKSYELRGLEACIQPVRATNLLHFKTQYSNCPTCGA-NNTDEVIEASLPYLLLF


                      .   .   .  ....+....+.*....**+.. . .... ..*+ ..... . . ....
HCVPCP2     (   227) VEQ-LEPCAQSRLLSGVAYTAFSGPVDKGHYTVYDTAKKS-MYDG---DRFV--KHD-LS
TGVPCP2     (   222) LIG----PIIGEVLEATGYICYSGSNRNGHYTYYDNRNGL-VVDAEKAYHFNRDLLQVTT
MHVPCP2     (   233) NTP--EGRKLPDDV--VAANIFTGG-SVGHYTHVKCKPKYqLYDACNVNKVSEAKGNFTD
MHVPCP1     (   216) GVR-DDKFCAFYTPRKVFRAACAVDVNDCHSMAVVEGKQI---DGKVVTKFIGDKFDFMV
HCVPCP1     (   204) QKP-EPVNPVSFVVKPVCSSIFRGAVSCGHYQTNIYSQNL-CVDGFGVNKIQPWTNDALN
TGVPCP1     (   199) DILsKQTPEAMFVVKPVMHAVYTGTTQNGHYMVDDIEHGY-CVDGMGIKPLKK--RCYTS
IBVPCP      (   227) ATD--GPATVDCDEDAVGTVVFVGSTNSGHCYTQAAGQAF---DNLAKDRKFGKKSPYIT


                     .+....... ......  .. ..+... ++ ...
HCVPCP2     (   279) LLSVTSVVM------VGGYVAPVNTVKPKPVINQ
TGVPCP2     (   277) AIASNFVVKK-PQAEERPKNCAFNKVAASPKIVQ
MHVPCP2     (   288) CLYLKNLKQTFSSVLTTFYLDDVKCVEYKPDLSQ
MHVPCP1     (   272) GYGMTFSMSPFELAQLYGSCITPNVCFVK-----
HCVPCP1     (   262) TICIKDADY---NAKVEISVTPIKN---------
TGVPCP1     (   256) TLFINANVM--TRAEKPKQEFKVEKVEQQPIVEE
IBVPCP      (   282) AMYTRFAFK----NETSLPVAKQSKGKSKS-VKE
Second alignment:
HCVPCP2         EDGVNVHDVTVTTDKSF-EQQVGVIADKDKDLSGAVPSDLNTSELLTKAIDVDWVEFYGF
TGVPCP2         EDNVNHERVSVSFDKTYGEQLKGTVVIKDKDVTNQLPSAFDVGQKVIKAIDIDWQAHYGF
MHVPCP2         VDGVNFRSCCVAEGEVF-GKTLGSVFCDGINVTKVRCSAIYKGKVFFQYSDLSEADLVAV
TGVPCP1         --------SEGAEGTSS-QEEVETVEVADITST-----DEDVDIVEVSAKDDPWAAAVDV
HCVPCP1         -----------EEGGND--LSLPVMISEWPLSVQQAQQEATLPDIAEDVVDQVEEVNSIF
IBVPCP          EDGVKYRSIVLKPGDSL--GQFGQVYAKNKIVFTAD-DVEDKEILYVPTTDKSILEYYGL
MHVPCP1         REGIAEAKATVCADAVD--ACPDQVEAFEIEKVEDSILDELQTELNAPA-DKTYEDVLAF

HCVPCP2         KDAVTFATVDHSAF-------AYESAVVNGIRVLKTSDNNCWVNAVCIALQYSKPHFISQ
TGVPCP2         RDAAAFSASSHDAY-------KFEVVTHSNFIVHKQTDNNCWINAICLALQRLKPQWKFP
MHVPCP2         KDAFGFDEPQLLKYYTMLGMCKWPVVVCGNYFAFKQSNNNCYINVACLMLQHLSLKFPKW
TGVPCP1         QEAEQF-NPSLPPF---------KTTNLNGKIILKQGDNNCWINACCYQLQAFD--FFN-
HCVPCP1         DIETVDVKHDVSPF-------EMPFEELNGLKILKQLDNNCWVNSVMLQIQLTG--ILDG
IBVPCP          DAQKYVIYLQTLAQ-------KWNVQYRDNFLILEWRDGNCWISSAIVLLQAAKIRFKG-
MHVPCP1         DAVCSEALSAFYAVPS-----DETHFKVCGFYSPAIERTNCWLRSTLIVMQSLPLEFKDL

HCVPCP2         GLDAAWNKFVLGDVEIFVAFVYYVARLMKGDKGDAEDTLTKLSKYLAN---EAQVQLEHY
TGVPCP2         GVRGLWNEFLERKTQGFVHMLYHISGVKKGEPGDAELMLHKLGDLMDN---DCEIIVTHT
MHVPCP2         QWQEAWNEFRSGKPLRFVSLVLAKGSFKFNEPSDSIDFMRVVLREADLS--GATCNLEFV
TGVPCP1         --NEAWEKFKKGDVMDFVNLCYAATTLARGHSGDAEYLLELMLNDYST----AKIVLAAK
HCVPCP1         --DYAMQFFKMGRVAKMIERCYTAEQCIRGAMGDVGLCMYRLLKDLHT----GFMVMDYK
IBVPCP          FLTEAWAKLLGGDPTDFVAWCYASCTAKVGDFSDANWLLANLAEHFDADYTNAFLKKRVS
MHVPCP1         EMQKLWLSYKAGYDQCFVDKLVKSVPKSIILP-QGGYVADFAYFFLSQ---CSFKAYANW

HCVPCP2         SSCVECDAKFKNSVASINSAIVCASVKRDGVQVGYCVHGIKYYSRVRSVRGRAIIVSVEQ
TGVPCP2         TACDKCAKVEKFVGPVVAAPLAIHGTD-ET-----CVHGVSVNVKVTQIKG---TVAITS
MHVPCP2         CKCGVKQEQRKGVDAVMHFGTLDKGDLVRGYN-IACTCGSKLVHCTQFNVP----FLICS
TGVPCP1         CGCGEKEIVLERAVFKLTPLKESFNYGVCG----DCMQVNTCRFLSVEGSG-VFVHDILS
HCVPCP1         CSCTSGRLEESGAVLFCTPTKKAFPYGTCLN----CNAPRMCTIRQLQGTI--IFVQQKP
IBVPCP          CNCGIKSYELRGLEACIQPVRATNLLHFKTQYS-NCPTCGANNTDEVIEASLPYLLLFAT
MHVPCP1         R-CLECDMELKLQGLDAMFFYGDVVSHMCK-----CGNSMTLLSADIPYTL----HFGVR

HCVPCP2         LEPCAQSRLLSGVAYTAFSGPVDKGHYT---------VYDTAKKSMYDGDRFVKHDLSLL
TGVPCP2         LIGPIIGEVLEATGYICYSGSNRNGHYT---------YYDNRNGLVVDAEKAYHFNRDLL
MHVPCP2         NTPEGRKLPDDVVAANIFTGG-SVGHYTHVKCKPKYQLYDACNVNKVSEAKGNFTDCLYL
TGVPCP1         KQTPEAMFVVKPVMHAVYTGTTQNGHYM---------VDDIEHGYCVDGMGIKPLKKRCY
HCVPCP1         EPVNPVSFVVKPVCSSIFRGAVSCGHYQ---------TNIYSQNLCVDGFGVNKIQPWTN
IBVPCP          DGPATVDCDEDAVGTVVFVGSTNSGHCYT-------QAAGQAFDNLAKDRKFGKKSPYIT
MHVPCP1         DDKFCAFYTPRKVFRAACAVDVNDCHSMA-------VVEGKQIDGKVVTKFIGDKFDFMV

HCVPCP2         S----VTSVVMVGGYVAP-----------VNTVKPKPVINQ--
TGVPCP2         Q----VTTAIASNFVVKKPQAEERPKNCAFNKVAASPKIVQ--
MHVPCP2         KNLKQTFSSVLTTFYLDD-----------VKCVEYKPDLSQ--
TGVPCP1         TSTLFINANVMTRAEKPKQ--E-----FKVEKVEQQPIVEE--
HCVPCP1         D---ALNTICIKDADYNA-----------KVEISVTPIKN---
IBVPCP          AMYTRFAFKNETSLPVAK------------QSKGKSKSVKE--
MHVPCP1         G---YGMTFSMSPFELAQ-----------LYGSCITPNVCFVK

Most important alignment rules and typical errors:

1. Different sequences should have different names.
2. Empty symbols aren't advisory inside sequence name.
3. Several blanks are advisory between sequence and its name.
4. Empty lines are mandatory inside parts of alignment.
5. Gap symbol must be '-'.
6. Beginning and finishing gaps are necessary.


Last updated: August 26, 2001.