http://compbio.mcs.anl.gov/CHISEL
ftp://ftp.mcs.anl.gov/pub/compbio/chisel

RELEASE: rel-1B May 2006.

CHISEL LIBRARIES
================
The Chisel_clusters_from_PIR.tar.gz file contains the chisel libraries 
generated from the homologous enzymatic PIR superfamilies. 


THe flow of the analysis is as follows:
1. The file takes a superfamily as input and retrieves all the
sequences for the family.

2. Retrieve feature information
   function data - getEC subroutine
   taxonomy data - getTaxonomy
   xref to NCBI gi - getGIS
   organism data - getOrganism
   Domain data - getInterpro
   Fasta data - getFasta
 For a set of sequences the above subroutines are queried to retrieve
 their feature information

3. Check for the kind of analysis to be performed
   ecAnalysis - check if its a single function or a multiple function
   family
   orderOfAnalysis - check if domain analysis needs to be done or not
   (more than 75 seqs perform domain analysis)

4. A hash of all attributes - A hash of all the feature information is
   stored for faster retrieval

5. If its a multiple function and more than 75 seqs family then the
   following subroutines are invoked
   interproAnalysis - performs the domain analysis by grouping
   sequences having the same composition of domains
   clustalwAnalysisMultipleEC - performs the separation of the
   sequences for every cluster resulting from the above domain
   analysis. This subroutine is invoked for every cluster from the
   previous analysis and the multiple ec analysis function is invoked.

6. If its a single function analysis, then the domain analysis is
   performed by invoking the interproAnalysis function and then by
   calling the clustalwAnalysisSingleEC function repeatedly on each
   cluster to group sequences based on the taxonomy and by running the
   clustalw tool on the sequences.

7. Cluster Annotations - The clusters are annotated by sending in the
   cluster group and assigning the identifier to the group using the
   subroutine assignClusterID.

8. profiling the results - The results of the analysis are profiled by
   running HMMER and BLOCKS on each individual cluster of
   sequences. This is done by using createHMM and createBlocks functions.

9. reporting the results - The results are reported using the function
   writeReport which generates a file of the resulting clusters

10. hash of the clusters - the createClusterHash function is invoked
    in order to create a hash of all the clusters generated from the
    analysis.


Description of files
====================
Inside each directory there will be found several files and directories:

hmm - directory containing hmm models, multiple alignments, dendrograms
	of the chisel clusters generated.

pssm - directory containing pssm models, consensus sequences, blocks of the
	chisel clusters generated.

*.clusters - file contains information on each of the chisel clusters 
	generated and which sequences belong to it.

*.report - file contains data on each of the chisel clusters generated with 
	phenotypic data as well.

*.nonclusters - file contains data on groups of sequences that didn't have 
	sufficient sequences to form a chisel cluster.

*.outliers - file contains data of sequences that were 
	outliers (i.e. length of sequences were significantly different, etc)

*.attribhash, *.clushash, *.fastahash - hash files with all the data necessary
	to generate the clusters