This file lists the files available on the CATH FTP site
In the text $VERSION indicates the version of CATH e.g. v2.4
Primary CATH Data Files
-----------------------
CathDomainDescriptionFile.$VERSION [Format: CDDF]
CathDomainList.$VERSION [Format: CLF]
CathChainList.$VERSION [Format: CLF]
CathNames.$VERSION [Format: CNF]
CathDomainSeqs.$VERSION.fa [Format: FASTA]
CathSegmentSeqs.$VERSION.fa [Format: FASTA]
Derived Files
-------------
CathDomainList.$VERSION.Treps [Format: CLF]
CathDomainList.$VERSION.Hreps [Format: CLF]
CathDomainList.$VERSION.S35reps [Format: CLF]
CathDomainList.$VERSION.S95reps [Format: CLF]
CathDomainList.$VERSION.S100reps [Format: CLF]
Primary CATH Data File Descriptions
-----------------------------------
FILE: CathDomainDescriptionFile.$VERSION
Contains a description of each protein domain in CATH (classes 1-4)
relating to a specific release of CATH.
This includes the overall PDB name and source information (from PDB file),
CATH superfamily number, CATH name of class, architecture and topology
(CATH homologous superfamily name is only included if it has been defined).
Also included is domain and segment information such as length, amino acid sequence
and domain boundary PDB residue identifiers.
[Uses Cath Domain Description File (CDDF) format]
FILE: CathDomainList.$VERSION
Contains all classified protein domains in CATH version $VERSION
for class 1 (mainly alpha), class 2 (mainly beta),
class 3 (alpha and beta) and class 4 (few secondary structures).
[Uses Cath List File (CLF) format]
FILE: CathChainList.$VERSION
Contains all the PDB chains that have been split into domains in CATH version $VERSION
The chains have been clustered into sequence families and are assigned to class 5
to signify multi-domain entries in CATH.
[Uses Cath List File (CLF) format]
FILE: CathNames.$VERSION
Description of each node in the CATH hierarchy for class, architecture, topology
and homologous superfamily levels.
[Uses Cath Names File (CNF) format]
FILE: CathDomainSeqs.$VERSION.fa
Protein sequences for all domain entries in CATH version $VERSION (classes 1-4).
[Uses Fasta format]
FILE: CathSegmentSeqs.$VERSION.fa
Protein sequences for all segment entries in CATH version $VERSION (classes 1-4).
Segments are continuous regions of sequence within a protein domain.
[Uses Fasta format]
Derived File Descriptions
-------------------------
The following files contain representative protein domains derived from the
CathDomainList.$VERSION file (classes 1-4)
FILE: CathDomainList.$VERSION.Treps
A representative protein domain from each topology.
[Uses Cath List File (CLF) format]
FILE: CathDomainList.$VERSION.Hreps
A representative protein domain from each homologous superfamily.
[Uses Cath List File (CLF) format]
FILE: CathDomainList.$VERSION.S35reps
A representative protein domain from each S35 sequence family.
(previously know as S-level = sequence level)
Sequences are clustered at 35% sequence identity.
[Uses Cath List File (CLF) format]
FILE: CathDomainList.$VERSION.S95reps
A representative protein domain from each S95 sequence family
(previously know as N-level = nearly-identical level)
Sequences are clustered at 95% sequence identity.
[Uses Cath List File (CLF) format]
FILE: CathDomainList.$VERSION.S100reps
A representative protein domain from each S100 sequence family
(previously know as I-level = identical level)
Sequences are clustered at 100% sequence identity
[Uses Cath List File (CLF) format]