Combination of direct information methods and alignments improves contact prediction.

Data Sets

PconsC has been benchmarked on a set of 150 small-sized proteins (test set), that have been used in the development of PSICOV. This choice was made in order to allow direct comparison between PconsC and other, published contact prediction methods.

The method development of PconsC has been aided by a set of 44 proteins (training set), that have been derived from recently recently published studies using contact prediction for protein 3D structure prediction and augumented with several proteins of known structure and sufficient amount of related sequences. These 44 proteins (19 globular and 25 membrane ones) are not homologous to each others or to any member of the test set. Some of the proteins in the test set may be homolgous, but the set was kept intact, for the sake of comparability.

The average length of a prediction target in the training set is greater than in the test set