Introduction
The conventional view of protein structure, built up over thirty years of X-ray structure determination, treats the atoms of a protein as fixed in space relative to one another, in a complex shape containing regions of local structure, such as α-helices and β-sheets. This view is undoubtedly correct for many proteins, however it is clear that large numbers of proteins do not exhibit fixed structure [1]. Initially such proteins were identified because regions of the structure were not visible in the crystallographic X-ray diffraction experiments, a consequence of either static or dynamic disorder [2,3]. The development of NMR methods for investigating protein structures in solution has made the xperimental detection of structural disorder relatively simple, even for large proteins [4]. Factors including the complexity of the aminoacid sequence and the content of charged and non-polar residues are major determinents of disorder [1-3,5]. The application of such considerations to entire genome sequences with bioinformatic tools has led to the suggestion that the majority of proteins in some eukaryotes, including humans, have disordered domains of up to fifty amino acids in length [2-3,6]. Intrinsically disordered, or "natively unfolded", proteins may be common becausedisorder leads to a high conformal entropy that is suggested to be advantageuos for a protein searching for its interaction partner. Indeed many natively unfolded proteins fold into an ordered structure on binding to a partner [7,8], and yeast proteins with disordered regions of seventy or more contiguous residues have more protein-protein interaction partners than other proteins [9].
The aim of this work is to investigate the use of modern machine learning methods, such as the Support Vector Machine (SVM), artificial neural network (ANN) and Relevance Vector Machine (RVM) for identifying regions within proteins lacking a fixed structure, based on amino acid sequence data. Kernel learning methods, such as the SVM, appear especially promising for work of this nature as they are able to operate directly on structured data, such as graphs, trees, or in this case sequence data.
Online Disorder Prediction
Acknowledgements
This work was supported by a discipline-hopping grant made jointly by the U.K. Medical Research Council (MRC), Engineering and Physical Sciences Research Council (EPSRC) and Biotechnology and Biological Sciences Research Council (BBSRC), administered by the MRC (Grant number 67192 - "Predicting protein disorder with advanced machine learning tools").
References
N.B. DOI rrepresnts a link to online material via the Digital Object Identifier
DOI system, where available.
[1] Uversky, V. N., "Natively unfolded proteins: a point where biology waits
for physics", Protein Science, vol. 11, pp. 739-756, 2002.
(DOI)
[2] Dunker, A. K., Brown, C. J., Lawson, J. D. and Iakouchova, L. M. and
Obradovic, Z., "Intrinsic disorder and protein function",
Biochemistry, vol. 41, no. 21, pp 6573-6582, 2002.
(DOI)
[3] Dunker, A. K., Lawson, J. D., Brown, C. J., Williams, R. M., Romero, P.,
Oh, J. S., Oldfield, C. J., Campen, A. M., Ratliff, C. M., Hipps, K. W.,
Ausio, J., Nissen, M. S., Reeves, R., Kang, C. H., Kissinger, C. R., Bailey,
R. W., Griswold, M. D., Chiu, W., Garner, E. C., and Obradovic, Z.,
"Intrinsically disordered protein", Journal of Molecular Graphics and
Modelling, vol. 19, no. 1, pp. 26-59, 2001.
(DOI)
[4] Dyson, H. J. and Wright, P. E., "Nuclear magnetic resonance methods for
elucidation of structure and dynamics in disordered states", Methods in
Enzymology, vol. 39, pp. 258-270, 2001.
[5] Romero, P., Obradovic, Z., Li, X., Garner, E. C., Brown, C. J. and Dunker,
A. K., "Sequence complexity of disorder", Proteins: Structure, Function
and Genetics, vol. 42, no. 1, pp. 38-48, 2000.
(DOI)
[6] Dunker, A. K. and Obradovic, Z., "The protein trinity - linking function
and disorder", Nature Biotechnology, vol. 19, pp. 805-806, 2001.
(DOI)
[7] Dyson, H. J. and Wright, P. E, "Coupling of folding and binding for
unstructured proteins", Current Opinion in Structural Biology, vol. 12,
pp. 54-60, 2002.
(DOI)
[8] Tompa, P., "Intrinsically unstructured proteins", Trends in Biochemical
Sciences, vol. 27, no. 10, pp. 527-533, 2002.
(DOI)
[9] Liu, J. F., and Tan, H. P. and Rost, B., "Loopy proteins appear conserved
in evolution", Journal of Molecular Biology, vol. 322, no. 1, pp.
53-64, 2002.
(DOI)
Research Team: Gavin Cawley, Stephen Hayward and Prof. Geoff Moore (CAP).