Instructions

Start by registering a team here.
Note that registration will not be complete until you have replied to the confirmation email and acknowledged by the challenge organisers.
Download data files.
We provide pre-processed data for two benchmark datasets that you are encouraged to used in development and testing of your submissions. These datasets will not be used to determine the final ranking of submissions to the challenge, but will be used in the leader board during the development phase of the project.
- Golub et al. leukaemia dataset - AML-v-ALL (paper,golub.mat)
- Alon et al. colon cancer dataset - tumor-v-normal (paper,alon.mat)
Prepare your submission. You must provide two functions, one of which is used to train a classifier and select relevant features, the other is used to generate predictions on the test data. The interface details depend on the programming language adopted:
- MATLAB: You must provide a file named train.m used to train the model.
  function [model,index] = train(x, t)
  Here x represents a matrix of standardised input features, where each row represents a subject and each column rpresents a gene, and t is a column vector giving the desired classification, {-1,+1}, for each subject. The function returns the model, which may be a structure, an object or a vector/matrix of model parameters and index, which is a vector specifying the columns of x which represent relevant features (i.e. biomarker genes).
  The second function predict.m is used to make the predictions on the test data:
  function y = predict(model, x)
  Here y is a column of test scores; scores below zero will be classified as belonging to the negative class and scores of zero or above the positive class. As the AUROC statistic is used for performance estimation, teams are strongly advised to generate probabilistic or at least continuous classifiers as the ranking of subjects is important. Ideally y represents the log-odds ratio. In this case x only contains columns representing the features (in the order) specified in index.
  An example submission, based on sparse logistic regression using Bayesian regularisation (see here for details) is provided to illustrate the requirements.
Submit your solution. Note that evaluation is computationally intensive, and so to ensure resources are allocated fairly, teams are limited to one submission per week 7 days. In addition, a new submission will only be evaluated after the evaluation of that team's previous submission is complete.
Note that in order to take part in the challenge, you must agree for your submitted implementation to be put into the public domain. This is required for two reasons, firstly for the scrutineering phase of the challenge, to ensure fairness, and secondly to provide a resource for rigorous evaluation in future studies.







University of East Anglia