Note that registration will not be complete until you have replied to the confirmation email and acknowledged by the challenge organisers.
We provide pre-processed data for two benchmark datasets that you are encouraged to used in development and testing of your submissions. These datasets will not be used to determine the final ranking of submissions to the challenge, but will be used in the leader board during the development phase of the project.
Here x represents a matrix of standardised input
features, where each row represents a subject and each column
rpresents a gene, and t is a column vector giving
the desired classification, {-1,+1}, for each subject. The
function returns the model, which may be a structure,
an object or a vector/matrix of model parameters and
index, which is a vector specifying the columns of
x which represent relevant features (i.e. biomarker
genes).
The second function predict.m is used to make the
predictions on the test data:
Here y is a column of test scores; scores below
zero will be classified as belonging to the negative class and
scores of zero or above the positive class. As the AUROC
statistic is used for performance estimation, teams are
strongly advised to generate probabilistic or at least
continuous classifiers as the ranking of subjects is important.
Ideally y represents the log-odds ratio. In this
case x only contains columns representing the
features (in the order) specified in index.
An example submission, based on sparse logistic regression
using Bayesian regularisation (see here
for details) is provided to illustrate the requirements.
function [model,index] = train(x, t)
function y = predict(model, x)
Note that evaluation is computationally intensive, and so to ensure resources are allocated fairly, teams are limited to one submission per week 7 days. In addition, a new submission will only be evaluated after the evaluation of that team's previous submission is complete.