BioCreAtIvE II - Protein-Protein Interaction Task

From GM-RKB
Jump to navigation Jump to search

The BioCreAtIvE II - Protein-Protein Interaction Task is a BioCreAtIvE Benchmark Task to evaluate Semantic Relation Recognition Systems on their ability to Recognize Protein-Protein Interactions.



References

  • http://biocreative.sourceforge.net/biocreative_2_ppi.html
    • Protein-Protein Interaction Task
    • Reflecting the process of database curator annotation extraction, several sub-tasks are posed. Each participant is free to take part at any (or all) of the proposed sub-tasks. (You can check out the PPI relevant Q & A page).
    • Protein Interaction Article Sub-task 1 (IAS): In practice, before detecting protein interaction descriptions in sentences, it is necessary to select those articles which contain relevant information relative to protein interactions. Although this aspect is critical for subsequent steps, it has often been neglected by previously published protein-interaction extraction systems. Thus this sub-task will be concerned with the classification of whether a given article contains protein interaction information.
      • Participants will need to return a ranked list of articles (identifiers) based on their relevance for protein interaction annotation. To evaluate the participating systems, the AROC (area under the receiver operating characteristic curve) measure based on the ranked predicted collections. (We had in the beginning also considered using additional evaluation metrics, e.g. utility measure[35]). The training collection will contain:
      • a) TP: (True Positives) collection of PubMed article abstracts which are relevant for protein interaction curation.
      • b) TN: (True Negatives) consists in articles which have been classified by domain expert curators from these two databases as not relevant for protein interaction curation.
      • c)*TP: (likely True Positives) consists of a collection of PubMed identifiers of articles which have been used for protein interaction annotation by other interaction databases (namely BIND, HPRD, MPACT and GRID).
    • Protein Interaction Pairs Sub-task 2 (IPS): This sub-task is related to the identification of protein-protein interaction pairs from full text articles. As training data the participants will get a collection of articles with the associated interaction pairs extracted from these articles, as well as the corresponding gene mention symbols. In case of the test set predictions, participants have to provide, for each article, a ranked list of protein-protein interaction pairs. The evaluation will be in terms of precision and recall of the predicted protein interaction pairs for each article.
    • Protein Interaction Sentences Sub-task 3 (ISS):
      • In practice, protein-protein interaction information for a given pair of proteins might be mentioned several times throughout a full text article. To produce a protein interaction summary, for instance, it is useful to select the most relevant sentence expressing interaction information for a given pair. Therefore one of the sub-tasks will ask participants to provide, for each protein interaction pair, a ranked list of maximal 5 text passages (containing at most 3 sentences per passage) describing their interaction.
      • For the evaluation, pooling methods will be used, as follows: all the sentences from all the systems for each document are collected. We will evaluate according to two aspects: a) the Percentage of interaction relevant sentences with respect to the total number of predicted (submitted) sentences and b) the Mean reciprocal rank (MRR) of the ranked list of interaction evidence passages with respect to the manually chosen best interaction sentence. Point b) is the most important evaluation criteria.
    • Protein Interaction Method Sub-task 4 (IMS):
      • For annotation purposes, as well as to judge the quality of protein interactions, it is important to know how protein interactions have been determined experimentally. In case of protein-protein interaction annotation, considerable effort has been made to develop a controlled vocabulary about interaction methods. This sub-task refers to the identification of the type of experiment which was used to confirm a given protein-protein interaction. The experimental method description has to be mapped into a previously provided controlled hierarchical vocabulary of experimental methods [36]. In this case the evaluation will be measured by the mean reciprocal rank of correctly identified interaction methods (correct MI identifiers) for each protein-protein interaction pair compared to the previously manually annotated interaction detection methods. This hierarchical controlled vocabulary is available at MI.