PPLRE Automated Evaluation System

From GM-RKB
Jump to: navigation, search

A PPLRE Automated Evaluation System is the Software System that addresses the PPLRE Automated Evaluation Task.



Overview

The PPLRE Automated Evaluation System is the system used within the PPLRE Project to evaluate the Performance of the PPLRE Relation Extraction Algorithms, particularly with respect to Correctness Metrics of Precision and F-Score. The task allows us to decide which predicted OPL relations are the most likely to be accurate and therefore the best candidates to ask the Domain Experts to review during the PPLRE Manual Evaluation Task.


Algorithm Performance Evaluation

Current Performance Ranking

Ensemble
Zprsr&Snwbl
070322
Ensemble
Any Two
070322
Zparser
0700404
(opt)
Zparser
070404
Snowball
070309
Snowball
070303
(opt)
NNeighbor
07040510
(opt)
NNeighbor
07040423
Cooccur
070403
TP 9 16 17 17 12 6 13 38 40
FP 0 1 10 14 6 12 8 152 243
FN 56 49 48 48 53 59 52 85 25
TN 138 134 144 145 140 27 80
Precision 100.0% 94.1% 63.0% 54.8% 66.6% 33.3% 61.9% 20.0% 14.1%
Recall 13.8% 24.6% 26.2% 26.2% 18.2% 9.2% 20.0% 58.5% 61.5%
Fscore 24.3% 39.0% 37.0% 35.4% 28.6% 14.4% 30.2% 29.8% 23.0%

PPLRE Evaluation - ZParser

PPLRE Evaluation - Nearest Neighbor

PPLRE Evaluation - Snowball

PPLRE Evaluation - Cooccurrence

PPLRE Evaluation - Ensemble


Requirements

One of the more ideal measures of Performance during this phase is the amount of time that a Domain Expert might spend validating a fixed sed of OPL relations. For example, the time required to validate 100 predicted relations divided by the number of True Positive predictions achieved. For now though we will assume that every good predictiong and bad prediction take as long to evaluate. Given this assumption we can focus on the precision of the algorithms.
We will also evaluate the F-Score in order to keep track of each algorithm's ability to extract a majority of the relations present in the test set.


Requirements: Input Data

Currently the focus of evaluation is on PPLRE Curated Data v1.3


Requirements: Output Data

  • True Positive: This predicted relation in fact exists in the test corpus (in this document (in this passage))
  • False Positive: This predicted relation does NOT exist in the test corpus (in this document (in this passage))
  • False Negative: This relation in the test corpus (in this document (in this passage)) was NOT predicted.
  • True Negative: This relation was neither predicted nor is it in the test corpus.

Algorithm Output File Format

The file format is described by way of example. Below is a sample of the output data for the OP() relation. For the PL() relation the ORGANISM and PROTEIN columns would be replaced by the PROTEIN and LOCATION columns.

P.ORGANISISMP.PROTEINP.CONFIDENCEA.TUPLE_IDA.PSIDA.SENTENCE_IDA.ORGANISMA.PROTEINOUTCOMEOUTCOME.Partial
Pseudomonas aeruginosaOprD2-78.65165722&nbsp37610Pseudomonas aeruginosaOprD2TPTP
Escherichia colicytochrome c-84.08272563&nbsp113416Escherichia coli"cytochrome, cytochrome c "TPTP
Halobacterium saccharovorumatpase-98.48214753&nbsp3110Halobacterium saccharovorumATPaseTPTP
Escherichia coliTsh-100.032466&nbsp31310Escherichia coliTshTPTP
Neisseria gonorrhoeaeFe-regulated protein-100.7413244&nbsp103310Neisseria gonorrhoeae"Fe_Regulated protein, FrpB "TPTP
Legionella pneumophilaFlaA-105.8240252&nbsp15510Legionella pneumophilaFlaATPTP
L. pneumophilaflaA-130.5143263&nbsp15511L. pneumophilaflaATPTP
Pseudomonas aeruginosaphospholipase C-138.1919905&nbsp65810Pseudomonas aeruginosa"PLC, phospholipase C, lipase "TPTP
P. tunicataAlpP-148.7812059&nbsp33811P. tunicataAlpPTPTP
&nbsp&nbsp&nbsp&nbsp73615L. pneumophilaPlaCFNTP
&nbsp&nbsp&nbsp&nbsp13413Escherichia colicrcAFNTP
&nbsp&nbsp&nbsp&nbsp61712S. marcescensHasDFNTP
&nbsp&nbsp&nbsp&nbsp60616T. pallidum47-kDa lipoproteinFNFN
&nbsp&nbsp&nbsp&nbsp4912Escherichia coli"NADH oxidase, ExbD, ExbB "FNTP
Neisseria gonorrhoeaeOprD2-100.7413244&nbsp1231&nbsp&nbspFPFP
Pseudomonas aeruginosaflaB-114.2801257&nbsp3211&nbsp&nbspFPFP


Design


Binary Evaluation

cd /home/zshi1/pplre/bin/re_parsing/semi/cotrain1/ ./binary_eva.pl <test_output> <gold_answer> <threshold> <eva_method#>

  • test_output: binary tuples predicted by the system. The file name should contain ‘PL’ or ‘PO’. It’s tab-delimited format with columns defined as follows:
    • 0: tuple_id
    • 1: protein name
    • 2: PSID of the protein name
    • 3: sentence id of the protein name
    • 4: location/organism name
    • 5: PSID of loc/org name
    • 6: sentence id of loc/org name
    • 7: confidence score

  • gold answer: ./data/curated_data/v1.3/OPL.test.tab
  • threshold: if confidence score > threshold -> positive, otherwise negative
  • eva_method# (method 1 is what we have agreed on)
    • 0: Partial name, partial relationship matching
    • 1: Partial name, full relationship matching
    • 2: Full name, partial relationship matching
    • 3: Full name, Full relationship matching


  • Output: stdout in tab-delimited format, same as results I reported in the meeting

Ternary Evaluation

% cd /home/zshi1/pplre/bin/re_parsing/semi/cotrain1/ % . ternary_eva.pl <PL_output> <PO_output> <gold_answer> <threshold> <eva_method#>

  • PL_output & PO_output: binary predictions of PL and PO. Same format as test_output above.
  • The rest parameters are same as for the binary prediction.
  • Output:

Wishlist Requirements

  • tbd