PPLRE Evaluation - Cooccurrence
The evaluation is currently under way. Some preliminary results below:
Algorithm version v2.3 on test set v1.3.1
A round of optimizations were performed. One proviso is that the optimization was performed against the test set and not the train set. The reason for this is that v1.3.1 of the train set was not ready.
The optimization options included:
- Number of Organism concepts (in a sentence)
- Number of Protein concepts (in a sentence)
- Number of Location concepts (in a sentence)
- The presence of a UMLS [Spatial_Concept] concep (in the sentence).
- The presence of a UMLS [Laboratory_Procedure] concept (in the sentence).
- The type of protein name.
- Organism count per sentence: 1 (e.g. two cases of E. coli in one sentence count as two instances)
- (logical) Location count per sentence: 1 (e.g. two cases of extracellular in one sentence cound as one instance).
- Restrict to sentences with a [Spacial_Concept]: not beneficial.
- Restrict to sentences with a [Laboratory_Procedure]: not beneficial.
- The protein name: beneficial (at least three chars & one upper case character OR a composite name with a space between words).
The following settings were found to trade off precision for f-score.
- Number of protein concepts per sentence
Algorithm version v2.3t on test set v1.3.1
- Preliminary experiments into two sentence passages over one sentence passages suggested that performance generally drops.