PPLRE Research Topics - Named Entity Relabeling

Jump to: navigation, search

Back to PPLRE Research Topics

  • Synopsis: One of the challenges in the PPLRE domain arises from the weak accuracy of Named Entity Recognition of Proteins. Many proteins are not labeled and many are mislabeled. An idea of how relieve this problem is by using the discovered Relation Recognition Patterns to discover mislabeled entities. A high-precision pattern could be used to correct a mistake by the NER procedure. The improved data could then be used to train a new model.



  • There are many examples of mislabelled proteins. Particularly prior to v2.5 when we had words such as 'for' and 'enzyme' in the dictionary. Maybe we could test whether the erroneous annotation of 'for' and 'enzyme' as protein named entities to see if we can repair the mistake.
  • One source of examples would be to look through the PSORTe dataset. Many of these documents are likely not 'caught' by our algorithms because the protein was not labelled properly.