A Reflect System is a Named Entity Recognition System and an Entity Mention Normalization System for Biomedical Entity Mentions.
- Reflect Dictionary: The core component of Reflect is a consolidated dictionary that links names and synonyms to source data entries. It was created by merging the STRING (http://string.embl.de) and STITCH (http://stitch.embl.de) databases, and it currently contains over 1.5 million proteins from 373 organisms and 4.3 million small molecules. The entire 18 GB dictionary is kept in RAM to enable fast tagging.
- Benchmarks Reflect's named entity recognition has been evaluated against the BioCreative I corpus for Saccharomyces cerevisiae and Drosophila melanogaster and achieved an F-score of 91% and 66% respectively (Pafilis et al. submitted).
- Most of us interested in the life sciences regularly comes across names of genes, proteins, or small molecules that we would like to know more about. To make this process easier, our team at the European Molecular Biology Laboratory have developed a new, free service called Reflect (http://reflect.ws) that can be installed as a plug-in to Firefox or Internet Explorer.
- With just one mouse-click, Reflect can tag gene, protein, or small molecule names to any web page, usually within a few seconds, without affecting the page layout. Clicking on a tagged item opens a popup showing a concise summary of important features, such as sequence (for proteins) or 2D structure (for small molecules). The popup also allows navigation to commonly used source databases (e.g., Uniprot).
- A publication describing the Reflect infrastructure is in final stages of review. Reflect is very much a work in progress - We designed it to be an extendible platform, and plan to add further entity types, including eventually beyond the life sciences. We welcome collaboration proposals for adding further entity types. In addition, we welcome proposals from publishers and data providers interested in programmatic access to Reflect.