2012 BRATAWebbasedToolforNLPAssisted

From GM-RKB
Jump to navigation Jump to search

Subject Headings: BRAT Annotation System

Notes

Cited By

Quotes

Abstract

We introduce the brat rapid annotation tool (BRAT), an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annotation for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques. We discuss several case studies of real-world annotation projects using pre-release versions of BRAT and present an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15% decrease in total annotation time. BRAT is available under an open-source license from: http://brat.nlplab.org

1 Introduction

Manually-curated gold standard annotations are a prerequisite for the evaluation and training of state-of-the-art tools for most Natural Language Processing (NLP) tasks. However, annotation is also one of the most time-consuming and financially costly components of many NLP research efforts, and can place heavy demands on human annotators for maintaining annotation quality and consistency. Yet, modern annotation tools are generally technically oriented and many offer little support to users beyond the minimum required functionality. We believe that intuitive and userfriendly interfaces as well as the judicious application of NLP technology to support, not supplant, human judgements can help maintain the quality of annotations, make annotation more accessible to non-technical users such as subject domain experts, and improve annotation productivity, thus reducing both the human and financial cost of annotation. The tool presented in this work, BRAT, represents our attempt to realise these possibilities.

2 Features

2.1 High-quality Annotation Visualisation

BRAT is based on our previously released opensource STAV text annotation visualiser (Stenetorp et al., 2011b), which was designed to help users gain an understanding of complex annotations involving a large number of different semantic types, dense, partially overlapping text annotations, and non-projective sets of connections between annotations. Both tools share a vector graphics-based visualisation component, which provide scalable detail and rendering. BRAT integrates PDF and EPS image format export functionality to support use in e.g. figures in publications (Figure 1).

Figure 1: Visualisation examples. Top: named entity recognition, middle: dependency syntax, bottom: verb frames.

2.2 Intuitive Annotation Interface

We extended the capabilities of STAV by implementing support for annotation editing. This was done by adding functionality for recognising standard user interface gestures familiar from text editors, presentation software, and many other tools. In BRAT, a span of text is marked for annotation simply by selecting it with the mouse by “dragging” or by double-clicking on a word. Similarly, annotations are linked by clicking with the mouse on one annotation and dragging a connection to the other (Figure 2).

Figure 2: Screenshot of the main BRAT user-interface, showing a connection being made between the annotations for “moving” and “Citibank”.

BRAT is browser-based and built entirely using standard web technologies. It thus offers a familiar environment to annotators, and it is possible to start using BRAT simply by pointing a standards-compliant modern browser to an installation. There is thus no need to install or distribute any additional annotation software or to use browser plug-ins. The use of web standards also makes it possible for BRAT to uniquely identify any annotation using Uniform Resource Identifiers (URIs), which enables linking to individual annotations for discussions in e-mail, documents and on web pages, facilitating easy communication regarding annotations.

2.3 Versatile Annotation Support

BRAT is fully configurable and can be set up to support most text annotation tasks. The most basic annotation primitive identifies a text span and assigns it a type (or tag or label), marking for e.g. POS-tagged tokens, chunks or entity mentions (Figure 1 top). These base annotations can be connected by binary relations – either directed or undirected – which can be configured for e.g. simple relation extraction, or verb frame annotation (Figure 1 middle and bottom). n-ary associations of annotations are also supported, allowing the annotation of event structures such as those targeted in the MUC (Sundheim, 1996), ACE (Doddington et al., 2004), and BioNLP (Kim et al., 2011) Information Extraction (IE) tasks (Figure 2). Additional aspects of annotations can be marked using attributes, binary or multi-valued flags that can be added to other annotations. Finally, annotators can attach free-form text notes to any annotation.

In addition to information extraction tasks, these annotation primitives allow BRAT to be configured for use in various other tasks, such as chunking (Abney, 1991), Semantic Role Labeling (Gildea and Jurafsky, 2002; Carreras and M`arquez, 2005), and dependency annotation (Nivre, 2003) (See Figure 1 for examples). Further, both the BRAT client and server implement full support for the Unicode standard, which allow the tool to support the annotation of text using e.g. Chinese or Devan¯agar¯i characters. BRAT is distributed with examples from over 20 corpora for a variety of tasks, involving texts in seven different languages and including examples from corpora such as those introduced for the CoNLL shared tasks on language-independent named entity recognition (Tjong Kim Sang and De Meulder, 2003) and multilingual dependency parsing (Buchholz and Marsi, 2006).

BRAT also implements a fully configurable system for checking detailed constraints on annotation semantics, for example specifying that a TRANSFER event must take exactly one of each of GIVER, RECIPIENT and BENEFICIARY arguments, each of which must have one of the types PERSON, ORGANIZATION or GEO-POLITICAL ENTITY, as well as a MONEY argument of type MONEY, and may optionally take a PLACE argument of type LOCATION (LDC, 2005). Constraint checking is fully integrated into the annotation interface and feedback is immediate, with clear visual effects marking incomplete or erroneous annotations (Figure 3).

Figure 3: Incomplete TRANSFER event indicated to the annotator

2.4 NLP Technology Integration

BRAT supports two standard approaches for integrating the results of fully automatic annotation tools into an annotation workflow: bulk annotation imports can be performed by format conversion tools distributed with BRAT for many standard formats (such as in-line and column-formatted BIO), and tools that provide standard web service interfaces can be configured to be invoked from the user interface.

5 Related Work and Conclusions

We have introduced BRAT, an intuitive and userfriendly web-based annotation tool that aims to enhance annotator productivity by closely integrating NLP technology into the annotation process. BRAT has been and is being used for several ongoing annotation efforts at a number of academic institutions and has so far been used for the creation of well-over 50,000 annotations. We presented an experiment demonstrating that integrated machine learning technology can reduce the time for type selection by over 30% and overall annotation time by 15% for a multi-type entity mention annotation task.

The design and implementation of BRAT was informed by experience from several annotation tasks and research efforts spanning more than a decade. A variety of previously introduced annotation tools and approaches also served to guide our design decisions, including the fast annotation mode of Knowtator (Ogren, 2006), the search capabilities of the XConc tool (Kim et al., 2008), and the design of web-based systems such as MyMiner (Salgado et al., 2010), and GATE Teamware (Cunningham et al., 2011). Using machine learning to accelerate annotation by supporting human judgements is well documented in the literature for tasks such as entity annotation (Tsuruoka et al., 2008) and translation (Mart´inez - G´omez et al., 2011), efforts which served as inspiration for our own approach.

BRAT, along with conversion tools and extensive documentation, is freely available under the open-source MIT license from its homepage at http://brat.nlplab.org

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2012 BRATAWebbasedToolforNLPAssistedSophia Ananiadou
Jun'ichi Tsujii
Tomoko Ohta
Sampo Pyysalo
Pontus Stenetorp
Goran Topić
BRAT: A Web-based Tool for NLP-assisted Text Annotation