Linguistic Resource

A Linguistic Resource is an Artifact that describes some aspect of a Natural Language and can be used by a Natural Language Processing System.




    • Evaluation of the correctness of a parser’s output is generally done by comparing the system output to correct human-constructed structures. These gold standard parses are obtained from a linguistic resource. Section 6.1 analyzes existing linguistic resources and their suitability for parser evaluation. Linguistic annotation (hereafter referred to as annotation) refers to the notations applied to language data that describes its information content. The annotation in a treebank, for example, includes at least POS tags and syntactic tags. An annotation scheme refers to the specification of a set of practices used for annotation in a particular linguistic resource. An encoding scheme defines the way in which the annotated data is represented. I will both introduce the annotation and encoding schemes used in existing linguistic resources and analyze their suitability for parser
    • The most commonly used linguistic resources for parser evaluation are treebanks, which are collections of syntactically annotated sentences. These syntactically annotated corpora consist of sentences which have been assigned parse trees with at least syntactic and morphosyntactic annotation....