An SDOI Project is a research project by Gabor Melli whose goal is to advance our ability to interlink documents and ontologies by supervised means.
- See: KDD-2009 Annotated Abstracts Dataset, KDD-2009 Abstracts Analysis, RKB Research Project, Term Mention Recognition, Term Mention Linking, Ontology, Research Paper, Data Mining Discipline, Concept Mention Identification and Linking Task, Supervised Learning Task.
- (Melli, 2012) ⇒ Gabor Melli. (2012). “Identifying Untyped Relation Mentions in a Corpus Given An Ontology.” In: Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language Processing.
- QUOTE: In this paper we present the SDOIrmi text graph-based semi-supervised algorithm for the task for relation mention identification when the underlying concept mentions have already been identified and linked to an ontology. To overcome the lack of annotated data, we propose a labelling heuristic based on information extracted from the ontology.We evaluated the algorithm on the kdd09cma1 dataset using a leave-one-document-out framework and demonstrated an increase in F1 in performance over a co-occurrence based AllTrue baseline algorithm. An extrinsic evaluation of the predictions suggests a worthwhile precision on the more confidently predicted additions to the ontology.
- (Melli, 2012) ⇒ SDOI Project
- We explore the automated semantic annotation of concept mentions within a document to their corresponding page in a semantic wiki, if such a page exists, by supervised means. Unlike the related task of identifying concept mentions in a document that can be linked to a Wikipedia page, our task also requires the identification of concept mentions not yet found in the knowledge base. Our approach creates feature vectors for all candidate text spans based on both local and global information. We propose a novel set of expanded features including information available in the other documents in the training corpus. The challenge of identifying previously unseen mentions is handled with a trained CRF sequential model. The addition of iterative classification to the process is explored. Experiments against a corpus based on annotated KDD 2009 abstracts and a data analysis semantic wiki shows a lift in F-measure against baseline algorithms. We analyze feature space to understand which features carry the most predictive power on this task, and which ones are correlated and redundant.