2016 SemanticallyAnnotatedConceptsin

We introduce a linguistic resource composed of a semantically annotated corpus and a lexicalized ontology that are interlinked on mentions of concepts and entities. The corpus contains the paper abstracts from within the proceedings of ACM's SIGKDD conferences for the years 2009 through 2015. Each abstract was internally annotated to identify the concepts mentioned within the text. Then, where possible, each mention was linked to the appropriate concept node in the ontology focused on data science topics. Together they form one of the few semantic resources within a subfield of computing science. The joint dataset enables tasks such as temporal modeling of concepts over time, and the development of semantic annotation methods for documents with a large proportion of mid-level concept mentions. Finally, the resource also prepares for the transition into semantic navigation of computing science research publications. Both resources are publicly available at http://gabormelli.com/Projects/kdd/.



