2001 CLaRK-anXMLbasedSysforCorpDev

Jump to: navigation, search

Subject Headings: CLaRK System.


Cited By



In this paper we describe the architecture and the intended applications of the CLaRK system. The development of the CLaRK system started under the Tübingen-Sofia International Graduate Programme in Computational Linguistics and Represented Knowledge (CLaRK). The main aim behind the design of the system is the minimization of the human work during creation of corpora. Creation of corpora is still important task for majority of languages like Bulgarian where the invested effort in such development is very modest in comparison with more intensively studied languages like English, German and French. We consider the corpora creation task as editing, manipulation, searching and transforming documents. Some of these tasks will be done for single document or a set of documents, others will be done on a part of a document. Besides efficiency of the corresponding processing in each state of the work, the most important investment is the human work. Thus, in our view, the design of the system has to be directed to minimization of the human work.


  • Abney St 1996 Partial Parsing via Finite-State Cascades. In: Proceedings of the ESSLLI'96 Robust Parsing Workshop. Prague, Czech Republic.
  • Corpus Encoding Standard 2001 XCES: Corpus Encoding Standard for XML. Vassar College, New York, USA. http://www.cs.vassar.edu/XCES/
  • DOM 1998 Document Object Model (DOM) Level 1. Specification Version 1.0. W3C Recommendation. http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001
  • Simov K, Popova G, Osenova P 2001 HPSG-based syntactic treebank of Bulgarian (BulTreeBank). In: Proceedings of Corpus linguistics 2001, Lancaster, UK.
  • Text Encoding Initiative 1997 Guidelines for Electronic Text Encoding and Interchange. SperbergMcQueen C.M., Burnard L (eds).
  • XML 2000 Extensible Markup Language (XML) 1.0 (Second Edition). W3C Recommendation. http://www.w3.org/TR/REC-xml
  • XPath 1999 XML Path Lamguage (XPath) version 1.0. W3C Recommendation. http://www.w3.org/TR/xpath
  • XSLT 1999 XSL Transformations (XSLT) version 1.0. W3C Recommendation. http://www.w3.org/TR/xslt,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2001 CLaRK-anXMLbasedSysforCorpDevKiril Simov
Zdravko Peev
Milen Kouylekov
Alexander Simov
Marin Dimitrov
Atanas Kiryakov
CLaRK - an XML-based System for Corpora DevelopmentProceedings of the Corpus Linguistic 2001 Conferencehttp://www.ontotext.com/sites/default/files/publications/clark01.pdf2001