2010 SemanticEnrichingOfNatLangTexts

Jump to: navigation, search

Subject Headings: Ontology-based Semantic Annotation.



This paper proposes an approach which utilizes natural language processing (NLP) and ontology knowledge to automatically denote the implicit semantics of textual requirements. Requirements documents include the syntax of natural language but not the semantics. Semantics are usually interpreted by the human user. In earlier work Gelhausen and Tichy showed that SalE MX automatically creates UML domain models from (semantically) annotated textual specifications [1]. This manual annotation process is very time consuming and can only be carried out by annotation experts. We automate semantic annotation so that SalE MX can be completely automated. With our approach, the analyst receives the domain model of a requirements specification in a very fast and easy manner. Using these concepts is the first step into farther automation of requirements engineering and software development.

5 Summary

  • Using sentence grammar structures to determine the correct semantics of a sentence seems feasible with our approach. We use popular NLP tools for the preprocessing of natural language texts. Even though AutoAnnotator is still work in progress, we have run a small qualitative case study using the technical specification of the WHOIS Protocol (IETF RFC 3912). The results suggest that the proposed approach is indeed capable of deriving the semantic tags of Sale mx. Still there are some di culties, which have to be addressed in future development.
  • First of all, subphrases are not yet handled correctly leading to confusing results. Errors of the pipelined NLP tools are not yet addressed adequately. Assigning a confidence value to each tool could improve results when information conflicts. On top of these future improvements, we plan to extend AutoAnnotator with an interactive dialog tool. This allows the analyst to steer the analysis process. We expect this interactive component to be used to resolve obvious mistakes the algorithms make as part of a feedback loop in the annotation process. Together with an instant UML diagram building process, the analyst could identify and correct the derived semantics on the fly.
  • Eventually, our process improves the annotation process with a speedup which we are currently evaluating. Only if the analyst is faster and receives the same quality models than in the manual process, automatic model creation can help support and improve the software development process.


  • 1. Gelhausen, T., Tichy, W.F.: Thematic role based generation of UML models from real world requirements. In: First IEEE International Conference on Semantic Computing (ICSC 2007). Volume 0., Irvine, CA, USA, IEEE Computer Society (September 2007) 282{289
  • 2. Miller, J., Mukerji, J.: MDA Guide Version 1.0.1 (June 2003)
  • 3. Mich, L., Franch, M., Inverardi, P.N.: Market research for requirements analysis using linguistic tools. Requirements Engineering 9(1) (February 2004) 40{56
  • 4. Körner, S.J., Derre, B., Gelhausen, T., Landhäußer, M.: RECAA { the Requirements Engineering Complete Automation Approach [Online].
  • 5. Cheng, B.H.C., Atlee, J.M.: Research directions in requirements engineering. In: ProceedingsFuture of Software Engineering FOSE '07. (May 2007) 285{303
  • 6. Dawson, L., Swatman, P.A.: The use of object-oriented models in requirements engineering: a field study. In: ICIS. (1999) 260{273
  • 7. Ryan, K.: The role of natural language in requirements engineering. In: Proceedings of IEEE International Symposium on Requirements Engineering, IEEE (Jan 1993) 240{242
  • 8. Moreno, A.M., van de Riet, R.: Justification of the equivalence between linguistic and conceptual patterns for the object model (1997)
  • 9. Juzgado, N.J., Moreno, A.M., Lopez, M.: How to use linguistic instruments for object-oriented analysis. IEEE Software 17(3) (2000)
  • 10. Harmain, H.M., Gaizauskas, R.J.: CM-Builder: An automated NL-based CASE tool. In: ASE. (2000) 45{54
  • 11. D. Gildea, and D. Jurafsky. (2002). “Automatic labeling of semantic roles.” In: Computational Linguistics 28(3) (September 2002) 245{288
  • 12. Montes, A., Pacheco, H., Estrada, H., Pastor, O.: Conceptual model generation from requirements model: A natural language processing approach. In Kapetanios, E., Sugumaran, V., Spiliopoulou, M., eds.: NLDB. Volume 5039 of Lecture Notes in Computer Science., Springer (2008) 325{326
  • 13. Hasegawa, R., Kitamura, M., Kaiya, H., Saeki, M.: Extracting conceptual graphs from Japanese documents for software requirements modeling. In Kirchberg, M., Link, S., eds.: APCCM. Volume 96 of CRPIT., Australian Computer Society (2009) 87{96
  • 14. Kof, L.: Natural language procesing for requirements engineering: Applicability to large requirements documents. In Russo, A., Garcez, A., Menzies, T., eds.: Automated Software Engineering, Proceedings of the Workshops, Linz, Austria (September 2004) In conjunction with the 19th IEEE Internationl Conference on Automated Software Engineering.
  • 15. Kof, L.: Natural language processing: Mature enough for requirements documents analysis? In Montoyo, A., Mu~noz, R., M etais, E., eds.: NLDB. Volume 3513 of Lecture Notes in Computer Science., Springer (June 2005) 91{102
  • 16. Fillmore, C.J.: Toward a modern theory of case. In Reibel, D.A., Schane, S.A., eds.: Modern Studies in English. Prentice Hall (1969) 361{375
  • 17. Krifka, M.: Thematische Rollen (June 2005)
  • 18. Rauh, G.: Tiefenkasus, thematische Relationen und Thetarollen. Gunter Narr Verlag, T?ubingen, Germany (1988)
  • 19. Manning, C., Jurafsky, D.: The stanford natural language processing group [Online].
  • 20. Santorini, B.: Part-of-speech tagging guidelines for the Penn Treebank Project (3rd revision). Technical Report MS-CIS-90-47, University of Pennsylvania Department of Computer and Information Science (1990)
  • 21. de Marne e, M.C., Christopher D. Manning: The Stanford typed dependencies representation. In: COLING Workshop on Cross-framework and Cross-domain Parser Evaluation. (2008) 1{8
  • 22. George A. MillerA.: WordNet: A lexical database for English. Communications of the ACM 38(1) (1995) 39{41
  • 23. Cycorp Inc.: ResearchCyc. http://research.cyc.com/ [checked 2010-02-15].
  • 24. Körner, S.J., Gelhausen, T.: Improving automatic model creation using ontologies. In Institute, K.S., ed.: Proceedings of the Twentieth International Conference on Software Engineering & Knowledge Engineering. (July 2008) 691{696,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 SemanticEnrichingOfNatLangTextsSven J. Körner
Mathias Landhäußer
Semantic Enriching of Natural Language Texts with Automatic Thematic Role AnnotationProceedings of the 15th International Conferefence on Applications of Natural Language to Information Systemhttp://www.ipd.uka.de/Tichy/uploads/publikationen/237/nldb2010 cameraReady.pdf10.1007/978-3-642-13881-2_92010