Concept Space

Jump to navigation Jump to search

A Concept Space is a Finite Weighted Undirected Graph whose Graph Nodes represent Concepts and whose Weighted Graph Edges represent Cooccurrences in some Domain.



    • Concept Space: Graph of terms occurring within objects linked to each other by the frequency with which they occur together.
  • (Chen et al., 2009) ⇒ Bo Chen, Wai Lam, Ivor Tsang, and Tak-Lam Wong. (2009). “Extracting Discrimininative Concepts for Domain Adaptation in Text Mining.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557045
    • One common predictive modeling challenge occurs in text mining problems is … However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation … We propose a domain adaptation method to extract the low-rank concept space shared by the source domain and the target domain, which can ensure both the predictive power and adaptive power are maximized.


  • (Chen et al., 2003) ⇒ Hsinchun Chen, Daniel Zeng, Homa Atabakhsh, Wojciech Wyzga, and Jenny Schroeder. (2003). “COPLINK: managing law enforcement data and knowledge.” In: Communications of the ACM, 46(1). doi:10.1145/602421.602441
    • Much of crime analysis is concerned with creating associations or linkages among various aspects of a crime. COPLINK Detect uses a technique called Concept Space [3] to identify such associations from existing crime data automatically. In general, a concept space is a network of terms and weighted associations that represent the concepts and their associations within an underlying information space. COPLINK Detect uses statistical techniques such as co-occurrence analysis and clustering functions to weight relationships between all possible pairs of concepts. No hand-coded domain knowledge is necessary for COPLINK Detect to perform the Concept Space analysis.


  • (Chen et al., 1996) ⇒ Hsinchun Chen, Bruce Schatz, Tobun Ng, Joanne Martinez, Amy Kirchhoff, and Chienting Lin. (1996). “A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project.” In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8). doi:10.1109/34.531798
    • ABSTRACT: This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. We have experimented previously with such a technique for a smaller molecular biology domain (Worm Community System, with 10+ MBs of document collection) with encouraging results. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we recently conducted experiments using the concept space approach on parallel supercomputers. Our test collection included 2+ GBs of computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising. Power Challenge was later selected to create a comprehensive computer engineering concept space of about 270,000 terms and 4,000,000+ links using 24.5 hours of CPU time. Our system evaluation involving 12 knowledgeable subjects revealed that the automatically-created computer engineering concept space generated significantly higher concept recall than the human-generated INSPEC computer engineering thesaurus. However, the INSPEC was more precise than the automatic concept space. Our current work mainly involves creating concept spaces for other major engineering domains and developing robust graph matching and traversal algorithms for cross-domain, concept-based retrieval. Future work also will include generating individualized concept spaces for assisting user-specific concept-based information retrieval.