Open main menu


Textual Data Analytics Task




  • (Wikipedia, 2015) ⇒ Retrieved:2015-4-1.
    • Subtasks — components of a larger text-analytics effort — typically include:
      • Information retrieval or identification of a corpus is a preparatory step: collecting or identifying a set of textual materials, on the Web or held in a file system, database, or content management system, for analysis.
      • Although some text analytics systems apply exclusively advanced statistical methods, many others apply more extensive natural language processing, such as part of speech tagging, syntactic parsing, and other types of linguistic analysis.
  • Named entity recognition is the use of gazetteers or statistical techniques to identify named text features: people, organizations, place names, stock ticker symbols, certain abbreviations, and so on. Disambiguation — the use of contextual clues — may be required to decide where, for instance, "Ford" can refer to a former U.S. president, a vehicle manufacturer, a movie star, a river crossing, or some other entity.
      • Recognition of Pattern Identified Entities: Features such as telephone numbers, e-mail addresses, quantities (with units) can be discerned via regular expression or other pattern matches.
      • Coreference: identification of noun phrases and other terms that refer to the same object.
      • Relationship, fact, and event Extraction: identification of associations among entities and other information in text
      • Sentiment analysis involves discerning subjective (as opposed to factual) material and extracting various forms of attitudinal information: sentiment, opinion, mood, and emotion. Text analytics techniques are helpful in analyzing sentiment at the entity, concept, or topic level and in distinguishing opinion holder and opinion object. * Quantitative text analysis is a set of techniques stemming from the social sciences where either a human judge or a computer extracts semantic or grammatical relationships between words in order to find out the meaning or stylistic patterns of, usually, a casual personal text for the purpose of psychological profiling etc.






  • (Bilisoly, 2008) ⇒ Roger Bilisoly. (2008). “Practical Text Mining with Perl.” Wiley Series on Methods and Applications in Data Mining




  • (Chen et al., 2005) ⇒ Hsinchun Chen, Sherrilynne S. Fuller, and William Hersh. (2005). “Medical Informatics: knowledge management and data mining in biomedicine." Springer. ISBN:038724381X,
    • QUOTE:Text mining aims to extract useful knowledge from textual data or documents (Hearst, 1999; Chen, 2001). Although text mining is often considered a subfield of data mining, some text mining techniques have originated from other disciplines, such as information retrieval, information visualization, computational linguistics, and information science. Examples of text mining applications include document classification, document clustering, entity extraction, information extraction, and summarization.

      Most knowledge management, data mining, and text mining techniques involve learning patterns from existing data or information, and are therefore built upon the foundations of machine learning and artificial intelligence. In the following, we review several major paradigms in machine learning, important evaluation methodologies, and their applicability in biomedicine.