Few-Shot Information Extraction (IE) Task

From GM-RKB
Jump to navigation Jump to search

A Few-Shot Information Extraction (IE) Task is an in-context information extraction task that is a few-shot NLP task.



References

2023

  • (Ma et al., 2023) ⇒ Yubo Ma, Yixin Cao, YongChing Hong, and Aixin Sun. (2023). “Large Language Model is Not a Good Few-shot Information Extractor, But a Good Reranker for Hard Samples !. ” arXiv preprint arXiv:2303.08559
    • ABSTRACT: Large Language Models (LLMs) have made remarkable strides in various tasks. However, whether they are competitive few-shot solvers for information extraction (IE) tasks and surpass fine-tuned small Pre-trained Language Models (SLMs) remains an open problem. This paper aims to provide a thorough answer to this problem, and moreover, to explore an approach towards effective and economical IE systems that combine the strengths of LLMs and SLMs. Through extensive experiments on eight datasets across three IE tasks, we show that LLMs are not effective few-shot information extractors in general, given their unsatisfactory performance in most settings and the high latency and budget requirements. However, we demonstrate that LLMs can well complement SLMs and effectively solve hard samples that SLMs struggle with. Building on these findings, we propose an adaptive filter-then-rerank paradigm, in which SLMs act as filters and LLMs act as rerankers. By utilizing LLMs to rerank a small portion of difficult samples identified by SLMs, our preliminary system consistently achieves promising improvements (2.1% F1-gain on average) on various IE tasks, with acceptable cost of time and money.

2022

  • (Agrawal et al., 2022) ⇒ Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. (2022). “Large Language Models are Few-Shot Clinical Information Extractors.” In: Proceedings of the EMNLP-2022.
    • ABSTRACT: long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT (Ouyang et al., 2022), perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and generation performance have already been studied extensively in such models, here we additionally demonstrate how to leverage them to tackle a diverse set of NLP tasks which require more structured outputs, including span identification, token-level sequence classification, and relation extraction. Further, due to the dearth of available data to evaluate these systems, we introduce new datasets for benchmarking few-shot clinical information extraction based on a manual re-annotation of the CASI dataset (Moon et al., 2014) for new tasks. On the clinical extraction tasks we studied, the GPT-3 systems significantly outperform existing zero- and few-shot baselines.
    • In prompt-based learning (also known as in-context learning), a pretrained language model is adapted to different tasks via priming on natural language prompts — pieces of text that are combined with an input and then fed to the language model to produce an output for that task. This paradigm has been successful for few-shot and zero-shot learning at many general-domain tasks (Brown et al., 2020; Liu et al., 2021; Wei et al., 2021; Sanh et al., 2021).
    • More recently, large language models such as T0 and InstructGPT have re-configured their training objectives to explicitly encourage the model to perform well at such prompts (Sanh et al., 2021; Ouyang et al., 2022). While prompt-based learning can be extended straightforwardly to classification tasks (e.g., multiple choice), more complex tasks require creativity in their implementation (Mishra et al., 2021). For example, coreference resolution is often re-framed as classification, asking which of two antecedents a pronoun refers to (Sanh et al., 2021) or whether a candidate antecedent is correct (Yang et al., 2022). This approach requires a list of antecedent candidates, which requires an additional component (e.g. a noun phrase generator) or many—potentially expensive—queries. Span classification and named entity recognition have been similarly reframed. For example, given a candidate entity X and full model access, the entity type can be predicted via an argmax over the possible types Y of the probability of statements like “X is a Y entity” (Cui et al., 2021). Alternatively, if only a single entity is being queried for a given input, prompting can be as simple as “What is the location”(Liu et al., 2022a); however, clinical NLP often concerns itself with extraction of multiple concepts. To extract multiple spans simultaneously, Li et al. (2019b) and Li et al. (2019a) use techniques from machine reading comprehension, relying on access to the underlying model and labeled data for training the extraction layer. While InstructGPT (Ouyang et al., 2022) has ∼ 2% or ≤ 1𝑘 extraction examples in its training, the LLM output is never converted to a structured form, and extraction examples are only evaluated qualitatively for improvement over other models. That is, only results for classification and generation tasks are quantified.