ContractNLI Dataset

From GM-RKB
(Redirected from ContractNLI)
Jump to navigation Jump to search

A ContractNLI Dataset is a document-level natural language inference dataset that provides 607 annotated non-disclosure agreements with 17 fixed hypotheses for evaluating contract review systems through three-class inference classification and evidence span identification.



References

2023

2021a

2021b

  • (GitHub, 2021) ⇒ https://stanfordnlp.github.io/contract-nli/
    • QUOTE: ContractNLI is a dataset for document-level natural language inference (NLI) on contracts whose goal is to automate/support a time-consuming procedure of contract review. In this task, a system is given a set of hypotheses (such as “Some obligations of Agreement may survive termination.”) and a contract, and it is asked to classify whether each hypothesis is entailed by, contradicting to or not mentioned by (neutral to) the contract as well as identifying evidence for the decision as spans in the contract.

      An overview of document-level NLI for contracts

       ContractNLI is the first dataset to utilize NLI for contracts and is also the largest corpus of annotated contracts (as of September 2021). ContractNLI is an interesting challenge to work on from a machine learning perspective (the label distribution is imbalanced and it is naturally multi-task, all the while training data being scarce) and from a linguistic perspective (linguistic characteristics of contracts, particularly negations by exceptions, make the problem difficult).

      Details of ContractNLI can be found in our paper that was published in “Findings of EMNLP 2021”. If you have a question regarding our dataset, you can contact us by emailing koreeda@stanford.edu or by creating an issue in this repository.

    • Dataset specification

      More formally, the task consists of:

    • We have 17 hypotheses annotated on 607 non-disclosure agreements (NDAs). The hypotheses are fixed throughout all the contracts including the test dataset.