LEDGAR Dataset

From GM-RKB
Jump to navigation Jump to search

A LEDGAR Dataset is a labeled dataset for contract provision classification, derived from the EDGAR Database.

  • Context:
    • It can be created by extracting contract provisions from documents filed with the U.S. Securities and Exchange Commission (SEC) and available on the EDGAR database.
    • It can categorize contract provisions into various legal themes or topics.
    • It can be used primarily for Natural Language Processing tasks, especially in the domain of legal technology and contract analysis.
    • It can aid in the automated understanding and classification of legal documents.
    • It can serve as a valuable resource for training and evaluating machine learning models in legal tech applications.
    • It can simplify the process of contract analysis, which traditionally requires substantial manual effort and legal expertise.
    • It can be integrated into legal tech software for purposes like contract review, risk assessment, and compliance checks.
    • ...
  • Example(s):
  • Counter-Example(s):
    • A general-purpose language dataset not specific to legal documents.
    • Raw financial datasets from EDGAR without specific labeling for NLP tasks.
    • Datasets focused on other forms of legal documents like court opinions or legislations, rather than contracts.
  • See: Contract Analysis, Natural Language Processing, Legal Technology, Machine Learning in Law, EDGAR Database, LexGLUE.


References

2024

  • Claude-3 with context of https://huggingface.co/datasets/coastalcph/lex_glue
    • NOTES:
      • LexGLUE includes two datasets related to contracts: LEDGAR and UNFAIR-ToS.
      • LEDGAR contains 80,000 contract provisions labeled with their types (e.g., "Terminations", "Indemnifications", "Governing Laws")
      • The LEDGAR dataset poses a single-label multi-class classification task to identify the type of each contract provision, with 100 different classes.
      • Here is the full list:

        "Adjustments", "Agreements", "Amendments", "Anti-Corruption Laws", "Applicable Laws", "Approvals", "Arbitration", "Assignments", "Assigns", "Authority", "Authorizations", "Base Salary", "Benefits", "Binding Effects", "Books", "Brokers", "Capitalization", "Change In Control", "Closings", "Compliance With Laws", "Confidentiality", "Consent To Jurisdiction", "Consents", "Construction", "Cooperation", "Costs", "Counterparts", "Death", "Defined Terms", "Definitions", "Disability", "Disclosures", "Duties", "Effective Dates", "Effectiveness", "Employment", "Enforceability", "Enforcements", "Entire Agreements", "Erisa", "Existence", "Expenses", "Fees", "Financial Statements", "Forfeitures", "Further Assurances", "General", "Governing Laws", "Headings", "Indemnifications", "Indemnity", "Insurances", "Integration", "Intellectual Property", "Interests", "Interpretations", "Jurisdictions", "Liens", "Litigations", "Miscellaneous", "Modifications", "No Conflicts", "No Defaults", "No Waivers", "Non-Disparagement", "Notices", "Organizations", "Participations", "Payments", "Positions", "Powers", "Publicity", "Qualifications", "Records", "Releases", "Remedies", "Representations", "Sales", "Sanctions", "Severability", "Solvency", "Specific Performance", "Submission To Jurisdiction", "Subsidiaries", "Successors", "Survival", "Tax Withholdings", "Taxes", "Terminations", "Terms", "Titles", "Transactions With Affiliates", "Use Of Proceeds", "Vacations", "Venues", "Vesting", "Waiver Of Jury Trials", "Waivers", "Warranties", "Withholdings"

2023

2020

  • (Tuggener et al., 2020) ⇒ Don Tuggener, Pius Von Däniken, Thomas Peetz, and Mark Cieliebak. (2020). "LEDGAR: A Large-scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts.” In: Proceedings of the Twelfth Language Resources and Evaluation Conference. [1]
    • NOTES:
      • Here are three examples of the labeled items within the LEDGAR dataset as described in the research paper:
        • "Assignment": This label is applied to provisions that discuss the conditions under which rights and obligations under a contract can be transferred from one party to another. This label would typically cover clauses that include terms about how and when an assignment is permissible and any requirements for consent from other parties involved.
        • "Amendment": Used for provisions that outline how the terms of the contract can be amended or altered. This label captures clauses specifying the procedures that the parties must follow to make changes to the contract, including any necessary approvals or notifications.
        • "Termination": Applies to provisions detailing the circumstances under which a contract can be terminated, the process for termination, and the consequences of termination. This label includes details on both parties' rights and obligations upon termination, such as notice periods, severance, and the handling of confidential information.