LEDGAR Dataset: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
Line 29: Line 29:
*** LEDGAR contains 80,000 contract provisions labeled with their types (e.g., "Terminations", "Indemnifications", "Governing Laws")
*** LEDGAR contains 80,000 contract provisions labeled with their types (e.g., "Terminations", "Indemnifications", "Governing Laws")
*** The LEDGAR dataset poses a single-label multi-class classification task to identify the type of each contract provision, with 100 different classes.
*** The LEDGAR dataset poses a single-label multi-class classification task to identify the type of each contract provision, with 100 different classes.
The LEDGAR dataset has 100 classes, each representing a type of contract provision. Here is the full list:
*** Here is the full list: <P> "Adjustments", "Agreements", "Amendments", "Anti-Corruption Laws", "Applicable Laws", "Approvals", "Arbitration", "Assignments", "Assigns", "Authority", "Authorizations", "Base Salary", "Benefits", "Binding Effects", "Books", "Brokers", "Capitalization", "Change In Control", "Closings", "Compliance With Laws", "Confidentiality", "Consent To Jurisdiction", "Consents", "Construction", "Cooperation", "Costs", "Counterparts", "Death", "Defined Terms", "Definitions", "Disability", "Disclosures", "Duties", "Effective Dates", "Effectiveness", "Employment", "Enforceability", "Enforcements", "Entire Agreements", "Erisa", "Existence", "Expenses", "Fees", "Financial Statements", "Forfeitures", "Further Assurances", "General", "Governing Laws", "Headings", "Indemnifications", "Indemnity", "Insurances", "Integration", "Intellectual Property", "Interests", "Interpretations", "Jurisdictions", "Liens", "Litigations", "Miscellaneous", "Modifications", "No Conflicts", "No Defaults", "No Waivers", "Non-Disparagement", "Notices", "Organizations", "Participations", "Payments", "Positions", "Powers", "Publicity", "Qualifications", "Records", "Releases", "Remedies", "Representations", "Sales", "Sanctions", "Severability", "Solvency", "Specific Performance", "Submission To Jurisdiction", "Subsidiaries", "Successors", "Survival", "Tax Withholdings", "Taxes", "Terminations", "Terms", "Titles", "Transactions With Affiliates", "Use Of Proceeds", "Vacations", "Venues", "Vesting", "Waiver Of Jury Trials", "Waivers", "Warranties", "Withholdings"
*** "Adjustments", "Agreements", "Amendments", "Anti-Corruption Laws", "Applicable Laws", "Approvals", "Arbitration", "Assignments", "Assigns", "Authority", "Authorizations", "Base Salary", "Benefits", "Binding Effects", "Books", "Brokers", "Capitalization", "Change In Control", "Closings", "Compliance With Laws", "Confidentiality", "Consent To Jurisdiction", "Consents", "Construction", "Cooperation", "Costs", "Counterparts", "Death", "Defined Terms", "Definitions", "Disability", "Disclosures", "Duties", "Effective Dates", "Effectiveness", "Employment", "Enforceability", "Enforcements", "Entire Agreements", "Erisa", "Existence", "Expenses", "Fees", "Financial Statements", "Forfeitures", "Further Assurances", "General", "Governing Laws", "Headings", "Indemnifications", "Indemnity", "Insurances", "Integration", "Intellectual Property", "Interests", "Interpretations", "Jurisdictions", "Liens", "Litigations", "Miscellaneous", "Modifications", "No Conflicts", "No Defaults", "No Waivers", "Non-Disparagement", "Notices", "Organizations", "Participations", "Payments", "Positions", "Powers", "Publicity", "Qualifications", "Records", "Releases", "Remedies", "Representations", "Sales", "Sanctions", "Severability", "Solvency", "Specific Performance", "Submission To Jurisdiction", "Subsidiaries", "Successors", "Survival", "Tax Withholdings", "Taxes", "Terminations", "Terms", "Titles", "Transactions With Affiliates", "Use Of Proceeds", "Vacations", "Venues", "Vesting", "Waiver Of Jury Trials", "Waivers", "Warranties", "Withholdings"


=== 2023 ===
=== 2023 ===

Revision as of 17:29, 25 April 2024

A LEDGAR Dataset is a labeled dataset for contract provision classification, derived from the EDGAR Database.

  • Context:
    • It can be created by extracting contract provisions from documents filed with the U.S. Securities and Exchange Commission (SEC) and available on the EDGAR database.
    • It can categorize contract provisions into various legal themes or topics.
    • It can be used primarily for Natural Language Processing tasks, especially in the domain of legal technology and contract analysis.
    • It can aid in the automated understanding and classification of legal documents.
    • It can serve as a valuable resource for training and evaluating machine learning models in legal tech applications.
    • It can simplify the process of contract analysis, which traditionally requires substantial manual effort and legal expertise.
    • It can be integrated into legal tech software for purposes like contract review, risk assessment, and compliance checks.
    • ...
  • Example(s):
  • Counter-Example(s):
    • A general-purpose language dataset not specific to legal documents.
    • Raw financial datasets from EDGAR without specific labeling for NLP tasks.
    • Datasets focused on other forms of legal documents like court opinions or legislations, rather than contracts.
  • See: Contract Analysis, Natural Language Processing, Legal Technology, Machine Learning in Law, EDGAR Database, LexGLUE.


References

2024

  • Claude-3 with context of https://huggingface.co/datasets/coastalcph/lex_glue
    • NOTES:
      • LexGLUE includes two datasets related to contracts: LEDGAR and UNFAIR-ToS.
      • LEDGAR contains 80,000 contract provisions labeled with their types (e.g., "Terminations", "Indemnifications", "Governing Laws")
      • The LEDGAR dataset poses a single-label multi-class classification task to identify the type of each contract provision, with 100 different classes.
      • Here is the full list:

        "Adjustments", "Agreements", "Amendments", "Anti-Corruption Laws", "Applicable Laws", "Approvals", "Arbitration", "Assignments", "Assigns", "Authority", "Authorizations", "Base Salary", "Benefits", "Binding Effects", "Books", "Brokers", "Capitalization", "Change In Control", "Closings", "Compliance With Laws", "Confidentiality", "Consent To Jurisdiction", "Consents", "Construction", "Cooperation", "Costs", "Counterparts", "Death", "Defined Terms", "Definitions", "Disability", "Disclosures", "Duties", "Effective Dates", "Effectiveness", "Employment", "Enforceability", "Enforcements", "Entire Agreements", "Erisa", "Existence", "Expenses", "Fees", "Financial Statements", "Forfeitures", "Further Assurances", "General", "Governing Laws", "Headings", "Indemnifications", "Indemnity", "Insurances", "Integration", "Intellectual Property", "Interests", "Interpretations", "Jurisdictions", "Liens", "Litigations", "Miscellaneous", "Modifications", "No Conflicts", "No Defaults", "No Waivers", "Non-Disparagement", "Notices", "Organizations", "Participations", "Payments", "Positions", "Powers", "Publicity", "Qualifications", "Records", "Releases", "Remedies", "Representations", "Sales", "Sanctions", "Severability", "Solvency", "Specific Performance", "Submission To Jurisdiction", "Subsidiaries", "Successors", "Survival", "Tax Withholdings", "Taxes", "Terminations", "Terms", "Titles", "Transactions With Affiliates", "Use Of Proceeds", "Vacations", "Venues", "Vesting", "Waiver Of Jury Trials", "Waivers", "Warranties", "Withholdings"

2023

2020