LEDGAR Dataset: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 16: | Line 16: | ||
** Raw financial datasets from EDGAR without specific labeling for NLP tasks. | ** Raw financial datasets from EDGAR without specific labeling for NLP tasks. | ||
** Datasets focused on other forms of legal documents like court opinions or legislations, rather than contracts. | ** Datasets focused on other forms of legal documents like court opinions or legislations, rather than contracts. | ||
* <B>See:</B> [[Contract Analysis]], [[Natural Language Processing]], [[Legal Technology]], [[Machine Learning in Law]], [[EDGAR Database]]. | * <B>See:</B> [[Contract Analysis]], [[Natural Language Processing]], [[Legal Technology]], [[Machine Learning in Law]], [[EDGAR Database]], [[LexGLUE]]. | ||
---- | ---- |
Revision as of 17:19, 25 April 2024
A LEDGAR Dataset is a labeled dataset for contract provision classification, derived from the EDGAR Database.
- Context:
- It can be created by extracting contract provisions from documents filed with the U.S. Securities and Exchange Commission (SEC) and available on the EDGAR database.
- It can categorize contract provisions into various legal themes or topics.
- It can be used primarily for Natural Language Processing tasks, especially in the domain of legal technology and contract analysis.
- It can aid in the automated understanding and classification of legal documents.
- It can serve as a valuable resource for training and evaluating machine learning models in legal tech applications.
- It can simplify the process of contract analysis, which traditionally requires substantial manual effort and legal expertise.
- It can be integrated into legal tech software for purposes like contract review, risk assessment, and compliance checks.
- ...
- Example(s):
- LEDGAR, v202x.
- ...
- Counter-Example(s):
- A general-purpose language dataset not specific to legal documents.
- Raw financial datasets from EDGAR without specific labeling for NLP tasks.
- Datasets focused on other forms of legal documents like court opinions or legislations, rather than contracts.
- See: Contract Analysis, Natural Language Processing, Legal Technology, Machine Learning in Law, EDGAR Database, LexGLUE.
References
2023
- (Jayakumar et al., 2023) ⇒ Thanmay Jayakumar, Fauzan Farooqui, and Luqman Farooqui. (2023). "Large Language Models Are Legal But They Are Not: Making the Case for a Powerful LegalLLM.” In: arXiv preprint arXiv:2311.08890. DOI:10.48550/arXiv.2311.08890
2020
- (Tuggener et al., 2020) ⇒ Don Tuggener, Pius Von Däniken, Thomas Peetz, and Mark Cieliebak. (2020). "LEDGAR: A Large-scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts.” In: Proceedings of the Twelfth Language Resources and Evaluation Conference. [1]