Labeled Contract Clause Dataset
Jump to navigation
Jump to search
A Labeled Contract Clause Dataset is an annotated legal document training dataset that contains contract clauses marked with contract smell labels for model development.
- AKA: Annotated Contract Clause Collection, Contract Quality Training Dataset, Labeled Legal Clause Corpus.
- Context:
- It can typically include Labeled Contract Clause Annotations for multiple quality issue types.
- It can typically contain Labeled Contract Clause Metadata such as contract type and jurisdiction.
- It can typically follow Labeled Contract Clause Annotation Guidelines for consistency.
- It can typically support Labeled Contract Clause Stratification for balanced training.
- It can typically provide Labeled Contract Clause Splits for train/validation/test sets.
- ...
- It can often derive from Labeled Contract Clause Sources like CUAD or proprietary collections.
- It can often undergo Labeled Contract Clause Quality Control through inter-annotator agreement.
- It can often enable Labeled Contract Clause Benchmarking for model comparison.
- It can often incorporate Labeled Contract Clause Version Control for reproducibility.
- ...
- It can range from being a Small Labeled Contract Clause Dataset to being a Large-Scale Labeled Contract Clause Dataset, depending on its labeled contract clause dataset size.
- It can range from being a Single-Label Contract Clause Dataset to being a Multi-Label Contract Clause Dataset, depending on its labeled contract clause dataset annotation schema.
- It can range from being a Coarse-Grained Labeled Contract Clause Dataset to being a Fine-Grained Labeled Contract Clause Dataset, depending on its labeled contract clause dataset granularity.
- It can range from being a Domain-Specific Labeled Contract Clause Dataset to being a Cross-Domain Labeled Contract Clause Dataset, depending on its labeled contract clause dataset coverage.
- ...
- It can facilitate Labeled Contract Clause Model Training for detection systems.
- It can enable Labeled Contract Clause Evaluation through test splits.
- It can support Labeled Contract Clause Research in legal NLP.
- It can provide Labeled Contract Clause Baselines for performance comparison.
- ...
- Example(s):
- Public Labeled Contract Clause Datasets, such as:
- CUAD-Based Contract Smell Dataset derived from CUAD annotations.
- LexGLUE Contract Clause Dataset for legal language understanding.
- ContractNLI Labeled Dataset for contract inference tasks.
- MAUD Contract Clause Dataset for M&A due diligence.
- Annotation Method Contract Clause Datasets, such as:
- Expert-Labeled Contract Clause Dataset with lawyer annotations.
- Crowdsourced Contract Clause Dataset using multiple annotators.
- LLM-Generated Contract Clause Dataset via automated labeling.
- Hybrid-Annotated Contract Clause Dataset combining methods.
- Domain-Specific Contract Clause Datasets, such as:
- Employment Contract Clause Dataset for HR agreements.
- Real Estate Contract Clause Dataset for property transactions.
- Technology Contract Clause Dataset for software licenses.
- Financial Contract Clause Dataset for banking agreements.
- ...
- Public Labeled Contract Clause Datasets, such as:
- Counter-Example(s):
- Unlabeled Contract Corpus, which lacks labeled contract clause annotations.
- General Legal Dataset, which doesn't focus on contract clauses.
- Code Quality Dataset, which targets software rather than contracts.
- See: Training Dataset, Legal Document Dataset, Annotated Text Dataset, Contract Clause, Machine Learning Dataset, NLP Benchmark Dataset, Document Annotation Task.