ACORD Clause Retrieval Dataset
(Redirected from Clause Retrieval Corpus)
Jump to navigation
Jump to search
A ACORD Clause Retrieval Dataset is a graded-relevance expert-annotated legal clause dataset that contains relevance annotations for contract clause retrieval tasks to support precedent-based contract drafting.
- AKA: Atticus Clause Retrieval Dataset, ACORD Dataset, Clause Retrieval Corpus, Precedent Clause Dataset.
- Context:
- It can typically contain 114 Querys paired with 126,000+ Query-Clause Pairs from legal contracts.
- It can typically include 400+ EDGAR Contracts as source documents for clause extraction.
- It can typically provide 1-5 Star Ratings indicating clause quality and relevance levels.
- It can typically support Graded Relevance Evaluation using NDCG metrics.
- It can typically enable Retrieval-Augmented Generation for contract drafting applications.
- ...
- It can often facilitate Few-Shot Clause Learning through high-quality examples.
- It can often benchmark Dense Retrieval Systems against sparse baselines.
- It can often evaluate Cross-Domain Generalization across contract types.
- It can often measure Precision at K for top-ranked clauses.
- ...
- It can range from being a Small ACORD Clause Retrieval Dataset to being a Large ACORD Clause Retrieval Dataset, depending on its dataset size.
- It can range from being a Single-Domain ACORD Clause Retrieval Dataset to being a Multi-Domain ACORD Clause Retrieval Dataset, depending on its contract domain coverage.
- It can range from being a Binary ACORD Clause Retrieval Dataset to being a Graded ACORD Clause Retrieval Dataset, depending on its relevance annotation granularity.
- It can range from being a Static ACORD Clause Retrieval Dataset to being a Evolving ACORD Clause Retrieval Dataset, depending on its update frequency.
- ...
- It can integrate with Clause Similarity Retrieval Mechanisms for system training.
- It can support Retrieval-Augmented Natural Language Generation (RAG) Techniques for clause generation.
- It can benchmark Contract Clause Discovery Tasks through retrieval evaluation.
- It can connect to Contract Management Platforms for clause library building.
- It can interface with AI-based Contract Review Systems for precedent access.
- ...
- Example(s):
- ACORD v1 (2024), with 114 querys and 126,000 pairs from EDGAR filings.
- Query Types in ACORD, such as:
- Obligation Querys seeking duty clauses.
- Right Querys finding entitlement provisions.
- Restriction Querys locating limitation clauses.
- Annotation Levels in ACORD, such as:
- 5-Star Clauses representing perfect precedents.
- 3-Star Clauses showing partial relevance.
- 1-Star Clauses indicating minimal applicability.
- ...
- Counter-Example(s):
- Binary Classification Datasets like CUAD, which lack graded relevance.
- Document-Level Datasets without clause-level annotations.
- Synthetic Legal Datasets lacking real contract examples.
- See: Contract Understanding Atticus Dataset (CUAD), LEDGAR Dataset, Clause Similarity Retrieval Mechanism, Contract Clause Discovery Task, Retrieval-Augmented Natural Language Generation (RAG) Technique, Legal Clause Identification Task, Annotated Legal Dataset, Contract Drafting System.