TREC LA Times Dataset

Jump to: navigation, search

The TREC LA Times Dataset was a Corpus that ..



  • (Hu et al., 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557066
    • We perform clustering experiments on three datasets: TDT2, LA Times (from TREC), and 20-newsgroups (20NG). We selected ... 18,547 documents from top ten sections of LA Times, ... The ten sections selected from LA Times are Entertainment, Financial, Foreign, Late Final, Letters, Metro, National, Sports, Calendar, and View.


  1. AP newswire (Disks 1-3)
  2. Wall Street Journal (Disks 1-2)
  3. San Jose Mercury News (Disk 3)
  4. Financial Times (Disk 4)
  5. Los Angeles Times (Disk 5)
  6. Foreign Broadcast Information Service (FBIS) (Disk 5)