Google AudioSet Dataset

Jump to navigation Jump to search

A Google AudioSet Dataset is an audio dataset created by Google Research.




    • QUOTE: The dataset is divided in three disjoint sets: a balanced evaluation set, a balanced training set, and an unbalanced training set. In the balanced evaluation and training sets, we strived for each class to have the same number of examples. The unbalanced training set contains the remainder of annotated segments.
    • Evaluation - eval_segments.csv
      20,383 segments from distinct videos, providing at least 59 examples for each of the 527 sound classes that are used. Because of label co-occurrence, many classes have more examples.
    • Balanced train - balanced_train_segments.csv
      22,176 segments from distinct videos chosen with the same criteria: providing at least 59 examples per class with the fewest number of total segments.
    • Unbalanced train - unbalanced_train_segments.csv
      2,042,985 segments from distinct videos, representing the remainder of the dataset.