A well known dataset in MIR is the Million Song Dataset. It is useful to be able compare results between this dataset and AcousticBrainz, and so we provide a mapping of IDs between the two datasets.
We currently have a mapping for about 250,000 MSD IDs, resulting in 370,000 matches (because a MSD ID may map to more than one MusicBrainz ID).
The code to recreate the data files is available on github: https://github.com/MTG/acousticbrainz-labs/tree/master/msdtombid See the README.md file in that directory for more information on how to run it.
We provide the following files containing the data of this mapping
msd-mbid-2016-01-abz-mbids.csv.bz2(33M): A unique list of MusicBrainz IDs present in AcousticBrainz at the time of the matching
msd-mbid-2016-01-results.json.bz2(195M): A mapping of MSD IDs and metadata to Recordings in MusicBrainz
msd-mbid-2016-01-results-ab.json.bz2(60M): The same mapping file, containing MBIDs only present in AcousticBrainz (filtered with the first file)
msd-mbid-2016-01-results-ab.csv.bz2(13M): The AB mapping file in CSV format, with simplified metadata
The json mapping files have the following format: