In the The 17th International Society for Music Information Retrieval Conference (ISMIR 2016) conference we proposed a method of evaluating the accuracy of a classification model by comparing it to a separately created dataset. Read the paper.

The code which we developed to perform this analysis will be available at

We provide here additional results which we were unable to include in the paper due to size constraints.

This document is currently incomplete, but will be added to over the week of the ISMIR conference.