Cedar Wingate
MUMT 621
Professor Ichiro Fujinaga
28 October 2009
jSymbolic Summary
The field of automatic music classification attempts to measure similarity between music through the use of computers and computer programs. In order to figure out similarity, these classification systems need to break down the audio track into different features by which similarity can be measured. In their paper, “jSymbolic: A Feature Extractor for MIDI Files” (2006), Cory McKay and Ichiro Fujinaga describe the broad way that these characteristics are broken down. There are three main types of features: low-level, derived from audio signal analysis; high-level, derived from symbolic representations like MIDI files; and cultural features, derived from Internet data mining (McKay and Fujinaga 2006, 1).
High-level features, in particular, have many aspects about them that make them desirable for automatic music classification. For one, they represent musically meaningful data that can be used by musicologists and music theorists. For example, instruments, contour, tempo, dynamics, harmony, and melody, just to name a few. Secondly, there is a great deal of symbolic data already available through MIDI files and Humdrum, and potential for a great deal more data to become available with advances in optical music recognition technology (McKay and Fujinaga 2006).
jSymbolic is an application developed to extract high-level features from MIDI files for use in automatic music classification. Unfortunately, these features cannot be reliably extracted from audio signals. So MIDI or some other symbolic data are necessary. The jSymbolic package is open source and designed to be easily expanded (McKay and Fujinaga 2006).
In designing the application, there were a few goals and issues that were taken into consideration. The main goal was to create a system for the classification of a wide variety of musics that would not need to be tweeked for some new style. Designing the features for extraction was key to accomplishing this. The developers wanted to avoid the “curse of dimensionality” (McKay and Fujinaga 2006) and did not want to relegate their features to the analysis system of one particular music theoretical approach (McKay 2004a, 57). They decided to use “large catalogue of features, with an emphasis on general features, and to give users the option of selecting which ones they want to extract” (McKay and Fujinaga 2006, 2).
There were two basic characteristics of the features. They were either one-dimensional or two dimensional. One-dimensional features are features that can be represented by one value, like means, standard deviations, and true/false values. Multi-dimensional features must be represented by more than one value, like historgrams.
Histograms, in particular, are a pretty cool feature of jSymbolic. In their essay, “Style-independent computer-assisted exploratory analysis of large music collections” (2007), McKay and Fujinaga use a few diagrams to show the difference between a Ramones song and a Thelonious Monk song. Using a beat historgram, they show that The Ramone’s song is visibly not as tight on the beat and has multiple tempo emphases in its tempo range compared to the very tight rhythm of the Thelonious Monk song (McKay and Fujinaga 2007, 70).
The features they chose were drawn from research in multiple fields, including music theory, ethnomusicology, music cognition, and popular musicology (McKay and Fujinaga 2006). 160 total features were chosen, 111 of which have been implemented (McKay 2004a). These features are broken down into 7 groups: instrumentation, texture, rhythm, dynamics, pitch statistics, melody and chords. Only the last group, chords, has not been implemented at all.
In another paper, McKay and Fujinaga (2005) investigated which features actually had the best statistics for classifying music into a set of genres. “It was found that features based on instrumentation (i.e., timbre) were by far the most important type of features” (McKay and Fujinaga 2005, 9). This study also validated their work into high-level feature extraction by showing that similarity results with high-level features scored better than results with audio signal feature extraction. They also used a larger number of classes (genres) in this study, which gave some hope that there is potential for automatic music classification systems to be able handle larger class sets and maybe someday deal with the number of classes a human expert is dealing with.
McKay, C. 2004a. Automatic genre classification of MIDI recordings. M.A. Thesis. McGill University, Canada.
McKay, C. 2004b. Automatic genre classification as a study of the viability of high-level features for music classification. Proceedings of the International Computer Music Conference. 367–70.
McKay, C., and I. Fujinaga. 2005. Automatic music classification and the importance of instrument identification. Proceedings of the Conference on Interdisciplinary Musicology.
McKay, C., and I. Fujinaga. 2006. jSymbolic: A feature extractor for MIDI files. Proceedings of the International Computer Music Conference. 302–5.
McKay, C., and I. Fujinaga. 2007. Style-independent computer-assisted exploratory analysis of large music collections. Journal of Interdisciplinary Music Studies 1 (1): 63–85.