Classifying Music Based on Frequency Content and Audio Data
Craig Dennis
ECE539 Project Proposal
With the growing market of portable digital audio players, the number of digital music files inside personal computers has increased. It can be difficult to choose and classify which songs to listen to when you want to listen to specific genres of music, such as classical music, pop music and classic rock. Not only must the consumer classify their music, but online distributors must classify thousands of songs in their database for their consumers to browse through.
How can music be easily classified without human interaction? It would be extremely tedious to go through all of the songs in a large database one by one to classify them. A neural network could be trained to determine the difference between three different genres of music, classical music, pop and classic rock.
For this project, I have taken 30 sample songs from 3 genres of music, classical music, pop music and classic rock music and analyzed the middle five seconds to classify the music.Frequency content of the audio files can be extracted using the Fast Fourier Transform in Matlab. The songs were recorded at a sampling rate of 44.1Khz, so the largest recoverable frequency is 22.05Khz. The five second samples will be broken down further to take the short time Fourier transform of 50 millisecond samples. These samples will be broken down into the low frequency content (0-200Hz), lower middle frequency content (201-400Hz), higher middle frequency content (400-800Hz) and into further higher bands (800-1600Hz), (1600-3200Hz) and (3200-22050Hz.) These frequency bands can help describe the acoustic characteristics of the sample. The 50ms samples will be averaged within 250ms samples. This will give 120 features to classify the song. The frequency bands were chosen because they are ranges in which different musical instruments are found. Most bass instruments are within the 50-200Hz range. Many brass instruments like the trumpet and French horn are within the 200-800Hz range. Woodwinds are roughly found in 800-1600Hz. The higher frequencies were chosen because many classic rock songs and pop songs have distorted guitars which have high frequency content in their noise.
The 120 feature vectors will be classified using the K-Nearest Neighbor neural network as well as the Multi-Layer Perceptron neural network.
References:
Alghoniemy, Masoud. Tewfik, Ahmed H. “Rhythm And Periodicity Detection in Polyphonic Music.” Pg 185-190.
“Audio Topics: The Frequencies of Music” PBS International 633 granite Court
Cheng, Kileen. Nazer, Bobak. Uppuluri, Jyoti. Verret, Ryan. “Beat This A Synchronization Project.”
Zhang, Yibin. Zhou Jie. “A Study Of Content-Based Music Classification.” pg 113-116. Department of Automation, TsinghuaUniversity, Beijing 100084, China