Abstract

Music classification is conducive to online music retrieval, but the current music classification model finds it difficult to accurately identify various types of music, which makes the classification effect of the current music classification model poor. In order to improve the accuracy of music classification, a music classification model based on multifeature fusion and machine learning algorithm is proposed. First, we obtain the music signal, and then extract various features from the classification of the music signal, and use machine learning algorithms to describe the type of music signal and the relationship between the features. The music classifier and deep belief network machine learning models in shallow logistic regression are established, respectively. Experiments were designed for these two models to verify the applicability of the model for music classification. By comparing the experimental results, it is found that the classification accuracy of the deep confidence network model is higher than that of the logistic regression model, but the number of iterations needed for its accuracy to converge is also higher than that of the logistic regression model. Compared with other current music classification models, this model reduces the time of constructing music classifier, speeds up the speed of music classification, and can identify various types of music with high precision. The accuracy of music classification is obviously improved, which verifies the superiority of this music classification model.

1. Introduction

With the continuous development of economic technology and science and technology, the connection between people and music is getting closer and closer. There are many kinds of music in the music market. Music can reduce the pressure on people’s life and work. Because different users have different preferences for music types, it is more important to find the music users’ need for quickly and accurately identifying the music from a large number of music libraries. The key technology to improve the efficiency of music query is music classification. Therefore, it is necessary to focus on how to establish a music classification model with better classification performance [1].

In order to establish a music classification model with better classification performance, it is necessary to analyze the defects of the current music classification model. After investigation and analysis, it is found that the current optimal music classification model is based on the neural network music classification model. But, as the music data are increasing, the error rate of the music classification model based on the neural network to construct a classification of low efficiency is higher and easier to fall into local minimum problem [2]; therefore, the neural network structure should be optimized to establish a music based on particle swarm algorithm to optimize the neural network classification model, speed up the neural network optimization ability, and improve the effect and the music classification accuracy. The music classification model based on particle swarm optimization neural network needs to extract music features, and the extracted music features can describe the music information [3] to effectively identify the music types.

In order to improve the effect of music classification, a music classification model based on multifeature fusion and machine learning algorithm is proposed and compared with other music classification models. The simulation results show that the music classifier in this paper takes less time to be constructed, and the music classification speed is fast. It can obtain high-precision music classification results and has obvious advantages. This paper analyzes the influence of timbre and melody on music and its applicability and introduces the meier cepstrum coefficient reflecting timbre and four characteristics reflecting the pitch, frequency, formant, and frequency band energy distribution of melody and finally carries out the feature extraction experiment for these four characteristics. By using the four features extracted and combining the two machine learning models of logistic regression and deep confidence network, experiments were designed, analyzed, and compared.

The music classification was initially performed by experts. Because of the little experts’ love of music, the music classification standards should be different. At the same time, due to the introduction of human subjective thinking, the results of music classification have a certain degree of blindness and subjectivity. This will make the results of music classification unreliable, and sometimes the training and actual situation will be very different. Then, there are some automatic music classification models, the most common of which is the automatic music classification model for hidden Markov models. Generally speaking, hidden Markov models are a linear classification technique. It is a simple linear relationship between music type and feature set [4, 5]. But in fact, this is not the case. There is a non-linear relationship between the music type and the feature set, which will hide the limitations of the Markov model and obtain undesirable music classification results [6, 7]. Neural network belongs to nonlinear classification technology, and the relationship between music type and feature set can be effectively fit. However, neural network has obvious defects. For example, the number of training samples of music is large, and if the training samples are too small, the music classification effect is very poor. At the same time, the structure of the neural network is relatively complex [8], which makes the convergence efficiency of the music classification model low and the music classification time too long.

In terms of music classification, the research mainly includes the following two categories: first, music feature extraction; secondly, a classification algorithm based on music features is designed. The main purpose of music feature extraction is to effectively extract the feature information in music. Only by effectively extracting music features can we better classify music. In the research process, audio signal features are divided [9, 10], including the following four types: long-duration features; short-term characteristics; semantic features; and mixed features. From the perspective of human experience and auditory perception, musical signals are divided [11], including three categories: treble, timbre, and melody. Audio signal processing (spectrum analysis) is a widely used method of music signal feature extraction. The application of these methods is very simple [12], and the operation is flexible, which can directly achieve the purpose of extraction.

Through the research and analysis of music signals, we know that timbre characteristics are the most basic characteristics of audio signals. If the instrument and the sound are different, the corresponding timbre will also be different. In the process of analysis, the music is first processed by the signal processing method [13] and divided into several frames. Secondly, techniques such as spectrum analysis [14] and fast-Fourier transform, which are widely used at present, are used to further process each frame and finally obtain the required spectrum. Finally, through the study of signal spectrum, the corresponding statistical features, such as SF, are obtained [15]. Sometimes, in order to make the signal features stronger, its power spectrum can be decomposed to make it a subband, and signal features, such as OSC, can be extracted during the analysis process [16]. The so-called time-domain features of music signals can be regarded as the arrangement of timbre features corresponding to multiframe signals in a period of time series. Therefore, time-domain features can be obtained by combining with timbre features of audio signals [17], and then time-domain features can be obtained through short-time Fourier transform. In music classification, classification algorithm is another important research content. The music classification algorithm based on deep learning is currently the most mainstream music classification algorithm that has attracted more attention from the researchers [18, 19]. Active learning algorithm with SVM as the core is a relatively mature theory and widely used classification algorithm. Two classification algorithms based on classification hyperplane distance and sample and version space reduction are proposed [20], and the classification effect is relatively good. Gaussian mixture model [21], logistic regression [22], decision tree [23], sparse representation classifier [24], etc. are also models that are often used in classification algorithms. Principal component analysis is also often used to extract musical features [24], so as to achieve processing purposes. Non-negative matrix factorization and other aspects can also be applied to extract musical features [25]. Among many music classification algorithms, finding the most consistent classification algorithm with the current music scene can significantly improve the classification effect.

Although many music classification methods can classify music signals, the classification effect of these methods is not satisfactory at present [2629], and there are still drawbacks such as music classification errors, imperfect classification, and retrieval functions. This is mainly because the classification ability of the features extracted by the classification method in the level of music feature extraction is poor and the content information contained in music cannot be fully used to carry out the classification work. Therefore, only by finding out the efficient feature extraction and classification method can we better improve the music classification and retrieval function.

3. Music Classification Model Based on Multifeature Fusion and Machine Learning Algorithm

3.1. Extraction of Music Classification Features

Music types are mainly identified by their characteristics, so in the actual modeling process, music classification is studied as a pattern recognition problem, so feature extraction is very critical, which directly determines the accuracy of music classification. At present, music features are usually modeled and analyzed by features, but the information amount of feature extraction is limited, and it is unable to comprehensively describe the type of music. Therefore, various features are extracted for music classification. Firstly, the music signal is collected. Since the music signal is continuous, it is necessary to divide the music signal into frames. At the same time, in order to better extract the music classification features, the music signal needs to be enhanced. This is shown in Figure 1. The framework includes four parts: data preprocessing module, feature extraction module, multifeature fusion module, and RVM pattern recognition module.

Data preprocessing module: find the window with rich emotional information according to the size of mutual information between emotional music classification data and emotional labels. If all the EEG data were used in the experiment, it would be computationally complicated and contain a lot of interference information. The time window method is adopted to find the short time window with the most abundant information according to the mutual information between the data matrix and the emotion label and extract the signal that can better represent the emotion.

Feature extraction module: extract the features of music classification signals from different angles. Since the premise of feature fusion is that there is little correlation between various features for fusion, time-frequency analysis is carried out on music classification to extract the wavelet energy features under different rhythms of emotional music classification, as well as the Hurst index and fractal dimension, which represent the nonlinearity and nonstability of music classification.

Multifeature fusion module: it is the feature fusion method using joint sparse representation. The feature fusion method can make comprehensive use of a variety of music classification features to achieve feature complementarity. The music classification feature fusion sentiment recognition method based on joint sparse representation can take advantage of the inherent sparsity of data and obtain the multifeature fusion result by combining the sparse representation of multiple music classification features.

Pattern recognition module: use RVM for classification and recognition. The music classification data and emotional labels were analyzed, and arbitrary kernel functions were selected to construct the RVM classification decision surface and decision function based on emotional music classification signals. The sparsely RVM model was guaranteed by iteratively updating the hyperparameters:where is the aggravation factor.

After the music signal is weighted, the operation of frame splitting is realized:where N is the frame length and T is the sampling frequency coefficient of the music signal.

The segmented processing of a frame of music signal through the time window can obtain segmented processing of music signal:where t(n) represents the music signal before segmentation and represents the size of the time window.

The calculation formula of short-term average energy of music signal is as follows:

In this way, a kind of music classification features can be obtained, and then the cepstrum coefficient of music signal frequency is extracted and taken as the second kind of music classification feature. The specific idea is as follows: firstly, Fourier transform is used to extract spectral energy coefficient from music signals, and then convolution operation and discrete cosine transform are performed on spectral energy coefficient to obtain frequency cepstrum coefficient, which is taken as the classification feature of music signals.

The musical signal can be seen as an audio sequence composed of different tones. The fluctuation of tones contains the composer’s emotion when composing the music. The pitch level is determined by pitch frequency; therefore, pitch frequency is a very important parameter in speech signal processing.

Pitch frequency extraction is based on the short-term stability of the speech signal. At present, the common extraction methods about pitch frequency are autocorrelation detection, average amplitude difference function, peak extraction method, and so on. Considering the stability and smoothness of pitch signal, the method of autocorrelation function detection is selected to extract the pitch frequency. The short-term autocorrelation function of speech signal is defined as

As shown in Figure 2, the autocorrelation function of pitch part in an audio clip will have an obvious peak value. The obvious peak value can be detected to determine pitch or high frequency tone, and the extraction of pitch frequency can be detected to detect the position spacing of adjacent peaks.

3.2. Building a Music Classification Model Based on Multifeature Fusion Machine Learning

Current machine learning algorithm is of two kinds: one is artificial neural network and the other is support vector machine (SVM); they are modelled, the principle of small, to maximize the neural network based on the empirical risk and support vector machine based on structural risk minimization principle; the neural network learning is used to significantly lower than the support vector machine; this paper establish music of support vector machine classifier, in order to improve the effect of music classification as the goal, in the face of support vector machine (SVM) for a detailed description.

The core idea of support vector machine is to construct a hyperplane as a decision plane to separate the two kinds of small identical samples, so as to obtain the form of classifier:where is the weight coefficient vector and b is the classification threshold.

The working principle of the music classification model based on multifeature fusion and machine learning algorithm is as follows: first, music signals are collected, and various features of classification are extracted from music signals; then, machine learning algorithm is used to describe the relationship between music signal types and features, and a music classifier is established, as shown in Figure 3.

As can be seen from Figure 3, when constructing the music classification model, various types of original music data should be collected first, and the collected music data should be denoised and processed. Frames and endpoints are used to detect the denoised music to obtain effective music signals. The time-domain and frequency-domain variance features and energy features of music are extracted from effective music signals, and the feature vectors are formed by combining the extracted features. The contribution of the three features to music classification was determined by the grey correlation analysis method, and the above features were weighted to integrate various features. The three weighted features are used as input parts of the neural network optimized by particle swarm optimization algorithm, and the music classification results are output by the neural network optimized by particle swarm optimization algorithm.

The basis of automatic classification of music files is the extraction of music signal features, in which the main melody is the main line of music and also the key factor to judge the music style. In general, the main melody of most pieces is pitched higher than the accompaniment melody, so this article evaluates mainly through the relatively simple skyline. Theme extraction algorithm carries out feature extraction on music files. The steps are as follows:(1)To traverse the original signal of the music file: for two notes that have a polyphonic relationship, delete all notes except the one with the highest pitch. The notes of the polyphonic relationship are defined as follows:(2)After the execution of step (1), sort the starting time from morning to night if two adjacent notes meet the following conditions:An example of feature extraction from a music file using the Skyline theme extraction algorithm is shown in Figure 4.(1)High computational efficiency: when calculating the error of the backpropagation, the derivative of the activation function is needed. Compared with sigmoid and other functions, ReLU does not need to carry out exponential operation. Since the derivative is a constant, the calculation is very small.(2)The gradient vanishing problem is avoided. As the sigmoid’s derivative is close to 0, when it is close to the saturated region, the neural network does not have the effect of gradient descent when it propagates back error, that is, the gradient disappears, so it is impossible to further learn features. However, the derivative of ReLU is constant, and the error can be transmitted stably in the whole range. In other words, the problem of gradient disappearance and gradient explosion will not occur.(3)Reduce the occurrence of overfitting problems. Because ReLU sets the output of some neurons to 0, the sparsity of the network is improved and the dependence between parameters is reduced. Sparsity is an important feature of neural networks, which means that neurons can better learn the main features.(4)The convergence rate is much higher than of sigmoid and other functions.

However, ReLU needs to set a reasonable learning rate. If the learning rate is not set reasonably, the gradient will fluctuate violently. If a large gradient is generated during the period, the neuron is likely to become paralyzed due to overstimulation and lose the ability to perceive small gradient. Specifically, the gradient of the neuron does not change and remains zero all the time.

In the current market, although there are several mature music data sets, most of these data sets only contain English tracks; specifically, for the construction of Chinese traditional music data sets, the market is rarely seen. Therefore, to experiment the classification model, we first need to build a music data set by ourselves. In this paper, the crawler tool is used to build the database by crawling through the drama information and files from many websites such as Chinese drama website, drama website, and drama house to the local area.(1)First, analyze the structure of the website to find the logical relationship between each drama as well as the corresponding relationship between drama and drama download page.(2)Analyze the website structure through the Google browser developer tool (shortcut key F12) to find the URL corresponding to the page jump and the URL corresponding to drama download.(3)Anticrawling strategies: at present, most websites have their own anticrawling strategies to protect the safety of their websites and prevent too many crawlers from occupying too many resources of the website and causing the crash of the website. Three anticlimbing strategies are selected in this paper:(1)Set the visit interval to reduce the burden of the target website.(2)Forge the user agent field of HTTP file.(3)Use proxy IP to obtain data from third parties.(4)Tool selection: the crawler system is developed in Python. It sends requests to the target site and gets responses through the Requests tool. The HTML file is parsed using the BeautifulSoup tool.(5)Data storage: the drama files crawled from the network will be saved in the server, and the song name information, performer information, label information, and preservation path information of the corresponding drama will be stored in the database synchronously.(6)Data consolidation: the music files crawled from the Internet in this paper contain a total of 4,221 tracks of 7 kinds of music. As shown in Figure 5, there are 420 Sichuan Opera, 550 Huangmei Opera, 440 Peking Opera, 785 Kunqu Opera, 502 Pingju Opera, 515 Qinqiang Opera, and 1015 Yue Opera.

4. Results and Analysis

In order to test the music classification effect of multifeature fusion and machine learning algorithm, music of the same type was selected as the experimental object, and a total of 10 kinds of music were collected. The number of samples of each kind of music is shown in Table 1.

In order to analyze the superiority of music classification of multifeature fusion and machine learning algorithm, hidden Markov model (HMM) and single-feature neural network (YPNN) were selected for comparative testing. They used the same test platform, and the test platform parameters are shown in Table 2.

The models in this paper, HMM and YPNN, were, respectively, used to train and learn the 10 categories of music in Table 1, establish the corresponding 10 categories of music classifiers, and then calculate their classification accuracy for the 10 categories of music, as shown in Figure 6.

As can be seen from Figure 6, using HMM, the classification accuracy of 10 kinds of music is very low, not more than 91%, far below actual application requirements of the music classification; this is mainly because the HMM is a model of linear classification of music, music types, and characteristics between linear and nonlinear contacts at the same time; this cannot be accurately set up; the optimal music classification model makes the music classification success rate low and the music classification error rate high. Using BPNN, the classification accuracy of 10 kinds of music are higher than that of HMM, more than 92%; it satisfies the requirement of actual application of the music classification; this is mainly because the YPNN belongs to a kind of nonlinear model of music classification, that can describe music at the same time and type and characteristics between the linear and nonlinear contacts; the better music classification model was established, but because of its characteristics of using a single classifier, the music classification success remains to be further improved.

In this paper, the classification accuracy of 10 kinds of music is more than 96%, far higher than that of the HMM and YPNN music classification accuracy. Because the model in this paper is small, it can only overcome the shortcomings of HMM and can only perform linear classification. Therefore, we introduced many features to describe music types, which overcomes the limitations of a single feature of music classification, and the comparison results prove the superiority of the music classification model.

The average classification time of 10 kinds of music of small similarity in 3 classification models was counted, and the unit of time was second. The statistical results are shown in Figure 7.

According to the analysis of the average music classification time in Figure 7, compared with HMM and YPNN music classification models, the time of music classification model in this paper is significantly reduced, speeding up the speed of music classification and thus significantly improving the speed of music classification.

In order to test and verify the classification results of the model, we selected 10 kinds of music from the China United Music Platform as training samples, and input the selected music features. Among them, music represents the frequency domain characteristics of the pitch and duration of the music. For the time-domain characteristics of reverberation time and tonal music, music short-term energy RMS represents music characteristics, and the music classification model is used to output 10 music types of music; the results are as shown in Table 3. It can be seen from Table 3 that the music classification model in this paper can effectively classify 10 types of music. By comparing the classification results with the actual categories of music, it is found that the classification effect of the music classification model in this paper is the same as the actual categories of music, indicating that the classification accuracy of the music classification model in this paper is high.

In order to verify the effect of music classification in this paper, 100 users were randomly selected from the users of the United Music Platform in the experiment, and the users were asked to choose the music they were interested in from 10 kinds of music. Through investigation and statistics, it is found that more people like ensemble music, followed by sizhu music and qin music, while the number of other music types is relatively small. Comparing the classification effect of this paper and the classification effect of neural network, the comparison result is shown in Figure 8. It can be seen from Figure 8 that the use of the music classification model can make music accounted for 21% of the total music, and music and piano music accounted for 17% and 14% of the total music, respectively. We selected 100 user interests. The music production after classification by the traditional neural network classification model accounts for 6% of the total music, and music and piano music account for 6% and 7% of the total music respectively. And there is a deviation in the interests of the users among the selected 100 user interests. In conclusion, the music classification model in this paper has a good classification effect.

5. Conclusion

Music classification is one of the important technologies to improve the music retrieval, for the current music low classification accuracy, slow speed, and so on, in order to improve the music for target classification and put forward the multiple feature fusion and music classification model of machine learning algorithms; compared with other music classification model tests, the results show that the music classification small sample model only improved classification rate, and the music classification accuracy has been improved significantly, the music classification result of better overall performance, has the very broad application prospects. In order to improve the accuracy of music classification, this paper proposes a music classification model based on multifeature fusion machine learning. The three weighted features are taken as input of machine learning, and the network output results are the music classification results. Through experimental analysis, it is found that the music classification model in this paper fully considers the time-domain, frequency-domain variance characteristics, and short-term energy characteristics of music in the process of adopting multifeature fusion machine learning to realize classification and has high classification accuracy for 10 kinds of music, such as hydrangea and style. The music classification model based on machine learning has found that the accuracy of MFCC is high. However, compared with other models, the accuracy of this model needs to be improved. Therefore, the follow-up will mainly increase the data set to explore whether the accuracy can be improved without affecting the subsequent experiments.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the General Project of Hunan Education Department: A Cross-Cultural Tour of Hunan Classic Folk Songs—A Study of Exotic Singing in Ukrainian Music Academy, Project No. 17C0314.