OMP-ELM: Orthogonal Matching Pursuit-Based Extreme Learning Machine for Regression

Omer F. Alcin; Abdulkadir Sengur; Jiang Qian; Melih C. Ince

doi:10.1515/jisys-2014-0095

Open Access Published by De Gruyter August 21, 2014

OMP-ELM: Orthogonal Matching Pursuit-Based Extreme Learning Machine for Regression

Omer F. Alcin , Abdulkadir Sengur , Jiang Qian and Melih C. Ince

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2014-0095

Abstract

Extreme learning machine (ELM) is a recent scheme for single hidden layer feed forward networks (SLFNs). It has attracted much interest in the machine intelligence and pattern recognition fields with numerous real-world applications. The ELM structure has several advantages, such as its adaptability to various problems with a rapid learning rate and low computational cost. However, it has shortcomings in the following aspects. First, it suffers from the irrelevant variables in the input data set. Second, choosing the optimal number of neurons in the hidden layer is not well defined. In case the hidden nodes are greater than the training data, the ELM may encounter the singularity problem, and its solution may become unstable. To overcome these limitations, several methods have been proposed within the regularization framework. In this article, we considered a greedy method for sparse approximation of the output weight vector of the ELM network. More specifically, the orthogonal matching pursuit (OMP) algorithm is embedded to the ELM. This new technique is named OMP-ELM. OMP-ELM has several advantages over regularized ELM methods, such as lower complexity and immunity to the singularity problem. Experimental works on nine commonly used regression problems indicate that the investigated OMP-ELM method confirms these advantages. Moreover, OMP-ELM is compared with the ELM method, the regularized ELM scheme, and artificial neural networks.

Keywords: ELM; regularized ELM; sparsity; OMP; SLFNs

1 Introduction

Extreme learning machine (ELM) has become an interesting topic in the machine intelligence and pattern recognition communities with numerous real-world applications [7, 11]. It has an efficient and simple single hidden layer feed forward network (SLFN) architecture, which was originally proposed by Huang et al. [7]. ELM has several advantages over the conventional methods, such as its high degree of adaptability to various pattern recognition problems coupled with an extremely rapid learning and lower computational cost [7, 11].

Besides its above-mentioned advantages, ELM has several drawbacks. First, ELM suffers from the existing irrelevant values in the data set [11, 16]. Second, choosing the appropriate network parameters (number of neurons) in the hidden layer is an open problem for any application. An ELM network with too few hidden neurons may not be proper for modeling the input data, whereas a network with too many hidden neurons may lead to overfitting [11]. Moreover, when the neurons in the hidden layer are more than the training data, the ELM might have the singularity problem just like the least squares (LS). As a result, its solution may become unstable. To overcome these limitations, several methods have been proposed in the regularization framework [2, 8, 9, 13–16]. In References [13–15], a solution was proposed for data sets that have irrelevant variables. The authors used a wrapper method where a neuron-ranking mechanism was employed through L₁ regularization. Then, a pruning operation was incorporated. Thus, the ELM network with an optimal number of hidden neurons was obtained. For the second drawback of the ELM network, the following three-step solution can be taken into account. The first step is constructing several models with various numbers of hidden neurons, and then a generalizing error calculation step comes next for each model. Selecting the model with the minimum generalization error constructs the third step. This process seems reasonable; however, its complexity may be expensive when the data set has high dimensionality. Instead of the using the above three-step method, several authors proposed regularized ELM [2, 11, 16]. In References [2, 11, 16], the authors added an L₂ norm penalty for constructing a regularized ELM. This method offers a solution for the singularity problem of the ELM; however, it suffers from the problem of adjusting the regularization parameter. Another regularization method was proposed in References [13, 16]. This method was called least angle regression (LARS) ELM. LARS-ELM proposed a hidden node architecture where there were more neurons. Then, an L₁ norm penalty was added to ELM for eliminating the irrelevant neurons. This solution is also known as the lasso problem [3], and the LARS algorithm was used in the optimization of the L₁ minimization [3, 16]. Although LARS-ELM is good in determining the most relevant hidden nodes, it still suffers from the singularity problem especially if there is a correlation between the outputs of selected nodes [16]. To overcome this limitation of LARS-ELM, an L₂ norm penalty was considered. Thus, a novel improved ELM, namely naive elastic net, was constructed [11, 16]. In naive elastic net, there is no singularity problem [16]; however, this new regularization scheme caused coefficient shrinkage and introduced more bias to the ELM [16, 18]. To overcome these drawbacks, a new algorithm called LARS-elastic net (LARS-EN) ELM was introduced [16]. Essentially, LARS-EN-ELM can be regarded as a rescaled naive elastic net. However, two regularization parameters need to be tuned. Tuning these two parameters causes a slow training. In addition, Miche et al. proposed an algorithm called Tikhonov regularization optimally pruned (TROP) ELM for solving the singularity problem of the LARS-ELM [11, 15, 16]. The proposed method employs a double regularization scheme where an L₁ penalty was used to rank the hidden layer neurons followed by an L₂ penalty on the regression weights. The authors proposed a fast algorithm for parameter tuning. However, the TROP-ELM method lacks in the following aspects [11, 16]. First, TROP-ELM cannot guarantee convergence to the optimal parameters in all cases. Second, the singularity problem of the algorithm continues while calculating the LARS solution [11, 16].

In summary, the above-mentioned modified ELM schemes still have several drawbacks, such as the singularity problem, parameter tuning, obtaining the optimal number of hidden neurons, and training complexity. In this article, to overcome these limitations of the modified ELM methods, we considered a greedy method for sparse approximation of the output weights vector of the ELM network. More specifically, the application of the OMP algorithm is investigated, which has several advantages against the proposed modified ELM methods, such as low complexity and immunity to the singularity problem. OMP is a simple and fast iterative algorithm that selects a dictionary element best correlated with the residual part of the signal for each iteration step [17]. Then, it develops a new approximant by projecting the signal onto those elements that have already been selected [10, 17]. Similar to the previously presented modified ELM articles, in this article, we apply the OMP-ELM method on the regression problems.

A regression problem is generally used for prediction and forecasting applications. It is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships [1]. The main purpose of the proposed method is selecting the relevant hidden units; thus, an optimum number of neurons in the hidden node of the ELM architecture is guaranteed. Moreover, we assume that the hidden unit size is larger than the ELM network needed.

The organization of the article is as follows. In Section 2, we briefly introduce ELM and the modified ELM methods. The investigated OMP-ELM is described in detail in Section 3. Section 4 introduces the data sets in the experiments. Finally, Section 5 summarizes the conclusions of the present study.

2 Theoretical Background

2.1 ELM

The ELM has a simple and efficient architecture where the hidden layer’s weights are initialized randomly [7]. The optimization of the ELM architecture is then achieved by using the Moore-Penrose generalized inverse (MPGI). The ELM is a kind of SLFN network and can be modeled as

(1)y = ∑i = 1Nβif(x; wi, bi), (1)

where N shows the hidden node neuron number. Each input node of the ELM network is connected to the ith hidden node through w_i weight. b_i shows the bias for the ith hidden node. The weight vector that is indicated with β_i in Eq. (1) connects the ith hidden node to the output. In addition, f shows the activation function. After randomly selecting the number of the hidden nodes and the other related parameters of ELM network, for given training samples {x_i, y_i}, the ELM optimization criterion can be defined in an LS form:

(2)L(X, Y; β) = ||Y − Hβ||2, (2)

where

X = (x1T⋮xMT), Y = (y1T⋮yMT), β = (β1T⋮βMT) and H = (f(x1; w1, b1)…f(x1; wN, bN)⋮…⋮f(xM; w1, b1)…f(xM; wN, bN)),

where M is the number of input samples. The parameter β can be determined by

(3)β=H†Y (3)

where H^† indicates the MPGI of H.

2.2 Improved ELMs

2.2.1 Regularized ELM

As mentioned in Section 1, singularity is an important problem for the ELM procedure. Especially, when the hidden node neuron number N is higher than the number of the input data M or several columns of H are correlated, the singularity problem of the ELM network, whose solution is unstable, will come out. Several authors added an L₂ norm penalty term to the equation of LS [Eq. (2)] for regularization [2, 8, 9] to solve the singularity problem. Thus, the obtained scheme can be defined as follows:

(4)L(X, Y; β, λ) = ||Y − Hβ||2 + λ2||β||2, (4)

where λ₂ is the regularization parameter. The parameter β can be obtained by

(5)β = (HTH + λ2I)−1HTY, (5)

where I is the identity matrix.

2.2.2 LARS-ELM

As we expressed previously, the ELM network with few hidden neurons may cause an underfitted ELM structure and too many hidden neurons may cause an overfitted ELM structure. At first, an ELM architecture with more hidden nodes is generated and then an L₁ norm penalty is used for eliminating the irrelevant nodes of the network. Consequently, determination of the optimum hidden node is achieved automatically, which yields an improved ELM structure [13]. The improved ELM is shown as

(6)L(X, Y; β, λ) = ||Y − Hβ||2 + λ1||β||1. (6)

Equation (6) is known as the lasso problem with penalty factor (λ₁) and ∣∣·∣∣¹ stands for the L₁ norm. In addition, Eq. (6) can be solved with the LARS algorithm efficiently [3].

2.2.3 LARS-EN-ELM

LARS-ELM may also encounter the singularity problem owing to the status of the Gram matrix, as it is known that the LARS solution is unstable when the Gram matrix is not full rank [3]. In other words, LARS uses the inverse of the Gram matrix of the selected columns of H for solving the problems. To overcome this shortcoming of LARS-ELM, an L₂ norm penalty is considered and a novel ELM scheme, namely LARS-EN-ELM, is obtained. The obtained scheme is as follows:

(7)L(X, Y; β, λ1, λ2) = ||Y − Hβ||2 + λ1||β||1 + λ2||β||2, (7)

which is a naive elastic net and can be reformed as a lasso problem:

(8)L(X, Y*; β*, γ) = ||Y* − H*β*||2 + λ||β*||1, (8)

where

Y* = (Y0N×1), H* = (1 + λ2)−1/2(Hλ2IN×N),

β* = 1 + λ2β, and λ = λ11 + λ2.

The expression given in Eq. (8) can be handled by the LARS algorithm. Let

(9)β* = argminβ*L(X, Y*; β*, λ). (9)

Thus, the related solution for Eq. (9) is obtained as follows:

(10)β = 11 + λ2β*. (10)

As Eq. (7) seems to handle the singularity problem, it is worth mentioning that the algorithm has two parameters, namely λ₁ and λ₂, that needs to be tuned. This situation can be considered as a limitation of the LARS-EN-ELM algorithm, which causes a slow training.

2.2.4 TROP-ELM

An algorithm called TROP-ELM was proposed in Reference [15] to handle the singularity problem in LARS-ELM. In TROP-ELM, first the LARS algorithm is used to choose the most relevant hidden nodes for constructing the ELM network. After constructing the ELM network, an L₂ norm penalty is added for solving the regularization issue. Moreover, the authors aimed to speed up the parameter tuning process one by one. According to their approach, they first tune λ₁ with λ₂ = 0, then they tune λ₂ with the well-tuned λ₁.

By tuning these parameters, the authors tried to obtain the optimum λ₁ and λ₂. However, this tuning scheme does not yield the optimal parameters unless the two parameters are considered in the dimension parameter space together.

2.3 OMP

In this section, the OMP algorithm will be discussed briefly. The detailed description can be found in References [10, 17]. In addition, the OMP algorithm will be used for selecting the best hidden nodes of the ELM network; thus, an ELM structure with optimum hidden nodes will be formed. Suppose that β is an arbitrary m-sparse signal in ℝ^N. Form an M × N matrix H whose rows are the measurement vectors, and observe that the M measurements of the signal can be collected in an N-dimensional data vector Y = Hβ. We refer to H as the measurement matrix and denote its columns by h₁,..., h_N. As β has only m non-zero components, the data vector Y = Hβ is a linear combination of m columns from H.

To determine the sparse vector β, we should decide which columns of H contribute in the measurement vector Y. The rationale is to get the columns in an iterative way. One column of H is chosen at each iteration according to its correlation with the residue of signal. Then, its contribution is subtracted from the residual signal. After several iterations, the algorithm will identify the correct set of columns. The algorithm of the OMP can be seen in Table 1.

Table 1

OMP Algorithm.

Input: M × N measurement matrix M-dimensional data vector Y	Output: An estimate β in ℝ^N
Step 1. Initiations, Λ_t← Ø, r₀← Y, t← 1.
Step 2. Determine the λ_t according to the optimization problem
λ_t← argmax max_{j=_{1, …,}d}∣〈r_t–1, h_j〉∣.
Step 3. Collect Λ_t← Λ_t–1 ∪ {λ_t}
H_t← [H_t–1h_λt].
Step 4. Solve LS to obtain a new estimate: βt ← argminβ′\|\|Htβ − Y\|\|2.
Step 5. Determine the new values (approximation and residual): at= Htβ′t
r_t← Y← a_t
Step 6.t = t + 1, and return Step 2 if t< m.

In Table 1, steps 4 and 5 show the notional structure of the OMP. It is essential that the residual r_t and columns of H_t are always orthogonal with each other. Therefore, it guarantees to pick a new atom at each step, and H_t has full column rank. The complexity of the OMP is high in step 2 in Table 1, whose exact complexity is O(mMN). In addition, at iteration t, the O(tM) complexity cost is encountered owing to the solution of the LS problem.

3 Proposed OMP-ELM

As mentioned in the algorithm of OMP, which is given in Section 2.3, the OMP algorithm only needs to determine the sparsity level m. This parameter gives information about how many hidden nodes are actively used to model the input-output relation because more hidden nodes are generated than necessary in the beginning by the proposed OMP-ELM scheme. Following the scheme in Reference [6], a sequence of K values of m decreasing from m_max to m_min on a logarithmic scale is considered. Consequently, K models are obtained corresponding to K different scheme. To obtain the best model, we employed the root mean square error (RMSE) criterion, which is given as

(11)RMSE = ∑i = 1M(yi − y^i)M, (11)

where y_i is the actual data set and y^i is the predicted data set by the proposed OMP-ELM scheme.

The OMP-ELM algorithm is given as follows:

Step 1:Initialize the parameters w and b of function f, which is given in Eq. (1).
Step 2:Select the number of hidden nodes N and the sparsity level K.
Step 3:Calculate the H matrix.
Step 4:Apply the OMP algorithm for obtaining the β coefficients.
Step 5:Calculate the RMSE for each model corresponding to each sparsity level K.
Step 6:Select the best β coefficients with the minimum RMSE value.

4 Experimental Studies

The data sets are collected from the University of California at Irvine Machine Learning Repository [5]. The related information about the data sets is given in Table 2. A normalization procedure is employed before the application of the proposed OMP-ELM method (z-score function in Matlab) [12]. As it is known, z-score measures the distance of a data point from the mean in terms of the standard deviation. For a random variable X with mean μ and standard deviation σ, the z-score of a value x can be calculated as

Table 2

Information about the Selected Data Sets.

Data Set	No. of Attributes	Samples
Data Set	No. of Attributes	Train	Test
Abalone	8	2784	1393
Ailerons	40	9166	4584
Auto Price	15	106	53
Auto MPG	7	265	133
Boston Housing	13	337	169
California Housing	8	13,760	6880
Delta Ailerons	5	4752	2377
Delta Elevator	6	6344	3173
Triazines	60	106	53

(12)z = (X − μ)σ. (12)

For the sake of comparison with the other methods, the training and test data sets are constructed in the same way. For all data sets, 10 different random permutations are considered, and for each permutation two-thirds are used for constructing the training data set and the remaining one-third is used for the test data set. Moreover, for optimally determining the sparsity level K of the OMP algorithm, a two-step sparsity level search algorithm is considered. In the first step, a coarse search mechanism was employed where a sequence of K values are chosen for each data set. In the following, the selected K values for each data set are given. For the Triazines and Auto Price data sets, the K value is changed from 5 to 50 in increments of 5. For the Delta Elevator, Auto MPG, Abalone, and Boston Housing data sets, K is varied in range of 20 to 200 in increments of 20, and for Delta Ailerons and other data sets, K is in the range of 100 to 500 in increments of 50. For each K value, the RMSE was calculated to determine a performance criterion for expressing the availability of K. Thus, the K value, which obtained the minimum RMSE, was chosen as the optimum K and this value was conveyed for the next step. In the second step, a fine search mechanism was employed where a range of K – 10 to K + 10 in increments of one element were considered. Similarly, for each K, the RMSE was calculated as a performance measurement. The optimal K value was chosen with the corresponding minimum RMSE. In experiments, we used 100 initializations for each data set, and the mean RMSE and its standard deviation was computed. The results presented in Table 3 are hence the average of the 100 repetitions for each data set. Table 3 also presents the standard deviation of the obtained results for each data set.

Table 3

Mean Value and Standard Deviation of RMSE, CC, and MAE for the Proposed Method.

Data Sets	Hidden Nodes	Mean RMSE	Mean RMSE σ	Mean CC	Mean CC σ	Mean MAE	Mean MAE σ
Abalone	200 (38)	0.0743	5.27e-4	0.7425	0.0049	0.0533	3e-4
Auto Price	100 (16)	0.0720	0.0082	0.8753	0.0275	0.0554	59e-4
Ailerons	1000 (72)	0.1833	4.099e-4	0.5782	0.0030	0.1419	4e-4
Boston Housing	200 (40)	0.0898	0.0081	0.8858	0.0169	0.0632	36e-4
California Housing	1000 (150)	0.1160	6.884e-4	0.8457	0.0033	0.0883	9e-4
Delta Elevator	200 (33)	0.1084	2.71e-4	0.8011	0.0011	0.0821	2e-4
Delta Ailerons	200 (100)	0.1842	0.0014	0.8503	0.0029	0.1413	14e-4
Servo	100 (20)	0.1056	0.0087	0.8913	0.0162	0.0729	7e-3
Triazines	100 (16)	0.1774	0.0141	0.3526	0.0646	0.1252	67e-4

The maximum number of hidden nodes tested for the proposed methods and the optimal number of nodes for OMP-ELM (shown within parentheses) are shown.

As one can see from Table 3, the minimum mean RMSE value was obtained for the Auto Price data set. Although we used the experiments on Auto Price data set with 100 hidden nodes, the minimum number of hidden nodes (16) was obtained. Moreover, for the Triazines data set, we also obtained the minimum number of hidden nodes; however, the corresponding performance (mean RMSE) was not as good as the Auto Price data set. The worst performance results were obtained for the Ailerons, Delta Ailerons, and Triazines data sets where the calculated mean RMSE values were >0.17. An important observation from Table 3 is that the obtained standard deviations are considerably low, which means that the proposed method yielded similar performances for each trial. This shows the robustness of the proposed OMP-ELM method. However, the final hidden nodes that were tabulated in the last column of Table 3 (shown within parentheses) were lower that the initially assigned number of hidden nodes. This yielded a reduction of complexity compared with the standard ELM method.

For further evaluation of the proposed method, we used the correlation coefficient (CC) and mean absolute error (MAE) criteria [4]. The related results are tabulated in Table 3. These new performance evaluation criteria produced similar results with RMSE. We also conducted a comparative experiment where standard ELM, elastic net, and lasso methods were considered. For this comparison, the Abalone, Auto Price, Ailerons, Boston Housing, Delta Elevator, and California Housing data sets were used. The related experimental results are tabulated in Table 4. It should be noted that the experimental results for the ELM, elastic net, and lasso were from Reference [11] and reproduced here for comparison purposes. The results from Table 4 indicate that the proposed OMP-ELM method yielded better results than the compared standard ELM, elastic net, and lasso methods. Especially, there are great improvements for the Abalone, Auto Price, and Boston Housing data sets when we considered the mean RMSE values. For the other data sets, our proposal yielded better mean RMSE values than the standard ELM and other regularized schemes. In addition, the standard deviations of the proposed method were lower than the compared methods. In other words, the proposed method was robust against the other compared methods. Finally, when we compared the final number of hidden nodes, our proposal has no obvious superiority.

Table 4

Comparison of the Proposed Method with ANN, Standard ELM, Elastic Net, and Lasso.

	Data Set
	Abalone	Auto Price	Ailerons	Boston Housing	Delta Elevator	California Housing
ANN
M. RMSE	0.1382	0.4257	0.1869	0.1178	0.1109	0.1057
σ	0.0506	0.1113	0.0038	0.0340	0.0007	0.0018
H. nodes	40	20	30	10	20	20
ELM
M. RMSE	0.6558	0.4051	0.4527	0.4867	0.6036	0.5071
σ	0.007	0.041	0.004	0.052	0.002	0.004
H. nodes	40	20	600	80	100	400
Elastic net
M. RMSE	0.6518	1.036	0.4476	0.4587	0.6045	0.5114
σ	0.004	0.019	0.005	0.033	0.002	0.004
H. nodes	200 (26.1)	50 (1.1)	1000 (146.4)	200 (39.8)	200 (25.4)	1000 (153.3)
Lasso
M. RMSE	0.6499	1.0281	0.4471	0.4465	0.6057	0.5078
σ	0.003	0.023	0.004	0.031	0.002	0.004
H. nodes	200 (21.6)	50 (2.1)	1000 (142)	200 (41.4)	200 (16.8)	1000 (149.1)
OMP-ELM
M. RMSE	0.0743	0.0720	0.1833	0.0898	0.1084	0.1160
σ	5.27e-4	0.0082	4.099e-4	0.0081	2.71e-4	6.884e-4
H. nodes	200 (38)	200 (16)	1000 (72)	200 (40)	200 (33)	1000 (150)

We finally conducted an experiment with an artificial neural network (ANN) structure for comparison purposes. A three-layered ANN structure was considered with various numbers of neurons in the hidden layer for each data set. The related hidden node neuron numbers can be seen in Table 4. In the hidden node, the tangent sigmoid activation function was considered. The ANN was trained with Levenberg-Marquardt back-propagation algorithm. In the output layer, we used linear activation function. According to the obtained RMSE values, our proposal gave better results than ANN for the Abalone, Auto Price, Ailerons, Boston Housing, and Delta Elevator data sets. ANN just outperformed the California Housing data set.

5 Conclusions

This article proposed an OMP-based regularized ELM algorithm. In the beginning, more hidden node neurons are considered and then the OMP procedure was employed to obtain the sparse output weight vector. In other words, the pruning of the irrelevant nodes can be achieved by obtaining the sparse output weight vector. In the implementation of the OMP-ELM procedure, we considered a two-step search mechanism for determining the sparsity level. In this search procedure, a fine search comes after a coarse search. Several experimental works have been conducted to show the performance of the proposed method. Moreover, a comparison of the proposed method with the original ELM and regularized ELM schemes such as elastic net and lasso has been carried out. The regularized ELM algorithms obtained more compact network architecture. However, their generalization performance was similar to the standard ELM. Our proposed method both provided a compact network and showed better performance than the standard ELM and the regularized ELM schemes. In summary, the proposed OMP-ELM has several advantages, such as a rapid solution with an optimum number of hidden nodes. The obtained accuracy of the proposed method is better than those of standard ELM and the regularized ELM methods.

Corresponding author: Abdulkadir Sengur, Technology Faculty, Department of Electric and Electronics Engineering, Firat University, Elazig 23119, Turkey, e-mail: ksengur@firat.edu.tr

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable and fruitful comments, which greatly improved the revised manuscript.

Bibliography

[1] J. S. Armstrong, Illusions in regression analysis, Int. J. Forecast.28 (2012), 689–694.10.1016/j.ijforecast.2012.02.001Search in Google Scholar

[2] W. Deng, Q. Zheng and L. Chen, Regularized extreme learning machine, in: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining, pp. 389–395, 2009.10.1109/CIDM.2009.4938676Search in Google Scholar

[3] B. Efron, T. Hastie, I. Johnstone and R. Tibshirani, Least angle regression, Ann. Statist.32 (2004), 407–499.10.1214/009053604000000067Search in Google Scholar

[4] H. Esen, F. Ozgen, M. Esen and A. Sengur, Artificial neural network and wavelet neural network approaches for modelling of a solar air heater, Expert Syst. Appl.36 (2009), 11240–11248.10.1016/j.eswa.2009.02.073Search in Google Scholar

[5] A. Frank and A. Asuncion, UCI Machine Learning Repository, 2011, http://archive.ics.uci.edu/ml. (accessed on 4 August 2013).Search in Google Scholar

[6] J. Friedman, T. Hastie and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw.33 (2010), 1–22.10.18637/jss.v033.i01Search in Google Scholar

[7] G.-B. Huang, Q.-Y. Zhu and C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing70 (2006), 489–501.10.1016/j.neucom.2005.12.126Search in Google Scholar

[8] G.-B. Huang, D.-H. Wang and Y. Lan, Extreme learning machines: a survey, Int. J. Mach. Learn. Cyber.2 (2011), 107–122.10.1007/s13042-011-0019-ySearch in Google Scholar

[9] G.-B. Huang, H. Zhou, X. Ding and R. Zhang, Extreme learning machine for regression and multi-class classification, IEEE Trans. Syst. Man Cyber. Part B42 (2012), 513–529.10.1109/TSMCB.2011.2168604Search in Google Scholar PubMed

[10] S. Kunis and H. Rauhut, Random sampling of sparse trigonometric polynomials II – orthogonal matching pursuit versus basis pursuit, Found. Comput. Math.8 (2008), 737–763.10.1007/s10208-007-9005-xSearch in Google Scholar

[11] J. M. Martinez-Martinez, P. Escandell-Montero, E. S. Olivas, J. D. Martin-Guerrero, R. Magdalena-Benedito and J. Gomez-Sanchis, Regularized extreme learning machine for regression problems, Neurocomputing74 (2011), 3716–3721.10.1016/j.neucom.2011.06.013Search in Google Scholar

[12] MATLAB version 7.10.0, The MathWorks Inc., Natick, MA.Search in Google Scholar

[13] Y. Miche, P. Bas, C. Jutten, O. Simula and A. Lendasse, A methodology for building regression models using extreme learning machine: OP-ELM, in: Proceedings of the 16th European Symposium on Artificial Neural Networks, Bruges, Belgium, pp. 1–6, 2008.Search in Google Scholar

[14] Y. Miche, A. Sorjamaa, P. Bas, O. Simula, C. Jutten and A. Lendasse, OP-ELM: optimally pruned extreme learning machine, IEEE Trans. Neural Netw.21 (2010), 158–162.10.1109/TNN.2009.2036259Search in Google Scholar PubMed

[15] Y. Miche, M. Heeswijk, P. Bas, O. Simula and A. Lendasse, TROP-ELM: a double-regularized ELM using LARS and Tikhonov regularization, Neurocomputing74 (2011), 2413–2421.10.1016/j.neucom.2010.12.042Search in Google Scholar

[16] L.-C. Shi, B.-L. Lu, EEG-based vigilance estimation using extreme learning machines, Neurocomputing102 (2013), 135–143.10.1016/j.neucom.2012.02.041Search in Google Scholar

[17] J. Tropp and A. Gilbert, Signal recovery from partial information via orthogonal matching pursuit, IEEE Trans. Inf. Theory53 (2007), 4655–4666.10.1109/TIT.2007.909108Search in Google Scholar

[18] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B67 (2005), 301–320.10.1111/j.1467-9868.2005.00503.xSearch in Google Scholar

Received: 2014-5-12

Published Online: 2014-8-21

Published in Print: 2015-3-1

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

OMP-ELM: Orthogonal Matching Pursuit-Based Extreme Learning Machine for Regression

Abstract

1 Introduction

2 Theoretical Background

2.1 ELM

2.2 Improved ELMs

2.2.1 Regularized ELM

2.2.2 LARS-ELM

2.2.3 LARS-EN-ELM

2.2.4 TROP-ELM

2.3 OMP

3 Proposed OMP-ELM

4 Experimental Studies

5 Conclusions

Acknowledgments

Bibliography

Journal and Issue

Articles in the same Issue