Exhaustive Search and Power-Based Gradient Descent Algorithms for Time-Delayed FIR Models

Chen, Hua; Ji, Yuejiang

doi:https://doi.org/10.1155/2022/9244890

Complexity

On this page

Abstract Introduction Examples Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 9244890 | https://doi.org/10.1155/2022/9244890

Exhaustive Search and Power-Based Gradient Descent Algorithms for Time-Delayed FIR Models

Hua Chen¹and Yuejiang Ji¹

Academic Editor: Daniele Salvati

Received17 Apr 2022

Revised16 May 2022

Accepted27 Jun 2022

Published08 Sept 2022

Abstract

In this study, two modified gradient descent (GD) algorithms are proposed for time-delayed models. To estimate the parameters and time-delay simultaneously, a redundant rule method is introduced, which turns the time-delayed model into an augmented model. Then, two GD algorithms can be used to identify the time-delayed model. Compared with the traditional GD algorithms, these two modified GD algorithms have the following advantages: (1) avoid a high-order matrix eigenvalue calculation, thus, are more efficient for large-scale systems; (2) have faster convergence rates, therefore, are more practical in engineering practices. The convergence properties and simulation examples are presented to illustrate the efficiency of the two algorithms.

1. Introduction

System identification plays an important role in control theory and application [1–3]. When the model of a dynamic system is established, one can design robust controllers for such a model to predict its dynamics in the future. There exist many identification algorithms, for example, the least squares (LS) algorithm [4, 5], the gradient descent (GD) algorithm [6, 7], and the particle swarm optimization (PSO) algorithm [8, 9]. When the considered model has a high order, the LS algorithm and the PSO algorithm are inefficient for their heavy computational efforts [10–12]. The GD algorithm has few computational efforts, but with slow convergence rates [13, 14]. To increase the convergence rate of the GD algorithm, two ways are usually performed: (1) design a more suitable direction [15–17]; (2) calculate a better step size [18, 19]. In [20], the best step size of a GD algorithm is given, which involves the eigenvalue calculation. For a high-order matrix, computing its eigenvalues is challenging. To deal with this problem, plenty of suboptimal step size calculating methods are developed, for example, the stochastic GD algorithm [21, 22], the forgetting factor GD algorithm [18, 19], the projection algorithm, and the steepest GD algorithm [23, 24]. Although these algorithms can increase the convergence rates, they are all sensitive to the considered model. That is, one should design different step sizes for different kinds of models.

Time delay is normal in engineering practices. The data of a dynamic system are usually collected by a sensor and then transmitted via a communication channel; they may encounter time-delay due to network congestion [25, 26]. For the time-delayed model identification, Chen proposed a redundant rule-based off-line algorithm that can estimate the parameters and time delay simultaneously [27]. Since the off-line algorithm cannot update the parameters with newly arrived data, Zhang et al. developed a redundant rule-based recursive LS (RLS) algorithm for bilinear time-delayed systems, the RLS algorithm is an online algorithm [28]. This paper focuses on time-delayed model identification and aims to develop some novel identification algorithms which have fast convergence rates and less computational efforts.

Inspired by the PSO algorithm and the power method [29], we propose two modified GD algorithms for time-delayed models: one is an exhaustive search method which chooses the step size based on the PSO algorithm, and the other is the power-based GD algorithm which computes the step size using power method. These two algorithms can get a better step size in each iteration without eigenvalue calculation. Therefore, the proposed algorithms have faster convergence rates when compared with the traditional GD algorithm.

This paper is organized as follows: Section 2 describes the time-delayed model and the traditional LS and GD algorithms. In Section 3, two modified GD algorithms are proposed. Section 4 proves the convergence properties of the two algorithms. Section 5 gives two simulation examples. Finally, conclusions are presented in Section 6.

2. Problem Statement

First, some notations are denoted as follows: denotes an identity matrix of the appropriate sizes; means the norm of the matrix and is written as ; stands for the spectral radius of the matrix ; the superscript is defined as the matrix transpose; and denote the maximum and minimum eigenvalues of a matrix , respectively.

2.1. Time-Delayed Model

Consider the following time-delayed model,where and are the output and input, respectively; is a Gaussian white noise, and satisfies ; are the unknown parameters need to be estimated; is an unknown time delay.

Since the time-delay is unknown, the corresponding information vector of is unavailable which leads to the traditional GD algorithm being impossible. To deal with this dilemma, we use the redundant rule method. Assume that the upper bound of the time delay is , and this assumption is rational and feasible. For example, when using the RIP protocol in the network, the maximum flop is 16.

Rewrite the time-delayed model as follows:

Define the parameter vector and the information vector as

The augmented parameter vector is decomposed into the following three parts:

Remark 1. Since the two corresponding information vectors of the parameter vectors play a less role in the output, the redundant parts and are both zero vectors. If the parameter estimates of converge to the true values, the two redundant parts equal zero vectors, and then we can obtain the time-delay estimates based on this special structure.

2.2. LS and GD Algorithms

Rewrite the augmented model of the time-delayed model as

Collect sets of input and output data and define

It gives rise to

Define the cost function as follows:

Using the LS algorithm to estimate the parameters, it follows that

The LS algorithm should perform a matrix inverse calculation which may lead to heavy computational efforts, especially for large-scale systems, e.g., is large.

To avoid the matrix inverse calculation, the traditional GD (T-GD) algorithm is introduced [20],

The T-GD algorithm does not need to compute the inverse of the information matrix , but it requires calculating the eigenvalues of the information matrix to choose a suitable step size to keep the T-GD algorithm convergent. When has a high order, computing its eigenvalues is also a challenging problem.

3. Two Modified GD Algorithms

In this section, two modified GD algorithms are developed which aim to avoid eigenvalue calculation and to increase the convergence rate.

3.1. Exhaustive Search-Based GD Algorithm

The PSO algorithm is an intelligent search algorithm, which assigns plenty of particles (initial parameter estimates) first, and then computes the personal best estimates and the global best estimates in each iteration [30, 31]. If the number of the particles is larger, the estimates can easily achieve the true values. Inspired by the PSO algorithm, an exhaustive search-based GD algorithm is developed in this subsection. Its basic idea is to assign several step sizes for a negative direction in each iteration, and the smallest cost function has the best step size.

Assume that the parameter estimate in the iteration is , the parameter estimate in the iteration is computed by

If we assign a random step size for the above GD algorithm, we can find that (1) a small step size will have a slow convergence rate; (2) a large step size may lead to divergence of the GD algorithm. To choose a suitable step size and to avoid the eigenvalue calculation, we assign step sizes for the GD algorithm in each iteration.

Define an interval in an iteration as

Choose uniformly distributed terms between , that is

Based on the step sizes, we have parameter estimates, that is

Among the parameter estimates, we will choose the best one. Once the parameter estimates in iteration have been obtained, the corresponding cost functions are computed by

Let

That is, the smallest cost function has the best parameter estimate in iteration .

Then, the steps of the exhaustive search-based GD (ES-GD) algorithm are listed as follows:

	Initialise , is a vector whose entries all equal to 1
	Collect measurable data and
	Assign the value for
	repeat
	for, do
	Assign ,
	Choose
	Update
	Compute
	Compare and choose
	Let
	end
	until convergence

Remark 2. The same as the PSO algorithm, a larger can lead to a more accurate parameter estimate . However, two problems exist for a poor : (1) if is small, all the step sizes can make , in this case, the step.size is quite small, we then assign ; (2) if is too large, all the step sizes lead to , in this case, we should assign to keep the ES-GD algorithm convergent.

Remark 3. The ES-GD algorithm uses the exhaustive search method to choose the step size; the “best” step size in each iteration is better than the step size which is randomly chosen. However, we have no confidence in the “best” step size because it is not the best one. In addition, a larger can make the “best” step size closer to the true one, but a larger also leads to heavier computational efforts.

3.2. Power-Based GD Algorithm

In [20], the authors have given the best step size for the cost functionis

Therefore, to get the actual best step size , one should compute both the maximum and minimum eigenvalues of the information matrix .

Since the eigenvalues of a high-order matrix are difficult to compute, next, we introduce the power method. The power method can get the maximum eigenvalue of a matrix using an iterative method.

For simplicity, let

Assign an initial non-zero vector , and use the following iterative function to get a sequence ,

Let

The following lemma is obtained.

Lemma 1. For a symmetric positive definite (SPD) matrix , the sequence is computed by (4). Then, the maximum eigenvalue of is computed by

Proof. Since is SPD, it has eigenvalues , and their corresponding eigenvectors are . Letand are linearly independent. There exist constants which are not all equal to zero, and the initial vector can be written byWithout loss of generality, assume that the eigenvalues of satisfyBased on (20), it gives rise toIt follows thatSince , when , we haveThen, (27) can be rewritten byTherefore, we can get thatThe proof is completed.
The power method can only get the maximum eigenvalue of . However, to get the best step size, one also should compute the minimum eigenvalue of . Next, we introduce an effective method to compute the minimum eigenvalue of .
Once the maximum eigenvalue of is obtained, we then assign a new term as follows:where is a positive constant which is chosen on a case by case basis.
Define a new matrix Then, the following lemma can be obtained.

Lemma 2. For a symmetric positive definite matrix , its eigenvalues are . A matrix is defined by (6). Then, the eigenvalues of the matrix are

Proof. For an SPD matrix , there exists a nonsingular matrix which can guaranteeThen, the matrix is written bySince and , we haveFor the SPD matrix , using the power method can obtain its maximum eigenvalue , and then the minimum eigenvalue of the matrix can be computed byWhen the maximum and minimum eigenvalues of the matrix are obtained, we can get the best step size.
The steps of the power-based GD (P-GD) algorithm are listed as follows:

	Initialise , is a vector whose entries all equal to 1
	Collect measurable data and
	Use the power method to compute
	Assign a positive constant based on
	Construct an SPD matrix
	Use the power method to compute the maximum eigenvalue of
	Calculate based on
	Compute the best step size
	repeat
	for, do
	Update
	end
	until convergence

Remark 4. If the maximum eigenvalue is not much bigger than , to compute the maximum eigenvalue is time-consuming. Because the value of will take more iterations to converge to zero.

Remark 5. The choice of the positive constant is very important; to get the maximum quickly, we would do better to choose a small . On the other hand, due to the estimation error, a small may lead not be an SPD matrix.

Remark 6. Recently, a novel GD algorithm, termed as fractional stochastic GD algorithm, has been proposed for parameter estimation. This algorithm is a well complement to the traditional GD algorithm, which can be widely used for different kinds of models [32–34].

4. Convergence Properties of the Two Modified GD Algorithms

The convergence properties of the two modified GD algorithms are given in the following which offer theory guidance for researchers.

4.1. Convergence Analysis of the ES-GD Algorithm

Rewrite the ES-GD algorithm as follows:

Subtracting on both sides of the above equation yieldswhere . Since is a Gaussian white noise and is independent on , the above equation is simplified as

Based on the exhaustive search-based, in each iteration, we will find an optimal which guarantees

It gives rise to

Therefore, the ES-GD algorithm is convergent.

4.2. Convergence Analysis of the P-GD Algorithm

The P-GD algorithm is written by

Subtracting the true value on both sides of the above equation yields

For simplicity, let

Equation (44) is simplified as

For an SPD matrix , there exists a matrix which can ensure

It follows that equation (46) can be transformed into

Clearly, all the absolute values in the diagonal matrix are smaller than 1, then we have

Therefore, the P-GD algorithm is convergent.

Remark 7. In the P-GD algorithm, the maximum absolute value in the diagonal matrix is , that is,where is the conditioned number of the matrix . If the matrix is ill-conditioned, no matter what the step size is, the convergence rates are always very slow. In this case, we can try to reconstruct a new information matrix or use the fractional stochastic GD algorithm proposed in [32–34] to increase the convergence rates.

5. Examples

Example 1. Consider the following time-delayed model,Assume that the time delay is and assign , we haveIn simulation, we collect 500 sets of input and output data, where and . Use the T-GD, ES-GD, and P-GD algorithms for the time-delayed model. The parameter estimates and their estimation errors are shown in Figure 1 and Table 1. The elapsed times of these three algorithms are illustrated in Table 2: the second row means that all the three algorithms run the same iteration, and the third row shows that the three algorithms have almost the same estimation error.
Assign a threshold . Compare the estimates (20-th iteration) with the threshold, if the absolute value of the estimate is smaller than the threshold, then it will be assigned as zero. We can get that the time-delay is 2.
In addition, we use the power method to compute the maximum and minimum eigenvalues of the information matrix , and the estimates are shown in Figure 2.
From this simulation, we can obtain the following conclusions:(1)The P-GD algorithm has the fastest convergence rates, then is the ES-GD algorithm, and the T-GD algorithm has the slowest convergence rates; this can be shown in Figure 1;(2)All the three algorithms can obtain the parameter estimates and the time-delay estimates simultaneously, as shown in Table 1;(3)The P-GD algorithm is the most effective algorithm, follows the ES-GD algorithm, and the last one is the T-GD algorithm; this can be shown in Table 2;(4)The power method can obtain the maximum eigenvalue and the minimum eigenvalue of the information matrix, as shown in Figure 2.

Example 2. A water tank system with a communication channel is proposed for simulation, see Figure 3, where is the position of the inlet water valve, and is the water level of Tank 2 and sampled by a pressure sensor. There exists a time-delay . The water tank system is modeled by the following model [35]:Using the T-GD, ES-GD, and P-GD algorithms for this model, the parameter estimates and their estimation errors are shown in Figure 4.
This example also shows that the two modified GD algorithms have faster convergence rates than those of the T-GD algorithm.

6. Conclusions

Two modified GD algorithms are proposed for systems with time-delay in this paper. The first is the ES-GD algorithm that does not require the eigenvalue calculation. The second is the P-GD algorithm, that can get the best step size by using the power method. These two modified GD algorithms have faster convergence rates than those of the T-GD algorithm, and can obtain the parameter estimates and time-delay estimates simultaneously. Thus, they can be widely used in engineering practices.

In this paper, we only use the modified GD algorithms for the time-delayed systems. If the systems have other kinds of hidden variables, e.g., missing outputs and model identities, can these two algorithms be also effective? this topic will remain as an open issue in future.

Data Availability

All data generated or analyzed during this study are included in this article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Jiangsu Province (No. BK20131109).

References

T. So derstrom and P. Stoica, Systen Identification, Prentice-Hall, Englewood Cliffs, NJ, USA, 1989.
J. Chen, J. Ma, M. Gan, and Q. Zhu, “Multi-direction gradient iterative algorithm: a unified framework for gradient iterative and least squares algorithms,” IEEE Transactions on Automatic Control, 2021.
View at: Publisher Site | Google Scholar
J. Chen, B. Huang, M. Gan, and C. P. Chen, “A novel reduced-order algorithm for rational models based on Arnoldi process and Krylov subspace,” Automatica, vol. 129, Article ID 109663, 2021.
View at: Publisher Site | Google Scholar
G. Y. Chen, M. Gan, C. L. P. Chen, and H. X. Li, “A regularized variable projection algorithm for separable nonlinear least-squares problems,” IEEE Transactions on Automatic Control, vol. 64, no. 2, pp. 526–537, 2019.
View at: Google Scholar
G. Birpoutsoukis, A. Marconato, J. Lataire, and J. Schoukens, “Regularized nonparametric Volterra kernel estimation,” Automatica, vol. 82, pp. 324–327, 2017.
View at: Publisher Site | Google Scholar
F. Ding, H. Ma, J. Pan, and E. F. Yang, “Hierarchical gradient- and least squares-based iterative algorithms for input nonlinear output-error systems using the key term separation,” Journal of the Franklin Institute, vol. 358, no. 9, pp. 5113–5135, 2021.
View at: Publisher Site | Google Scholar
Y. J. Ji and L. X. Lv, “Two identification methods for a nonlinear membership function,” Complexity, vol. 2021, Article ID 5515888, 7 pages, 2021.
View at: Publisher Site | Google Scholar
J. H. Li and X. Li, “Particle swarm optimization iterative identification algorithm and gradient iterative identification algorithm for Wiener systems with colored noise,” Complexity, vol. 2018, Article ID 7353171, 2018.
View at: Google Scholar
J. H. Li, T. C. Zong, J. P. Gu, and L. Hua, “Parameter estimation of Wiener systems based on the particle swarm iteration and gradient search principle,” Circuits, Systems, and Signal Processing, vol. 39, no. 7, pp. 3470–3495, 2020.
View at: Publisher Site | Google Scholar
F. Giri and E. W. Bai, Block-Oriented Nonlinear System Identification, Springer, Berlin, 2010.
Y. Chen, Y. J. Liu, J. Chen, and J. X. Ma, “A novel identification method for a class of closed-loop systems based on basis pursuit de-noising,” IEEE Access, vol. 8, Article ID 99648, 2020.
View at: Publisher Site | Google Scholar
J. Chen, Q. M. Zhu, M. F. Hu, L. X. Guo, and P. Narayan, “Improved gradient descent algorithms for time-delay rational state-space systems: intelligent search method and momentum method,” Nonlinear Dynamics, vol. 101, no. 1, pp. 361–373, 2020.
View at: Publisher Site | Google Scholar
S. J. Fan, L. Xu, F. Ding, A. Alsaedi, and T. Hayat, “Correlation analysis-based stochastic gradient and least squares identification methods for errors-in-variables systems using the multi-innovation,” International Journal of Control, Automation and Systems, vol. 19, no. 1, pp. 289–300, 2021.
View at: Publisher Site | Google Scholar
J. Chen, F. Ding, Q. Zhu, and Y. J. Liu, “Interval error correction auxiliary model based gradient iterative algorithms for multirate ARX models,” IEEE Transactions on Automatic Control, vol. 65, no. 10, pp. 4385–4392, 2020.
View at: Publisher Site | Google Scholar
D. Q. Wang, Y. R. Yan, Y. J. Liu, and J. H. Ding, “Model recovery for Hammerstein systems using the hierarchical orthogonal matching pursuit method,” Journal of Computational and Applied Mathematics, vol. 345, pp. 135–145, 2019.
View at: Publisher Site | Google Scholar
D. Q. Wang, Q. H. Fan, and Y. Ma, “An interactive maximum likelihood estimation method for multivariable Hammerstein systems,” Journal of the Franklin Institute, vol. 357, no. 17, Article ID 12986, 2020.
View at: Publisher Site | Google Scholar
M. James, “New insights and perspectives on the natural gradient method,” 2014, https://arxiv.org/pdf/1412.1193.
View at: Google Scholar
Q. L. Liu, Y. S. Xiao, F. Ding, and T. Hayat, “Decomposition-based over-parameterization forgetting factor stochastic gradient algorithm for Hammerstein-Wiener nonlinear systems with non-uniform sampling,” International Journal of Robust and Nonlinear Control, vol. 31, no. 12, pp. 6007–6024, 2021.
View at: Publisher Site | Google Scholar
J. X. Ma, W. L. Xiong, F. Ding, A. Alsaedi, and T. Hayat, “Data filtering based forgetting factor stochastic gradient algorithm for Hammerstein systems with saturation and preload nonlinearities,” Journal of the Franklin Institute, vol. 353, no. 16, pp. 4280–4299, 2016.
View at: Publisher Site | Google Scholar
Y. Saad, “Iterative methods for sparse linear systems,” Society for Industrial and Applied Mathematics, 2003.
View at: Google Scholar
C. P. Yu, J. Chen, S. K. Li, and M. Verhaegen, “Identification of affinely parameterized state-space models with unknown inputs,” Automatica, vol. 122, Article ID 109271, 2020.
View at: Publisher Site | Google Scholar
G. Y. Chen, M. Gan, C. L. P. Chen, and H. X. Li, “Basis function matrix-based flexible coefficient autoregressive models: a framework for time series and nonlinear system modeling,” IEEE Transactions on Cybernetics, vol. 51, no. 2, pp. 614–623, 2021.
View at: Publisher Site | Google Scholar
S. Magnusson, C. Enyioha, N. Li, C. Fischione, and V. Tarokh, “Convergence of limited communication gradient methods,” IEEE Transactions on Automatic Control, vol. 63, no. 5, pp. 1356–1371, 2018.
View at: Publisher Site | Google Scholar
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
View at: Google Scholar
Y. Gu, Q. M. Zhu, C. J. Li, P. Y. Zhu, and H. Nouri, “State filtering and parameter estimation for two-input two-output systems with time delay,” IET Control Theory & Applications, vol. 15, no. 16, pp. 2053–2066, 2021.
View at: Publisher Site | Google Scholar
D. Q. Wang, Z. Zhang, and B. Xue, “Decoupled parameter estimation methods for Hammerstein systems by using filtering technique,” IEEE Access, vol. 6, Article ID 66612, 2018.
View at: Publisher Site | Google Scholar
J. Chen, J. X. Ma, Y. J. Liu, and F. Ding, “Identification methods for time-delay systems based on the redundant rules,” Signal Processing, vol. 137, pp. 192–198, 2017.
View at: Publisher Site | Google Scholar
X. Zhang, Q. Y. Liu, F. Ding, A. Alsaedi, and T. Hayat, “Recursive identification of bilinear time-delay systems through the redundant rule,” Journal of the Franklin Institute, vol. 357, no. 1, pp. 726–747, 2020.
View at: Publisher Site | Google Scholar
J. Chen, D. Q. Wang, Y. J. Liu, and Q. M. Zhu, “Varying infimum gradient descent algorithm for agent-sever systems using different order iterative preconditioning methods,” IEEE Transactions on Industrial Informatics, vol. 18, no. 7, pp. 4436–4446, 2022.
View at: Publisher Site | Google Scholar
Y. Y. Cui, X. Meng, and J. F. Qiao, “A multi-objective particle swarm optimization algorithm based on two-archive mechanism,” Applied Soft Computing, vol. 119, Article ID 108532, 2022.
View at: Publisher Site | Google Scholar
P. B. Fernandes, R. C. L. Oliveira, and J. Fonseca Neto, “Trajectory planning of autonomous mobile robots applying a particle swarm optimization algorithm with peaks of diversity,” Applied Soft Computing, vol. 116, Article ID 108108, 2022.
View at: Publisher Site | Google Scholar
Z. A. Khan, N. I. Chaudhary, and S. Zubair, “Fractional stochastic gradient descent for recommender systems,” Electronic Markets, vol. 29, no. 2, pp. 275–285, 2019.
View at: Publisher Site | Google Scholar
Z. A. Khan, S. Zubair, H. Alquhayz, M. Azeem, and A. Ditta, “Design of momentum fractional stochastic gradient descent for recommender systems,” IEEE Access, vol. 7, Article ID 179575, 2019.
View at: Publisher Site | Google Scholar
Z. A. Khan, S. Zubair, N. I. Chaudhary, M. A. Z. Raja, F. A. Khan, and N. Dedovic, “Design of normalized fractional SGD computing paradigm for recommender systems,” Neural Computing & Applications, vol. 32, no. 14, Article ID 10245, 2020.
View at: Publisher Site | Google Scholar
J. Chen, B. Huang, F. Ding, and Y. Gu, “Variational Bayesian approach for ARX systems with missing observations and varying time-delays,” Automatica, vol. 94, pp. 194–204, 2018.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Hua Chen and Yuejiang Ji. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

120

Downloads

328

Citations