Abstract

In this study, two modified gradient descent (GD) algorithms are proposed for time-delayed models. To estimate the parameters and time-delay simultaneously, a redundant rule method is introduced, which turns the time-delayed model into an augmented model. Then, two GD algorithms can be used to identify the time-delayed model. Compared with the traditional GD algorithms, these two modified GD algorithms have the following advantages: (1) avoid a high-order matrix eigenvalue calculation, thus, are more efficient for large-scale systems; (2) have faster convergence rates, therefore, are more practical in engineering practices. The convergence properties and simulation examples are presented to illustrate the efficiency of the two algorithms.

1. Introduction

System identification plays an important role in control theory and application [13]. When the model of a dynamic system is established, one can design robust controllers for such a model to predict its dynamics in the future. There exist many identification algorithms, for example, the least squares (LS) algorithm [4, 5], the gradient descent (GD) algorithm [6, 7], and the particle swarm optimization (PSO) algorithm [8, 9]. When the considered model has a high order, the LS algorithm and the PSO algorithm are inefficient for their heavy computational efforts [1012]. The GD algorithm has few computational efforts, but with slow convergence rates [13, 14]. To increase the convergence rate of the GD algorithm, two ways are usually performed: (1) design a more suitable direction [1517]; (2) calculate a better step size [18, 19]. In [20], the best step size of a GD algorithm is given, which involves the eigenvalue calculation. For a high-order matrix, computing its eigenvalues is challenging. To deal with this problem, plenty of suboptimal step size calculating methods are developed, for example, the stochastic GD algorithm [21, 22], the forgetting factor GD algorithm [18, 19], the projection algorithm, and the steepest GD algorithm [23, 24]. Although these algorithms can increase the convergence rates, they are all sensitive to the considered model. That is, one should design different step sizes for different kinds of models.

Time delay is normal in engineering practices. The data of a dynamic system are usually collected by a sensor and then transmitted via a communication channel; they may encounter time-delay due to network congestion [25, 26]. For the time-delayed model identification, Chen proposed a redundant rule-based off-line algorithm that can estimate the parameters and time delay simultaneously [27]. Since the off-line algorithm cannot update the parameters with newly arrived data, Zhang et al. developed a redundant rule-based recursive LS (RLS) algorithm for bilinear time-delayed systems, the RLS algorithm is an online algorithm [28]. This paper focuses on time-delayed model identification and aims to develop some novel identification algorithms which have fast convergence rates and less computational efforts.

Inspired by the PSO algorithm and the power method [29], we propose two modified GD algorithms for time-delayed models: one is an exhaustive search method which chooses the step size based on the PSO algorithm, and the other is the power-based GD algorithm which computes the step size using power method. These two algorithms can get a better step size in each iteration without eigenvalue calculation. Therefore, the proposed algorithms have faster convergence rates when compared with the traditional GD algorithm.

This paper is organized as follows: Section 2 describes the time-delayed model and the traditional LS and GD algorithms. In Section 3, two modified GD algorithms are proposed. Section 4 proves the convergence properties of the two algorithms. Section 5 gives two simulation examples. Finally, conclusions are presented in Section 6.

2. Problem Statement

First, some notations are denoted as follows: denotes an identity matrix of the appropriate sizes; means the norm of the matrix and is written as ; stands for the spectral radius of the matrix ; the superscript is defined as the matrix transpose; and denote the maximum and minimum eigenvalues of a matrix , respectively.

2.1. Time-Delayed Model

Consider the following time-delayed model,where and are the output and input, respectively; is a Gaussian white noise, and satisfies ; are the unknown parameters need to be estimated; is an unknown time delay.

Since the time-delay is unknown, the corresponding information vector of is unavailable which leads to the traditional GD algorithm being impossible. To deal with this dilemma, we use the redundant rule method. Assume that the upper bound of the time delay is , and this assumption is rational and feasible. For example, when using the RIP protocol in the network, the maximum flop is 16.

Rewrite the time-delayed model as follows:

Define the parameter vector and the information vector as

The augmented parameter vector is decomposed into the following three parts:

Remark 1. Since the two corresponding information vectors of the parameter vectors play a less role in the output, the redundant parts and are both zero vectors. If the parameter estimates of converge to the true values, the two redundant parts equal zero vectors, and then we can obtain the time-delay estimates based on this special structure.

2.2. LS and GD Algorithms

Rewrite the augmented model of the time-delayed model as

Collect sets of input and output data and define

It gives rise to

Define the cost function as follows:

Using the LS algorithm to estimate the parameters, it follows that

The LS algorithm should perform a matrix inverse calculation which may lead to heavy computational efforts, especially for large-scale systems, e.g., is large.

To avoid the matrix inverse calculation, the traditional GD (T-GD) algorithm is introduced [20],

The T-GD algorithm does not need to compute the inverse of the information matrix , but it requires calculating the eigenvalues of the information matrix to choose a suitable step size to keep the T-GD algorithm convergent. When has a high order, computing its eigenvalues is also a challenging problem.

3. Two Modified GD Algorithms

In this section, two modified GD algorithms are developed which aim to avoid eigenvalue calculation and to increase the convergence rate.

3.1. Exhaustive Search-Based GD Algorithm

The PSO algorithm is an intelligent search algorithm, which assigns plenty of particles (initial parameter estimates) first, and then computes the personal best estimates and the global best estimates in each iteration [30, 31]. If the number of the particles is larger, the estimates can easily achieve the true values. Inspired by the PSO algorithm, an exhaustive search-based GD algorithm is developed in this subsection. Its basic idea is to assign several step sizes for a negative direction in each iteration, and the smallest cost function has the best step size.

Assume that the parameter estimate in the iteration is , the parameter estimate in the iteration is computed by

If we assign a random step size for the above GD algorithm, we can find that (1) a small step size will have a slow convergence rate; (2) a large step size may lead to divergence of the GD algorithm. To choose a suitable step size and to avoid the eigenvalue calculation, we assign step sizes for the GD algorithm in each iteration.

Define an interval in an iteration as

Choose uniformly distributed terms between , that is

Based on the step sizes, we have parameter estimates, that is

Among the parameter estimates, we will choose the best one. Once the parameter estimates in iteration have been obtained, the corresponding cost functions are computed by

Let

That is, the smallest cost function has the best parameter estimate in iteration .

Then, the steps of the exhaustive search-based GD (ES-GD) algorithm are listed as follows:

Initialise , is a vector whose entries all equal to 1
Collect measurable data and
Assign the value for
repeat
for, do
  Assign ,
  Choose
  Update
  Compute
  Compare and choose
  Let
end
until convergence

Remark 2. The same as the PSO algorithm, a larger can lead to a more accurate parameter estimate . However, two problems exist for a poor : (1) if is small, all the step sizes can make , in this case, the step.size is quite small, we then assign ; (2) if is too large, all the step sizes lead to , in this case, we should assign to keep the ES-GD algorithm convergent.

Remark 3. The ES-GD algorithm uses the exhaustive search method to choose the step size; the “best” step size in each iteration is better than the step size which is randomly chosen. However, we have no confidence in the “best” step size because it is not the best one. In addition, a larger can make the “best” step size closer to the true one, but a larger also leads to heavier computational efforts.

3.2. Power-Based GD Algorithm

In [20], the authors have given the best step size for the cost functionis

Therefore, to get the actual best step size , one should compute both the maximum and minimum eigenvalues of the information matrix .

Since the eigenvalues of a high-order matrix are difficult to compute, next, we introduce the power method. The power method can get the maximum eigenvalue of a matrix using an iterative method.

For simplicity, let

Assign an initial non-zero vector , and use the following iterative function to get a sequence ,

Let

The following lemma is obtained.

Lemma 1. For a symmetric positive definite (SPD) matrix , the sequence is computed by (4). Then, the maximum eigenvalue of is computed by

Proof. Since is SPD, it has eigenvalues , and their corresponding eigenvectors are . Letand are linearly independent. There exist constants which are not all equal to zero, and the initial vector can be written byWithout loss of generality, assume that the eigenvalues of satisfyBased on (20), it gives rise toIt follows thatSince , when , we haveThen, (27) can be rewritten byTherefore, we can get thatThe proof is completed.
The power method can only get the maximum eigenvalue of . However, to get the best step size, one also should compute the minimum eigenvalue of . Next, we introduce an effective method to compute the minimum eigenvalue of .
Once the maximum eigenvalue of is obtained, we then assign a new term as follows:where is a positive constant which is chosen on a case by case basis.
Define a new matrix Then, the following lemma can be obtained.

Lemma 2. For a symmetric positive definite matrix , its eigenvalues are . A matrix is defined by (6). Then, the eigenvalues of the matrix are

Proof. For an SPD matrix , there exists a nonsingular matrix which can guaranteeThen, the matrix is written bySince and , we haveFor the SPD matrix , using the power method can obtain its maximum eigenvalue , and then the minimum eigenvalue of the matrix can be computed byWhen the maximum and minimum eigenvalues of the matrix are obtained, we can get the best step size.
The steps of the power-based GD (P-GD) algorithm are listed as follows:

Initialise , is a vector whose entries all equal to 1
Collect measurable data and
Use the power method to compute
Assign a positive constant based on
Construct an SPD matrix
Use the power method to compute the maximum eigenvalue of
Calculate based on
Compute the best step size
repeat
for, do
  Update
end
until convergence

Remark 4. If the maximum eigenvalue is not much bigger than , to compute the maximum eigenvalue is time-consuming. Because the value of will take more iterations to converge to zero.

Remark 5. The choice of the positive constant is very important; to get the maximum quickly, we would do better to choose a small . On the other hand, due to the estimation error, a small may lead not be an SPD matrix.

Remark 6. Recently, a novel GD algorithm, termed as fractional stochastic GD algorithm, has been proposed for parameter estimation. This algorithm is a well complement to the traditional GD algorithm, which can be widely used for different kinds of models [3234].

4. Convergence Properties of the Two Modified GD Algorithms

The convergence properties of the two modified GD algorithms are given in the following which offer theory guidance for researchers.

4.1. Convergence Analysis of the ES-GD Algorithm

Rewrite the ES-GD algorithm as follows:

Subtracting on both sides of the above equation yieldswhere . Since is a Gaussian white noise and is independent on , the above equation is simplified as

Based on the exhaustive search-based, in each iteration, we will find an optimal which guarantees

It gives rise to

Therefore, the ES-GD algorithm is convergent.

4.2. Convergence Analysis of the P-GD Algorithm

The P-GD algorithm is written by

Subtracting the true value on both sides of the above equation yields

For simplicity, let

Equation (44) is simplified as

For an SPD matrix , there exists a matrix which can ensure

It follows that equation (46) can be transformed into

Clearly, all the absolute values in the diagonal matrix are smaller than 1, then we have

Therefore, the P-GD algorithm is convergent.

Remark 7. In the P-GD algorithm, the maximum absolute value in the diagonal matrix is , that is,where is the conditioned number of the matrix . If the matrix is ill-conditioned, no matter what the step size is, the convergence rates are always very slow. In this case, we can try to reconstruct a new information matrix or use the fractional stochastic GD algorithm proposed in [3234] to increase the convergence rates.

5. Examples

Example 1. Consider the following time-delayed model,Assume that the time delay is and assign , we haveIn simulation, we collect 500 sets of input and output data, where and . Use the T-GD, ES-GD, and P-GD algorithms for the time-delayed model. The parameter estimates and their estimation errors are shown in Figure 1 and Table 1. The elapsed times of these three algorithms are illustrated in Table 2: the second row means that all the three algorithms run the same iteration, and the third row shows that the three algorithms have almost the same estimation error.
Assign a threshold . Compare the estimates (20-th iteration) with the threshold, if the absolute value of the estimate is smaller than the threshold, then it will be assigned as zero. We can get that the time-delay is 2.
In addition, we use the power method to compute the maximum and minimum eigenvalues of the information matrix , and the estimates are shown in Figure 2.
From this simulation, we can obtain the following conclusions:(1)The P-GD algorithm has the fastest convergence rates, then is the ES-GD algorithm, and the T-GD algorithm has the slowest convergence rates; this can be shown in Figure 1;(2)All the three algorithms can obtain the parameter estimates and the time-delay estimates simultaneously, as shown in Table 1;(3)The P-GD algorithm is the most effective algorithm, follows the ES-GD algorithm, and the last one is the T-GD algorithm; this can be shown in Table 2;(4)The power method can obtain the maximum eigenvalue and the minimum eigenvalue of the information matrix, as shown in Figure 2.

Example 2. A water tank system with a communication channel is proposed for simulation, see Figure 3, where is the position of the inlet water valve, and is the water level of Tank 2 and sampled by a pressure sensor. There exists a time-delay . The water tank system is modeled by the following model [35]:Using the T-GD, ES-GD, and P-GD algorithms for this model, the parameter estimates and their estimation errors are shown in Figure 4.
This example also shows that the two modified GD algorithms have faster convergence rates than those of the T-GD algorithm.

6. Conclusions

Two modified GD algorithms are proposed for systems with time-delay in this paper. The first is the ES-GD algorithm that does not require the eigenvalue calculation. The second is the P-GD algorithm, that can get the best step size by using the power method. These two modified GD algorithms have faster convergence rates than those of the T-GD algorithm, and can obtain the parameter estimates and time-delay estimates simultaneously. Thus, they can be widely used in engineering practices.

In this paper, we only use the modified GD algorithms for the time-delayed systems. If the systems have other kinds of hidden variables, e.g., missing outputs and model identities, can these two algorithms be also effective? this topic will remain as an open issue in future.

Data Availability

All data generated or analyzed during this study are included in this article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Jiangsu Province (No. BK20131109).