Abstract

Predicting suspended sediment load (SSL) in water resource management requires efficient and reliable predicted models. This study considers the support vector regression (SVR) method to predict daily suspended sediment load. Since the SVR has unknown parameters, the observer-teacher-learner-based Optimization (OTLBO) method is integrated with the SVR model to provide a novel hybrid predictive model. The SVR combined with the genetic algorithm (SVR-GA) is used as an alternative model. To explore the performance and application of the proposed models, five input combinations of rainfall and discharge data of Cham Siah River catchment are provided. The predictive models are assessed using various numerical and visual indicators. The results indicate that the SVR-OTLBO model offers a higher prediction performance than other models employed in the current study. Specifically, SVR-OTLBO model offers highest Pearson correlation coefficient (R = 0.9768), Willmott’s Index (WI = 0.9812), ratio of performance to IQ (RPIQ = 0.9201), and modified index of agreement (md = 0.7411) and the lowest relative root mean square error (RRMSE = 0.5371) in comparison with SVR-GA (R = 0.9704, WI = 0.9794, RPIQ = 0.8521, and md = 0.7323, 0.5617) and SVR (R = 0.9501, WI = 0.9734, RPIQ = 0.3229, md = 0.4338, and RRMSE = 1.0829) models, respectively.

1. Introduction

Proper estimate of sediment transport load is highly essential in water engineering purposes such as the design and operation of dams, flood control structures, water conveyance channels, and other hydraulic structures [1]. In this context, forecasting and evaluating the suspended sediment load (SSL) in the catchment scale is a vital hydroenvironmental issue [2, 3]. Despite the importance of SSL, due to the dependency of the multiple hydrological, metrological, and hydraulic variables, the evaluation process is too complicated [46].

So far, various SSL prediction models such as physical, numerical, and empirical models are applied. Physical models are formed based on the theoretical governing equations of sediment transport composed of the partial differential equation of mass and flow transport. Although the physical models are the most accurate prediction models, the complexity in governing equations solution and incorporation and dependency on the various simplifying assumptions confines their application to practical engineering problems [7]. The numerical models, which are almost the most widespread approach in recent years, are built on solving the mass and flow transport using numerical calculus approaches and computer programming [8, 9]. Despite the popularity of this approach, especially in recent years, these models’ main drawback is the demand for knowledge in the application, limitations, and abilities of various numerical schemes and techniques. Furthermore, the numerical models require high intelligence in complicated computer programming and high computing speed [10].

Another class of SSL prediction approach depends on the experimental measurement and is known as empirical methods. In the middle of this approach, the most popular one is sediment rating curves. In this approach, a regression model is usually employed to develop a relationship between discharge and SSL [11, 12]. However, the sediment rating curves method has some methodological constraints. Also, an essential requirement for this approach is the availability of high-quality experimental data used in the curve fitting process [13].

Data-driven models are also efficient tools for predicting the SSL. This approach could draw on the causal factors and consequences of an event without any necessity of a deep understanding of a complex phenomenon process [14]. Data-driven models, which simulate a system using observed data in real life of the system, include a broad range of models such as regression-based models, time series models, and artificial intelligence (AI) models. The regression-based models evaluate the relationship of a dependent variable and several independent variables. In previous water resource engineering studies, the regression-based models have been applied to explore different sediment load, water level, energy dissipation, and similar essential hydrological parameters [1517]. Demirci and Baltaci [18] assessed three models based on the fuzzy logic approach (FL), sediment rating curves (SRC), and multilinear regression models. The models’ performance is investigated, and the FL model provided better performance in calculating the SSL than the other models. Singh et al. [19] evaluated different heuristic methods for predicting the SSL. The results show that multilayer perceptron (MLP) offered the best performance. The time series models, which are based on processing sequence inputs data, consist of statistical methods such as autoregressive (AR) and autoregressive moving average (ARMA) models, autoregressive integrated moving average (ARIMA) model, and autoregressive moving average with exogenous (ARMAX) data [20]. However, Moeeni and Bonakdari [21] indicated that the time series models are inadequate for nonlinear hydrological problems such as suspended sediment load modeling.

The AI models are a fast, cost-effective, and appropriate predicting approach that does not require detailed physical information. Their process for obtaining and loading the data is partly simple with high predicting accuracy [22]. In recent years, artificial neural networks (ANNs), fuzzy-based models, support vector machine (SVM), and support vector regression (SVR) have been employed for predicting the SSL [2325]. Mustafa et al. [26] used a multilayer perceptron (MLP) with four different training algorithms to forecast the suspended sediment discharge. Results showed that the Levenberg–Marquardt (LM) indicated a better performance than other training algorithms.

Despite such broad usage of ANN models, the models provided unsatisfactory results in some engineering problems. In previous studies, the combined form of AANs, known as hybrid ANNs, has been extensively employed to solve such problems. ARMAX-ANN was used to estimate SSL. The proposed model provided better accuracy in comparison with the ARMAX and ANN models [27]. Adib and Mahmodi [28] predicted SSL by incorporating the ANN and genetic algorithm (GA). They found that the hybrid model is more effective than the ANN.

In the case of the predicting SSL models, it has been observed that a sort of models such as fuzzy logic [2931] or linear genetic programming (LPG) [32, 33] can solely predict the SSL with high accuracy. Despite applying these models in the SSL prediction, to improve the prediction accuracy and quality, similar to the artificial neural networks, the other AI methods can be employed in the hybrid form. Generally, the hybrid models based on the fuzzy logic and ANNs could be trained faster and adaptive than the solo application of the ANNs or fuzzy logic. Samet et al. [34] investigated the prediction performance of the ANN, adaptive neuro-fuzzy inference system (ANFIS), and AAN-GA in forecasting SSL. Results indicated that the ANFIS model indicated the best prediction performance compared to other models.

In addition to the hybrid fuzzy logic and ANN model, it is common to use support vector machine (SVM). The SVR method structure is more straightforward than fuzzy and ANN models that enhance the predicting model [35, 36]. As a result, the SVR method can solve problems across hydrological datasets such as small sample sizes, nonlinearity, and high dimensionality [37, 38]. These profits of SVR make the method a popular option for simulating and predicting the SSL in river and sediment transport studies. The summary of studies that used the support vector machine model to predict suspended sediment load is given in Table 1.

Although the SVR application has various advantages, it has some unknown parameters in its structure, which drastically affect SSL prediction accuracy. To solve this fundamental limitation of SVR, a method is required to apply the optimization algorithm [50, 51]. Due to this fact, researchers are still looking for a robust, reliable model that can solve the complex problem of suspended sediment transport using AI models. In this way, this study enhances the SVR model’s performance by combining the SVR and observer-teacher-learner-based optimization (OTLBO). The optimization algorithm (OTLBO) is employed to determine the optimal parameters of the support vector machine. OTBLO is a heuristic algorithm introduced by Shahrouzi et al. [52]. Furthermore, the second model based on genetic algorithm and SVR is developed and named SVR-GA. Finally, the models are employed to evaluate the SSL of at Cham Siah River catchment in Iran.

Although an increasing trend is observed in applying artificial intelligence models in the literature to estimate the SSL, as far as authors know, the model developed in the present study, SVR-OTLBO, has not been used in water engineering concerns and estimation of SSL in particular.

Due to the lack of information on suspended sediment load data in watersheds of Iran, continuous sediment data are not available. However, the rainfall and river discharge data are available in the form of time series. Hence, a dataset of rainfall and river discharge with different lead times is employed as predictors to simulate SSLs selected based on available observed events. Hence, the model, which is developed based on available observed SSL events, simulates the daily sediment.

The main objectives of this study are as follows: (i) considering a new hybrid intelligence model (SVR-OTLBO model) for the suspended sediment load estimation, (ii) evaluation of the predictability of the developed model in one of the Iranian rivers (i.e., Cham Siah River) despite the lack of the sediment information, and (iii) developing a predictive model using river discharge and rainfall as the main factors in the sediment suspension load.

2. Case Study and Data Collection

To examine the proposed models’ performance, the Cham Siah River catchment in Kohgiluyeh and Boyer-Ahmad Province, southwest of Iran, is used as a case study area. Cham Siah River catchment, which is shown in Figure 1, covers an area of 793 km2. The average annual rainfall and river discharge of the catchment are 623.5 mm and 8.02 m3/s. Its minimum and maximum heights are 600 to 1500 meters; the average slope of the subbasin is 26.9%, and the annual sediment volume is 328711 tons/year.

The daily hydrological data of the catchment, including the discharge, rainfall, and SSL recorded from 1986 to 2015, are used for prediction modeling. The daily rainfall data are obtained from two rain gauges, namely, Saeed Abad, which is located at 50°43ʹ05ʺE, 30°41ʹ 34ʺN with an altitude of 690 m, and Dehdasht, which is located at 50°34ʹ27ʺE, 30°47ʹ24ʺN with an altitude of 840 m. Besides, the daily river discharge and event-based sediment data are provided from the Saeed Abad station, which is located at 50°43ʹ37ʺE, 30°43ʹ21ʺN with an altitude of 663 m.

To provide the predictive models, observational data are divided into training dataset (68%) and testing dataset (32%). In this study, to ensure that all variables receive equal consideration during the training of the models, all the variables are rescaled from to and their dimensions are removed [14] using the following equation:

3. Methods

3.1. Support Vector Regression

Support vector regression (SVR) can be used for classification and regression problems [53]. SVR can perform a linear classification for two-dimensional space. Moreover, the data with higher SVR variables can be implicitly mapped into higher-dimensional space using a nonlinear map function. In this context, the main equation of the method is as follows:where indicates the function between target and input variables, is the m-dimensional weight vector, is the mapping function that maps x into the m-dimensional feature vector, and b is the bias term.

SVR investigates a hard margin for a classifier. Using the following equation (i.e., equation (2), which is called primal problem), the hard margin can be converted to a soft margin. The objective function of SVR converts to a minimizing problem.where C is the penalty, and are slack variables, W is the weight of vector, m is the number of inputs, x is the input variables, yi is the observational target variable, and is the insensitive loss function.

As the results show a rational value, avoiding inappropriate results, some constraints could be inserted into the above equation’s objective function. To consider the constraints, the primal problem alters to the following equation:where , , , and are Lagrange coefficients, which are multiplied by the constraints. Through applying the Lagrange function and KKT condition, equation (3) can be converted to the dual problem as equation (5) where the terms b, ,, and are neglected:

In equation (4), is the kernel function. Thus, equation (1) can be rewritten as follows:

The output values are computed based on the values of the obtained parameters, i.e., b, , α, C, and the kernel function parameters. In this study, a radial basis function (RBF) in the form of equation (8) has been chosen as the kernel function:where is the kernel parameter.

3.2. Description of Optimization Methods

To obtain the optimistic values of the SVR parameters (e.g., , and C), the observer-teacher-learner-based optimization (OTLBO) and genetic algorithm (GA) methods are used. OTLBO is a powerful metaheuristic optimization method that was firstly introduced by Shahrouzi et al. [52]. The OTLBO is worked based on the impact of teacher and observer on the learners in a class. Precisely, the method consists of the three primary phases of the education: (i) via teacher (known as teacher stage), (ii) via interplay with observer (known as observer stage), and (iii) via interplay with the other learners (known as learner stage) [54]. In OTLBO, a set of learners is considered as population. This means that all comparatively designed variables are supposed to class members in this optimization algorithm. Furthered details of OTLBO were given by Shahrouzi et al. [52].

The GA, which is employed in this study, is a popular method in evolutionary computation studies to solve optimization problems. The GA has an initial population that is randomly generated, and each population member includes chromosome. The chromosome represents a possible solution, including genes. Also, the method has a step, which is known as the iteration loop. In the iteration loop, a new population is generated using selection, crossover, and mutation operations. At each stage, the new and old populations are selected based on the objective function’s value [55].

3.3. Description of Hybrid SVR Models

As mentioned above, the SVR parameters, including the value of ε, σ, and C, are considered decision variables, which require to be optimized through OTLBO or GA in an objective function. In the case of SVR-OTLBO, the development of the process is composed of the following steps:(i)The initial values of the decision variables (e.g., , and C) are randomly determined.(ii)The SVR model predicts the initial target values based on the training data. The value of the objective function, which is the correlation coefficient between observed and predicted target values in this study, is computed.(iii)The teacher or observer phase is randomly selected to determine the SVR parameters: ε, σ, and C. Consequently, the objective function is calculated and learner phase is started, and the objective function is evaluated as the same as the previous randomly selected phase.(iv)The best solution is updated.(v)The steps mentioned above are repeated to satisfy the termination criterion.

The above step-by-step algorithm of the SVR-OTLBO model for predicting the SSL is presented in Figure 2.

The second hybrid model is based on the SVR-GA method. The following steps are used to develop the model:(i)The initial decision variables (first population) are randomly determined.(ii)The SVR model computes the initial target variable; consequently, the initial value of the objective function is calculated.(iii)The crossover operator is used to generate the offspring and new parameters of SVR. Later, the SVR model is employed to predict the target variable and evaluate the offspring according to objective function values.(iv)The mutation operator is applied to generate the mutant population. The SVR model is then used to simulate the target variable and assess the mutant population based on objective function values.(v)The population is sorted, and the repository of the member is updated.(vi)The steps mentioned above are repeated to satisfy the termination criterion.

The SVR-GA modeling to predict the SSL is shown in Figure 3.

3.4. Assessing the Prediction Performance

Five indices are used to evaluate the prediction performance of the SVR, GA-SVR, and OTLBO-SVR models. These indices are composed of Pearson correlation coefficient (R) [56], relative root mean squared error (RRMSE) [57, 58], Willmott’s Index (WI) [59], ratio of performance to IQ (RPIQ) [60], and modified index of agreement (md) [61] as follows:where the Oi and Xi are the observed SSL and predicted SSL, is the average of the observed SSL and is the average of the predicted SSL, and are the first (25%) and third (75%) values of the samples, and m is the number of samples.

3.5. Uncertainty Analysis

To evaluate the uncertainty of the models (SVR, SVR-GA, and SVR-OTLBO), the confidence limits of prediction errors () are described as follows [62]:where and are the mean and standard deviation of prediction errors, respectively. is the standard normal variable at the of significant level. A predictive model with a positive value of provides overestimated prediction while a negative value of indicates the underestimated result.

4. Results and Discussion

4.1. Description of the Input Combinations

It is essential to explore the best input combinations for predicting the target variable as a first step. A number of feature selection methods including Pearson correlation, autocorrelation function (ACF), partial autocorrelation function (PACF), and cross-correlation function (CCF) can be used to obtain the optimal predictive variables [23, 63]. However, the Pearson correlation is a simple and effective method to estimate appropriate input variables [6466]. Herein, the input combinations are identified by calculating the correlation between the SSL on the origin day (t) and inputs variables, including the river discharge (Qs) from origin day to four days earlier (t − 4) and the rainfall depth (Rs and Rd) from origin day to six days earlier (t − 6).

Table 2 presents the correlation coefficients obtained between the SSL(t) and the input variables.

As shown in the table, the highest correlation between the SSL(t) and input variables are found in five variables including Rs(t), Rd(t), Qs(t), Rd(t − 1), and Rs(t − 1). Moreover, a decreasing trend is observed between the lag time of predictive variables and correlation coefficients. For instance, the correlation coefficients obtained by Rs(t) and Rs(t − 4) are 0.59 and 0.01, respectively.

Several input combinations are adopted using the variables nominated by correlation analysis, namely, M1 to M5 (Table 3). The river discharge is ignored through two combinations (M1 and M2) to assess the impact of the other predictive variables on prediction performance. It should be highlighted that, as explained in the methodology, the input combinations and output data are normalized using equation (1).

4.2. Assessment of the Models’ Performance

To compare the prediction performance of the models used in this study (e.g., SVR, SVR-OTLBO, and SVR-GA), their metric indices obtained for different input combinations over the testing phase are presented in Table 4. From the table, it is evident that, for the SVR models, SVR-M4 provides better performance (RRMSE: 1.08, R: 0.95, RPIQ: 0.3229, md: 0.4338, and WI: 0.97). In the case of SVR-OTLBO models, the lowest RRMSE (0.537) is observed in the SVR-OTLBO-M4. The highest value of R (0.9769) is found in the SVR-OTLBO-M3, while the SVR-OTLBO-M4 provides the highest WI (0.9812), RPIQ (0.9201), and md (0.7411). In general, the SVR-OTLBO-M4 offers better prediction performance compared to other SVR-OTOBO models. Regarding the SVR-GA, values reported in Table 4 indicate that the lowest RRMSE (0.562) is seen in the SVR-GA-M4 model while the highest R (0.97) and WI (0.979), RPIQ (0.8521), and md (0.7323) are observed, respectively, in the SVR-GA-M5 and SVR-GA-M4 models.

The metrics obtained from different predictive models (e.g., SVR, SVR-OTLBO, and SVR-GA models) confirm that the M4 input combination, which comprises the Rs(t), Rd(t), Qs(t), and Rd(t − 1), is the best combination for predicting the SSL(t). Hence, the predictive models, including SVR-M4, SVR-OTLBO-M4, and SVR-GA-M4, are nominated for further assessment.

To find out the best-fit model among all the models nominated in the present study, the heat map diagram (Figure 4) is used as a visual comparing tool. The diagram employed the different normalized metrics to compare the cells with one and zero values, respectively, indicating the highest and lowest performance. Figure 4 demonstrates that the SVR-OTLBO-M4 has the best performance in both training and testing phases.

To further explore the performance of the models, the scatter plots of the estimated SSL using the models and the measured SSL are shown in Figure 5 for both training and testing phases. Also, the values of R2 for the selected models have been reported. Based on Figure 5, it is clear that the proposed hybrid models are generally closer to the best line than the SVR model. Furthermore, the SVR-OTLBO-M4 provides the highest values of the R2 in both training (R2 = 0.958) and testing (R2 = 0.953) phases.

To investigate the simulated data changes, the box plot is employed. The box plot of the simulated SSL for the selected models is shown in Figure 6. As some can see, the minimum and maximum of the SSL50 are, respectively, in the observed data () and model (). The relative difference between the observed data and the selected models is 215%, 56%, and 47%, respectively, associated with the SVR-M4, SVR-OTLBO-M4, and SVR-GA-M4 models. These values demonstrate that the hybrid models are closer to the observed data compared to the SVR model. Moreover, Figure 6 shows that the minimum and maximum IQR measurements are referred to the observed data (81.5 mg/l) and SVR-GA-M4 model (112.95 mg/l). The relative difference in the observed data and the SVR-M4, SVR-OTLBO-M4, and SVR-GA-M4 models is, respectively, 13.8%, 12.5%, and 38.5%. This observation depicts that the SVR-OTLBO has minimum changes compared to the other models.

In this study, the Taylor diagram is applied to combine several statistical criteria, including the standard deviation, correlation coefficient, and RSME [67]. Indeed, the main aim of the Taylor diagram is to discover the nearest predictive model with benchmark record data (in the present study, the observed SSL). The Taylor diagrams of the selected models are shown in Figure 7. Figure 7 shows that the simulated data by the SVR-OTLOB-M4 model is nearer to the observed data compared to the SVR-M4 or SVR-GA model in both the training and testing phases. Hence, the performance of the SVR-OTLBO model is higher compared to the other predictive models.

Previous studies have been widely used the SSL with different lags as input features to predict the SSL. However, providing a predictive model based on the other hydrological variables such as discharge or rainfall depth is more efficient due to the simplicity in measuring discharge and rainfall. To confirm this fact, the hybrid model proposed in the current study effectively estimates the suspended sediment load based on the rainfall and discharge data as input features.

4.3. Assessment of the Models’ Uncertainty

To measure the uncertainty of selected models in the present study, values at the 5% of significant level over the testing phase are presented in Table 5.

Table 5 shows that the models provide the underestimated prediction values of suspended sediment load. Further, the lowest uncertainty band (312.0) is detected in the results gained by the SVR-OTLBO model. Mutually, the SVR model offers the highest uncertainty (369.2). This finding is consistent with the results attained from the performance metric that the SVR-OTLBO model has the highest advantage compared to other models used in the present study.

4.4. Assessment of the Proposed Models against Literature Models

Although the newly developed hybrid model in the current study, SVR-OTLBO, successfully predicts SSL, it is interesting to compare its performance with those obtained in the other studies. Sadeghpourhaji et al. [39] investigated WSVM and SVM models for forecasting SSL, obtaining , respectively. Rashidi et al. [43] developed two predictive models GT-SVM and SVM (RBF kernel), gaining , respectively. Nourani et al. [42] executed SVM for forecasting SSL, gaining . Kumar et al. [41] applied ANN, LASVR, multilinear regression (MLR), classification and regression tree (CART), and M5 to predict SSL, obtaining , respectively. Buyukyildiz and Kumcu [23] predicted daily SSL using scaled conjugate gradient (SCG) algorithm, radial basis neural network (RBNN), generalized regression neural network (GRNN), ANFIS-GP, ANFIS-GC, and SVR, obtaining , respectively. Hssanpour et al. [46] developed a hybrid model based on FCM-SVR for predicting SSL, achieving . Therefore, it can be found that the hybrid models reported in this research have better predictive performance compared to the SVR-based models in the literature.

5. Summary and Conclusions

The present study focused on the development of a hybrid model to estimate the suspended sediment load. For this purpose, the hydrometry and hydroclimatology data of the Cham Siah basin composed of river discharge, SSL, and rainfall depth data are employed. The support vector regression method is used to predict the SSL. As a number of parameters in the SVR are unknown, two hybrid models are developed. These hybrid models are composed of SVR-GA and SVR-OTLBO. In this study, five input variables are investigated. Also, five predictive models are designed for each SVR, SVR-GA, and SVR-OTLBO model. The correlation of SSL and the inputs variable is evaluated to identify the most significant input variables. Furthermore, five indices of RRMSE, R, WI, RPIQ, and md are employed to determine the best performance of models.

In general, the following findings are obtained in this study:(i)Among all the SVR models, the performance of SVR-M4 is the highest. The SVR-OTLBO-M4 has the best performance compared with the other SVR-OTLBO models, and SVR-GA-M4 is the best-fit model among all the SVR-GA models.(ii)Among those models with the highest performance, the SVR-OTLBO-M4 has the highest performance in both testing and training phases.(iii)The hybrid models’ predicted data are closer to observational data compared with the SVR model’s output data. Besides, the SVR-OTLBO-M4 is the nearest predicted model with observational data.(iv)Feature selection based on correlation methods is an inadequate approach due to the complexity of hydrological phenomena such as sediment. Using metaheuristic algorithms is an appropriate method in selecting features and finding the best input combinations. This limitation can be solved in the future by developing a multiobjective optimization model based on the OTLBO algorithm.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.