Farnaz Daneshvar Vousoughi; Rasoul Samadzadeh
Abstract
1-IntroductionNowadays water resources management is a vitally important task and is the optimum planning of irrigation projects, and the development and exploitation of water resources especially during drought and flood events are strictly dependent on the accuracy of the used rainfall-runoff modeling ...
Read More
1-IntroductionNowadays water resources management is a vitally important task and is the optimum planning of irrigation projects, and the development and exploitation of water resources especially during drought and flood events are strictly dependent on the accuracy of the used rainfall-runoff modeling tool. Therefore, different models have been already developed and employed for modeling rainfall-runoff processes of the watersheds (Partovian et al., 2017).The wavelet-based pre-processing approach in the present study was used in the modeling of runoff time series via ANN. Furthermore, the impacts of denoising (smoothing) and wavelet transform have been simultaneously investigated in the accuracy of runoff prediction for one month ahead at the outlet of Ardabil plain.2-Methodology2-1-Case of the StudyThe plain of Ardabil (38 – 38 N and 47 – 48 E), locatedin north-western Iran, covers an area of about 990 km2 (see Fig. 1). In the present study, the trend analysis was carried out on the rainfall (P) and runoff (R) parameters for three stations including Samian (PS, RS), Gilandeh (PG, RG), and Kozatopraghi (PK, RK)) located in the Ardabil plain from 1977 to 2019. The data sampling has been reported in the one-month interval at all of the stations. Figure 2 shows the locations of the rainfall and runoff stations. In this study, five combinations of input data were consumed for runoff prediction as to the following:Comb. 1: RS(t), RS(t-1), PS(t); Comb. 2: RS(t), RS(t-12), RS(t-24), PS(t); Comb. 3: RK(t), RG(t), RG(t-1), PG(t-12), RK(t-12) Fig. (1): Case of the study and the position of rainfall and runoff stations.2-2-Artificial Neural Network (ANN) Three-layered feed-forward backpropagation, which is usually used in forecasting hydrologic time series, provides a general framework for representing the nonlinear functional mapping between a set of input and output variables.2-3-Wavelet transform (WT)In hydrological problems, the time series are usually in the discrete but continuous format; therefore, the discrete WT was used in the following form (Mallat, 1998):(2) 2-4-Wavelet based de-noisingWavelet de-noising technique is operated as follows: (1) an applicable mother wavelet and several resolution level methods are selected. An approximation subseries at the resolution level L and detailed sub-series at different resolution levels are decomposed from main time series xi (2) The absolute amounts of detailed-sub-series, which exceed the values of the fixed threshold are changed by the difference between the values of threshold and detailed sub-series.2-5- Efficiency criteria in runoff predictionTwo different criteria were used to measure the efficiency of the proposed forecasting methods; the root means square error(RMSE) and the determination coefficient (DC). 3-Results and DiscussionSome temporal features may also exist in the runoff time series due to their highly non-stationary fluctuations. To handle such features, wavelet-based temporal pre-processed data were entered into the ANNs to improve the accuracy of runoff modeling. WT and wavelet-based de-noising approaches were used for modeling the rainfall-runoff process via the ANN model. The Daubechies-4(db4) mother wavelet, which is almost similar to the runoff signal could capture the features of the signal, especially peak values, thus, it was selected as the mother wavelet for the decomposition of the runoff time series in this study. The decomposition of runoff time series at level L yields L+1 sub-signals (one approximation sub-signal, Pa(t) and L detailed sub-signals, Pdi(t) (i=1, 2, …, L)). Decomposition level 3 was considered as the optimum decomposition level. Each of the decomposed sub-series of the runoff demonstrated a specific seasonal feature of the process. In WT-ANN (WANN) model, decomposed sub-series accompanied by the rainfall and runoff data of each compound were used in the FFNN to predict one-month-ahead runoff values at the outlet of Ardabil plain (Samian station). In the second stage, the runoff time series were denoised via WT, and the denoised runoff data were used to predict the runoff at Samian station for one month ahead. Finally, the ANN model was compared with ANN models using pre-processing inputs.The results of three models for one-step-ahead runoff forecasting at Samian station have been presented in Table 1. Results indicated that better accuracy was comprised with another model via the WANN model in the comb. 3. WANN models via comb. 3 used the runoff data of Gilandeh and Kozatopragi that lied in the upstream and showed accurate performance. These demonstrated Gilandeh and Kozatopragi runoff time series played an important role in Samian runoff modeling. Accuracy improvement in the WANN model was 17%, 3.5%, and 35% combs. 1, 2, and 3 of inputs. The ANN model with denoised inputs showed little improvement (1, 6, and 6.2 percent in combs. 1, 2, and 3 of data) in runoff modeling at the outlet of the plain. Table (1): The results of ANN and SVM models for one-step-ahead predictionsInput combinationOutputvariableModel Type DCRMSE (Normalized) CalibrationVerificationCalibrationVerification1RS(t+1)ANNANN with denoised dataWANN 0.5920.5940.7910.4340.4380.5870.0650.0650.0470.0510.0520.0442RS(t+1)ANNANN with denoised dataWANN 0.7950.8130.7910.5670.6010.5870.0430.0390.0470.04340.0420.0443RS(t+1)ANNANN with denoised dataWANN 0.9070.8310.8800.7300.7750.8540.0290.0400.0320.0350.0320.0264-ConclusionsIn this study, the wavelet-based denoised data and WT were employed in ANN for rainfall-runoff modeling at the outlet of Ardabil plain using data pre-processing techniques. Accordingly, first, it was sought to smooth the hydrological time series by eliminating the outliers and large noises of raw observed time series, which may be due to human or tool measurement error or systematic error. Then, different sub-series were generated by decomposing runoff time series and used to train the ANN model for rainfall-runoff modeling. Using processed and unprocessed data, the obtained results were compared; this comparison indicated the merit of applied data pre-processing approaches due to robust identification of hidden patterns in data so that the developed models could simulate and predict runoff values with lower margin of error and higher confidence and the best results were achieved by employing the decomposed runoff data via WT having different training time series with the same components of original time series. For future study, it is recommended to examine the efficiency of the proposed data pre-processing method in the rainfall-runoff modeling of other watersheds since it is expected that the merit of the method is more highlighted where the quality of the collected data is blurred due to the technical limitations. Furthermore, it is suggested to evaluate the efficiency of the proposed method in modeling the process at other time scales and for modeling other hydrological processes which may involve distinct noise levels and patterns regarding the type of process.Keywords: Runoff modeling, Wavelet Transform, Wavelet-based de-noising, Artificial Neural Network (ANN), Ardabil plain5-References Donoho, D.H., 1995. Denoising by soft-thresholding. IEEE Transactions on Information Theory. 41(3):613–617.Mallat, S.G., 1998. A Wavelet Tour of Signal Processing, second ed. Academic Press, San Diego. Partovian, A., Nourani, V., Aalami, M.T., 2016. Optimizing Neural Network for Monthly Rainfall-Runoff Modeling with Denoised-Jittered Data. Journal of Tethys. 4(4), 284–294.
Farnaz Daneshvar Vousoughi; Vahid Manafianazar
Volume 5, Issue 17 , March 2019, , Pages 45-64
Abstract
Abstract
Groundwater has played an important role in the urban and rural water supply and agriculture. In order to manage water resources, an accurate and reliable groundwater level forecasting is needed. In this research, 15 piezometers in Ardabil plain were used. SVM was applied for a prediction method ...
Read More
Abstract
Groundwater has played an important role in the urban and rural water supply and agriculture. In order to manage water resources, an accurate and reliable groundwater level forecasting is needed. In this research, 15 piezometers in Ardabil plain were used. SVM was applied for a prediction method in one month-step-ahead. Clustering tool and Wavelet Transform (WT) as spatial and temporal pre-processing and an artificial neural system for modeling were also used. The results showed that the values of R2 coefficients in calibration and verification of prediction were respectively 0.94 and 0.89. On the other hand, the application of the WT to groundwater level data increased the performance of the model up to 3% and 5% for calibration and verification parts. The performance of the SVM model was compared to the proposed combined WT–ANN and ANN models. The results showed that the values of R2 coefficients in calibration and verification of prediction were respectively 0.94 and 0.88. The application of the WT to groundwater level data increased the performance of the model up to 3% and 7% for calibration and verification parts. The results obtained by the SVM model showed the improved performance of modeling and its combination with WT showed the best performance in the pre-processing of the modeling. Also the results of the ANN and hybrid WT-ANN models yielded good performance. Also, the results of the hybrid WT-ANN models showed slightly better results than the ANN model in some clusters.
Introduction
Recently, Artificial Intelligence (AI) approach, as a new generation of robust tools, has been developed for time series forecasting purpose. As such forecasting tools, Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been extensively employed at different engineering fields. Among such AI models, the capability of the commonly used ANN models to approximate nonlinear mappings between inputs and outputs makes it a useful tool for modeling hydrological phenomena. However, ANN-based modeling may include some shortcomings, such as over fitting, convergence to local minima and slow training, which make it difficult to achieve adequate efficiency when dealing with complex hydrological processes [12]. Support Vector Machine (SVM), proposed in [13], is one of the most persuasive forecasting tools as an alternative method to ANN. SVM is based on the structural risk minimization principle and Vapnik–Chervonenkis theory, and involves solving a quadratic programming problem; thus, it can theoretically get the global best consequence of the primal problem.
In recent decades, SVMs have been implemented in several hydrological fields and in groundwater levels. In this paper, the conjunction of SVM and the wavelet-based data pre-processing was examined by proposed Wavelet-SVM (WSVM) in modeling groundwater level for one month ahead. The proposed models were also compared with single SVM, ANN and Wavelet-ANN (WANN) models. The plain of Ardabil (38 – 38 N and 47 – 48 E), located in the north-west of Iran, covers an area of about 990 km2. In this plain, 15 piezometers (wells) are operated to measure the GWLs. The data sampling has been reported in one-month intervals for all of the piezometers. The plain is equipped with one runoff gauge at the outlet and 6 rain gauges within the watershed. Fig. 2 shows the position of piezometers as well as rainfall and runoff gauging stations. The monthly rainfall, runoff, and GWL data were available from 1988 to 2012 and used in this study. About 18 years of data were used for the training, and the remaining 7 years for the validation.
Support Vector Machine
SVM as a powerful methodology was used for solving problems in non-linear classification, function estimation, and density estimation. Via SVM, a non-linear function can be shown as:
(1)
where f indicates the relationship between the input and output, w is the m-dimensional weight vector, φ is the mapping f unction that maps x into the m-dimensional feature vector and u is the bias term.
Artificial Neural Network (ANN)
ANN is widely applied in hydrology and water resource studies as a forecasting tool. In ANN, feed– forward back–propagation (BP) network models are common to engineers. The Feed forward neural network (FFNN) is widely applied in hydrology and water resource studies as a forecasting tool. Three-layered FFNNs, which have usually been used in forecasting hydrologic time series, provide a general framework for representing nonlinear functional mapping between a set of input and output variables.
The explicit expression for an output value of a three layered FFNN is given by (Kim and Valdes, 2003):
(2)
where i, j and k respectively denote the input layer, hidden layer and output layer neurons. wji is a weight in the hidden layer connecting the i th neuron in the input layer and the j th neuron in the hidden layer, wjo is the bias for the j th hidden neuron, fh is the activation function of the hidden neuron, wkj is a weight in the output layer connecting the j th neuron in the hidden layer and the k th neuron in the output layer, wko is the bias for the k th output neuron, fo is the activation function for the output neuron, xi is i th input variable for input layer and k and y are computed and observed output variables, respectively. NN and MN are respectively the number of the neurons in the input and hidden layers. The weights are different in the hidden and output layers, and their values can be changed during the network training process.
Wavelet transform (WT)
The WT has enlarged in occupation and popularity in recent years since its inception in the early 1980s, but the widespread usage of the Fourier transform has yet to occur (Grossman and Morlet, 1984).
In real hydrological problems, the time series are usually in the discrete format rather continues and, therefore, the discrete WT in the following form is usually used (Mallat, 1998):
(3)
where m and n are integers that respectively control the wavelet dilation and translation; a0 is a specified fined dilation step greater than 1; and b0 is the location parameter and must be greater than zero. The most common and simplest choice for parameters are a0 = 2 and b0 = 1. This power-of-two logarithmic scaling of the dilation and translation is known as the dyadic grid arrangement.
Self Organizing Map (SOM)
SOM is an effective software tool for the visualization of high-dimensional data. It implements an orderly mapping of a high-dimensional distribution onto a regular low-dimensional grid. Thereby, it is able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display while preserving the topology structure of the data (Kohonen, 1997). The way SOMs go about reducing dimensions is by producing a map of usually 1 or 2 dimensions which plot the similarities of the data by grouping similar data items together.
The SOM is trained iteratively: initially the weights are randomly assigned. When the n-dimensional input vector x is sent through the network, the distance between the weight w neurons of SOM and the inputs is computed. The most common criterion to compute the distance is the Euclidean distance (Kohonen, 1997):
(4)
Results and Discussion
The results of the proposed one-step-ahead GWL modeling using pre-processed data by SVM and WT-SVM were given. The SVM-based results were also compared with those of the ANN-based model.
Results of clustering
Due to the existence of various piezometers over the Ardabil plain and the importance of managing groundwater resources, it is a necessity to unite the adequate information about GWLs in various regions of the plain and identify the dominant piezometers to predict GWL conditions of the plain in the future. In order to accomplish the spatial clustering, an SOM was utilized to identify similar and predominant piezometers. The SOM classifies the similar piezometers (with similar temporal patterns and seasonalities) into the same classes.
The clustering results of piezometers into 5 clusters are shown in Table 1. It is clear that clustering was achieved in the direction of main stream flow and probably groundwater flow regime was parallel with the surface water toward the outlet in the northwest of the plain. To evaluate the performance of the clustering results produced by SOM, the Silhouette coefficient was used as a measure of cluster validity. The Euclidean distance was then utilized to select the centroid piezometer of each cluster, which was the best representation of the GWL pattern of the cluster.
Table (1) The results of clustering
Cluster NO.
Piezometers
Silhouette Coefficient
Central Piezometer
Cluster 1
P4, P9
0.42, 0.34
P4
Cluster 2
P2, P12
0.46, 0.72
P12
Cluster 3
P1, P8, P11
0.45, 0.58, 0.11
P8
Cluster 4
P6, P7, P10, P14
0.41, 0.62, 0.40, 0.54
P7
Cluster 5
P3, P5, P13, P15
0.65, 0.71, 0.53, 0.51
P5
Results of SVM and ANN
The results of one-step-ahead for all 5 central piezometers of clusters are shown in Table 2. As mentioned previously, for each ANN, the dominant input variables (column 2, Table 2) were determined by linear correlation, in which Pi(t) and Ij(t) respectively indicate the GWL and rainfall time series of central piezometer i and rainfall gauge of j. Q(t) is the outflow time series from the outlet of basin. The results of one-step-ahead indicated that all of the models produced acceptable outcomes, and confirm the appropriate identification of the representative GWL patterns over the watershed. Cluster 1 did not show reliable results because the Silhouette coefficient of P4 had a lower value than 0.5, which shows that cluster 1 had a weak structure.
Piezometers in cluster 3 showed better results than cluster 1, despite the large utilization in the region which was due to being close to the outlet of the plain and accumulation of water of other regions near the outlet area. Other clusters showed superior results since they were near the supplying and recharging resources and in the highlands of plain. Therefore, the spatial clustering not only can enhance the modeling performance by grouping the similar time series within the same clusters but also it can identify the piezometers and regions with irrelevant data due to artificial and/or external impacts on the system.
Table 2 Results of ANN and SVM models for one-step-ahead predictions
Cluster NO.
Input variable
Output
variable
Model Type
R2
RMSE (Normalized)
Calibration
Verification
Calibration
Verification
Cluster 1
P4(t),
P4(t-1),
I4(t),
Q(t)
P4(t+1)
SVM
ANN
0.977
0.977
0.958
0.951
0.006
0.006
0.005
0.005
Cluster 2
P12(t), P12(t-1),
Q(t)
P12(t+1)
SVM
ANN
0.944
0.935
0.86
0.869
0.041
0.044
0.035
0.034
Cluster 3
P8(t),
P8(t-1),
I3(t-1),
Q(t-2)
P8(t+1)
SVM
ANN
0.99
0.996
0.99
0.992
0.023
0.015
0.015
0.014
Cluster 4
P7(t),
P7(t-1),
I4(t-1),
Q(t-1)
P7(t+1)
SVM
ANN
0.819
0.832
0.667
0.677
0.038
0.037
0.023
0.022
Cluster 5
P5(t),
P5(t-1),
Q(t-1)
P5(t+1)
SVM
ANN
0.955
0.97
0.94
0.94
0.006
0.005
0.004
0.004
Results of WANN and WSVM models
In addition to spatial patterns, some temporal features may also exist in the GWL process due to highly non-stationary fluctuations of the time series. To handle such features, wavelet-based temporal pre-processed data were entered into the ANNs or SVM in order to improve the modeling accuracy. The hybrid model, Wavelet-ANN (WANN) and Wavelet-SVM (WSVM), were simultaneously designed to catch the non-linear GWL modeling. Due to the structure of the Daubechies-4(db4) mother wavelet which is almost similar to the GWL signal, it could capture the signal’s features, especially peak values, and was selected as the mother wavelet for the decomposition of the GWL time series in this study. The decomposition of the main GWL time series at level L yields L+1 sub-signals (one approximation sub-signal, Pa(t) and L detailed sub-signals, Pdi(t) (i=1, 2, …, L)). The decomposition level 3 was considered as the optimum decomposition level. The decomposed sub-series of GWL (each resolution demonstrating a specific seasonal feature of the process) accompanied by the rainfall and runoff data of each cluster were used in the FFNN and SVM models in order to predict one-month-ahead GWL values. The results of WANN and WSVM models for one-step-ahead forecasting are presented in Table 3. The WANN and WSVM results of one-step-ahead showed that the performance of models for all clusters were accurate during both training and verification periods. According to Table 3, the results obtained by the WANN model show the improved performance of modeling in comparison to the ANN modeling. It is clear from the performance criteria that all WSVM yielded slightly better results than the WANN (except for clusters 1 and 5 in scenario 2).
Table 3 Results of WANN and WSVM models for one-step-ahead predictions
Cluster NO.
Input variable
Output
variable
Model Type
R2
RMSE (Normalized)
Calibration
Verification
Calibration
Verification
Cluster 1
Pi4(t),
I4(t),
Q(t)
P4(t+1)
WSVM
WANN
0.993
0.988
0.973
0.975
0.003
0.005
0.004
0.004
Cluster 2
Pi12(t),
Q(t)
P12(t+1)
WSVM
WANN
0.962
0.968
0.901
0.916
0.033
0.031
0.029
0.027
Cluster 3
Pi8(t),
I3(t-1),
Q(t-2)
P8(t+1)
WSVM
WANN
0.997
0.997
0.995
0.995
0.013
0.013
0.011
0.011
Cluster 4
Pi7(t),
I4(t-1),
Q(t-1)
P7(t+1)
WSVM
WANN
0.898
0.922
0.822
0.861
0.028
0.025
0.017
0.015
Cluster 5
Pi5(t),
Q(t-1)
P5(t+1)
WSVM
WANN
0.979
0.971
0.967
0.963
0.004
0.005
0.003
0.003
Concluding Remarks
In this paper, ANN based models were developed for GWL forecasting over the plain of Ardabil, in the north-west of Iran. The inputs of the AI models were monthly rainfall, runoff, and GWL at 15 piezometers over the study area. Data pre-processing via SOM and WT were shown to be useful tools in improving AI based GWL forecasting models. The proposed methodology was applied to Ardabil plain data to find one-month-ahead forecasts of GWL. As a result, the entire study area was divided into five clusters with SOM clustering scheme and then AI modeling was performed separately for each cluster. In order to improve model efficiency and consider seasonality effects, the WT which can capture the multi-scale features of a signal, was used to decompose GWL time series into different sub-signals at different levels. The sub-signals were then used as inputs of the AI models to predict GWLs. Overall, the results of this study provide promising evidence for combining spatial and temporal data pre-processing methods, and more specifically SOM and WT methods, to forecast GWL values using the AI method. One of the advantages of the proposed method is that by using a clustering method it is possible to identify piezometers and regions with good and bad data quality. In order to complete the current study, it is recommended to use the presented methodology to forecast the GWL by adding other hydrological time series and variables (e.g., temperature and/or evapotranspiration) to the input layer of the model. Moreover, due to the uncertainty of the rainfall process and the ability of the Fuzzy concept to handle uncertainties, the combination of the ANN and fuzzy inference system (FIS) models as an adaptive neural-fuzzy inference system (ANFIS) model, could provide useful results. It would also be useful to apply the proposed methodology in other heterogeneous groundwater systems in order to investigate the overall effect of the climatic conditions on the performance of the proposed model.