Document Type : پژوهشی
Authors
 Farnaz Daneshvar Vousoughi ^{} ^{1}
 Vahid Manafianazar ^{2}
^{1} Assistant professor, Dept. of Water Resources Engineering, Faculty of Civil Eng., Univ. of Ardabil Islamic Azad, Iran
^{2} M.Sc., Dept. of Water Resources Engineering, Faculty of Civil Eng., Univ. of Ahar Islamic Azad, Iran
Abstract
Abstract
Groundwater has played an important role in the urban and rural water supply and agriculture. In order to manage water resources, an accurate and reliable groundwater level forecasting is needed. In this research, 15 piezometers in Ardabil plain were used. SVM was applied for a prediction method in one monthstepahead. Clustering tool and Wavelet Transform (WT) as spatial and temporal preprocessing and an artificial neural system for modeling were also used. The results showed that the values of R2 coefficients in calibration and verification of prediction were respectively 0.94 and 0.89. On the other hand, the application of the WT to groundwater level data increased the performance of the model up to 3% and 5% for calibration and verification parts. The performance of the SVM model was compared to the proposed combined WT–ANN and ANN models. The results showed that the values of R2 coefficients in calibration and verification of prediction were respectively 0.94 and 0.88. The application of the WT to groundwater level data increased the performance of the model up to 3% and 7% for calibration and verification parts. The results obtained by the SVM model showed the improved performance of modeling and its combination with WT showed the best performance in the preprocessing of the modeling. Also the results of the ANN and hybrid WTANN models yielded good performance. Also, the results of the hybrid WTANN models showed slightly better results than the ANN model in some clusters.
Introduction
Recently, Artificial Intelligence (AI) approach, as a new generation of robust tools, has been developed for time series forecasting purpose. As such forecasting tools, Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been extensively employed at different engineering fields. Among such AI models, the capability of the commonly used ANN models to approximate nonlinear mappings between inputs and outputs makes it a useful tool for modeling hydrological phenomena. However, ANNbased modeling may include some shortcomings, such as over fitting, convergence to local minima and slow training, which make it difficult to achieve adequate efficiency when dealing with complex hydrological processes [12]. Support Vector Machine (SVM), proposed in [13], is one of the most persuasive forecasting tools as an alternative method to ANN. SVM is based on the structural risk minimization principle and Vapnik–Chervonenkis theory, and involves solving a quadratic programming problem; thus, it can theoretically get the global best consequence of the primal problem.
In recent decades, SVMs have been implemented in several hydrological fields and in groundwater levels. In this paper, the conjunction of SVM and the waveletbased data preprocessing was examined by proposed WaveletSVM (WSVM) in modeling groundwater level for one month ahead. The proposed models were also compared with single SVM, ANN and WaveletANN (WANN) models. The plain of Ardabil (38 – 38 N and 47 – 48 E), located in the northwest of Iran, covers an area of about 990 km2. In this plain, 15 piezometers (wells) are operated to measure the GWLs. The data sampling has been reported in onemonth intervals for all of the piezometers. The plain is equipped with one runoff gauge at the outlet and 6 rain gauges within the watershed. Fig. 2 shows the position of piezometers as well as rainfall and runoff gauging stations. The monthly rainfall, runoff, and GWL data were available from 1988 to 2012 and used in this study. About 18 years of data were used for the training, and the remaining 7 years for the validation.
Support Vector Machine
SVM as a powerful methodology was used for solving problems in nonlinear classification, function estimation, and density estimation. Via SVM, a nonlinear function can be shown as:
(1)
where f indicates the relationship between the input and output, w is the mdimensional weight vector, φ is the mapping f unction that maps x into the mdimensional feature vector and u is the bias term.
Artificial Neural Network (ANN)
ANN is widely applied in hydrology and water resource studies as a forecasting tool. In ANN, feed– forward back–propagation (BP) network models are common to engineers. The Feed forward neural network (FFNN) is widely applied in hydrology and water resource studies as a forecasting tool. Threelayered FFNNs, which have usually been used in forecasting hydrologic time series, provide a general framework for representing nonlinear functional mapping between a set of input and output variables.
The explicit expression for an output value of a three layered FFNN is given by (Kim and Valdes, 2003):
(2)
where i, j and k respectively denote the input layer, hidden layer and output layer neurons. wji is a weight in the hidden layer connecting the i th neuron in the input layer and the j th neuron in the hidden layer, wjo is the bias for the j th hidden neuron, fh is the activation function of the hidden neuron, wkj is a weight in the output layer connecting the j th neuron in the hidden layer and the k th neuron in the output layer, wko is the bias for the k th output neuron, fo is the activation function for the output neuron, xi is i th input variable for input layer and k and y are computed and observed output variables, respectively. NN and MN are respectively the number of the neurons in the input and hidden layers. The weights are different in the hidden and output layers, and their values can be changed during the network training process.
Wavelet transform (WT)
The WT has enlarged in occupation and popularity in recent years since its inception in the early 1980s, but the widespread usage of the Fourier transform has yet to occur (Grossman and Morlet, 1984).
In real hydrological problems, the time series are usually in the discrete format rather continues and, therefore, the discrete WT in the following form is usually used (Mallat, 1998):
(3)
where m and n are integers that respectively control the wavelet dilation and translation; a0 is a specified fined dilation step greater than 1; and b0 is the location parameter and must be greater than zero. The most common and simplest choice for parameters are a0 = 2 and b0 = 1. This poweroftwo logarithmic scaling of the dilation and translation is known as the dyadic grid arrangement.
Self Organizing Map (SOM)
SOM is an effective software tool for the visualization of highdimensional data. It implements an orderly mapping of a highdimensional distribution onto a regular lowdimensional grid. Thereby, it is able to convert complex, nonlinear statistical relationships between highdimensional data items into simple geometric relationships on a lowdimensional display while preserving the topology structure of the data (Kohonen, 1997). The way SOMs go about reducing dimensions is by producing a map of usually 1 or 2 dimensions which plot the similarities of the data by grouping similar data items together.
The SOM is trained iteratively: initially the weights are randomly assigned. When the ndimensional input vector x is sent through the network, the distance between the weight w neurons of SOM and the inputs is computed. The most common criterion to compute the distance is the Euclidean distance (Kohonen, 1997):
(4)
Results and Discussion
The results of the proposed onestepahead GWL modeling using preprocessed data by SVM and WTSVM were given. The SVMbased results were also compared with those of the ANNbased model.
Results of clustering
Due to the existence of various piezometers over the Ardabil plain and the importance of managing groundwater resources, it is a necessity to unite the adequate information about GWLs in various regions of the plain and identify the dominant piezometers to predict GWL conditions of the plain in the future. In order to accomplish the spatial clustering, an SOM was utilized to identify similar and predominant piezometers. The SOM classifies the similar piezometers (with similar temporal patterns and seasonalities) into the same classes.
The clustering results of piezometers into 5 clusters are shown in Table 1. It is clear that clustering was achieved in the direction of main stream flow and probably groundwater flow regime was parallel with the surface water toward the outlet in the northwest of the plain. To evaluate the performance of the clustering results produced by SOM, the Silhouette coefficient was used as a measure of cluster validity. The Euclidean distance was then utilized to select the centroid piezometer of each cluster, which was the best representation of the GWL pattern of the cluster.
Table (1) The results of clustering
Cluster NO.
Piezometers
Silhouette Coefficient
Central Piezometer
Cluster 1
P4, P9
0.42, 0.34
P4
Cluster 2
P2, P12
0.46, 0.72
P12
Cluster 3
P1, P8, P11
0.45, 0.58, 0.11
P8
Cluster 4
P6, P7, P10, P14
0.41, 0.62, 0.40, 0.54
P7
Cluster 5
P3, P5, P13, P15
0.65, 0.71, 0.53, 0.51
P5
Results of SVM and ANN
The results of onestepahead for all 5 central piezometers of clusters are shown in Table 2. As mentioned previously, for each ANN, the dominant input variables (column 2, Table 2) were determined by linear correlation, in which Pi(t) and Ij(t) respectively indicate the GWL and rainfall time series of central piezometer i and rainfall gauge of j. Q(t) is the outflow time series from the outlet of basin. The results of onestepahead indicated that all of the models produced acceptable outcomes, and confirm the appropriate identification of the representative GWL patterns over the watershed. Cluster 1 did not show reliable results because the Silhouette coefficient of P4 had a lower value than 0.5, which shows that cluster 1 had a weak structure.
Piezometers in cluster 3 showed better results than cluster 1, despite the large utilization in the region which was due to being close to the outlet of the plain and accumulation of water of other regions near the outlet area. Other clusters showed superior results since they were near the supplying and recharging resources and in the highlands of plain. Therefore, the spatial clustering not only can enhance the modeling performance by grouping the similar time series within the same clusters but also it can identify the piezometers and regions with irrelevant data due to artificial and/or external impacts on the system.
Table 2 Results of ANN and SVM models for onestepahead predictions
Cluster NO.
Input variable
Output
variable
Model Type
R2
RMSE (Normalized)
Calibration
Verification
Calibration
Verification
Cluster 1
P4(t),
P4(t1),
I4(t),
Q(t)
P4(t+1)
SVM
ANN
0.977
0.977
0.958
0.951
0.006
0.006
0.005
0.005
Cluster 2
P12(t), P12(t1),
Q(t)
P12(t+1)
SVM
ANN
0.944
0.935
0.86
0.869
0.041
0.044
0.035
0.034
Cluster 3
P8(t),
P8(t1),
I3(t1),
Q(t2)
P8(t+1)
SVM
ANN
0.99
0.996
0.99
0.992
0.023
0.015
0.015
0.014
Cluster 4
P7(t),
P7(t1),
I4(t1),
Q(t1)
P7(t+1)
SVM
ANN
0.819
0.832
0.667
0.677
0.038
0.037
0.023
0.022
Cluster 5
P5(t),
P5(t1),
Q(t1)
P5(t+1)
SVM
ANN
0.955
0.97
0.94
0.94
0.006
0.005
0.004
0.004
Results of WANN and WSVM models
In addition to spatial patterns, some temporal features may also exist in the GWL process due to highly nonstationary fluctuations of the time series. To handle such features, waveletbased temporal preprocessed data were entered into the ANNs or SVM in order to improve the modeling accuracy. The hybrid model, WaveletANN (WANN) and WaveletSVM (WSVM), were simultaneously designed to catch the nonlinear GWL modeling. Due to the structure of the Daubechies4(db4) mother wavelet which is almost similar to the GWL signal, it could capture the signal’s features, especially peak values, and was selected as the mother wavelet for the decomposition of the GWL time series in this study. The decomposition of the main GWL time series at level L yields L+1 subsignals (one approximation subsignal, Pa(t) and L detailed subsignals, Pdi(t) (i=1, 2, …, L)). The decomposition level 3 was considered as the optimum decomposition level. The decomposed subseries of GWL (each resolution demonstrating a specific seasonal feature of the process) accompanied by the rainfall and runoff data of each cluster were used in the FFNN and SVM models in order to predict onemonthahead GWL values. The results of WANN and WSVM models for onestepahead forecasting are presented in Table 3. The WANN and WSVM results of onestepahead showed that the performance of models for all clusters were accurate during both training and verification periods. According to Table 3, the results obtained by the WANN model show the improved performance of modeling in comparison to the ANN modeling. It is clear from the performance criteria that all WSVM yielded slightly better results than the WANN (except for clusters 1 and 5 in scenario 2).
Table 3 Results of WANN and WSVM models for onestepahead predictions
Cluster NO.
Input variable
Output
variable
Model Type
R2
RMSE (Normalized)
Calibration
Verification
Calibration
Verification
Cluster 1
Pi4(t),
I4(t),
Q(t)
P4(t+1)
WSVM
WANN
0.993
0.988
0.973
0.975
0.003
0.005
0.004
0.004
Cluster 2
Pi12(t),
Q(t)
P12(t+1)
WSVM
WANN
0.962
0.968
0.901
0.916
0.033
0.031
0.029
0.027
Cluster 3
Pi8(t),
I3(t1),
Q(t2)
P8(t+1)
WSVM
WANN
0.997
0.997
0.995
0.995
0.013
0.013
0.011
0.011
Cluster 4
Pi7(t),
I4(t1),
Q(t1)
P7(t+1)
WSVM
WANN
0.898
0.922
0.822
0.861
0.028
0.025
0.017
0.015
Cluster 5
Pi5(t),
Q(t1)
P5(t+1)
WSVM
WANN
0.979
0.971
0.967
0.963
0.004
0.005
0.003
0.003
Concluding Remarks
In this paper, ANN based models were developed for GWL forecasting over the plain of Ardabil, in the northwest of Iran. The inputs of the AI models were monthly rainfall, runoff, and GWL at 15 piezometers over the study area. Data preprocessing via SOM and WT were shown to be useful tools in improving AI based GWL forecasting models. The proposed methodology was applied to Ardabil plain data to find onemonthahead forecasts of GWL. As a result, the entire study area was divided into five clusters with SOM clustering scheme and then AI modeling was performed separately for each cluster. In order to improve model efficiency and consider seasonality effects, the WT which can capture the multiscale features of a signal, was used to decompose GWL time series into different subsignals at different levels. The subsignals were then used as inputs of the AI models to predict GWLs. Overall, the results of this study provide promising evidence for combining spatial and temporal data preprocessing methods, and more specifically SOM and WT methods, to forecast GWL values using the AI method. One of the advantages of the proposed method is that by using a clustering method it is possible to identify piezometers and regions with good and bad data quality. In order to complete the current study, it is recommended to use the presented methodology to forecast the GWL by adding other hydrological time series and variables (e.g., temperature and/or evapotranspiration) to the input layer of the model. Moreover, due to the uncertainty of the rainfall process and the ability of the Fuzzy concept to handle uncertainties, the combination of the ANN and fuzzy inference system (FIS) models as an adaptive neuralfuzzy inference system (ANFIS) model, could provide useful results. It would also be useful to apply the proposed methodology in other heterogeneous groundwater systems in order to investigate the overall effect of the climatic conditions on the performance of the proposed model.
Highlights

Keywords