May. 06, 2024
The rapid development of urban areas in recent decades has led municipal authorities around to world to relocate urban transportation infrastructures underground [1–4]. In Taiwan, underground public transport systems have been developed and expanded in major cities such as Taipei, Taoyuan, and Kaohsiung. In Taipei alone, all heavy-load transportation routes through the city have been relocated underground and five major metro lines with a total operating mileage of 131.2 km have been constructed and are in operation since 1996. Construction work underground is mostly conducted in small and confined spaces subject to numerous uncertainties, making work here much more difficult than aboveground [5–7]. Besides, with the exception of subway stations, most of the below-ground subway system infrastructure is built using the shield tunneling method, in which a tunnel boring machine (TBM) simultaneously excavates the soil ahead, removes the excavated material, and installs a supporting shield structure to stabilize the newly excavated tunnel section. However, underground construction is not only challenging but also risky. During shield tunneling, factors such as changes in stress, tail void closures, disturbed soil compaction, and lining segment deformation can displace lateral soil layers, leading to the ground settling, bulging, or experiencing lateral displacement [8]. Therefore, while the TBM is in operation, a safety monitoring system must be active. This system collects site data and supervises TBM maneuvers to prevent excessive ground settlement, which can damage existing urban infrastructure and buildings and trigger disastrous accidents [3, 5, 9–11].
However, data on settlement generated by the safety monitoring system alert users to settling that has already occurred and are thus useful only for developing and implementing post-deformation remedies that prevent a situation from worsening [2, 11]. Shield tunneling safety would benefit greatly from a database created with limited monitoring data and soil layer parameters that may be used to predict settlement conditions, provide early warnings of deformation, and increase reaction times [1, 12–14]. With this aim in mind, the settlement monitoring data for Tender CG291 of the Songshan Line of the Taipei MRT system were collected in this study. In this monitoring data, safe-level data entries far outnumber alert-level data entries, creating an imbalanced dataset. Classification models based on ordinary classification techniques can result in serious bias in class forecasting when imbalanced processing data [15] which renders inference models based on artificial intelligence (AI) unable to classify scarce data with accuracy. For this reason, effectively processing imbalanced data to prevent forecasting bias is critical for AI-based inference models.
Few researchers have developed AI models for use as autonomous integrated systems to predict ground settlement in tunnel construction. Thus, in this study, a novel advancement of symbiotic organisms search-least squares support vector machine (SOS-LSSVM) and data balancing methods is proposed to help predict settlement and help project decision-makers to prevent geotechnical disasters. The developed model is at the forefront of efforts to integrate metaheuristics, AI techniques, and data balancing methods to automatically and accurately predict shield-tunnel settlement. For this purpose, factors influencing settlement were investigated, and historical monitoring data were gathered for the training on AI. This prediction model is expected to be useful for design and construction agencies in predicting settlement, thereby helping them adopt preventive measures against settlement. Thus, the objectives of this study are as follows:(1)Identifying influential factors for settlement in shield tunneling: the literature on settlement estimation was reviewed for possible influential factors, which were further tested using statistical methods by the software SPSS.(2)Conducting resampling for imbalanced data: two methods were applied against imbalanced data, probability distribution data balance sampling (PDDBS), and synthetic minority oversampling technique (SMOTE).(3)Establishing a model for settlement prediction: the proposed settlement prediction model for shield tunneling was developed using symbiotic organisms search-least squares support vector machine (SOS-LSSVM).(4)Verifying the effectiveness of the proposed model: the prediction results of SOS-LSSVM and another four AI-based models were compared to determine the best performer based on prediction accuracy. Also, the receiver operating characteristic (ROC) curve and the area under the curve (AUC) were used to evaluate the classification accuracy of the data balanced by PDDBS and SMOTE. Thus, the proposed model has been verified to solve the data imbalance problem effectively.
In this study, SOS-LSSVM was integrated with the data balance sampling method to create a shield-tunnel settlement prediction system optimized to help prevent ground-settlement-related disasters during tunnel construction. The system, based in the construction control center, utilizes automatically collected and wirelessly transmitted monitoring data to forecast tunnel settlement status in real-time. When predicted settlement levels exceed the warning value, engineers may take appropriate actions to prevent disaster.
In Taiwan, shield tunneling has been in use for over 31 years since its debut in 1976; through the years, TBMs have seen considerable improvements, from the most primitive open-face manual types to the later mechanical, slurry pressure balanced, and earth pressure balanced types. Because of the lack of slurry deposit yards or facilities, the Rapid Transit System in Taipei mostly employs earth pressure balanced TBMs, except for the Xindian Line (CH22), which uses two slurry pressure balanced machines. Shield tunneling would result in ground settlement having negative impacts on the adjacent structures [5, 16]. The soil layer and surface displacements caused by shield tunneling are related to the type and diameter of the TBM, excavation depth, site condition, soil properties, and groundwater level. When a TBM is advancing, if the thrust force against the tunnel face is lower than the static earth pressure of the soil layers, the soil releases its stresses along the tunnel face and rushes toward the tunnel face because the soil layers are under active earth pressure. This leads to ground loss and results in settlement. If the thrust force is equal to the static earth pressure of the soil layers, the tunnel face becomes static. Furthermore, if the thrust force is greater than the static earth pressure of the soil layers, the soil along the tunnel face is pressed forward, causing the ground to bulge. Ground settlement during shield TBM tunneling develops in the following steps: (1) before and during tunnel face excavation, (2) during the passage of the shield skin plate, and (3) after installation of segmental lining and backfill grouting [17].
According to previous studies, various factors contribute to ground settlements, such as geometrical, geological (e.g., the strength characteristics and the overconsolidation ratio of the soil), and shield operational parameters [4, 7, 8, 11–14, 18]. Fargnoli et al. summarized that face support pressure, grouting pressure, machine stoppage time, and installation time for one-ring tunnel lining were essential parameters to predict surface settlement [2]. Luo et al. also indicated that the groundwater condition is an important factor because shield tunneling would cause pore water pressure variation [18]. The fill factor of grouting and grouting pressure was identified as the most affecting parameters when applying an AI-based algorithm to predict settlements [14].
Establishing a settlement prediction model is necessary for underground construction safety. Analytical, empirical, and numerical methods were proposed to predict settlement and other tunnel deformations. The most important weakness of such proposed methods is that they fail to consider all parameters contributing to the settlement (e.g., ground condition, operational parameters, and tunnel geometry) [14]. Also, because the process around shield TBM tunneling is complicated, most of the studies could not provide statistically meaningful relationships between the volume loss and operation parameters [17].
Recently, some researchers have successfully used AI-based algorithms to establish a model for predicting the settlement induced by shield tunneling, such as artificial neural networks (ANNs), fuzzy logic (FL), support vector machine (SVM), and gene expression programming (GEP) [7, 14]. Wang et al. successfully applied an adaptive relevance vector machine (aRVM) to predict real-time settlement development [9]. Bouayad and Emeriault proposed a methodology that combines the principal component analysis (PCA) with an adaptive neuro-fuzzy-based inference system (ANFIS) to model the nonlinear relationship between ground surface settlements induced by an earth pressure-balanced TBM [7].
Symbiotic organisms search-least squares SVM (SOS–LSSVM) was developed by Cheng and Proyogo [19] and proved to be reliable in prediction tasks [20–22]. SOS-LSSVM uses an advanced metaheuristic to search optimal parameters and identify the correlations between input and output variables from the historical case data to establish inference models. Previous studies also identified that the SOS method exhibited excellent performance [19, 23, 24]. In addition to SOS-LSSVM, this study also applied backpropagation neural network (BPNN), least squares support vector machine (LSSVM), evolutionary least squares support vector machine inference model (ELSIM) [25, 26], and SVM to estimate the settlements for comparison.
Data imbalance refers to one class of samples in a dataset overwhelming another class; this has serious consequences in classification. Generally, the term “minority” (MI) is used to refer to the class of scarce samples in the dataset and “majority” (MA) for the dominant class [27]. For example, when a dataset contains 95% majority class samples and 5% minority class samples, an inference model will tend to classify all of the samples as the majority class and achieve 95% accuracy; however, its accuracy for the minority class will be 0%. This bias is caused by the characteristics and limitations of AI, which requires a large amount of evenly distributed data for training and testing to achieve satisfactory forecasting results.
Once the distribution of imbalanced data is skewed, an AI-based inference model trained on them will also produce skewed results accordingly. The major measures to solve the data imbalance problem are undersampling and oversampling. Besides, this study also introduces a sampling method that utilizes probability distribution to balance data and improve the classification accuracy.
Undersampling is a technique that decreases the number of MA samples for the balance of a training dataset. It reduces the number of MA samples until the MA class is the same in size as the MI class. Undersampling is superior to oversampling for the training of imbalanced data; however, this approach can eliminate some potentially useful training samples; hence, it lowers the performance of the classifier.
Excessive MA samples could be eliminated through random selection to balance out the two classes. To avoid uncertainty pertaining to random undersampling, Kubat and Matwin proposed an alternative undersampling approach that they considered more appropriate. To mitigate data imbalance, they removed the redundant data in the MA class, followed by removing the borderline samples close to the boundary of the MA and MI classes as well as the noisy data [28].
Oversampling increases the number of MI samples for the balance of a training dataset. It increases the number of MI samples until the MI class is of the same size as the MA class. As an approach against data imbalance, it is highly popular, and it is effective for the training of imbalanced data. However, because oversampling introduces some high-precision samples into the dataset, the result is often a lengthy training time or even overtraining.
In addition to random oversampling, the synthetic minority oversampling technique (SMOTE) was used in this study. Unlike random oversampling, which duplicates the MI class to expand the sample size, SMOTE generates synthetic samples by adopting linear interpolations between two near samples. Specifically, SMOTE identifies and calculates the difference between MI samples using the nearest one, then multiplies the difference using a random value between 0 and 1, and then adds it to the MI class via the generation of a new MI sample class.
This chapter addresses how the influential critical factors for settlement in shield tunneling were identified. These factors serve as the input variables for the proposed model, which uses SOS-LSSVM and relies on historical case data for training and testing to determine the optimal mapping of input and output variables, thereby predicting the settlement of tunnels. The flowchart is illustrated in Figure 1.
Step 1. Identify influential preliminary factors
Review studies on shield tunneling and list the reasons that are attributed as the cause for settlement. The ones that are mentioned more frequently will be identified as preliminary influential factors. Then, implement SPSS on the preliminary influential factors to determine the factors to be included.
Step 2. Collect and establish the case dataset
Collect case data according to the required input and output variables and thus establish a complete case dataset that provides the input data.
Step 3. Balance the dataset
A total of 999 data were collected for the present study, of which 75 were of alert level; therefore, the data were imbalanced. To overcome this problem, this study proposed a new data balancing method: probability distribution data balance sampling (PDDBS). There are two types of probability distribution data balance sampling (PDDBS): PDDBS oversampling and PDDBS median sampling, as shown in Figure 2. PDDBS oversampling balances a dataset by increasing the MI samples to the same amount of MA samples. By contrast, PDDBS median sampling simultaneously increases MI samples and decreases MA samples to the median total sample size to achieve balance in the dataset [29].(1)PDDBS oversampling procedure (Figure 2(a)) Step a: select one type of attribute data from the dataset and calculate its sample size and R (MI). The number of samples of R (MI) that must be added to the MI class is determined as follows: Step b: divide the MI class ni (MI) into k intervals. Step c: calculate the probability of an interval, as shown in Figure 3. The conversion equation for the normal distribution of the sample is The probability of an interval is Step d: calculate the number of samples S that must be increased in an interval (Figure 3). The formula for S is Step e: generate the values and add them to the MI class. The formula to increase S samples in Step d is Step f: examine whether the sample sizes are balanced. Examine if the classes in the dataset are equal in size. If not, they will require balancing again; if they are, the dataset is considered balanced.(2)PDDBS median sampling procedure (Figure 2(b)) Step a: select one type of attribute data from the dataset and calculate its sample size, R (MI), and R (MA). The number of samples R (MI) that must be added to the MI class is The number of samples R (MA) that must be detracted from the MA class is Step b: divide the MI class ni (MI) into k intervals and the MA class ni (MA) into k2 intervals. The number k1 can be calculated as The number k2 can be calculated as Step c: calculate the probability of an interval as shown in Figures 4 and 5. The conversion equation for the normal distribution of sample is The probability of an interval is Step d: calculate the number of small-class samples S1 and reduce the number of multiclass samples S2 for each interval. The equation for the number of S1 (Figure 4) that must be increased for the small number of samples in each interval is The equation for the number of S2 that must be reduced for multiple types of samples in each interval is Step e: generate values and add samples. The equation for increasing the number of S1 samples in step d is which directly reduces the S2 samples calculated in step d from the multiclass samples. Step f: confirm that the sample sizes are balanced and the classes in the dataset are equal in size. If not, they must be balanced again. In the preceding equations, , , , , , , , , , , , , , , and .Four methods, including PDDBS oversampling, PDDBS median sampling, SMOTE oversampling, and SMOTE median sampling, were implemented to thoroughly examine their performance in dealing with imbalanced classification based on their respective advantages and disadvantages [30–32]. PDDBS provides larger numbers of replicated minority samples but increases the likelihood of overfitting, while SMOTE reduces the risk of overfitting but tends to exclude helpful information. Also, while oversampling minimizes information loss and generates equal numbers of minority and majority class samples, the process may overfit the classifier. Finally, although the use of the median in median sampling to punish differences in nominal features associated with typical differences in continuous eigenvalues provides an effective theoretical model to remove noise and redundant samples, its sampling performance on the same datasets may be poor.
If you are interested in sending in a Guest Blogger Submission,welcome to write for us!
All Comments ( 0 )