Abstract:
The rapid development of seismic networks and the advancement of monitoring equipment have enabled the recording of various seismic events, including natural earthquakes and man-made blasting activities. Notably, nuclear explosions can also be detected through seismic monitoring, and this detection is a crucial aspect in the verification process of the Comprehensive Nuclear Test Ban Treaty. However, distinguishing between natural seismic events and those caused by blasting is challenging. Both appear as fluctuating curves on seismic records and share a striking resemblance, making manual identification resource-intensive and prone to human error, potentially leading to misjudgments and confusion in earthquake catalogs. This issue can compromise the effectiveness of earthquake early warning systems and emergency response measures. Therefore, the automated classification and discrimination between seismic events originated from natural sources and those caused by blasting are of great significance for both earth science research and national defense.
Currently automatic classification techniques predominantly rely on deep learning, which typically requires extensive labeled datasets for training. Obtaining sufficient high-quality data for nuclear explosion events can be challenging due to their unique nature, limiting the application of deep learning for this purpose. This paper focuses on the classification and discrimination of natural earthquakes and blasting with limited sample data. The test data consists of vertical component recordings of short-period natural earthquakes and nuclear explosions. These recordings are preprocessed by employing the SPA method to eliminate the trend component. Subsequently, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is utilized to extract a series of intrinsic mode functions. Wavelet thresholding is applied to reduce noise, and then the denoised components are reconstructed to generate the final signals. The preprocessed signals are expanded by translation and noise injection techniques. This process results in a final training set, which consists of 500 event signals for each category.
The features from the energy spectrum, power spectrum, and cepstrum are extracted to form a high-dimensional small-sample dataset. Given the excellent performance of the eXtreme Gradient Boosting (XGBoost) model in small-sample classification tasks, this study employs its strategy of aggregating weak classifiers. The model improves the accuracy by performing second-order Taylor expansion on the objective function, thereby retaining more target-related information. The XGBoost model is then utilized to classify natural and blasting seismic events. To address the complexity and numerous parameters in the conventional XGBoost, this paper employs the genetic algorithm (GA) to optimize three key hyperparameters that significantly impact classification accuracy: the number of iterations, maximum tree depth, and learning rate. The GA’s advantages include its independence from initial conditions, robustness, and suitability for complex optimization problems. Taking advantage of these strengths, the GA-XGBoost model is constructed.
In the tests conducted with the high-dimensional small-sample dataset, the GA-XGBoost model achieved the highest classification accuracy of 94.094% on the power spectrum feature set. When using the power spectrum feature as input, the GA-XGBoost model outperformed both LSTM and GS (grid search)-XGBoost models in accuracy. Notably, compared with the GS-XGBoost, the GA-XGBoost model improved the classification accuracy by 2.037% and reduced the runtime from 409.26 seconds to 55.48 seconds, increasing operational efficiency by over 86%. However, the preprocessing and feature extraction process presented in this paper is relatively complex and requires professional expertise. Moreover, the paper utilizes a 1500-dimensional power spectrum feature. It should be noted that different feature dimensions can impact test results. Hence, it is necessary to select the appropriate feature dimension according to the specific data and tests.
Although the tests have verified the classification effectiveness of high-dimensional features such as the power spectrum, energy spectrum, and cepstrum, the optimal features may vary with different datasets. Therefore, it is essential to conduct tests and select the most suitable features based on the available data. Finally, while the study explores hyperparameter optimization using genetic algorithm, with the emergence of new optimization algorithms, there is potential for further investigation into these algorithms for hyperparameter selection. Overall, the tests demonstrate that the GA-XGBoost model offers a balance of accuracy, stability, and efficiency, showing promise for small-sample classification tasks.