Abstract:
The long-term observation data of the geomagnetic field based on the geomagnetic stations (networks) are of great value for studying the spatio-temporal variation rules, characteristics, also and the field source information of the geomagnetic field. However, due to infrastructure and human activities (such as high-speed rail, highways, power grids,
etc) as well as sudden instrument failures, there are interferences and missing observation data in some time periods for some observation stations. Therefore, this paper utilizes the XGBoost machine learning method to reconstruct the observation data of some stations with severe interference and missing data based on the high-quality observation data of existing stations in their surrounding areas. The results of simulation experiments show that the reconstruction residuals of geomagnetic field components are lower than 0.1 nT whether in magnetically quiet days or in disturbed days. Further comparative analysis of the experimental statistics and residual curve illustrates that the reconstruction accuracy mainly depends on the geomagnetic activity and the time-variable complexity of the signals to be reconstructed, and in addition the reconstruction accuracy by XGBoost method is higher than that by the BP neural network. This research suggests that, the reconstruction method by XGBoost machine learning has an advantage in dealing with nonlinear complex signals, and thus can be effectively applied to reconstruct the observation data of some stations with severe interference and missing data.