基于随机森林方法的地震损失预测

梁梓豪; 苗鹏宇; WangJianming; 王自法

doi:10.11939/jass.20220182

摘要: 针对现有的基于实际震害评估的大多研究仅限于某特定区域和某种结构类型，且所采用的数据样本量也十分有限，本文基于随机森林模型，采用2011年3月11日东日本M_W9.0地震的37万8 037条建筑物实际震害数据，利用美国应用技术协会发布的地震震害等级划分标准（ATC-13）预测了建筑物地震破坏所引起的损失，对建筑物损失的影响因素进行了特征重要性分析。结果显示：通过合成少数类过采样技术（SMOTE）解决数据不均衡和贝叶斯优化超参数之后，得到了基于随机森林的预测模型测试集的准确率为68.8%，轻微破坏、中等破坏、严重破坏、倒塌等四种破坏等级的召回率分别为65.0%，53.6%，74.8%，81.8%；考虑生命安全性能将模型转换为二分类之后，模型准确率进一步提高至87.5%，极大地改善了现有研究应用于建筑损失预测中数据样本量受限、数据不均衡等导致的最严重破坏等级精度低等问题。对随机森林模型特征重要性的研究表明：震中距、峰值加速度和v_S30是最影响模型输出的特征。

Abstract: Rapid assessment of building damage and its severity after an earthquake is crucial for emergency response and recovery. Accurate earthquake damage assessment is crucial for pre-earthquake disaster prevention and mitigation, post-earthquake disaster relief, and rapid reconstruction. Most existing studies based on actual earthquake damage assessment are limited to a specific region and a particular structure type, and the number of data samples used is also limited, resulting in subpar generalization performance for the model. Many factors affect the loss of buildings due to earthquakes. Traditional methods cannot fully consider the complex mapping relationship between the influencing factors. Therefore, finding a method to quickly and accurately assess building damage is essential. Machine learning provides a data-driven artificial intelligence method that can handle complex nonlinear relationships between input and output parameters by learning the underlying laws of big data. This paper proposes an earthquake damage prediction model based on combination of Bayesian optimization algorithm, synthetic minority over-sampling technique （SMOTE）, and random forest algorithm. The core of the Bayesian optimization algorithm takes prior knowledge into account. It can continuously update and iterate until the optimal parameter combination is fitted, solving the problem of slow efficiency of traditional parameter adjustment. The core of the SMOTE method is to generate data samples of a few categories, solving the problem of uneven distribution of data samples. Based on the random forest model, this paper uses 378 037 actual building damage data from the March 11, 2011, M_W9.0 Tohoku-Oki, Japan earthquake, comprehensively considers multidimensional building information such as ground shaking information, site information, and structural characteristics, and uses the earthquake damage classification issued by the American Applied Technical Council （ATC-13）. This model can predict the damage caused by earthquake damage to buildings and analyze the feature importance of factors affecting building damage. The results show that after using SMOTE method to solve data imbalance and the Bayesian approach to optimize hyper-parameters, the accuracy on the test set of the random forest-based prediction model is 68.8%, and the recall rates for minor damage, moderate damage, severe damage and collapse are 65.0%, 53.6%, 74.8%, and 81.8%, respectively; the accuracy of the model is further increased to 87.5% by considering the life safety performance to convert the model to dichotomous classification, which significantly improves the existing research problems in building loss prediction, such as limited data, lack of regional generalization, lack of diversity in building attributes, imprecise classification of damage levels and low accuracy of the most severe damage state. The study of the importance of random forest features showed that the epicenter distance, PGA and v_S30 have the most significant influences on the model output.The earthquake damage assessment model established by this study can achieve rapid and relatively accurate prediction of building damage caused by earthquakes, which is beneficial for pre-earthquake planning and timely rescue after the earthquake.

基于随机森林方法的地震损失预测

Earthquake loss prediction based on random forest algorithm