基于机器学习的厚覆盖土层建筑场地类别评价

王喆恺; 谭慧明; 高志兵

doi:10.11939/jass.20220176

摘要: 针对因测量等误差对等效剪切波速计算的影响而造成的场地类别容易因单个因素稍有变化即发生的类别改变问题，从江苏省盐城地区收集了大量厚覆盖土层情况下的标准贯入值、深度、剪切波速等相关现场试验数据，利用机器学习方法进行训练建模，研究多特征值模型解决厚覆盖土层情况下场地分类问题的能力。结果表明：随机森林模型的分类精度在加入“等效变异系数”后可达97.7%，且其泛化能力以及对样本总体的判断能力均优于支持向量机模型，该模型为厚覆盖土层建筑场地类别的判断提供了一种新的方式。将二次判断结果与勘探报告结果对比，结果证明该随机森林模型可用于场地分类变化问题的二次判断，为避免工程现场在类似情况下出现过于保守的判断提供了可靠的依据。

Abstract: In response to the problem that the category of the site is easily changed due to slight changes in a single factor caused by measurement and other errors in the calculation of equivalent shear wave velocity, a large amount of relevant field test data such as standard penetration value, depth, and shear wave velocity were collected under thick overburdens of Yancheng area of Jiangsu Province. Machine learning methods were used for training and modeling, and the ability of multi eigenvalue models to solve site classification problems under thick overburdens was studied. The results showed that through feasibility analysis, the accuracies of the logistic regression model, the support vector machine model and the random forest model were 0.809, 0.939, 0.951, respectively. Considering the accuracy gap between each two models of the above three, the support vector machine algorithm and the random forest algorithm were selected as the optimal algorithms for building the model. In order to consider the integrity of the entire borehole as much as possible, this paper proposes a parameter called as “equivalent coefficient of variation”, which effectively improves the accuracy of the model. Subsequently, when establishing the support vector machine model, the classification performance of linear, polynomial, and Gaussian kernels was compared, and the Gaussian kernel function was ultimately selected for model building. The accuracy of the obtained support vector machine model was 0.951. When establishing a random forest model, the classification performance of the model was tested by setting different numbers of decision trees. Finally, 150 decision trees were selected to build the model, and the accuracy of the obtained random forest model was 0.977. From the results, the accuracy of the support vector machine model and the random forest model are 95.1% and 97.7%, respectively, with recall rates of 98.2% and 97.3%. The AUC （area under curve） values of both models are 0.98. Therefore, while the classification performance of the random forest model is not inferior to that of the support vector machine model, it has a higher adaptability to the sample data, and the recall and accuracy of the random forest model are similar, that is, the model’s judgment on the sample population is more balanced. In summary, the above random forest model is optimal to solve the problem studied in this paper, and can provide reliable basis for determining the category of sites with thick overburdens. Therefore the random forest model was used to determine the site category of 75 sets of data in the critical sample of this study. The results showed that 61 sets were consistent with the judgment results of the exploration report, while 14 sets were different from the judgment results of the exploration report. Moreover, the model’s judgment on class Ⅲ sites was completely consistent with the exploration report. All above proves that the model not only has excellent judgment ability in non-critical situations, but also maintains good judgment ability when used to solve problems in critical situations. Therefore, this model can make secondary judgments for similar engineering problems and provide effective reference basis. Based on the random forest model, the judgment results are output in sequence, and are organized and verified according to the original drilling information. It is found that in the judgment on the critical sample, the model correctly judged eight drilling holes, and only two drilling holes had different site classification judgments from the exploration report, all of which were classified as class Ⅳ drilling sites in the report and were classified as class Ⅲ drilling sites in the model. In practical engineering, judgments made on site for safety reasons are often conservative, and such judgments are magnified as two different site classification results near the boundary. This can explain the significant divergence between the model and exploration report’s judgments on class Ⅳ sites.

基于机器学习的厚覆盖土层建筑场地类别评价

Classification evaluation of construction sites with thick overburden based on machine learning