大数据时代的地球科学知识图谱研究现状与展望

Review and perspective of Earth Science Knowledge Graph in Big Data Era

  • 摘要: 地球科学是一门数据密集型学科,而知识图谱则被认为是充分利用和发挥地球科学领域庞大数据的一种有效方式。相对于通用大模型技术的发展,知识图谱能提供更为准确的知识,增强生成式大模型结果的智能性和可靠性。本文首先阐释了知识图谱的相关概念和构建方法;然后,对在科学领域广泛应用的顶层本体基本形式化本体模型进行了详细介绍,简单总结了地球科学领域已经建成的知识图谱,并着重论述了地球科学领域的地质学核心本体和地质科学本体,以及两者的异同;最后,介绍了与地球科学知识图谱构建相关的深时数字地球计划等国际地学数据科学计划,并对未来地学知识图谱发展和地震科学知识图谱构建发展所面临的挑战和应用前景进行了展望。

     

    Abstract:
    Earth Science is a discipline that heavily relies on data, yet it is not fully harnessing the advantages of Earth data with existing technological means though covers many subject areas, Knowledge Graphs (KGs) is widely recognized as an effective approach to fully harness and utilize the extensive data in this field. Earth Science Knowledge Graphs can integrate geoscience knowledge, enhance research efficiency, and facilitate interdisciplinary collaboration. By analyzing network connections and semantic relationships, they uncover knowledge associations and patterns, andaid researchers in identifying new domains and posing novel research questions. Unlike conventional advancements in large-scale modeling technologies, Knowledge Graph offers precise knowledge that enhances both the intelligence and dependability of generated outcomes from such models.
    Firstly, this study provides a detailed exposition of Knowledge Graph concepts and construction methods. Knowledge Graphs, as a form of data graph, are designed to collect and convey knowledge from the real world. Their universal expression is in the form of triples, consisting of head entities, tail entities, and the relationships between them. Knowledge Graphs have emerged as a significant approach for organizing structured knowledge and integrating information from multiple data sources in the organizational world. Their architectural framework primarily encompasses four components: source data acquisition, knowledge fusion, knowledge computation, and knowledge application. Source data acquisition stands as the primary step in building Knowledge Graphs, focusing on extracting useful information from various types of data. Knowledge fusion is pivotal in addressing the heterogeneity of different Knowledge Graphs, with the aim of enhancing their quality through integration. Knowledge computation represents the primary output capability of Knowledge Graphs, currently applied in fields such as semantic search, question answering, and visualization analysis. Knowledge Graph construction technology enables the extraction of information from structured, unstructured, and semi-structured data sources, organizing this information into knowledge and presenting it in graphical form. Presently, the construction of Knowledge Graphs in the field of Earth Sciences primarily employs two methods: Top-down and bottom-up approaches, with the overarching principle being the synthesis of both methods while allowing flexibility in their specific sequencing.
    Secondly, this study offers a comprehensive introduction to the widely applied top-level ontology, the Basic Formal Ontology (BFO) model, in the scientific domain. The paper briefly summarizes existing Knowledge Graph in the geoscience field, emphasizing the GeoCore Ontology and Geoscience ontology (GSO) in the Earth Science domain, highlighting their similarities and differences. BFO, comprising 38 classes, is designed to facilitate information integration, retrieval, and analysis in scientific research. Presently, BFO has been successfully employed in over 350 ontology projects worldwide. The GeoCore Ontology, built upon BFO, serves as a specialized framework to describe the core concepts within the domain of Earth Science, rigorously defining a set of universal geological concepts during its development. Conversely, GSO provides a systematic framework for representing crucial geological science knowledge, encompassing three hierarchical layers: foundational, geological, and detailed modules. GeoCore can be viewed as an intermediary layer within GSO, which can be further expanded, while detailed modules have already been constructed within GSO. Additionally, researchers worldwide employ various methods such as literature mining, domain expert interviews, and data mining techniques to extract Earth Science knowledge from relevant literature, databases, and open data, subsequently to construct Knowledge Graphs. These Knowledge Graphs are found in applications across various domains including geological exploration, natural disaster prediction, and environmental conservation, and are utilized in practical projects such as oil and gas exploration, water resource management, and climate change research. In summary, the application scope of Earth Science Knowledge Graphs is extensive, providing a crucial foundation of data and knowledge for scientific research, decision support, and sustainable development.
    Finally, the study introduces international Earth Science data science initiatives such as the Deep-time Digital Earth (DDE) project related to constructing Earth Science Knowledge Graph, and the challenges and application prospects for the future development of Earth Science Knowledge Graph, with a focus on seismic science. The DDE aims to connect and coordinate global deep-earth data, promoting the sharing of geoscientific knowledge worldwide and facilitating research on Earth's evolution in a data-driven manner. Apart from the DDE, numerous domestic and international organizations and initiatives are driving the development of Knowledge Graph in Earth Science, such as OneGeology, EarthCube, and LinkedGeoData projects. Despite facing various challenges, Knowledge Graph is gradually overcoming these hurdles with advancements in technology and tools. These challenges are not exclusive to the field of Earth Science but are prevalent across all Knowledge Graph construction endeavors. However, due to the complexity and diversity of Earth Science, Knowledge Graph construction in this field encounters unique difficulties. Nevertheless, there is ample room for the creation and application of Knowledge Graph in Earth Science, with the introduction of Large Language Models (LLMs) bringing forth new opportunities. Earthquake Science, as a crucial branch of Earth Science, encompasses intersections of multiple primary disciplines such as geology, geophysics, and engineering seismology. However, the application of Knowledge Graphs in the field of Earthquake Science still faces significant gaps and urgently requires further research building upon existing models. In conclusion, the future development of Earth Science Knowledge Graphs will be an ongoing process of evolution and refinement, bringing more opportunities and benefits for fields such as Earth Science research, decision-making, and public education through sustained technological innovation and interdisciplinary collaboration.

     

/

返回文章
返回