Abstract:
Earth Science is a discipline that heavily relies on data, yet it is not fully harnessing the advantages of Earth data with existing technological means though covers many subject areas, Knowledge Graphs (KGs) is widely recognized as an effective approach to fully harness and utilize the extensive data in this field. Earth Science Knowledge Graphs can integrate geoscience knowledge, enhance research efficiency, and facilitate interdisciplinary collaboration. By analyzing network connections and semantic relationships, they uncover knowledge associations and patterns, andaid researchers in identifying new domains and posing novel research questions. Unlike conventional advancements in large-scale modeling technologies, Knowledge Graph offers precise knowledge that enhances both the intelligence and dependability of generated outcomes from such models.
Firstly, this study provides a detailed exposition of Knowledge Graph concepts and construction methods. Knowledge Graphs, as a form of data graph, are designed to collect and convey knowledge from the real world. Their universal expression is in the form of triples, consisting of head entities, tail entities, and the relationships between them. Knowledge Graphs have emerged as a significant approach for organizing structured knowledge and integrating information from multiple data sources in the organizational world. Their architectural framework primarily encompasses four components: source data acquisition, knowledge fusion, knowledge computation, and knowledge application. Source data acquisition stands as the primary step in building Knowledge Graphs, focusing on extracting useful information from various types of data. Knowledge fusion is pivotal in addressing the heterogeneity of different Knowledge Graphs, with the aim of enhancing their quality through integration. Knowledge computation represents the primary output capability of Knowledge Graphs, currently applied in fields such as semantic search, question answering, and visualization analysis. Knowledge Graph construction technology enables the extraction of information from structured, unstructured, and semi-structured data sources, organizing this information into knowledge and presenting it in graphical form. Presently, the construction of Knowledge Graphs in the field of Earth Sciences primarily employs two methods: Top-down and bottom-up approaches, with the overarching principle being the synthesis of both methods while allowing flexibility in their specific sequencing.
Secondly, this study offers a comprehensive introduction to the widely applied top-level ontology, the Basic Formal Ontology (BFO) model, in the scientific domain. The paper briefly summarizes existing Knowledge Graph in the geoscience field, emphasizing the GeoCore Ontology and Geoscience ontology (GSO) in the Earth Science domain, highlighting their similarities and differences. BFO, comprising 38 classes, is designed to facilitate information integration, retrieval, and analysis in scientific research. Presently, BFO has been successfully employed in over 350 ontology projects worldwide. The GeoCore Ontology, built upon BFO, serves as a specialized framework to describe the core concepts within the domain of Earth Science, rigorously defining a set of universal geological concepts during its development. Conversely, GSO provides a systematic framework for representing crucial geological science knowledge, encompassing three hierarchical layers: foundational, geological, and detailed modules. GeoCore can be viewed as an intermediary layer within GSO, which can be further expanded, while detailed modules have already been constructed within GSO. Additionally, researchers worldwide employ various methods such as literature mining, domain expert interviews, and data mining techniques to extract Earth Science knowledge from relevant literature, databases, and open data, subsequently to construct Knowledge Graphs. These Knowledge Graphs are found in applications across various domains including geological exploration, natural disaster prediction, and environmental conservation, and are utilized in practical projects such as oil and gas exploration, water resource management, and climate change research. In summary, the application scope of Earth Science Knowledge Graphs is extensive, providing a crucial foundation of data and knowledge for scientific research, decision support, and sustainable development.
Finally, the study introduces international Earth Science data science initiatives such as the Deep-time Digital Earth (DDE) project related to constructing Earth Science Knowledge Graph, and the challenges and application prospects for the future development of Earth Science Knowledge Graph, with a focus on seismic science. The DDE aims to connect and coordinate global deep-earth data, promoting the sharing of geoscientific knowledge worldwide and facilitating research on Earth's evolution in a data-driven manner. Apart from the DDE, numerous domestic and international organizations and initiatives are driving the development of Knowledge Graph in Earth Science, such as OneGeology, EarthCube, and LinkedGeoData projects. Despite facing various challenges, Knowledge Graph is gradually overcoming these hurdles with advancements in technology and tools. These challenges are not exclusive to the field of Earth Science but are prevalent across all Knowledge Graph construction endeavors. However, due to the complexity and diversity of Earth Science, Knowledge Graph construction in this field encounters unique difficulties. Nevertheless, there is ample room for the creation and application of Knowledge Graph in Earth Science, with the introduction of Large Language Models (LLMs) bringing forth new opportunities. Earthquake Science, as a crucial branch of Earth Science, encompasses intersections of multiple primary disciplines such as geology, geophysics, and engineering seismology. However, the application of Knowledge Graphs in the field of Earthquake Science still faces significant gaps and urgently requires further research building upon existing models. In conclusion, the future development of Earth Science Knowledge Graphs will be an ongoing process of evolution and refinement, bringing more opportunities and benefits for fields such as Earth Science research, decision-making, and public education through sustained technological innovation and interdisciplinary collaboration.