2023 UnifyingLargeLanguageModelsandK

(Pan, Luo et al., 2023) ⇒ Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. (2023). “Unifying Large Language Models and Knowledge Graphs: A Roadmap.” In: arXiv preprint arXiv:2306.08302. doi:10.48550/arXiv.2306.08302

Subject Headings: Knowledge Graph, LLM+KG-based Method.

Notes

Cited By

http://scholar.google.com/scholar?q=%222023%22+Unifying+Large+Language+Models+and+Knowledge+Graphs%3A+A+Roadmap

Quotes

Abstract

Large language models (LLMs), such as ChatGPT and GPT4, are making new waves in the field of natural language processing and artificial intelligence, due to their emergent ability and generalizability. However, LLMs are black-box models, which often fall short of capturing and accessing factual knowledge. In contrast, Knowledge Graphs (KGs), Wikipedia and Huapu for example, are structured knowledge models that explicitly store rich factual knowledge. KGs can enhance LLMs by providing external knowledge for inference and interpretability. Meanwhile, KGs are difficult to construct and evolving by nature, which challenges the existing methods in KGs to generate new facts and represent unseen knowledge. Therefore, it is complementary to unify LLMs and KGs together and simultaneously leverage their advantages. In this article, we present a forward-looking roadmap for the unification of LLMs and KGs. Our roadmap consists of three general frameworks, namely, 1) KG-enhanced LLMs, which incorporate KGs during the pre-training and inference phases of LLMs, or for the purpose of enhancing understanding of the knowledge learned by LLMs; 2) LLM-augmented KGs, that leverage LLMs for different KG tasks such as embedding, completion, construction, graph-to-text generation, and question answering; and 3) [[Synergized LLMs + KG]]s, in which LLMs and KGs play equal roles and work in a mutually beneficial way to enhance both LLMs and KGs for bidirectional reasoning driven by both data and knowledge. We review and summarize existing efforts within these three frameworks in our roadmap and pinpoint their future research directions.

1 INTRODUCTION

Large language models (LLMs) ^[1] (e.g., BERT [1], RoBERTA [2], and T5 [3]), pre-trained on the large-scale corpus, have shown great performance in various natural language processing (NLP) tasks, such as question answering [4], machine translation [5], and text generation [6]. Recently, the dramatically increasing model size further enables the LLMs with the emergent ability [7], paving the road for applying LLMs as Artificial General Intelligence (AGI). Advanced LLMs like ChatGPT^[2] and PaLM2^[3], with billions of parameters, exhibit great potential in many complex practical tasks, such as education [8], code generation [9] and recommendation [10].

Fig. 1. Summarization of the pros and cons for LLMs and KGs. LLM pros: General Knowledge [11], Language Processing [12], Generalizability [13]; LLM cons: Implicit Knowledge [14], Hallucination [15], In- decisiveness [16], Black-box [17], Lacking Domain-specific/New Knowledge [18]. KG pros: Structural Knowledge [19], Accuracy [20], Decisive- ness [21], Interpretability [22], Domain-specific Knowledge [23], Evolving Knowledge [24]; KG cons: Incompleteness [25], Lacking Language Understanding [26], Unseen Facts [27].

Despite their success in many applications, LLMs have been criticized for their lack of factual knowledge. Specifically, LLMs memorize facts and knowledge contained in the training corpus [14]. However, further studies reveal that LLMs are not able to recall facts and often experience hallucinations by generating statements that are factually incorrect [15], [28]. For example, LLMs might say “Einstein discovered gravity in 1687” when asked, “When did Einstein discover gravity?”, which contradicts the fact that Isaac Newton formulated the gravitational theory. This issue severely impairs the trustworthiness of LLMs. As black-box models, LLMs are also criticized for their lack of interpretability. LLMs represent knowledge implicitly in their parameters. It is difficult to interpret or validate the knowledge obtained by LLMs. Moreover, LLMs perform reasoning by a probability model, which is an indecisive process [16]. The specific patterns and functions LLMs used to arrive at predictions or decisions are not directly accessible or explainable to humans [17]. Even though some LLMs are equipped to explain their predictions by applying chain-of-thought [29], their reasoning explanations also suffer from the hallucination issue [30]. This severely impairs the application of LLMs in high-stakes scenarios, such as medical diagnosis and legal judgment. For instance, in a medical diagnosis scenario, LLMs may incorrectly diagnose a disease and provide explanations that contradict medical commonsense. This raises another issue that LLMs trained on general corpus might not be able to generalize well to specific domains or new knowledge due to the lack of domain-specific knowledge or new training data [18]. To address the above issues, a potential solution is to in- corporate knowledge graphs (KGs) into LLMs. Knowledge graphs (KGs), storing enormous facts in the way of triples, i.e., (head entity, relation, tail entity), are a structured and decisive manner of knowledge representation (e.g., Wiki- data [20], YAGO [31], and NELL [32]). KGs are crucial for various applications as they offer accurate explicit knowledge [19]. Besides, they are renowned for their symbolic reasoning ability [22], which generates interpretable results. KGs can also actively evolve with new knowledge continuously added in [24]. Additionally, experts can construct domain-specific KGs to provide precise and dependable domain-specific knowledge [23]. Nevertheless, KGs are difficult to construct [33], and current approaches in KGs [25], [27], [34] are inadequate in handling the incomplete and dynamically changing nature of real-world KGs. These approaches fail to effectively model unseen entities and represent new facts. In addition, they often ignore the abundant textual information in KGs. Moreover, existing methods in KGs are often customized for specific KGs or tasks, which are not generalizable enough. Therefore, it is also necessary to utilize LLMs to address the challenges faced in KGs. We summarize the pros and cons of LLMs and KGs in Fig. 1, respectively. Recently, the possibility of unifying LLMs with KGs has attracted increasing attention from researchers and practitioners. LLMs and KGs are inherently interconnected and can mutually enhance each other. In KG-enhanced LLMs, KGs can not only be incorporated into the pre-training and inference stages of LLMs to provide external knowledge [35]–[37], but also used for analyzing LLMs and providing interpretability [14], [38], [39]. In LLM-augmented KGs, LLMs have been used in various KG-related tasks, e.g., KG embedding [40], KG completion [26], KG construction [41], KG-to-text generation [42], and KGQA [43], to improve the performance and facilitate the application of KGs. In Syn- ergized LLM + KG, researchers marries the merits of LLMs and KGs to mutually enhance performance in knowledge

representation [44] and reasoning [45], [46]. Although there are some surveys on knowledge-enhanced LLMs [47]–[49], which mainly focus on using KGs as an external knowledge to enhance LLMs, they ignore other possibilities of integrating KGs for LLMs and the potential role of LLMs in KG applications. In this article, we present a forward-looking roadmap for unifying both LLMs and KGs, to leverage their respective strengths and overcome the limitations of each approach, for various downstream tasks. We propose detailed categorization, conduct comprehensive reviews, and pinpoint emerging directions in these fast-growing fields. Our main contributions are summarized as follows: 1) Roadmap. We present a forward-looking roadmap for integrating LLMs and KGs. Our roadmap, consisting of three general frameworks to unify LLMs and KGs, namely, KG-enhanced LLMs, LLM-augmented KGs, and Synergized LLMs + KGs, pro- vides guidelines for the unification of these two distinct but complementary technologies. 2) Categorization and review. For each integration framework of our roadmap, we present a detailed categorization and novel taxonomies of research on unifying LLMs and KGs. In each category, we review the research from the perspectives of differ- ent integration strategies and tasks, which provides more insights into each framework. 3) Coverage of emerging advances. We cover the advanced techniques in both LLMs and KGs. We include the discussion of state-of-the-art LLMs like ChatGPT and GPT-4 as well as the novel KGs e.g., multi-modal knowledge graphs. 4) Summary of challenges and future directions. We highlight the challenges in existing research and present several promising future research directions. The rest of this article is organized as follows. Section 2 first explains the background of LLMs and KGs. Section 3 introduces the roadmap and the overall categorization of this article. Section 4 presents the different KGs-enhanced LLM approaches. Section 5 describes the possible LLM- augmented KG methods. Section 6 shows the approaches of synergizing LLMs and KGs. Section 7 discusses the challenges and future research directions. Finally, Section 8 concludes this paper.

2 BACKGROUND

In this section, we will first briefly introduce a few representative large language models (LLMs) and discuss the prompt engineering that efficiently uses LLMs for varieties of applications. Then, we illustrate the concept of knowledge graphs (KGs) and present different categories of KGs.

2.1 Large Language models (LLMs)

Large language modes (LLMs) pre-trained on large-scale corpus have shown great potential in various NLP tasks [13]. As shown in Fig. 3, most LLMs derive from the Trans- former design [50], which contains the encoder and decoder modules empowered by a self-attention mechanism. Based

110 M

117M-1.5B

175B

175B Unknown

Output Text Input Text

Output Text

Features

Input Text

110M-340M

140M BART

66M

T5 80M-11B

11B T0 mT5 300M-13B

110M-10B GLM

Switch 1.6T

4.1B-269B ST-MoE

20B UL2

Flan-T5 80M-11B

540B

130B

20B

7B-13B

Features

BERT

DistillBert

11M-223M ALBERT

14M-110M ELECTRA

Open-Source Closed-Source

Input Text

RoBERTA 125M-355M

ERNIE 114M

DeBERTa 44M-304M

Fig. 2. Representative large language models (LLMs) in recent years. Open-source models are represented by solid squares, while closed source models are represented by hollow squares.

Encoder

Decoder Feed Forward

Self-Attention Linear Concat Multi-head Dot-Product

ing the entire sentence, such as text classification [53] and named entity recognition [54].

2.1.2 Encoder-decoder LLMs. Encoder-decoder large language models adopt both the

Feed Forward Self-Attention

Encoder-Decoder Attention Self-Attention

Linear

V

Attention

Linear Linear Q K

encoder and decoder module. The encoder module is re- sponsible for encoding the input sentence into a hidden- space, and the decoder is used to generate the target output text. The training strategies in encoder-decoder LLMs can be more flexible. For example, T5 [3] is pre-trained by masking and predicting spans of masking words. UL2 [55] unifies

Fig. 3. An illustration of the Transformer-based LLMs with self-attention mechanism.

on the architecture structure, LLMs can be categorized into three groups: 1) encoder-only LLMs, 2) encoder-decoder LLMs, and 3) decoder-only LLMs. As shown in Fig. 2, we sum- marize several representative LLMs with different model architectures, model sizes, and open-source availabilities.

2.1.1 Encoder-only LLMs. Encoder-only large language models only use the encoder to encode the sentence and understand the relationships between words. The common training paradigm for these model is to predict the mask words in an input sentence. This method is unsupervised and can be trained on the large-scale corpus. Encoder-only LLMs like BERT [1], AL- BERT [51], RoBERTa [2], and ELECTRA [52] require adding an extra prediction head to resolve downstream tasks. These models are most effective for tasks that require understand-

several training targets such as different masking spans and masking frequencies. Encoder-decoder LLMs (e.g., T0 [56], ST-MoE [57], and GLM-130B [58]) are able to directly resolve tasks that generate sentences based on some context, such as summariaztion, translation, and question answering [59].

2.1.3 Decoder-only LLMs. Decoder-only large language models only adopt the de- coder module to generate target output text. The training paradigm for these models is to predict the next word in the sentence. Large-scale decoder-only LLMs can generally perform downstream tasks from a few examples or simple instructions, without adding prediction heads or finetun- ing [60]. Many state-of-the-art LLMs (e.g., Chat-GPT [61] and GPT-44) follow the decoder-only architecture. However, since these models are closed-source, it is challenging for academic researchers to conduct further research. Recently,

1. https://openai.com/product/gpt-4

Instruction

Context

Input Text

Prompt

Wikipedia

Medical Knowledge Graph

Fig. 4. An example of sentiment classification prompt.

Alpaca5 and Vicuna6 are released as open-source decoder- only LLMs. These models are finetuned based on LLaMA

[62] and achieve comparable performance with ChatGPT and GPT-4.

2.2 Prompt Engineering

Prompt engineering is a novel field that focuses on creating and refining prompts to maximize the effectiveness of large language models (LLMs) across various applications and re- search areas [63]. As shown in Fig. 4, a prompt is a sequence of natural language inputs for LLMs that are specified for the task, such as sentiment classification. A prompt could contain several elements, i.e., 1) Instruction, 2) Context, and 3) Input Text. Instruction is a short sentence that instructs the model to perform a specific task. Context provides the context for the input text or few-shot examples. Input Text is the text that needs to be processed by the model. Prompt engineering seeks to improve the capacity of large large language models (e.g.,ChatGPT) in diverse com- plex tasks such as question answering, sentiment classifica- tion, and common sense reasoning. Chain-of-thought (CoT) prompt [64] enables complex reasoning capabilities through intermediate reasoning steps. Liu et al. [65] incorporate external knowledge to design better knowledge-enhanced prompts. Automatic prompt engineer (APE) proposes an automatic prompt generation method to improve the perfor- mance of LLMs [66]. Prompt offers a simple way to utilize the potential of LLMs without finetuning. Proficiency in prompt engineering leads to a better understanding of the strengths and weaknesses of LLMs.

2.3 Knowledge Graphs (KGs)

Knowledge graphs (KGs) store structured knowledge as a collection of triples KG = {(h, r, t) ⊆ E × R × E}, where E and R respectively denote the set of entities and relations. Existing knowledge graphs (KGs) can be classified into four groups based on the stored information: 1) encyclopedic KGs, 2) commonsense KGs, 3) domain-specific KGs, and 4) multimodal KGs. We illustrate the examples of KGs of different categories in Fig. 5.

2. https://github.com/tatsu-lab/stanford alpaca

3. https://lmsys.org/blog/2023-03-30-vicuna/

Fig. 5. Examples of different categories’ knowledge graphs, i.e., encyclopedic KGs, commonsense KGs, domain-specific KGs, and multi-modal KGs.

2.3.1 Encyclopedic Knowledge Graphs.

Encyclopedic knowledge graphs are the most ubiquitous KGs, which represent the general knowledge in real-world. Encyclopedic knowledge graphs are often constructed by integrating information from diverse and extensive sources, including human experts, encyclopedias, and databases. Wikidata [20] is one of the most widely used encyclopedic knowledge graphs, which incorporates varieties of knowl- edge extracted from articles on Wikipedia. Other typical encyclopedic knowledge graphs, like Freebase [67], Dbpedia [68], and YAGO [31] are also derived from Wikipedia. In addition, NELL [32] is a continuously improving encyclope- dic knowledge graph, which automatically extracts knowl- edge from the web, and uses that knowledge to improve its performance over time. There are several encyclope- dic knowledge graphs available in languages other than English such as CN-DBpedia [69] and Vikidia [70]. The largest knowledge graph, named Knowledge Occean (KO) 7, currently contains 4,8784,3636 entities and 17,3115,8349 relations in both English and Chinese.

2.3.2 Commonsense Knowledge Graphs. Commonsense knowledge graphs formulate the knowledge about daily concepts, e.g., objects, and events, as well as their relationships [71]. Compared with encyclopedic knowledge graphs, commonsense knowledge graphs often model the tacit knowledge extracted from text such as (Car, 4. https://ko.zhonghuapu.com/

Factual Knowledge

Text Input

Structural Fact Domain-speciﬁc Knowledge Symbolic-reasoning

Output

KG-related Tasks

General Knowledge Language Processing Generalizability

Output

Knowledge Representation

a. KG-enhanced LLMs b. LLM-augmented KGs c. Synergized LLMs + KGs

Fig. 6. The general roadmap of unifying KGs and LLMs. (a.) KG-enhanced LLMs. (b.) LLM-augmented KGs. (c.) Synergized LLMs + KGs.

TABLE 1 Representative applications of using LLMs and KGs.

Name Category LLMs KGs URL ChatGPT/GPT-4 ERNIE 3.0 Bard Firefly AutoGPT Copilot New Bing Shop.ai Wikidata KO OpenBG Doctor.ai Chat Bot Chat Bot Chat Bot Photo Editing AI Assistant Coding Assistant Web Search Recommendation Knowledge Base Knowledge Base Recommendation Health Care Assistant ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

✓ ✓ ✓

✓ ✓ ✓ ✓ https://shorturl.at/cmsE0 https://shorturl.at/sCLV9 https://shorturl.at/pDLY6 https://shorturl.at/fkzJV https://shorturl.at/bkoSY https://shorturl.at/lKLUV https://shorturl.at/bimps https://shorturl.at/alCY7 https://shorturl.at/lyMY5 https://shorturl.at/sx238 https://shorturl.at/pDMV9 https://shorturl.at/dhlK0

UsedFor, Drive). ConceptNet [72] contains a wide range of commonsense concepts and relations, which can help computers understand the meanings of words people use. ATOMIC [73], [74] and ASER [75] focus on the causal effects between events, which can be used for commonsense rea- soning. Some other commonsense knowledge graphs, such as TransOMCS [76] and CausalBanK [77] are automatically constructed to provide commonsense knowledge.

2.3.3 Domain-specific Knowledge Graphs Domain-specific knowledge graphs are often constructed to represent knowledge in a specific domain, e.g., medi- cal, biology, and finance [23]. Compared with encyclopedic knowledge graphs, domain-specific knowledge graphs are often smaller in size, but more accurate and reliable. For example, UMLS [78] is a domain-specific knowledge graph in the medical domain, which contains biomedical concepts and their relationships. In addition, there are some domain- specific knowledge graphs in other domains, such as finance [79], geology [80], biology [81], chemistry [82] and geneal- ogy [83].

2.3.4 Multi-modal Knowledge Graphs. Unlike conventional knowledge graphs that only contain textual information, multi-modal knowledge graphs repre- sent facts in multiple modalities such as images, sounds, and videos [84]. For example, IMGpedia [85], MMKG [86], and Richpedia [87] incorporate both the text and image information into the knowledge graphs. These knowledge graphs can be used for various multi-modal tasks such as image-text matching [88], visual question answering [89], and recommendation [90].

2.4 Applications LLMs as KGs have been widely applied in various real-world applications. We summarize some representa- tive applications of using LLMs and KGs in Table 1. ChatGPT/GPT-4 are LLM-based chatbots that can commu- nicate with humans in a natural dialogue format. To im- prove knowledge awareness of LLMs, ERNIE 3.0 and Bard incorporate KGs into their chatbot applications. Instead of Chatbot. Firefly develops a photo editing application that allows users to edit photos by using natural language de- scriptions. Copilot, New Bing, and Shop.ai adopt LLMs to empower their applications in the areas of coding assistant, web search, and recommendation, respectively. Wikidata and KO are two representative knowledge graph applica- tions that are used to provide external knowledge. OpenBG [91] is a knowledge graph designed for recommendation. Doctor.ai develops a health care assistant that incorporates LLMs and KGs to provide medical advice.

3 ROADMAP & CATEGORIZATION

In this section, We first present a road map of explicit frameworks that unify LLMs and KGs. Then, we present the categorization of research on unifying LLMs and KGs.

3.1 Roadmap

The roadmap of unifying KGs and LLMs is illustrated in Fig. 6. In the roadmap, we identify three frameworks for the unification of LLMs and KGs, including KG-enhanced LLMs, LLM-augmented KGs, and Synergized LLMs + KGs.

3.1.1 KG-enhanced LLMs

LLMs are renowned for their ability to learn knowledge from large-scale corpus and achieve state-of-the-art per- formance in various NLP tasks. However, LLMs are often criticized for their hallucination issues [15], and lacking of interpretability. To address these issues, researchers have proposed to enhance LLMs with knowledge graphs (KGs). KGs store enormous knowledge in an explicit and struc- tured way, which can be used to enhance the knowledge awareness of LLMs. Some researchers have proposed to incorporate KGs into LLMs during the pre-training stage, which can help LLMs learn knowledge from KGs [92], [93]. Other researchers have proposed to incorporate KGs into LLMs during the inference stage. By retrieving knowledge

Fig. 7. The general framework of the Synergized LLMs + KGs, which contains four layers: 1) Data, 2) Synergized Model, 3) Technique, and 4) Application.

3.1.2 LLM-augmented KGs

KGs store structure knowledge playing an essential role in many real-word applications [19]. Existing methods in KGs fall short of handling incomplete KGs [25] and processing text corpus to construct KGs [96]. With the generalizability of LLMs, many researchers are trying to harness the power of LLMs for addressing KG-related tasks.

The most straightforward way to apply LLMs as text encoders for KG-related tasks. Researchers take advantage of LLMs to process the textual corpus in the KGs and then use the representations of the text to enrich KGs representa- tion [97]. Some studies also use LLMs to process the original corpus and extract relations and entities for KG construction [98]. Recent studies try to design a KG prompt that can effectively convert structural KGs into a format that can be comprehended by LLMs. In this way, LLMs can be directly applied to KG-related tasks, e.g., KG completion [99] and KG reasoning [100].

3.1.3 Synergized LLMs + KGs

The synergy of LLMs and KGs has attracted increasing attention from researchers these years [40], [42]. LLMs and KGs are two inherently complementary techniques, which should be unified into a general framework to mutually enhance each other.

To further explore the unification, we propose a unified framework of the synergized LLMs + KGs in Fig. 7. The unified framework contains four layers: 1) Data, 2) Synergized Model, 3) Technique, and 4) Application. In the Data layer, LLMs and KGs are used to process the textual and structural data, respectively. With the development of multi-modal LLMs [101] and KGs [102], this framework can be extended to process multi-modal data, such as video, audio, and images. In the Synergized Model layer, LLMs and KGs could synergize with each other to improve their capabilities. In Technique layer, related techniques that have been used in LLMs and KGs can be incorporated into this framework to further enhance the performance. In the Application layer, LLMs and KGs can be integrated to address various real-world applications, such as search engines [103], recommender systems [10], and AI assistants [104].

3.2 Categorization =

To better understand the research on unifying LLMs and KGs, we further provide a fine-grained categorization for each framework in the roadmap. Specifically, we focus on different ways of integrating KGs and LLMs, i.e., KG- enhanced LLMs, KG-augmented LLMs, and Synergized LLMs + KGs. The fine-grained categorization of the research is illustrated in Fig. 8.

KG-enhanced LLMs. Integrating KGs can enhance the performance and interpretability of LLMs in various down- stream tasks. We categorize the research on KG-enhanced LLMs into three groups:

) KG-enhanced LLM pre-training includes works that apply KGs during the pre-training stage and im- prove the knowledge expression of LLMs.
) KG-enhanced LLM inference includes research that utilizes KGs during the inference stage of LLMs, which enables LLMs to access the latest knowledge without retraining.
) KG-enhanced LLM interpretability includes works that use KGs to understand the knowledge learned by LLMs and interpret the reasoning process of LLMs.

LLM-augmented KGs. LLMs can be applied to augment various KG-related tasks. We categorize the research on LLM-augmented KGs into five groups based on the task types:

) LLM-augmented KG embedding includes studies that apply LLMs to enrich representations of KGs by encoding the textual descriptions of entities and relations.
) LLM-augmented KG completion includes papers that utilize LLMs to encode text or generate facts for better KGC performance.
) LLM-augmented KG construction includes works that apply LLMs to address the entity discovery, corefer- ence resolution, and relation extraction tasks for KG construction.
) LLM-augmented KG-to-text Generation includes re- search that utilizes LLMs to generate natural lan- guage that describes the facts from KGs.
) LLM-augmented KG question answering includes stud- ies that apply LLMs to bridge the gap between natural language questions and retrieve answers from KGs.

Synergized LLMs + KGs. The synergy of LLMs and KGs aims to integrate LLMs and KGs into a unified framework

Fig. 8. Fine-grained categorization of research on unifying large language models (LLMs) with knowledge graphs (KGs).

to mutually enhance each other. In this categorization, we review the recent attempts of Synergized LLMs + KGs from

TABLE 2 Summary of KG-enhanced LLM methods.

the perspectives of knowledge representation and reasoning. In the following sections (Sec 4, 5, and 6), we will provide details on these categorizations.

4 KG-ENHANCED LLMS

Large language models (LLMs) achieve promising results in many natural language processing tasks. However, LLMs have been criticized for their lack of practical knowledge and tendency to generate factual errors during inference. To address this issue, researchers have proposed integrating knowledge graphs (KGs) to enhance LLMs. In this sec- tion, we first introduce the KG-enhanced LLM pre-training, which aims to inject knowledge into LLMs during the pre- training stage. Then, we introduce the KG-enhanced LLM inference, which enables LLMs to consider the latest knowl- edge while generating sentences. Finally, we introduce the KG-enhanced LLM interpretability, which aims to improve the interpretability of LLMs by using KGs. Table 2 summa- rizes the typical methods that integrate KGs for LLMs.

4.1 KG-enhanced LLM Pre-training

Existing large language models mostly rely on unsupervised training on the large-scale corpus. While these models may exhibit impressive performance on downstream tasks, they often lack practical knowledge relevant to the real world. Previous works that integrate KGs into large language mod- els can be categorized into three parts: 1) Integrating KGs into training objective, 2) Integrating KGs into LLM inputs and 3) Integrating KGs into additional fusion modules.

KG-enhanced LLM pre-training

KG-enhanced LLM inference

KG-enhanced LLM interpretability

4.1.1 Integrating KGs into Training Objective The research efforts in this category focus on designing novel knowledge-aware training objectives. An intuitive idea is to expose more knowledge entities in the pre-training objective. GLM [106] leverages the knowledge graph struc- ture to assign a masking probability. Specifically, entities that can be reached within a certain number of hops are considered to be the most important entities for learning, and they are given a higher masking probability during

Text Representations

Text-knowledge Alignment

Knowledge Graph Representations

Text Sequence Entitiy Input Text: Bob Dylan wrote Blowin’ in the Wind in 1962

Fig. 9. Injecting KG information into LLMs training objective via text- knowledge alignment loss, where h denotes the hidden representation generated by LLMs.

pre-training. Furthermore, E-BERT [107] further controls the balance between the token-level and entity-level training losses. The training loss values are used as indications of the learning process for token and entity, which dynamically de- termines their ratio for the next training epochs. SKEP [105] also follows a similar fusion to inject sentiment knowledge during LLMs pre-training. SKEP first determines words with positive and negative sentiment by utilizing PMI along with a predefined set of seed sentiment words. Then, it assigns a higher masking probability to those identified sentiment words in the word masking objective. The other line of work explicitly leverages the connec- tions with knowledge and input text. As shown in Figure 9, ERNIE [92] proposes a novel word-entity alignment training objective as a pre-training objective. Specifically, ERNIE feeds both sentences and corresponding entities mentioned in the text into LLMs, and then trains the LLMs to pre- dict alignment links between textual tokens and entities in knowledge graphs. Similarly, KALM [93] enhances the input tokens by incorporating entity embeddings and includes an entity prediction pre-training task in addition to the token-only pre-training objective. This approach aims to improve the ability of LLMs to capture knowledge related to entities. Finally, KEPLER [132] directly employs both knowledge graph embedding training objective and Masked token pre-training objective into a shared transformer-based encoder. Deterministic LLM [108] focuses on pre-training language models to capture deterministic factual knowledge. It only masks the span that has a deterministic entity as the question and introduces additional clue contrast learning and clue classification objective. WKLM [110] first replaces entities in the text with other same-type entities and then feeds them into LLMs. The model is further pre-trained to distinguish whether the entities have been replaced or not.

4.1.2 Integrating KGs into LLM Inputs As shown in Fig. 10, this kind of research focus on in- troducing relevant knowledge sub-graph into the inputs of LLMs. Given a knowledge graph triple and the corre- sponding sentences, ERNIE 3.0 [104] represents the triple as a sequence of tokens and directly concatenates them with the sentences. It further randomly masks either the relation token in the triple or tokens in the sentences to better combine knowledge with textual representations. However, such direct knowledge triple concatenation method allows

Input Text: Mr. Darcy gives Elizabeth a letter

Fig. 10. Injecting KG information into LLMs inputs using graph structure.

the tokens in the sentence to intensively interact with the tokens in the knowledge sub-graph, which could result in Knowledge Noise [36]. To solve this issue, K-BERT [36] takes the first step to inject the knowledge triple into the sentence via a visible matrix where only the knowledge entities have access to the knowledge triple information, while the tokens in the sentences can only see each other in the self-attention module. To further reduce Knowledge Noise, Colake [111] proposes a unified word-knowledge graph (shown in Fig. 10) where the tokens in the input sentences form a fully connected word graph where tokens aligned with knowl- edge entities are connected with their neighboring entities. The above methods can indeed inject a large amount of knowledge into LLMs. However, they mostly focus on popular entities and overlook the low-frequent and long- tail ones. DkLLM [112] aims to improve the LLMs repre- sentations towards those entities. DkLLM first proposes a novel measurement to determine long-tail entities and then replaces these selected entities in the text with pseudo token embedding as new input to the large language models. Furthermore, Dict-BERT [113] proposes to leverage exter- nal dictionaries to solve this issue. Specifically, Dict-BERT improves the representation quality of rare words by ap- pending their definitions from the dictionary at the end of input text and trains the language model to locally align rare words representations in input sentences and dictionary definitions as well as to discriminate whether the input text and definition are correctly mapped. 4.1.3 Integrating KGs by Additional Fusion Modules By introducing additional fusion modules into LLMs, the information from KGs can be separately processed and fused into LLMs. As shown in Fig. 11, ERNIE [92] proposes a textual-knowledge dual encoder architecture where a T- encoder first encodes the input sentences, then a K-encoder processes knowledge graphs which are fused them with the textual representation from the T-encoder. BERT-MK [114] employs a similar dual-encoder architecture but it intro- duces additional information of neighboring entities in the knowledge encoder component during the pre-training of

Text Outputs Knowledge Graph Outputs

Fig. 12. Dynamic knowledge graph fusion for LLM inference.

Fig. 11. Integrating KGs into LLMs by additional fusion modules.

LLMs. However, some of the neighboring entities in KGs may not be relevant to the input text, resulting in extra redundancy and noise. CokeBERT [117] focuses on this issue and proposes a GNN-based module to filter out irrelevant KG entities using the input text. JAKET [115] proposes to fuse the entity information in the middle of the large lan- guage model. The first half of the model processes the input text and knowledge entity sequence separately. Then, the outputs of text and entities are combined together. Specif- ically, the entity representations are added to their corre- sponding position of text representations, which are further processed by the second half of the model. K-adapters [116] fuses linguistic and factual knowledge via adapters which only adds trainable multi-layer perception in the middle of the transformer layer while the existing parameters of large language models remain frozen during the knowledge pre- training stage. Such adapters are independent of each other and can be trained in parallel.

4.2 KG-enhanced LLM Inference The above methods could effectively fuse knowledge with the textual representations in the large language models. However, real-world knowledge is subject to change and the limitation of these approaches is that they do not permit updates to the incorporated knowledge without retraining the model. As a result, they may not generalize well to the unseen knowledge during inference [133]. Therefore, considerable research has been devoted to keeping the knowledge space and text space separate and injecting the knowledge while inference. These methods mostly focus on the Question Answering (QA) tasks, because QA requires the model to capture both textual semantic meanings and up-to-date real-world knowledge.

4.2.1 Dynamic Knowledge Fusion A straightforward method is to leverage a two-tower ar- chitecture where one separated module processes the text inputs and the other one processes the relevant knowledge graph inputs [134]. However, this method lacks interaction between text and knowledge. Thus, KagNet [95] proposes to first encode the input KG, and then augment the input textual representation. In contrast, MHGRN [135] uses the final LLM outputs of the input text to guide the reasoning

process on the KGs. Yet, both of them only design a single- direction interaction between the text and KGs. To tackle this issue, QA-GNN [118] proposes to use a GNN-based model to jointly reason over input context and KG information via message passing. Specifically, QA-GNN represents the input textual information as a special node via a pooling operation and connects this node with other entities in KG. However, the textual inputs are only pooled into a single dense vector, limiting the information fusion performance. JointLK [119] then proposes a framework with fine-grained interaction between any tokens in the textual inputs and any KG entities through LM-to-KG and KG-to-LM bi-directional attention mechanism. As shown in Fig. 12, pairwise dot- product scores are calculated over all textual tokens and KG entities, the bi-directional attentive scores are computed sep- arately. In addition, at each jointLK layer, the KGs are also dynamically pruned based on the attention score to allow later layers to focus on more important sub-KG structures. Despite being effective, in JointLK, the fusion process be- tween the input text and KG still uses the final LLM outputs as the input text representations. GreaseLM [120] designs deep and rich interaction between the input text tokens and KG entities at each layer of the LLMs. The architecture and fusion approach are mostly similar to ERNIE [92] discussed in Section 4.1.3, except that GreaseLM does not use the text- only T-encoder to handle input text. 4.2.2 Retrieval-Augmented Knowledge Fusion Different from the above methods that store all knowledge in parameters, as shown in Figure 13, RAG [94] proposes to combine non-parametric and parametric modules to handle the external knowledge. Given the input text, RAG first searches for relevant KG in the non-parametric module via MIPS to obtain several documents. RAG then treats these documents as hidden variables z and feeds them into the output generator, empowered by Seq2Seq LLMs, as additional context information. The research indicates that using different retrieved documents as conditions at different generation steps performs better than only using a single document to guide the whole generation process. The experimental results show that RAG outperforms other parametric-only and non-parametric-only baseline models in open-domain QA. RAG can also generate more specific, diverse, and factual text than other parameter-only base- lines. Story-fragments [123] further improves architecture by adding an additional module to determine salient knowl- edge entities and fuse them into the generator to improve

KGs

Q: Which country is Obama from?

Knowledge Retriever

Retrieved Facts (Obama, BornIn, Honolulu) (Honolulu, LocatedIn, USA)

Backpropagation

LLMs A: USA

Fig. 13. Retrieving external knowledge to enhance the LLM generation.

the quality of generated long stories. EMAT [124] further improves the efficiency of such a system by encoding exter- nal knowledge into a key-value memory and exploiting the fast maximum inner product search for memory querying. REALM [122] proposes a novel knowledge retriever to help the model to retrieve and attend over documents from a large corpus during the pre-training stage and success- fully improves the performance of open-domain question answering. KGLM [121] selects the facts from a knowl- edge graph using the current context to generate factual sentences. With the help of an external knowledge graph, KGLM could describe facts using out-of-domain words or phrases.

4.3 KG-enhanced LLM Interpretability Although LLMs have achieved remarkable success in many NLP tasks, they are still criticized for their lack of inter- pretability. The large language model (LLM) interpretability refers to the understanding and explanation of the inner workings and decision-making processes of a large lan- guage model [17]. This can improve the trustworthiness of LLMs and facilitate their applications in high-stakes scenar- ios such as medical diagnosis and legal judgment. Knowl- edge graphs (KGs) represent the knowledge structurally and can provide good interpretability for the reasoning results. Therefore, researchers try to utilize KGs to improve the interpretability of LLMs, which can be roughly grouped into two categories: 1) KGs for language model probing, and 2) KGs for language model analysis.

4.3.1 KGs for LLM Probing The large language model (LLM) probing aims to under- stand the knowledge stored in LLMs. LLMs, trained on large-scale corpus, are often known as containing enor- mous knowledge. However, LLMs store the knowledge in a hidden way, making it hard to figure out the stored knowledge. Moreover, LLMs suffer from the hallucination problem [15], which results in generating statements that contradict facts. This issue significantly affects the reliability of LLMs. Therefore, it is necessary to probe and verify the knowledge stored in LLMs. LAMA [14] is the first work to probe the knowledge in LLMs by using KGs. As shown in Fig. 14, LAMA first converts the facts in KGs into cloze statements by a pre- defined prompt template and then uses LLMs to predict the missing entity. The prediction results are used to evaluate the knowledge stored in LLMs. For example, we try to

Fig. 14. The general framework of using knowledge graph for language model probing.

probe whether LLMs know the fact (Obama, profession, pres- ident). We first convert the fact triple into a cloze question “Obama’s profession is .” with the object masked. Then, we test if the LLMs can predict the object “president” correctly. However, LAMA ignores the fact that the prompts are inappropriate. For example, the prompt “Obama worked as a ” may be more favorable to the prediction of the blank by the language models than “Obama is a by profession”. Thus, LPAQA [125] proposes a mining and paraphrasing-based method to automatically generate high-quality and diverse prompts for a more accurate assessment of the knowledge contained in the language model. Moreover, Adolphs et al. [127] attempt to use examples to make the language model understand the query, and experiments obtain substantial improvements for BERT-large on the T-REx data. Unlike using manually defined prompt templates, Autoprompt [126] proposes an automated method, which is based on the gradient-guided search to create prompts. Instead of probing the general knowledge by using the encyclopedic and commonsense knowledge graphs, BioLAMA [136] and MedLAMA [128] probe the medical knowledge in LLMs by using medical knowledge graphs. Alex et al. [129] investigate the capacity of LLMs to re- tain less popular factual knowledge. They select unpopular facts from Wikidata knowledge graphs which have low- frequency clicked entities. These facts are then used for the evaluation, where the results indicate that LLMs encounter difficulties with such knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the tail.

4.3.2 KGs for LLM Analysis Knowledge graphs (KGs) for pre-train language models (LLMs) analysis aims to answer the following questions such as “how do LLMs generate the results?”, and “how do the function and structure work in LLMs?”. To analyze the inference process of LLMs, as shown in Fig. 15, KagNet [38] and QA-GNN [118] make the results generated by LLMs at each reasoning step grounded by knowledge graphs. In this way, the reasoning process of LLMs can be explained by extracting the graph structure from KGs. Shaobo et al. [131] investigate how LLMs generate the results correctly. They adopt the causal-inspired analysis from facts extracted from KGs. This analysis quantitatively measures the word patterns that LLMs depend on to generate the results. The results show that LLMs generate the missing factual more by the positionally closed words rather than the knowledge-

Fig. 15. The general framework of using knowledge graph for language model analysis.

dependent words. Thus, they claim that LLMs are inade- quate to memorize factual knowledge because of the inaccu- rate dependence. To interpret the training of LLMs, Swamy et al. [130] adopt the language model during pre-training to generate knowledge graphs. The knowledge acquired by LLMs during training can be unveiled by the facts in KGs explicitly. To explore how implicit knowledge is stored in parameters of LLMs, Dai et al. [39] propose the concept of knowledge neurons. Specifically, activation of the identified knowledge neurons is highly correlated with knowledge expression. Thus, they explore the knowledge and facts represented by each neuron by suppressing and amplifying knowledge neurons.

5 LLM-AUGMENTED FOR KGS Knowledge graphs are famous for representing knowledge in a structural manner. They have been applied in many downstream tasks such as question answering, recommen- dation, and web search. However, the conventional KGs are often incomplete and existing methods often lack con- sidering textual information. To address these issues, re- cent research has explored integrating LLMs to augment KGs to consider the textual information and improve the performance in downstream tasks. In this section, we will introduce the recent research on LLM-augmented KGs. Rep- resentative works are summarized in Table 3. We will intro- duce the methods that integrate LLMs for KG embedding, KG completion, KG construction, KG-to-text generation, and KG question answering, respectively.

5.1 LLM-augmented KG Embedding Knowledge graph embedding (KGE) aims to map each entity and relation into a low-dimensional vector (embed- ding) space. These embeddings contain both semantic and structural information of KGs, which can be utilized for various tasks such as question answering [182], reasoning [38], and recommendation [183]. Conventional knowledge graph embedding methods mainly rely on the structural information of KGs to optimize a scoring function de- fined on embeddings (e.g., TransE [25], and DisMult [184]).

TABLE 3 Summary of representative LLM-augmented KG methods.

Tasks Method Year Technique Pretrain-KGE [97] 2020 LLMs as Text Encoders KEPLER [40] 2020 LLMs as Text Encoders Nayyeri et al. [137] 2022 LLMs as Text Encoders LLM-augmented KG embedding Huang et al. [138] 2022 LLMs as Text Encoders CoDEx [139] 2022 LLMs as Text Encoders LMKE [140] 2022 LLMs for Joint Text and KG Embedding kNN-KGE [141] 2022 LLMs for Joint Text and KG Embedding LambdaKG [142] 2023 LLMs for Joint Text and KG Embedding KG-BERT [26] 2019 Joint Encoding MTL-KGC [143] 2020 Joint Encoding PKGC [144] 2022 Joint Encoding LASS [145] 2022 Joint Encoding MEM-KGC [146] 2021 MLM Encoding LLM-augmented KG completion OpenWorld KGC [147] 2023 MLM Encoding StAR [148] 2021 Separated Encoding SimKGC [149] 2022 Separated Encoding LP-BERT [150] 2022 Separated Encoding GenKGC [99] 2022 LLM as decoders KGT5 [151] 2022 LLM as decoders KG-S2S [152] 2022 LLM as decoders AutoKG [96] 2023 LLM as decoders LDET [153] 2019 Entity Typing BOX4Types [154] 2021 Entity Typing LRN [155] 2021 Entity Linking TempEL [156] 2023 Entity Linking BertCR [157] 2019 CR (Within-document) Spanbert [158] 2020 CR (Within-document) CDLM [159] 2021 CR (Cross-document) CrossCR [160] 2021 CR (Cross-document) LLM-augmented KG construction CR-RL [161] 2021 CR (Cross-document) SentRE [162] 2019 RE (Sentence-level) Curriculum-RE [163] 2021 RE (Sentence-level) DREEAM [164] 2023 RE (document-level) Kumar et al. [98] 2020 End-to-End Construction Guo et al. [165] 2021 End-to-End Construction Grapher [41] 2021 End-to-End Construction PiVE [166] 2023 End-to-End Construction COMET [167] 2019 Distilling KGs from LLMs BertNet [168] 2022 Distilling KGs from LLMs West et al. [169] 2022 Distilling KGs from LLMs Ribeiro et al [170] 2021 Leveraging Knowledge from LLMs JointGT [42] 2021 Leveraging Knowledge from LLMs LLM-augmented KG-to-text Generation FSKG2Text [171] GAP [172] 2021 2022 Leveraging Knowledge from LLMs Leveraging Knowledge from LLMs GenWiki [173] 2020 Constructing KG-text aligned Corpus KGPT [174] 2020 Constructing KG-text aligned Corpus Luo et al. [175] 2020 Entity/Relation Extractor QA-GNN [118] 2021 Entity/Relation Extractor Lukovnikov et al. [176] 2023 Entity/Relation Extractor Nan et al. [177] 2023 Entity/Relation Extractor LLM-augmented KGQA DEKCOR [178] 2021 Answer Reasoner DRLK [179] 2022 Answer Reasoner OreoLM [180] 2022 Answer Reasoner GreaseLM [120] 2022 Answer Reasoner ReLMKG [181] 2022 Answer Reasoner UniKGQA [43] 2023 Answer Reasoner

5.1.1 LLMs as Text Encoders Pretrain-KGE [97] is a representative method that follows the framework shown in Fig. 16. Given a triple (h, r, t) from KGs, it firsts uses a LLM encoder to encode the textual de- scriptions of entities h, t, and relations r into representations as eh = LLM(Texth), et = LLM(Textt), er = LLM(Textr), (1) where eh, er, and et denotes the initial embeddings of enti- ties h, t, and relations r, respectively. Pretrain-KGE uses the BERT as the LLM encoder in experiments. Then, the initial embeddings are fed into a KGE model to generate the final embeddings vh, vr, and vt. During the KGE training phase, they optimize the KGE model by following the standard KGE loss function as L = [γ + f (vh, vr, vt) − f (v′ , v′ , v′)], (2) where f is the KGE scoring function, γ is a margin hy- perparameter, and v′ , v′ , and v′ are the negative samples.

h r t

However, these approaches often fall short in representing unseen entities and long-tailed relations due to their limited structural connectivity [185], [186]. To address this issue, as shown in Fig. 16, recent research adopt LLMs to enrich representations of KGs by encoding the textual descriptions of entities and relations [40], [97].

In this way, the KGE model could learn adequate struc- ture information, while reserving partial knowledge from LLM enabling better knowledge graph embedding. KEPLER [40] offers a unified model for knowledge embedding and pre-trained language representation. This model not only generates effective text-enhanced knowledge embedding

( Neil Armstrong, BornIn, Wapakoneta)

Fig. 17. LLMs for joint text and knowledge graph embedding.

Fig. 16. LLMs as text encoder for knowledge graph embedding (KGE).

using powerful LLMs but also seamlessly integrates factual knowledge into LLMs. Nayyeri et al. [137] use LLMs to gen- erate the world-level, sentence-level, and document-level representations. They are integrated with graph structure embeddings into a unified vector by Dihedron and Quater- nion representations of 4D hypercomplex numbers. Huang et al. [138] combine LLMs with other vision and graph encoders to learn multi-modal knowledge graph embedding that enhances the performance of downstream tasks. CoDEx [139] presents a novel loss function empowered by LLMs that guides the KGE models in measuring the likelihood of triples by considering the textual information. The proposed loss function is agnostic to model structure that can be incorporated with any KGE model.

5.1.2 LLMs for Joint Text and KG Embedding Instead of using KGE model to consider graph structure, another line of methods directly employs LLMs to incorpo- rate both the graph structure and textual information into the embedding space simultaneously. As shown in Fig. 17, kNN-KGE [141] treats the entities and relations as special tokens in the LLM. During training, it transfers each triple (h, r, t) and corresponding text descriptions into a sentence x as x = [CLS] h Texth[SEP] r [SEP] [MASK] Textt[SEP], (3) where the tailed entities are replaced by [MASK]. The sen- tence is fed into a LLM, which then finetunes the model to predict the masked entity, formulated as PLLM (t|h, r) = P ([MASK]=t|x, Θ), (4) where Θ denotes the parameters of the LLM. The LLM is optimized to maximize the probability of the correct entity t. After training, the corresponding token representations in LLMs are used as embeddings for entities and rela- tions. Similarly, LMKE [140] proposes a contrastive learning method to improve the learning of embeddings generated by LLMs for KGE. Meanwhile, to better capture graph structure, LambdaKG [142] samples 1-hop neighbor entities and concatenates their tokens with the triple as a sentence feeding into LLMs.

5.2 LLM-augmented KG Completion Knowledge Graph Completion (KGC) refers to the task of inferring missing facts in a given knowledge graph. Similar to KGE, conventional KGC methods mainly focused on the structure of the KG, without considering the exten- sive textual information. However, the recent integration of LLMs enables KGC methods to encode text or generate facts for better KGC performance. These methods fall into two distinct categories based on their utilization styles: 1) LLM as Encoders (PaE), and 2) LLM as Generators (PaG).

5.2.1 LLM as Encoders (PaE). As shown in Fig. 18 (a), (b), and (c), This line of work first uses encoder-only LLMs to encode textual information as well as KG facts. Then, they predict the plausibility of the triples by feeding the encoded representation into a predic- tion head, which could be a simple MLP or conventional KG score function (e.g., TransE [25] and TransR [187]). Joint Encoding. Since the encoder-only LLMs (e.g., Bert [1]) are well at encoding text sequences, KG-BERT [26] represents a triple (h, r, t) as a text sequence and encodes it with LLM Fig. 18(a). x = [CLS] Texth [SEP] Textr [SEP] Textt [SEP], (5) The final hidden state of the [CLS] token is fed into a classifier to predict the possibility of the triple, formulated as s = σ(MLP(e[CLS])), (6) where σ(·) denotes the sigmoid function and e[CLS] de- notes the representation encoded by LLMs. To improve the efficacy of KG-BERT, MTL-KGC [143] proposed a Multi- Task Learning for the KGC framework which incorporates additional auxiliary tasks into the model’s training, i.e. prediction (RP) and relevance ranking (RR). PKGC [144] assesses the validity of a triplet (h, r, t) by transforming the triple and its supporting information into natural language sentences with pre-defined templates. These sentences are then processed by LLMs for binary classification. The sup- porting information of the triplet is derived from the at- tributes of h and t with a verbalizing function. For instance, if the triple is (Lebron James, member of sports team, Lakers), the information regarding Lebron James is verbalized as ”Lebron James: American basketball player”. LASS [145] observes that language semantics and graph structures are equally vital to KGC. As a result, LASS is proposed to jointly learn two types of embeddings: semantic embedding

Triple:

Text Sequence: [CLS] Text [SEP] Text [SEP] Text [SEP]

(c) Separated Encoding

Fig. 18. The general framework of adopting LLMs as encoders (PaE) for KG Completion.

and structure embedding. In this method, the full text of a triple is forwarded to the LLM, and the mean pooling of the corresponding LLM outputs for h, r, and t are separately calculated. These embeddings are then passed to a graph- based method, i.e. TransE, to reconstruct the KG structures. MLM Encoding. Instead of encoding the full text of a triple, many works introduce the concept of Masked Lan- guage Model (MLM) to encode KG text (Fig. 18(b)). MEM- KGC [146] uses Masked Entity Model (MEM) classification mechanism to predict the masked entities of the triple. The input text is in the form of x = [CLS] Texth [SEP] Textr [SEP] [MASK] [SEP], (7) Similar to Eq. 4, it tries to maximize the probability that the masked entity is the correct entity t. Additionally, to enable the model to learn unseen entities, MEM-KGC integrates multitask learning for entities and super-class prediction based on the text description of entities: x = [CLS] [MASK] [SEP] Texth [SEP]. (8) OpenWorld KGC [147] expands the MEM-KGC model to address the challenges of open-world KGC with a pipeline framework, where two sequential MLM-based modules are defined: Entity Description Prediction (EDP), an auxiliary module that predicts a corresponding entity with a given textual description; Incomplete Triple Prediction (ITP), the target module that predicts a plausible entity for a given incomplete triple (h, r, ?). EDP first encodes the triple with Eq. 8 and generates the final hidden state, which is then forwarded into ITP as an embedding of the head entity in Eq. 7 to predict target entities. Separated Encoding. As shown in Fig. 18(c), these meth- ods involve partitioning a triple (h, r, t) into two distinct parts, i.e. (h, r) and t, which can be expressed as x(h,r) = [CLS] Texth [SEP] Textr [SEP], (9) xt = [CLS] Textt [SEP]. (10)

Then the two parts are encoded separately by LLMs, and the final hidden states of the [CLS] tokens are used as the rep- resentations of (h, r) and t, respectively. The representations are then fed into a scoring function to predict the possibility of the triple, formulated as s = fscore(e(h,r), et), (11) where fscore denotes the score function like TransE. StAR [148] applies Siamese-style textual encoders on their text, encoding them into separate contextualized rep- resentations. To avoid the combinatorial explosion of textual encoding approaches, e.g., KG-BERT, StAR employs a scor- ing module that involves both deterministic classifier and spatial measurement for representation and structure learn- ing respectively, which also enhances structured knowledge by exploring the spatial characteristics. SimKGC [149] is another instance of leveraging a Siamese textual encoder to encode textual representations. Following the encoding process, SimKGC applies contrastive learning techniques to these representations. This process involves computing the similarity between the encoded representations of a given triple and its positive and negative samples. In particular, the similarity between the encoded representation of the triple and the positive sample is maximized, while the sim- ilarity between the encoded representation of the triple and the negative sample is minimized. This enables SimKGC to learn a representation space that separates plausible and implausible triples. To avoid overfitting textural in- formation, CSPromp-KG [188] employs parameter-efficient prompt learning for KGC. LP-BERT [150] is a hybrid KGC method that combines both MLM Encoding and Separated Encoding. This ap- proach consists of two stages, namely pre-training and fine-tuning. During pre-training, the method utilizes the standard MLM mechanism to pre-train a LLM with KGC data. During the fine-tuning stage, the LLM encodes both parts and is optimized using a contrastive learning strategy (similar to SimKGC [149]).

5.2.2 LLM as Generators (PaG). Recent works use LLMs as sequence-to-sequence generators in KGC. As presented in Fig. 19 (a) and (b), these approaches involve encoder-decoder or decoder-only LLMs. The LLMs receive a sequence text input of the query triple (h, r, ?), and generate the text of tail entity t directly. GenKGC [99] uses the large language model BART [5] as the backbone model. Inspired by the in-context learning approach used in GPT-3 [60], where the model concatenates relevant samples to learn correct output answers, GenKGC proposes a relation-guided demonstration technique that includes triples with the same relation to facilitating the model’s learning process. In addition, during generation, an entity-aware hierarchical decoding method is proposed to reduce the time complexity. KGT5 [151] introduces a novel KGC model that fulfils four key requirements of such models: scalability, quality, versatility, and simplicity. To address these objectives, the proposed model employs a straightforward T5 small architecture. The model is distinct from previous KGC methods, in which it is randomly ini- tialized rather than using pre-trained models. KG-S2S [152]

Query Triple:

Text Sequence: [CLS] Text [SEP] Text [SEP]

[SEP] Text [SEP] Text [SEP]

(a) Encoder-Decoder PaG

[SEP] Text [SEP]

[SEP] Text [SEP] Text [SEP]

(a) Decoder-Only PaG

Fig. 19. The general framework of adopting LLMs as decoders (PaG) for KG Completion. The En. and De. denote the encoder and decoder, respectively.

is a comprehensive framework that can be applied to var- ious types of KGC tasks, including Static KGC, Temporal KGC, and Few-shot KGC. To achieve this objective, KG-S2S reformulates the standard triple KG fact by introducing an additional element, forming a quadruple (h, r, t, m), where m represents the additional ”condition” element. Although different KGC tasks may refer to different conditions, they typically have a similar textual format, which enables uni- fication across different KGC tasks. The KG-S2S approach incorporates various techniques such as entity description, soft prompt, and Seq2Seq Dropout to improve the model’s performance. In addition, it utilizes constrained decoding to ensure the generated entities are valid. For closed-source LLMs (e.g., ChatGPT and GPT-4), AutoKG adopts prompt engineering to design customized prompts [96]. As shown in Fig. 20, these prompts contain the task description, few- shot examples, and test input, which instruct LLMs to predict the tail entity for KG completion.

5.2.3 Comparison between PaE and PaG LLMs as Encoders (PaE) applies an additional prediction head on the top of the representation encoded by LLMs. Therefore, the PaE framework is much easier to finetune since we can only optimize the prediction heads and freeze the LLMs. Moreover, the output of the prediction can be eas- ily specified and integrated with existing KGC functions for different KGC tasks. However, during the inference stage, the PaE requires to compute a score for every candidate in KGs, which could be computationally expensive. Besides, they cannot generalize to unseen entities. Furthermore, the PaE requires the representation output of the LLMs, whereas some state-of-the-art LLMs (e.g. GPT-41) are closed sources and do not grant access to the representation output. LLMs as Generators (PaG), on the other hand, which does not need the prediction head, can be used without finetuning or access to representations. Therefore, the frame- work of PaG is suitable for all kinds of LLMs. In addition, PaG directly generates the tail entity, making it efficient

Fig. 20. The framework of prompt-based PaG for KG Completion.

in inference without ranking all the candidates and easily generalizing to unseen entities. But, the challenge of PaG is that the generated entities could be diverse and not lie in KGs. What is more, the time of a single inference is longer due to the auto-regressive generation. Last, how to design a powerful prompt that feeds KGs into LLMs is still an open question. Consequently, while PaG has demonstrated promising results for KGC tasks, the trade-off between model complexity and computational efficiency must be carefully considered when selecting an appropriate LLM- based KGC framework.

5.2.4 Model Analysis Justin et al. [189] provide a comprehensive analysis of KGC methods integrated with LLMs. Their research investigates the quality of LLM embeddings and finds that they are suboptimal for effective entity ranking. In response, they propose several techniques for processing embeddings to improve their suitability for candidate retrieval. The study also compares different model selection dimensions, such as Embedding Extraction, Query Entity Extraction, and Lan- guage Model Selection. Lastly, the authors propose a frame- work that effectively adapts LLM for knowledge graph completion.

5.3 LLM-augmented KG Construction Knowledge graph construction involves creating a struc- tured representation of knowledge within a specific domain. This includes identifying entities and their relationships with each other. The process of knowledge graph construc- tion typically involves multiple stages, including 1) entity discovery, 2) coreference resolution, and 3) relation extraction. Fig 21 presents the general framework of applying LLMs for each stage in KG construction. More recent approaches have explored 4) end-to-end knowledge graph construction, which involves constructing a complete knowledge graph in one step or directly 5) distilling knowledge graphs from LLMs.

5.3.1 Entity Discovery Entity discovery in KG construction refers to the process of identifying and extracting entities from unstructured data

Knowledge Graph

Text: Joe Biden was born in Pennsylvania. He serves as the 46th President of the United States.

Fig. 21. The general framework of LLM-based KG construction.

sources, such as text documents, web pages, or social me- dia posts, and incorporating them to construct knowledge graphs. Named Entity Recognition (NER) involves identifying and tagging named entities in text data with their positions and classifications. The named entities include people, or- ganizations, locations, and other types of entities. The state- of-the-art NER methods usually employ LLMs to leverage their contextual understanding and linguistic knowledge for accurate entity recognition and classification. There are three NER sub-tasks based on the types of NER spans identified, i.e., flat NER, nested NER, and discontinuous NER. 1) Flat NER is to identify non-overlapping named entities from input text. It is usually conceptualized as a sequence labelling problem where each token in the text is assigned a unique label based on its position in the sequence [1], [190]–[192]. 2) Nested NER considers complex scenarios which allow a token to belong to multiple entities. The span-based method [193]– [197] is a popular branch of nested NER which involves enumerating all candidate spans and classifying them into entity types (including a non-entity type). Parsing-based methods [198]–[200] reveal similarities between nested NER and constituency parsing tasks (predicting nested and non- overlapping spans), and propose to integrate the insights of constituency parsing into nested NER. 3) Discontinuous NER identifies named entities that may not be contiguous in the text. To address this challenge, [201] uses the LLM output to identify entity fragments and determine whether they are overlapped or in succession. Unlike the task-specific methods, GenerativeNER [202] uses a sequence-to-sequence LLM with a pointer mecha- nism to generate an entity sequence, which is capable of solving all three types of NER sub-tasks. Entity Typing (ET) aims to provide fine-grained and ultra-grained type information for a given entity men- tioned in context. These methods usually utilize LLM to encode mentions, context and types. LDET [153] applies pre- trained ELMo embeddings [190] for word representation and adopts LSTM as its sentence and mention encoders.

BOX4Types [154] recognizes the importance of type depen- dency and uses BERT to represent the hidden vector and each type in a hyperrectangular (box) space. LRN [155] considers extrinsic and intrinsic dependencies between la- bels. It encodes the context and entity with BERT and employs these output embeddings to conduct deductive and inductive reasoning. MLMET [203] uses predefined patterns to construct input samples for the BERT MLM and employs [MASK] to predict context-dependent hypernyms of the mention, which can be viewed as type labels. PL [204] and DFET [205] utilize prompt learning for entity typing. LITE [206] formulates entity typing as textual inference and uses RoBERTa-large-MNLI as the backbone network. Entity Linking (EL), as known as entity disambiguation, involves linking entity mentions appearing in the text to their corresponding entities in a knowledge graph. [207] proposed BERT-based end-to-end EL systems that jointly discover and link entities. ELQ [208] employs a fast bi- encoder architecture to jointly perform mention detection and linking in one pass for downstream question answering systems. Unlike previous models that frame EL as matching in vector space, GENRE [209] formulates it as a sequence-to- sequence problem, autoregressively generating a version of the input markup-annotated with the unique identifiers of an entity expressed in natural language. GENRE is extended to its multilingual version mGENRE [210]. Considering the efficiency challenges of generative EL approaches, [211] par- allelizes autoregressive linking across all potential mentions and relies on a shallow and efficient decoder. ReFinED [212] proposes an efficient zero-shot-capable EL approach by taking advantage of fine-grained entity types and entity descriptions which are processed by a LLM-based encoder. 5.3.2 Coreference Resolution (CR) Coreference resolution is to find all expressions (i.e., men- tions) that refer to the same entity or event in a text. Within-document CR refers to the CR sub-task where all these mentions are in a single document. Mandar et al. [157] initialize LLM-based coreferences resolution by replacing the previous LSTM encoder [213] with BERT. This work is followed by the introduction of SpanBERT [158] which is pre-trained on BERT architecture with a span-based masked language model (MLM). Inspired by these works, Tuan Manh et al. [214] present a strong baseline by incorporat- ing the SpanBERT encoder into a non-LLM approach e2e- coref [213]. CorefBERT leverages Mention Reference Predic- tion (MRP) task which masks one or several mentions and requires the model to predict the masked mention’s corre- sponding referents. CorefQA [215] formulates coreference resolution as a question answering task, where contextual queries are generated for each candidate mention and the coreferent spans are extracted from the document using the queries. Tuan Manh et al. [216] introduce a gating mech- anism and a noisy training method to extract information from event mentions using the SpanBERT encoder. In order to reduce the large memory footprint faced by large LLM-based NER models, Yuval et al. [217] and Raghuveer el al. [218] proposed start-to-end and approxima- tion models, respectively, both utilizing bilinear functions to calculate mention and antecedent scores with reduced reliance on span-level representations.

Cross-document CR refers to the sub-task where the

Construct KGs

mentions refer to the same entity or event might be across multiple documents. CDML [159] proposes a cross docu- ment language modeling method which pre-trains a Long- former [219] encoder on concatenated related documents and employs an MLP for binary classification to determine whether a pair of mentions is coreferent or not. CrossCR [160] utilizes an end-to-end model for cross-document coref- erence resolution which pre-trained the mention scorer on

Cloze Question

Obama born in [MASK] Honolulu is located in [MASK] USA's capital is [MASK]

LLMs

Distilled Triples

(Obama, BornIn, Honolulu) (Honolulu, LocatedIn, USA) (Washingto D.C., CapitalOf, USA)

Brarck Obama

BornIn

USA

Michelle Obama

Honolulu

Washingto D.C.

gold mention spans and uses a pairwise scorer to compare mentions with all spans across all documents. CR-RL [161] proposes an actor-critic deep reinforcement learning-based coreference resolver for cross-document CR.

5.3.3 Relation Extraction (RE) Relation extraction involves identifying semantic relation- ships between entities mentioned in natural language text. There are two types of relation extraction methods, i.e. sentence-level RE and document-level RE, according to the scope of the text analyzed. Sentence-level RE focuses on identifying relations be- tween entities within a single sentence. Peng et al. [162] and TRE [220] introduce LLM to improve the performance of relation extraction models. BERT-MTB [221] learns relation representations based on BERT by performing the matching- the-blanks task and incorporating designed objectives for relation extraction. Curriculum-RE [163] utilizes curriculum learning to improve relation extraction models by gradu- ally increasing the difficulty of the data during training. RECENT [222] introduces SpanBERT and exploits entity type restriction to reduce the noisy candidate relation types. Jiewen [223] extends RECENT by combining both the entity information and the label information into sentence-level embeddings, which enables the embedding to be entity- label aware. Document-level RE (DocRE) aims to extract relations between entities across multiple sentences within a docu- ment. Hong et al. [224] propose a strong baseline for DocRE by replacing the BiLSTM backbone with LLMs. HIN [225] use LLM to encode and aggregate entity representation at different levels, including entity, sentence, and document levels. GLRE [226] is a global-to-local network, which uses LLM to encode the document information in terms of entity global and local representations as well as context relation representations. SIRE [227] uses two LLM-based encoders to extract intra-sentence and inter-sentence relations. LSR [228] and GAIN [229] propose graph-based approaches which induce graph structures on top of LLM to better extract relations. DocuNet [230] formulates DocRE as a semantic segmentation task and introduces a U-Net [231] on the LLM encoder to capture local and global dependencies between entities. ATLOP [232] focuses on the multi-label problems in DocRE, which could be handled with two techniques, i.e., adaptive thresholding for classifier and localized con- text pooling for LLM. DREEAM [164] further extends and improves ATLOP by incorporating evidence information.

5.3.4 End-to-End KG Construction Currently, researchers are exploring the use of LLMs for end-to-end KG construction. Kumar et al. [98] propose a

Fig. 22. The general framework of distilling KGs from LLMs.

unified approach to build KGs from raw text, which con- tains two LLMs powered components. They first finetune a LLM on named entity recognition tasks to make it capable of recognizing entities in raw text. Then, they propose another “2-model BERT” for solving the relation extraction task, which contains two BERT-based classifiers. The first classifier learns the relation class whereas the second binary classifier learns the direction of the relations between the two entities. The predicted triples and relations are then used to construct the KG. Guo et al. [165] propose an end- to-end knowledge extraction model based on BERT, which can be applied to construct KGs from Classical Chinese text. Grapher [41] presents a novel end-to-end multi-stage sys- tem. It first utilizes LLMs to generate KG entities, followed by a simple relation construction head, enabling efficient KG construction from the textual description. PiVE [166] pro- poses a prompting with an iterative verification framework that utilizes a smaller LLM like T5 to correct the errors in KGs generated by a larger LLM (e.g., ChatGPT). To further explore advanced LLMs, AutoKG design several prompts for different KG construction tasks (e.g., entity typing, entity linking, and relation extraction). Then, it adopts the prompt to perform KG construction using ChatGPT and GPT-4.

5.3.5 Distilling Knowledge Graphs from LLMs LLMs have been shown to implicitly encode massive knowl- edge [14]. As shown in Fig. 22, some research aims to distill knowledge from LLMs to construct KGs. COMET [167] proposes a commonsense transformer model that constructs commonsense KGs by using existing tuples as a seed set of knowledge on which to train. Using this seed set, a LLM learns to adapt its learned representations to knowledge generation, and produces novel tuples that are high quality. Experimental results reveal that implicit knowledge from LLMs is transferred to generate explicit knowledge in com- monsense KGs. BertNet [168] proposes a novel framework for automatic KG construction empowered by LLMs. It re- quires only the minimal definition of relations as inputs and automatically generates diverse prompts, and performs an efficient knowledge search within a given LLM for consis- tent outputs. The constructed KGs show competitive quality, diversity, and novelty with a richer set of new and complex relations, which cannot be extracted by previous methods. West et al. [169] propose a symbolic knowledge distillation framework that distills symbolic knowledge from LLMs. They first finetune a small student LLM by distilling com- monsense facts from a large LLM like GPT-3. Then, the student LLM is utilized to generate commonsense KGs.

5.4 LLM-augmented KG-to-text Generation The goal of Knowledge-graph-to-text (KG-to-text) genera- tion is to generate high-quality texts that accurately and consistently describe the input knowledge graph infor- mation [233]. KG-to-text generation connects knowledge graphs and texts, significantly improving the applicability of KG in more realistic NLG scenarios, including story- telling [234] and knowledge-grounded dialogue [235]. How- ever, it is challenging and costly to collect large amounts of graph-text parallel data, resulting in insufficient training and poor generation quality. Thus, many research efforts re-

KGs Brarck Obama

BornIn

USA

Michelle Obama

Honolulu Graph Linearization Brack Obama [SEP] PoliticianOf [SEP] USA [SEP] ..... [SEP] Michelle Obama

Washingto D.C.

LLMs

Description Text

Brack Obama is a politician of USA. He was born in Honolulu, and married to Michelle Obama.

sort to either: 1) leverage knowledge from LLMs or 2) construct large-scale weakly-supervised KG-text corpus to solve this issue. 5.4.1 Leveraging Knowledge from LLMs As pioneering research efforts in using LLMs for KG-to-Text generation, Ribeiro et al. [170] and Kale and Rastogi [236] directly fine-tune various LLMs, including BART and T5, with the goal of transferring LLMs knowledge for this task. As shown in Fig. 23, both works simply represent the input graph as a linear traversal and find that such a naive approach successfully outperforms many existing state-of-the-art KG-to-text generation systems. Interestingly, Ribeiro et al. [170] also find that continue pre-training could further improve model performance. However, these meth- ods are unable to explicitly incorporate rich graph semantics in KGs. To enhance LLMs with KG structure information, JointGT [42] proposes to inject KG structure-preserving representations into the Seq2Seq large language models. Given input sub-KGs and corresponding text, JointGT first represents the KG entities and their relations as a sequence of tokens, then concatenate them with the textual tokens which are fed into LLM. After the standard self-attention module, JointGT then uses a pooling layer to obtain the contextual semantic representations of knowledge entities and relations. Finally, these pooled KG representations are then aggregated in another structure-aware self-attention layer. JointGT also deploys additional pre-training objec- tives, including KG and text reconstruction tasks given masked inputs, to improve the alignment between text and graph information. Li et al. [171] focus on the few-shot scenario. It first employs a novel breadth-first search (BFS) strategy to better traverse the input KG structure and feed the enhanced linearized graph representations into LLMs for high-quality generated outputs, then aligns the GCN- based and LLM-based KG entity representation. Colas et al. [172] first transform the graph into its appropriate repre- sentation before linearizing the graph. Next, each KG node is encoded via a global attention mechanism, followed by a graph-aware attention module, ultimately being decoded into a sequence of tokens. Different from these works, KG- BART [37] keeps the structure of KGs and leverages the graph attention to aggregate the rich concept semantics in the sub-KG, which enhances the model generalization on unseen concept sets. 5.4.2 Constructing large weakly KG-text aligned Corpus Although LLMs have achieved remarkable empirical suc- cess, their unsupervised pre-training objectives are not nec- essarily aligned well with the task of KG-to-text genera- tion, motivating researchers to develop large-scale KG-text

Fig. 23. The general framework of KG-to-text generation.

aligned corpus. Jin et al. [173] propose a 1.3M unsupervised KG-to-graph training data from Wikipedia. Specifically, they first detect the entities appearing in the text via hyperlinks and named entity detectors, and then only add text that shares a common set of entities with the corresponding knowledge graph, similar to the idea of distance supervision in the relation extraction task [237]. They also provide a 1,000+ human annotated KG-to-Text test data to verify the effectiveness of the pre-trained KG-to-Text models. Simi- larly, Chen et al. [174] also propose a KG-grounded text corpus collected from the English Wikidump. To ensure the connection between KG and text, they only extract sentences with at least two Wikipedia anchor links. Then, they use the entities from those links to query their surrounding neighbors in WikiData and calculate the lexical overlapping between these neighbors and the original sentences. Finally, only highly overlapped pairs are selected. The authors ex- plore both graph-based and sequence-based encoders and identify their advantages in various different tasks and settings.

5.5 LLM-augmented KG Question Answering Knowledge graph question answering (KGQA) aims to find answers to natural language questions based on the struc- tured facts stored in knowledge graphs [238], [239]. The inevitable challenge in KGQA is to retrieve related facts and extend the reasoning advantage of KGs to QA. Therefore, recent studies adopt LLMs to bridge the gap between nat- ural language questions and structured knowledge graphs [177], [178], [240]. The general framework of applying LLMs for KGQA is illustrated in Fig. 24, where LLMs can be used as 1) entity/relation extractors, and 2) answer reasoners. 5.5.1 LLMs as Entity/relation Extractors Entity/relation extractors are designed to identify entities and relationships mentioned in natural language questions and retrieve related facts in KGs. Given the proficiency in language comprehension, LLMs can be effectively utilized for this purpose. Lukovnikov et al. [176] are the first to uti- lize LLMs as classifiers for relation prediction, resulting in a notable improvement in performance compared to shallow neural networks. Nan et al. [177] introduce two LLM-based KGQA frameworks that adopt LLMs to detect mentioned entities and relations. Then, they query the answer in KGs using the extracted entity-relation pairs. QA-GNN [118] uses LLMs to encode the question and candidate answer pairs, which are adopted to estimate the importance of

Question: Where was Neil Armstrong born in?

Fig. 24. The general framework of applying LLMs for knowledge graph question answering (KGQA).

relative KG entities. The entities are retrieved to form a subgraph, where an answer reasoning is conducted by a graph neural network. Luo et al. [175] use LLMs to calculate the similarities between relations and questions to retrieve related facts, formulated as s(r, q) = LLM(r)⊤LLM(q), (12) where q denotes the question, r denotes the relation, and LLM(·) would generate representation for q and r, respec- tively. Furthermore, Zhang et al. [241] propose a LLM-based path retriever to retrieve question-related relations hop-by- hop and construct several paths. The probability of each path can be calculated as |p| P (p|q) = s(rt, q), (13) t=1 where p denotes the path, and rt denotes the relation at the t-th hop of p. The retrieved relations and paths can be used as context knowledge to improve the performance of answer reasoners as P (a|q) = P (a|p)P (p|q), (14) p∈P where P denotes retrieved paths and a denotes the answer.

5.5.2 LLMs as Answer Reasoners Answer reasoners are designed to reason over the retrieved facts and generate answers. LLMs can be used as answer reasoners to generate answers directly. For example, as shown in Fig. 3 24, DEKCOR [178] concatenates the re- trieved facts with questions and candidate answers as x = [CLS] q [SEP] Related Facts [SEP] a [SEP], (15) where a denotes candidate answers. Then, it feeds them into LLMs to predict answer scores. After utilizing LLMs to generate the representation of x as QA context, DRLK [179] proposes a Dynamic Hierarchical Reasoner to capture the interactions between QA context and answers for answer

prediction. Yan et al. [240] propose a LLM-based KGQA framework consisting of two stages: (1) retrieve related facts from KGs and (2) generate answers based on the retrieved facts. The first stage is similar to the entity/relation extractors. Given a candidate answer entity a, it extracts a series of paths p1, . . . , pn from KGs. But the second stage is a LLM-based answer reasoner. It first verbalizes the paths by using the entity names and relation names in KGs. Then, it concatenates the question q and all paths p1, . . . , pn to make an input sample as x = [CLS] q [SEP] p1 [SEP] · · · [SEP] pn [SEP]. (16) These paths are regarded as the related facts for the can- didate answer a. Finally, it uses LLMs to predict whether the hypothesis: “a is the answer of q” is supported by those facts, which is formulated as e[CLS] = LLM(x), (17) s = σ(MLP(e[CLS])), (18) where it encodes x using a LLM and feeds representation corresponding to [CLS] token for binary classification, and σ(·) denotes the sigmoid function. To better guide LLMs reason through KGs, OreoLM [180] proposes a Knowledge Interaction Layer (KIL) which is in- serted amid LLM layers. KIL interacts with a KG reasoning module, where it discovers different reasoning paths, and then the reasoning module can reason over the paths to generate answers. GreaseLM [120] fuses the representations from LLMs and graph neural networks to effectively reason over KG facts and language context. UniKGQA [43] unifies the facts retrieval and reasoning into a unified framework. UniKGQA consists of two modules. The first module is a semantic matching module that uses a LLM to match questions with their corresponding relations semantically. The second module is a matching information propagation module, which propagates the matching information along directed edges on KGs for answer reasoning. Similarly, ReLMKG [181] performs joint reasoning on a large language model and the associated knowledge graph. The question and verbalized paths are encoded by the language model, and different layers of the language model produce outputs that guide a graph neural network to perform message pass- ing. This process utilizes the explicit knowledge contained in the structured knowledge graph for reasoning purposes. StructGPT [242] adopts a customized interface to allow large language models (e.g., ChatGPT) directly reasoning on KGs to perform multi-step question answering.

6 SYNERGIZED LLMS + KGS The synergy of LLMs and KGs has attracted increasing attention these years, which marries the merits of LLMs and KGs to mutually enhance performance in various down- stream applications. For example, LLMs can be used to understand natural language, while KGs are treated as a knowledge base, which provides factual knowledge. The unification of LLMs and KGs could result in a powerful model for knowledge representation and reasoning. In this section, we will discuss the Synergized LLMs + KGs from two perspectives: 1) knowledge representation, and 2) reasoning. We have summarized the representative works in Table 4.

TABLE 4 Summary of methods that synergize KGs and LLMs.

Task Method Year

JointGT [42] 2021 Knowledge representation KEPLER [40] DRAGON [44] 2021 2022 HKLM [243] 2023

QA-GNN [118] 2021 LARK [45] 2023 Reasoning Siyuan et al. [46] 2023 RecInDial [244] 2022

KnowledgeDA [245] 2022

6.1 Knowledge Representation Text corpus and knowledge graphs both contain enormous knowledge. However, the knowledge in the text corpus is usually implicit and unstructured, while the knowledge in KGs is explicit and structured. Therefore, it is necessary to align the knowledge in the text corpus and KGs to represent them in a unified way. The general framework of unifying LLMs and KGs for knowledge representation is shown in Fig. 25. KEPLER [40] presents a unified model for knowledge embedding and pre-trained language representation. In KE- PLER, they encode textual entity descriptions with a LLM as their embeddings, and then jointly optimize the knowledge embedding and language modeling objectives. JointGT [42] proposes a graph-text joint representation learning model, which proposes three pre-training tasks to align represen- tations of graph and text. DRAGON [44] presents a self- supervised method to pre-train a joint language-knowledge foundation model from text and KG. It takes text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities. Then, DRAGON utilizes two self-supervised reasoning tasks, i.e., masked language modeling and KG link prediction to optimize the model parameters. HKLM [243] introduces a unified LLM which incorporates KGs to learn representations of domain- specific knowledge.

6.2 Reasoning To take advantage of both LLMs and KGs, researchers synergize LLMs and KGs to perform reasoning on various applications. In the question answering task, QA-GNN [118] first utilizes LLMs to process the text question and guide the reasoning step on the KGs. In this way, it can bridge the gap between text and structural information, which provides interpretability for the reasoning process. In the knowledge graph reasoning task, LARK [45] proposes a LLM-guided logical reasoning method. It first transforms the conventional logical rules into a language sequence and then asks LLMs to reason the final outputs. Moreover, Siyuan et al. [46] unify structure reasoning and language mode pre-training in a unified framework. Given a text input, they adopt LLMs to generate the logical query, which is executed on the KGs to obtain structural context. Last, the structural context is fused with textual information to generate the final output. RecInDial [244] combines the

Text Corpus Knowledge Graph

Training Objects

Fig. 25. The general framework of unifying LLMs and KGs for knowledge representation.

knowledge graphs and LLMs to provide personalized rec- ommendations in the dialogue system. KnowledgeDA [245] proposes a unified domain language model development pipline to enhance the task-specific training procedure with domain knowledge graphs.

7 FUTURE DIRECTIONS In the previous sections, we have reviewed the recent ad- vances in unifying KGs and LLMs, but here are still many challenges and open problems that need to be addressed. In this section, we discuss the future directions of this research area.

7.1 KGs for Hallucination Detection in LLMs The hallucination problem in LLMs [246], which generates factually incorrect content, significantly hinders the reliabil- ity of LLMs. As discussed in Section 4, existing studies try to utilize KGs to obtain more reliable LLMs through pre- training or KG-enhanced inference. Despite the efforts, the issue of hallucination may continue to persist in the realm of LLMs for the foreseeable future. Consequently, in order to gain the public’s trust and border applications, it is impera- tive to detect and assess instances of hallucination within LLMs and other forms of AI-generated content (AIGC). Existing methods strive to detect hallucination by training a neural classifier on a small set of documents [247], which are neither robust nor powerful to handle ever-growing LLMs. Recently, researchers try to use KGs as an external source to validate LLMs [248]. Further studies combine LLMs and KGs to achieve a generalized fact-checking model that can detect hallucinations across domains [249]. Therefore, it opens a new door to utilizing KGs for hallucination detection.

7.2 KGs for Editing Knowledge in LLMs Although LLMs are capable of storing massive real-world knowledge, they cannot quickly update their internal knowledge updated as real-world situations change. There

are some research efforts proposed for editing knowledge in LLMs [250], [251] without re-training the whole LLMs. Yet, such solutions still suffer from poor performance or computational overhead. Existing studies [252], [253] also propose solutions to edit knowledge in LLMs, but only limit themselves to handling simple tuple-based knowledge in KGs. In addition, there are still challenges, such as catas- trophic forgetting and incorrect knowledge editing [254], leaving much room for further research.

7.3 KGs for Black-box LLMs Knowledge Injection Although pre-training and knowledge editing could update LLMs to catch up with the latest knowledge, they still need to access the internal structures and parameters of LLMs. However, many state-of-the-art large LLMs (e.g., ChatGPT) only provide APIs for users and developers to access, mak- ing themselves black-box to the public. Consequently, it is impossible to follow conventional KG injection approaches described [95], [134] that change LLM structure by adding additional knowledge fusion modules. Converting various types of knowledge into different text prompts seems to be a feasible solution. However, it is unclear whether these prompts can generalize well to new LLMs. Moreover, the prompt-based approach is limited to the length of input to- kens of LLMs. Therefore, how to enable effective knowledge injection for black-box LLMs is still an open question for us to explore [255], [256].

7.4 Multi-Modal LLMs for KGs Current knowledge graphs typically rely on textual and graph structure to handle KG-related applications. How- ever, real-world knowledge graphs are often constructed by data from diverse modalities [102], [257], [258]. Therefore, effectively leveraging representations from multiple modal- ities would be a significant challenge for future research in KGs. One potential solution is to develop methods that can accurately encode and align entities across different modalities [259]. Recently, with the development of multi- modal LLMs [101], [260], leveraging LLMs for modality alignment holds promise in this regard. But, bridging the gap between multi-modal LLMs and KG structure remains a crucial challenge in this field, demanding further investi- gation and advancements.

7.5 LLMs for Understanding KG Structure Conventional LLMs trained on plain text data are not

7.6 Synergized LLMs and KGs for Birectional Reason- ing KGs and LLMs are two complementary technologies that can synergize each other. However, the synergy of LLMs and KGs is less explored by existing researchers. A desired synergy of LLMs and KGs would involve leveraging the strengths of both technologies to overcome their individual limitations. LLMs, such as ChatGPT, excel in generating human-like text and understanding natural language, while KGs are structured databases that capture and represent knowledge in a structured manner. By combining their capa- bilities, we can create a powerful system that benefits from the contextual understanding of LLMs and the structured knowledge representation of KGs. To better unify LLMs and KGs, many advanced techniques need to be incorporated, such as multi-modal learning [261], graph neural network [262], and continuous learning [263]. Last, the synergy of LLMs and KGs can be applied to many real-world applica- tions, such as search engines [103], recommender systems [10], [90], and drug discovery. With a given application problem, we can apply a KG to perform a knowledge-driven search for potential goals and unseen data, and simultaneously start with LLMs to perform a data/text-driven inference to see what new data/goal items can be derived. When the knowledge-based search is combined with data/text-driven inference, they can mutually validate each other, resulting in efficient and effective solutions powered by dual-driving wheels. There- fore, we can anticipate increasing attention to unlock the po- tential of integrating KGs and LLMs for diverse downstream applications with both generative and reasoning capabilities in the near future.

8 CONCLUSION Unifying large language models (LLMs) and knowledge graphs (KGs) is an active research direction that has at- tracted increasing attention from both academia and in- dustry. In this article, we provide a thorough overview of the recent research in this field. We first introduce different manners that integrate KGs to enhance LLMs. Then, we introduce existing methods that apply LLMs for KGs and establish taxonomy based on varieties of KG tasks. Finally, we discuss the challenges and future directions in this field. We hope this article can provide a comprehensive understanding of this field and advance future research.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2023 UnifyingLargeLanguageModelsandK	Xindong Wu Shirui Pan Linhao Luo Yufei Wang Chen Chen Jiapu Wang			Unifying Large Language Models and Knowledge Graphs: A Roadmap				10.48550/arXiv.2306.08302		2023

↑ LLMs are also known as pre-trained language models (PLMs).
↑ https://openai.com/blog/chatgpt
↑ https://ai.google/discover/palm2

[1] LLMs are also known as pre-trained language models (PLMs).

[2] ttps://openai.com/blog/chatgpt

[3] ttps://ai.google/discover/palm2

[1]

[2]

[3]

2023 UnifyingLargeLanguageModelsandK

Notes

Cited By

Quotes

Abstract

1 INTRODUCTION

2 BACKGROUND

2.1 Large Language models (LLMs)

2.2 Prompt Engineering

2.3 Knowledge Graphs (KGs)

2.3.1 Encyclopedic Knowledge Graphs.

3 ROADMAP & CATEGORIZATION

3.1 Roadmap

3.1.1 KG-enhanced LLMs

3.1.2 LLM-augmented KGs

3.1.3 Synergized LLMs + KGs

3.2 Categorization =

4 KG-ENHANCED LLMS

4.1 KG-enhanced LLM Pre-training

References

Navigation menu

Search