🌟 Most Impactful RAG Papers
(March 2023 to April 2024)
The concept of Retrieval-Augmented Generation (RAG) was introduced in 2021 through a seminal paper. Since then, there has been significant growth in RAG research, particularly in the past year, driven by the emergence of numerous LLMs. RAG has become one of the most widely used applications of LLMs. The below table provides summaries of top papers published between March 2023 and May 2024, covering various topics related to RAG research. These topics are:
This table will continue to be updated regularly, so stay tuned for more updates!
| Title | Description | Tags | Date |
|---|---|---|---|
| Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts | The work highlights the limitations of current large language models in knowledge-intensive tasks due to issues like untimeliness, high costs of knowledge updates, and hallucinations. It introduces METRAG, a Multi-layered Thoughts enhanced Retrieval-Augmented Generation framework, which goes beyond traditional similarity-oriented methods by incorporating both similarity- and utility-oriented thoughts, and uses an LLM as a task-adaptive summarizer. Extensive experiments demonstrate that METRAG significantly improves the performance of retrieval-augmented generation in knowledge-intensive tasks. | RAG Enhancement | May 2024 |
| HippoRAG Neurobiologically Inspired Long-Term Memory for Large Language Models | The paper introduces HippoRAG, a retrieval framework inspired by the hippocampal indexing theory to enhance knowledge integration in large language models. By combining LLMs, knowledge graphs, and the Personalized PageRank algorithm, HippoRAG mimics human memory processes. Experiments show it significantly outperforms existing methods in multi-hop question answering, offering improved performance, cost efficiency, and speed. | RAG Enhancement | May 2024 |
| Don't Forget to Connect! Improving RAG with Graph-based Reranking | The paper addresses challenges in Retrieval Augmented Generation when documents have partial information or less obvious connections to the context. Introducing G-RAG, a reranker based on graph neural networks (GNNs), the method combines document connections and semantic information to enhance RAG. G-RAG outperforms state-of-the-art approaches with a smaller computational footprint, and significantly outperforms PaLM 2 as a reranker, highlighting the importance of effective reranking in RAG. | Retrieval Improvement | May 2024 |
| GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning | The paper introduces GNN-RAG, a method that combines LLMs and Graph Neural Networks (GNNs) for Knowledge Graph Question Answering (KGQA). GNN-RAG uses GNNs to retrieve answer candidates from dense KG subgraphs and LLMs to reason over extracted paths. This approach significantly improves performance on KGQA benchmarks, outperforming state-of-the-art models, including GPT-4, especially in multi-hop and multi-entity questions. | Domain-Specific RAG | May 2024 |
| Observations on Building RAG Systems for Technical Documents | Retrieval augmented generation (RAG) for technical documents creates challenges as embeddings do not often capture domain information. The paper reviews prior art for important factors affecting RAG and perform experiments to highlight best practices and potential challenges to build RAG systems for technical documents. | RAG Survey | May 2024 |
| RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing | The paper surveys how LLMs tackle NLP challenges, integrating external information to boost performance. It explores Retrieval-Augmented Language Models (RALMs) like RAG and RAU, detailing their evolution, taxonomy, and applications in various NLP tasks. Key components and evaluation methods are discussed, emphasizing strengths, limitations, and avenues for future research to enhance retrieval quality and efficiency. Overall, it offers structured insights into RALMs' potential for advancing NLP. | RAG Survey | April 2024 |
| When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | The paper illustrates how LLMs can effectively integrate with information retrieval (IR) systems, especially when additional context is necessary for answering questions. It suggests that while popular questions are often answered by LLMs' parametric memory, less popular ones benefit from IR usage. A tailored training approach introduces a special token, ⟨RET⟩, for questions where LLMs lack answers, leading to improvements demonstrated by the Adaptive Retrieval LLM (ADAPT-LLM) on the PopQA dataset. Evaluation reveals ADAPT-LLM's ability to use ⟨RET⟩ for questions needing IR, while maintaining high accuracy relying solely on parametric memory. | RAG Enhancement | April 2024 |
| A Survey on Retrieval-Augmented Text Generation for Large Language Models | The paper introduces Retrieval-Augmented Generation which combines retrieval methods with deep learning to overcome the static limitations of large language models by integrating real-time external information. Focusing on text, RAG mitigates LLMs' tendency to generate inaccurate responses, enhancing reliability through real-world data. Organized into pre-retrieval, retrieval, post-retrieval, and generation stages, the paper outlines RAG's evolution and evaluates its performance, aiming to consolidate research, clarify its technology, and broaden LLMs' applicability. | RAG Survey | April 2024 |
| RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback | RA-ISF proposes Retrieval Augmented Iterative Self-Feedback to enhance large language models' problem-solving abilities by iteratively decomposing tasks and processing them in three submodules. Experiments demonstrate its superiority over existing benchmarks like GPT3.5 and Llama2, notably improving factual reasoning and reducing hallucinations. | RAG Enhancement | March 2024 |
| RAFT: Adapting Language Model to Domain Specific RAG | This paper introduces RAFT (Retrieval Augmented FineTuning), a training approach designed to enhance a pre-trained Large Language Model's ability to answer questions in domain-specific contexts. RAFT focuses on adapting the model to gain new knowledge by fine-tuning it to ignore irrelevant documents retrieved during the question-answering process. By selectively citing relevant information from retrieved documents, RAFT improves the model's reasoning capabilities and performance across various datasets like PubMed, HotpotQA, and Gorilla. | RAG Enhancement | March 2024 |
| Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge | This paper investigates the effectiveness of Retrieval Augmented Generation and fine-tuning (FT) approaches in improving the performance of Large Language Models on low-frequency entities in question answering tasks. While FT shows significant improvement across entities of different popularity levels, RAG outperforms other methods. Furthermore, advancements in retrieval and data augmentation techniques enhance the success of both RAG and FT approaches in customizing LLMs for handling low-frequency entities. | Comparison Paper | March 2024 |
| Improving language models by retrieving from trillions of tokens | This paper introduces RETRO, a Retrieval-Enhanced Transformer, which enhances auto-regressive language models by conditioning on document chunks retrieved from a massive corpus. Despite using significantly fewer parameters compared to existing models like GPT-3 and Jurassic-1, RETRO achieves comparable performance on tasks like question answering after fine-tuning. By combining a frozen Bert retriever, a differentiable encoder, and a chunked cross-attention mechanism, RETRO leverages an order of magnitude more data during prediction. This approach presents new possibilities for improving language models through explicit memory at an unprecedented scale. | RAG Enhanced LLMs | March 2024 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | The RAT method enhances large language models' reasoning and generation capabilities in long-horizon tasks by iteratively revising a chain of thoughts with relevant information retrieved through information retrieval. By incorporating retrieval-augmented thoughts into models like GPT-3.5, GPT-4, and CodeLLaMA-7b, RAT significantly improves performance across various tasks, including code generation, mathematical reasoning, creative writing, and embodied task planning, with average rating score increases of 13.63%, 16.96%, 19.2%, and 42.78%, respectively. | RAG Enhancement | March 2024 |
| Instruction-tuned Language Models are Better Knowledge Learners | Instruction-tuned Language Models are Better Knowledge Learners introduces pre-instruction-tuning (PIT), a method that instruction-tunes on questions before training on documents, contrary to the standard approach. PIT significantly enhances LLMs' ability to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%, as demonstrated in extensive experiments and ablation studies. | Instruction Tuning | February 2024 |
| Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models | Hallucinations present a significant challenge for large language models often resulting from limited internal knowledge. While incorporating external information can mitigate this, it also risks introducing irrelevant details, leading to external hallucinations. In response, The authors introduce Rowen, which selectively augments LLMs with retrieval when detecting inconsistencies across languages, indicative of hallucinations. This semantic-aware process balances internal reasoning with external evidence, effectively mitigating hallucinations. Empirical analysis shows Rowen surpasses existing methods in detecting and mitigating hallucinated content in LLM outputs. | RAG Enhancement | February 2024 |
| G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering | The paper introduces GraphQA, a framework enabling users to interactively query textual graphs through conversational interfaces for various real-world applications. They propose G-Retriever, which combines graph neural networks, large language models, and Retrieval-Augmented Generation to navigate large textual graphs effectively. Through soft prompting and optimization techniques, G-Retriever achieves superior performance and scalability while mitigating issues like hallucination. Empirical evaluations across multiple domains demonstrate its effectiveness, showcasing its potential for practical applications. | Retriever Improvement | February 2024 |
| Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks | Retrieval-Augmented Data Augmentation (RADA) is a method aimed at improving model performance in low-resource settings with limited training data. RADA addresses the challenge of suboptimal and less diverse synthetic data generation by incorporating examples from other datasets. It retrieves relevant instances based on similarities with the given seed data and prompts Large Language Models to generate new samples with contextual information from both original and retrieved samples. Experimental results demonstrate the effectiveness of RADA in training and test-time data augmentation scenarios, outperforming existing LLM-powered data augmentation methods. | Domain Specific RAG | February 2024 |
| RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval | RAPTOR presents a new approach to retrieval-augmented language modeling by introducing a method that constructs a hierarchical summary tree from large documents, enabling more nuanced and comprehensive retrieval of information. Unlike conventional methods that pull short, direct excerpts from texts, RAPTOR's recursive process embeds, clusters, and summarizes text chunks at multiple abstraction levels. This structured retrieval allows for a deeper understanding and integration of information across entire documents, significantly enhancing performance on complex tasks requiring multi-step reasoning. Demonstrated improvements on various benchmarks, including a remarkable 20% absolute accuracy increase on the QuALITY benchmark with GPT-4, underline RAPTOR's potential to revolutionize how models access and leverage extensive knowledge bases, setting new standards for question-answering and beyond. | RAG Enhancement | January 2024 |
| RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture | The paper explores two methods used by developers to integrate proprietary and domain-specific data into Large Language Models : Retrieval-Augmented Generation and Fine-Tuning. It presents a detailed pipeline for applying these methods to LLMs like Llama2-13B, GPT-3.5, and GPT-4, focusing on extracting information, generating questions and answers, fine-tuning, and evaluation. The paper demonstrates the capacity of fine-tuned models to leverage cross-geographic information, enhancing answer similarity significantly, and underscores the broader applicability and benefits of LLMs in various industrial domains. | Comparison Paper | January 2024 |
| Corrective Retrieval Augmented Generation | CRAG introduces a novel strategy to enhance the robustness and accuracy of large language models during retrieval-augmented generation processes. Addressing the potential pitfalls of relying on the relevance of retrieved documents, CRAG employs a retrieval evaluator to gauge the quality and relevance of documents for a given query, enabling adaptive retrieval strategies based on confidence scores. To overcome the limitations of static databases, CRAG integrates large-scale web searches, providing a richer pool of documents. Additionally, its unique decompose-then-recompose algorithm ensures the model focuses on pertinent information while discarding the irrelevant, thereby refining the quality of generation. Designed as a versatile, plug-and-play solution, CRAG significantly enhances RAG-based models' performance across a range of generation tasks, demonstrated through substantial improvements in four diverse datasets. | RAG Enhancement | January 2024 |
| UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems | The paper introduces UniMS-RAG, a novel framework designed to address the personalization challenge in dialogue systems by incorporating multiple knowledge sources. It decomposes the task into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation, and unifies them into a single sequence-to-sequence paradigm during training. This allows the model to dynamically retrieve and evaluate relevant evidence using special tokens, facilitating interaction with diverse knowledge sources. Furthermore, a self-refinement mechanism is proposed to iteratively refine generated responses based on consistency and relevance scores. | Domain Specific RAG | January 2024 |
| Retrieval-Augmented Generation for Large Language Models: A Survey | This survey delves into Retrieval-Augmented Generation as a solution to challenges faced by Large Language Models, including hallucination and outdated knowledge. RAG integrates external databases to enhance accuracy and credibility, particularly for knowledge-intensive tasks, and enables continuous knowledge updates. The paper reviews the evolution of RAG paradigms, covering Naive RAG, Advanced RAG, and Modular RAG, while examining the retrieval, generation, and augmentation techniques. It discusses state-of-the-art technologies and introduces an updated evaluation framework and benchmark, concluding with insights into current challenges and future research directions. | RAG Survey | December 2023 |
| Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models | Chain-of-Noting (CoN) introduces an innovative approach to enhance the robustness and reliability of retrieval-augmented language models (RALMs) by addressing the issue of processing irrelevant or noisy information and improving the model's ability to recognize when it lacks sufficient knowledge to answer a question. CoN's strategy involves creating sequential reading notes on retrieved documents, facilitating a more detailed assessment of their relevance and integrating this evaluation into the answer generation process. This method not only helps in filtering out unhelpful information but also empowers RALMs to more confidently identify and admit when an answer is beyond their current knowledge or data scope. Leveraging ChatGPT for training data creation and implementing CoN on a LLaMa-2 7B model, this approach has demonstrated significant performance improvements over traditional RALMs in open-domain question answering tasks. The results include a notable increase in Exact Match (EM) scores amidst noisy document retrieval and enhanced rejection rates for questions outside the model's pre-training knowledge, underscoring CoN's potential in making RALMs more reliable and trustworthy. | RAG Enhanced LLMs | November 2023 |
| From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL | The paper introduces CREA-ICL, an innovative method designed to enhance the zero-shot learning capabilities of multilingual pre-trained language models (MPLMs) in low-resource languages through cross-lingual retrieval-augmented in-context learning. By retrieving semantically similar prompts from high-resource languages, this approach seeks to bolster the models' performance across a range of tasks. The findings indicate consistent improvements in classification tasks; however, the approach encounters obstacles when applied to generation tasks. These outcomes provide valuable insights into the distinctions in effectiveness between classification and generation domains when utilizing retrieval-augmented in-context learning, highlighting the nuanced challenges and potential strategies for advancing the application of MPLMs in multilingual settings. | Domain Specific RAG | November 2023 |
| REST: Retrieval-Based Speculative Decoding | The paper introduces REST, a novel algorithm called Retrieval-Based Speculative Decoding, aimed at accelerating language model generation. Unlike prior methods, REST leverages retrieval to generate draft tokens based on common phases and patterns observed during text generation. It seamlessly integrates with existing language models without additional training, achieving notable speedups of 1.62X to 2.36X on code or text generation tasks when benchmarked against 7B and 13B language models in a single-batch setting. | RAG Enhancement | November 2023 |
| Learning to Filter Context for Retrieval-Augmented Generation | The FILCO method is introduced to enhance the quality of context provided to generation models in retrieval-augmented systems. By identifying useful context and training context filtering models, FILCO aims to mitigate issues arising from irrelevant passages during generation. Experimental results across various knowledge-intensive tasks demonstrate the effectiveness of FILCO in improving output quality, surpassing existing approaches in tasks such as question answering, fact verification, and dialog generation. This method proves beneficial regardless of whether the retrieved context aligns perfectly with the desired output. | RAG Enhancement | November 2023 |
| Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | Self-RAG introduces a novel approach to enhance the quality and accuracy of large language models by incorporating a process of retrieval and self-reflection. Unlike traditional retrieval-augmented generation models that may retrieve and use external passages indiscriminately, Self-RAG employs a more dynamic method. It enables an LLM to adaptively decide when to retrieve information and critically assess the relevance of retrieved content and its own generated responses through the use of special "reflection tokens." This innovative mechanism allows the model to adjust its behavior based on the specifics of the task at hand, offering a higher degree of control during inference. Testing on a variety of tasks, including open-domain question answering, reasoning, and fact verification, demonstrates that Self-RAG models (with 7B and 13B parameters) surpass both conventional LLMs and other retrieval-augmented models in performance, showcasing notable improvements in generating factual and accurately cited long-form content. | RAG Enhancement | October 2023 |
| Benchmarking Large Language Models in Retrieval-Augmented Generation | This paper tackles the critical task of evaluating how Retrieval-Augmented Generation influences the performance of large language models across a spectrum of capabilities essential for effective RAG application. Through the establishment of the Retrieval-Augmented Generation Benchmark (RGB), a novel corpus designed for RAG evaluation in both English and Chinese, the study meticulously assesses LLMs against four core abilities: noise robustness, negative rejection, information integration, and counterfactual robustness. The analysis of six representative LLMs using RGB exposes their relative strengths and weaknesses, revealing that while these models demonstrate resilience against noise, they falter significantly in rejecting irrelevant information, integrating diverse information sources, and countering false information. The findings underscore the need for further advancements in LLMs to harness the full potential of RAG, highlighting the complexity and challenges of improving LLMs' factual accuracy and decision-making processes. | RAG Evaluation | October 2023 |
| Knowledge-Augmented Language Model Verification | The paper introduces a novel method aimed at improving the factual accuracy of language model responses by incorporating a verification step into the knowledge-augmentation process. Recognizing that LMs often produce factually incorrect answers due to the limitations of their internalized knowledge, this approach enhances text generation by identifying and correcting errors in both the retrieval of relevant external knowledge and the reflection of this knowledge in the generated text. A specialized verifier, a smaller LM trained via instruction-finetuning, is employed to detect inaccuracies in both retrieval and generation. Errors identified by the verifier can be corrected by updating the retrieved knowledge or modifying the generated text. Moreover, the use of an ensemble of outputs guided by different instructions, combined with a single verifier, boosts the verification's reliability. Tested across multiple question answering benchmarks, this method significantly increases the factual accuracy of responses, demonstrating the verifier's effectiveness in pinpointing and addressing errors in knowledge retrieval and text generation. | RAG Enhancement | October 2023 |
| Optimizing Retrieval-augmented Reader Models via Token Elimination | This study introduces an approach to enhance the efficiency of Fusion-in-Decoder (FiD), a retrieval-augmented language model widely used in open-domain tasks like question answering and fact checking. By analyzing the importance of each retrieved passage to the model's performance, the researchers propose a method for selectively eliminating non-critical information at the token level. This token elimination strategy significantly reduces decoding time—by up to 62.2%—with minimal impact on the model's effectiveness, only reducing performance by 2%. Surprisingly, in some instances, this approach not only maintains but also improves the model's performance. This method offers a promising direction for optimizing the balance between computational efficiency and accuracy in retrieval-augmented reader models. | RAG Enhanced LLMs | October 2023 |
| Self-Knowledge Guided Retrieval Augmentation for Large Language Models | SKR (Self-Knowledge guided Retrieval) is a novel method designed to enhance the performance of large language models by intelligently incorporating external knowledge. Recognizing the limitations of LLMs in terms of the completeness and updatability of their knowledge, SKR focuses on improving LLMs' ability to discern what they know and what they don't, allowing them to selectively seek external information. This approach aims to mitigate the issues with retrieval-based methods that sometimes detract from the model's original responses. By enabling LLMs to refer to previously encountered questions and judiciously utilize external resources for new queries, SKR has shown to outperform existing methods in various datasets, leveraging models like InstructGPT or ChatGPT for improved question-answering capabilities. | Retriever Improvement | October 2023 |
| Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models | The "Tree of Clarifications" (ToC) framework addresses the challenge of ambiguous questions in open-domain question answering by creating a structured tree of potential interpretations, allowing for the generation of comprehensive long-form answers. This method leverages few-shot prompting and external knowledge to recursively disambiguate questions and gather relevant information. ToC not only surpasses other few-shot methods across various metrics but also outperforms fully-supervised approaches in Disambig-F1 and Disambig-ROUGE scores, offering a robust solution to understanding and answering ambiguously posed questions effectively. | RAG Enhanced LLMs | October 2023 |
| Retrieval-Generation Synergy Augmented Large Language Models | The paper introduces a novel iterative framework that combines retrieval and generation processes to enhance large language models for knowledge-intensive tasks. This collaborative approach allows the model to access both parametric knowledge (built into the model itself) and non-parametric knowledge (from external sources) and iteratively refine its understanding and output through interactions between retrieval and generation phases. This synergy is particularly beneficial for complex tasks requiring multi-step reasoning. Tested across single-hop and multi-hop question-answering datasets, the method demonstrates a marked improvement in LLMs' reasoning capabilities, surpassing existing approaches in performance. | RAG Enhancement | October 2023 |
| RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation | RECOMP introduces a method to optimize the efficiency of retrieval-augmented language models by compressing retrieved documents into concise summaries before integrating them into the model's context. This approach aims to make the inference process less resource-intensive and helps LMs more effectively discern pertinent information from retrieved documents. RECOMP employs two types of compressors: an extractive compressor, which identifies and uses key sentences from documents, and an abstractive compressor, which creates summaries by combining information from various sources. These compressors are designed to enhance LMs' task performance while generating brief summaries, even capable of omitting augmentation when retrieved documents are not beneficial. | RAG Enhancement | October 2023 |
| Retrieval meets Long Context Large Language Models | The paper delves into the comparative benefits of retrieval-augmentation and extended context windows in large language models, and whether their combination could yield superior results for various downstream tasks. Using two advanced LLMs for analysis, the findings reveal that a model with a smaller context window (4K tokens) supplemented by retrieval-augmentation can match the performance of a model with a larger context window (16K tokens) fine-tuned for long-context tasks, but with significantly lower computational demand. Moreover, incorporating retrieval into LLMs enhances performance across all context window sizes. The standout model, a retrieval-augmented Llama2-70B with a 32K context window, notably outperformed leading models like GPT-3.5-turbo-16k and Davinci003 across various tasks, including question answering and summarization, while also achieving faster generation speeds. This research underscores the effectiveness of retrieval-augmentation in improving LLMs' efficiency and accuracy, offering valuable guidance for future model development strategies. | Comparison Paper | October 2023 |
| Making Retrieval-Augmented Language Models Robust to Irrelevant Context | This paper addresses the challenge of ensuring that retrieval-augmented language models (RALMs) remain effective and accurate, especially when confronted with irrelevant information during multi-hop reasoning tasks. Through an extensive analysis across five open-domain question answering benchmarks, the authors identify instances where retrieval augmentation actually hampers model performance. To combat this, they introduce two strategies: first, a baseline approach using a natural language inference model to filter out passages that don't support the question-answer pairs, ensuring the model isn't misled by irrelevant data. While effective in reducing inaccuracies, this method risks excluding useful information. To refine this approach, the authors develop a technique for enhancing the language model's ability to discern and appropriately use retrieved passages, by training with a combination of relevant and irrelevant contexts. Remarkably, they demonstrate that a modest dataset of just 1,000 examples can significantly improve the model's resilience to irrelevant information without compromising its performance on pertinent examples. | RAG Enhanced LLMs | October 2023 |
| RA-DIT: Retrieval-Augmented Dual Instruction Tuning | RA-DIT presents a novel approach to enhancing retrieval-augmented language models (RALMs) by introducing a two-step, lightweight fine-tuning process that can be applied to any large language model to equip it with retrieval capabilities. The first step focuses on fine-tuning the LLM to better utilize retrieved information, while the second step optimizes the retriever to fetch more relevant information as determined by the LLM's needs. This method stands out by not requiring costly modifications to the model's pre-training phase or relying on less effective post-hoc integration of data stores. Tested across various zero- and few-shot learning benchmarks, RA-DIT achieves unprecedented performance improvements, showcasing its effectiveness in knowledge-intensive tasks and significantly surpassing existing models in both zero-shot and few-shot scenarios. | RAG Enhanced LLMs | October 2023 |
| InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining | InstructRetro builds on the idea of enhancing auto-regressive large language models through retrieval-augmented pretraining, presenting the largest model of its kind, Retro 48B. This model, an expansion of a 43B GPT model pretrained with an additional 100 billion tokens and leveraging Retro's method from 1.2 trillion tokens, demonstrates remarkable improvements in perplexity and factual accuracy while requiring minimal additional computational resources. The process not only showcases the scalability of retrieval-augmented pretraining but also significantly enhances instruction tuning and zero-shot generalization capabilities. InstructRetro, when fine-tuned with instructions, surpasses its GPT counterpart across various tasks, including short-form QA, reading comprehension, long-form QA, and summarization, with notable margins. Interestingly, the study also reveals that removing the encoder and utilizing only the decoder of InstructRetro yields comparable results, suggesting a promising route for optimizing GPT decoders through retrieval-augmented pretraining followed by instruction tuning. | RAG Enhanced LLMs | October 2023 |
| GAR-meets-RAG Paradigm for Zero-Shot Information Retrieval | The GAR-meets-RAG approach innovatively combines two paradigms—generation-augmented retrieval (GAR) and retrieval-augmented generation (RAG)—to address the zero-shot information retrieval challenge, where no labeled data from the target domain is available. This method iteratively enhances both the retrieval and rewriting stages, significantly improving recall and precision in document ranking without requiring domain-specific training data. By integrating the generative capabilities of large language models with embedding-based retrieval, the proposed methodology not only addresses the common pitfalls of high-recall retrieval and high-precision ranking in a zero-shot context but also sets new benchmarks on the BEIR and TREC-DL datasets. It achieves remarkable improvements in key metrics like Recall@100 and nDCG@10, showing up to 17% relative gains over prior state-of-the-art results, demonstrating its effectiveness in zero-shot passage retrieval tasks. | Retriever Improvement | October 2023 |
| Retrieve Anything To Augment Large Language Models | The paper proposes LLM-Embedder, a unified model designed to address the challenges faced by large language models by leveraging retrieval augmentation. Unlike conventional methods, LLM-Embedder optimizes retrieval for diverse LLM needs with one model. Training this unified model poses challenges due to the varied semantic relationships targeted by different retrieval tasks. To overcome this, the paper presents optimized training methodologies, including reward formulation, stabilized knowledge distillation, multi-task fine-tuning, and homogeneous negative sampling. These strategies lead to outstanding empirical performance, offering a promising solution for enhancing LLM capabilities through retrieval augmentation. | Retriever Improvement | October 2023 |
| DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | DSPy introduces a systematic approach for developing and optimizing language model (LM) pipelines, abstracting them as text transformation graphs. These imperative computational graphs enable declarative modules to invoke LMs, which can then learn to apply various techniques through parameterization. With a compiler that optimizes DSPy pipelines, it maximizes given metrics, allowing for sophisticated LM pipelines to be expressed and optimized efficiently. Case studies demonstrate DSPy's effectiveness in outperforming standard prompting and expert-created demonstrations across various tasks, showcasing its competitive performance even with smaller LM models. | RAG Enhancement | October 2023 |
| RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling | RegaVAE, a novel retrieval-augmented language model, addresses the challenges of determining relevant information retrieval and effective integration during generation. By considering both source and target text, it encodes them into a latent space using a variational auto-encoder (VAE). Leveraging this compact representation, RegaVAE outperforms existing models in text generation quality and hallucination removal, as demonstrated through theoretical analysis and experiments across diverse datasets. | RAG Enhanced LLMs | October 2023 |
| Text Embeddings Reveal (Almost) As Much As Text | The paper explores text embedding inversion, aiming to reconstruct original text from embeddings. While a basic model performs poorly, a multi-step approach achieves 92% accuracy in recovering 32-token text inputs. This method, trained on two embedding models, successfully retrieves personal information like full names from clinical notes, highlighting potential privacy risks associated with text embeddings. | Embeddings | October 2023 |
| Understanding Retrieval Augmentation for Long-Form Question Answering | This paper investigates the effects of retrieval-augmented language models on long-form question answering. By comparing answers generated from LMs using the same evidence documents, the impact of retrieval augmentation on different LMs is analyzed. The study also examines various attributes of generated answers and evaluates methods for automatically judging attribution to evidence documents. Insights are provided on how retrieval augmentation influences long, knowledge-rich text generation, including attribution patterns and analysis of attribution errors, offering directions for future research in this area | RAG Enhanced LLMs | October 2023 |
| Generate rather than Retrieve: Large Language Models are Strong Context Generators | This study introduces GenRead, a novel approach for handling knowledge-intensive tasks like open-domain question answering, by leveraging large language models to generate rather than retrieve contextual documents. This method prompts the language model to produce context relevant to the given question, which is then used to determine the final answer. Additionally, GenRead employs a novel clustering-based prompting technique that ensures the diversity of generated documents, covering a broader range of perspectives and thereby enhancing the accuracy of answers. Through rigorous testing across multiple tasks, including QA, fact checking, and dialogue systems, GenRead has shown to significantly surpass traditional retrieval-based methods, achieving notably higher exact match scores on benchmarks like TriviaQA and WebQ without relying on external knowledge sources. This marks a significant advancement in efficiently accessing and utilizing knowledge for AI tasks. | RAG Enhancement | September 2023 |
| RAGAS: Automated Evaluation of Retrieval Augmented Generation | RAGAs introduces a new way to evaluate Retrieval Augmented Generation systems without the need for human-annotated references. RAG systems enhance language models by fetching information from textual databases, which helps to minimize inaccuracies or "hallucinations" in generated text. Evaluating these systems is complex due to the need to assess the retrieval's relevance, the LLM's ability to use the retrieved information accurately, and the overall quality of the generated text. RAGAs offers a comprehensive set of metrics for assessing these aspects quickly and without human annotations, facilitating more efficient development and refinement of RAG technologies. This is particularly valuable in the rapidly evolving field of large language models. | RAG Evaluation | September 2023 |
| RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models | RaLLe introduces an open-source framework aimed at enhancing the development and evaluation of retrieval-augmented large language models (R-LLMs), specifically for tasks requiring a high degree of factual accuracy, like question-answering. Addressing the lack of transparency in current tools, RaLLe provides a detailed view into each step of the R-LLM process, from retrieval to generation. This enables developers to refine prompts, evaluate the efficacy of different components, and quantitatively measure the performance improvements in their models. Essentially, RaLLe offers a comprehensive toolkit for boosting the effectiveness and precision of R-LLMs in handling complex, knowledge-based tasks. | RAG Enhanced LLMs | August 2023 |
| RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models | The paper presents RAVEN, an approach to improving in-context learning in encoder-decoder language models through retrieval augmentation. By analyzing the ATLAS model, the authors pinpoint challenges like mismatches between training and usage, and limited context availability. RAVEN addresses these by integrating retrieval-augmented masked and prefix language modeling, alongside a novel technique called Fusion-in-Context Learning. This method boosts few-shot learning capabilities without extra training or changes to the model structure. Testing shows RAVEN surpassing ATLAS and holding its ground against some of the most sophisticated models, even with fewer parameters. This study highlights the efficacy and potential of retrieval-augmented models in enhancing in-context learning, paving the way for future advancements in the field. | RAG Enhanced LLMs | August 2023 |
| KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases | KnowledGPT introduces a novel framework aimed at overcoming the limitations of large language models regarding completeness, timeliness, faithfulness, and adaptability by integrating them with knowledge bases. This integration allows for enhanced retrieval and storage of knowledge, making LLMs more powerful and versatile. The framework uses "program of thought" prompting to generate search queries in code format, facilitating precise operations within KBs. Additionally, KnowledGPT enables the creation of personalized KBs to store user-specific knowledge. Through comprehensive testing, KnowledGPT has shown to significantly expand the range of questions LLMs can answer by utilizing both public and personalized knowledge sources, marking a significant step forward in making LLMs more informed and adaptable. | Input Preprocessing | August 2023 |
| Learning to Retrieve In-Context Examples for Large Language Models | This paper introduces a novel framework for improving in-context learning for large language models by iteratively training dense retrievers to identify high-quality examples. The framework involves training a reward model based on LLM feedback to evaluate candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever. Experimental results across 30 tasks demonstrate significant performance enhancements, showcasing the framework's generalization ability to unseen tasks. Analysis reveals that the model improves performance by retrieving examples with similar patterns, consistently benefiting LLMs of different sizes. | Retriever Improvement | July 2023 |
| Active Retrieval Augmented Generation | This paper explores how to enhance large language models through Active Retrieval Augmented Generation, addressing the common issue of factual inaccuracies or "hallucinations" in generated content. The proposed method, FLARE (Forward-Looking Active REtrieval augmented generation), innovates by not just retrieving information once before generation but actively deciding when and what to retrieve as the generation progresses. This process involves predicting future content needs and using those predictions to fetch relevant information dynamically. Tested across four long-form, knowledge-intensive generation tasks, FLARE shows either superior or competitive performance compared to baseline methods. This approach proves particularly useful in generating lengthy texts where the need for external information can arise at multiple points, showcasing a significant advancement in generating more accurate and reliable content. | Retriever Improvement | May 2023 |
| Augmented Large Language Models with Parametric Knowledge Guiding | The paper presents a novel Parametric Knowledge Guiding (PKG) framework aimed at improving the performance of Large Language Models on domain-specific tasks. By integrating a knowledge-guiding module, PKG allows LLMs to access specialized knowledge without needing to modify the original model parameters. This approach is particularly advantageous for enhancing "black-box" LLMs, which are typically not open for modification or fine-tuning. The PKG framework leverages open-source models for creating an offline knowledge base, addressing both the transparency issues and data privacy concerns associated with proprietary LLMs. The effectiveness of PKG is showcased through significant performance improvements across a variety of knowledge-intensive tasks. | Domain Specific RAG | May 2023 |
| Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory | This paper introduces "selfmem," a framework for retrieval-augmented text generation that addresses the limitations of traditional memory retrieval methods by leveraging the model's own outputs as an unbounded memory pool. This self-memory approach allows for iterative improvements in text generation tasks by using the model's generated content as new memory sources for subsequent generations. Tested across neural machine translation, abstractive text summarization, and dialogue generation tasks, the selfmem framework has shown remarkable performance, setting new benchmarks in several domains. The study also provides a detailed analysis of the framework's components, offering valuable insights for future research in retrieval-augmented text generation. | Memory Improvement | May 2023 |
| Query Rewriting for Retrieval-Augmented Large Language Models | The study proposes a framework for improving retrieval-augmented Large Language Models through query rewriting, named Rewrite-Retrieve-Read. Unlike conventional approaches that focus on enhancing either the retrieval process or the reading comprehension capabilities of LLMs, this framework emphasizes refining the search queries themselves to bridge the gap between the input text and the necessary knowledge for retrieval. By generating an initial query with an LLM and then refining it using a trainable small language model, the approach uses web search engines for more accurate context retrieval. The rewriter is further optimized with reinforcement learning based on feedback from the LLM reader. Demonstrated across open-domain and multiple-choice QA tasks, this method shows significant performance improvements, highlighting its effectiveness and scalability for retrieval-augmented LLM applications. | ||
| Tags | Input Preprocessing | May 2023 | |
| Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation | The paper introduces SURGE, a framework designed to enhance knowledge-grounded dialogue generation by integrating Knowledge Graphs (KGs) into the language model's response process. SURGE improves the relevance and factual accuracy of dialogue responses by retrieving context-specific subgraphs from KGs and ensuring consistency in the generated text through innovative word embedding perturbations and contrastive learning. This approach guarantees that the dialogue is grounded in accurate and relevant knowledge. Tested on the OpendialKG and KOMODIS datasets, SURGE demonstrates its ability to produce high-quality, knowledge-rich dialogues, addressing the challenge of ensuring the use of pertinent knowledge in dialogue generation. | Retriever Improvement | May 2023 |
| Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data | The SANTA model focuses on improving the retrieval of structured data through a unique approach that educates language models on the intricacies of structured content. By aligning structured and unstructured data and honing in on entities within structured data, SANTA creates a shared embedding space for both types of data, enhancing its retrieval capabilities. This method has shown impressive results in tasks like code and product searches, even in scenarios where it hasn't been directly trained, thanks to its specialized pretraining techniques. Essentially, SANTA stands out by teaching language models to better understand and utilize structured data's distinct characteristics. | Retriever Improvement | May 2023 |
| Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In | The paper introduces an new approach to retrieval augmentation for language models through the Augmentation-Adapted Retriever (AAR). Unlike previous methods that tightly integrate the retriever and LM, AAR acts as a flexible plug-in, capable of working with various LMs without requiring joint fine-tuning. This adaptability allows AAR to provide relevant external information to enhance LMs on knowledge-intensive tasks, even if these LMs were not part of its initial training set. Tested across a range of model sizes, AAR shows remarkable ability to boost zero-shot generalization capabilities of LMs from small to very large, demonstrating that learning from one LM's preferences can benefit a wide array of others. This research highlights the potential of making retrieval augmentation more universally applicable across different LMs. | Retriever Improvement | May 2023 |
| Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy | The paper introduces Iter-RetGen, a method that enhances retrieval-augmented large language models by initiating a dynamic interaction between retrieval and generation processes. This iterative synergy allows the model to refine its search for external knowledge based on initial outputs and then improve subsequent generations using the newly retrieved information. Unlike other methods that may impose structural constraints by interleaving retrieval with generation, Iter-RetGen treats retrieved knowledge as a unified whole, maintaining generation flexibility. Tested on tasks like multi-hop question answering, fact verification, and commonsense reasoning, Iter-RetGen not only efficiently combines parametric and non-parametric knowledge but also shows superior or competitive results compared to leading models, all while minimizing retrieval and generation overheads. | RAG Enhanced LLMs | May 2023 |
| Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks | This paper introduces PGRA, a two-stage framework designed to enhance non-knowledge-intensive (NKI) tasks using retrieval-augmented methods. Unlike previous research focused on knowledge-intensive tasks, PGRA addresses the unique challenges of NKI tasks by first using a task-agnostic retriever to efficiently select candidate evidence from a shared static index. Then, a prompt-guided reranker tailors the evidence to the specific task needs. The approach not only surpasses existing retrieval-augmented methods in performance but also showcases flexibility across different tasks, marking a significant step forward in applying retrieval augmentation to a broader range of NLP tasks. | Retriever Innovation | May 2023 |
| RET-LLM: Towards a General Read-Write Memory for Large Language Models | RET-LLM introduces a framework that integrates a general write-read memory unit into Large Language Models, addressing their limitation in explicitly storing and retrieving knowledge. This approach, rooted in Davidsonian semantics, allows LLMs to handle information more dynamically, storing knowledge in scalable, updatable triplets. The framework enhances LLMs' performance on question answering tasks, particularly those requiring an understanding of time-dependent information, and outperforms traditional models in both effectiveness and interpretability | Memory Improvement | May 2023 |
| Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources | Chain-of-Knowledge (CoK) is a framework designed to enhance Large Language Models by dynamically integrating grounding information from diverse sources, aiming to produce more accurate and hallucination-free content. CoK operates through a three-stage process: starting with reasoning preparation, it moves to dynamic knowledge adapting where it corrects initial rationales by incorporating knowledge from relevant domains, and concludes with answer consolidation. Unique to CoK is its ability to utilize both structured (e.g., Wikidata, tables) and unstructured knowledge, facilitated by an adaptive query generator capable of handling various query languages. This methodology ensures a robust foundation for generating factual responses by minimizing errors through a step-by-step rationale correction process. CoK has demonstrated its effectiveness in improving LLMs' performance on a broad spectrum of knowledge-intensive tasks. | Retriever Improvement | May 2023 |
| Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study | The paper investigates whether large autoregressive language models should be pretrained with retrieval. They conduct a comprehensive analysis using RETRO, a scalable retrieval-augmented LM, compared to standard GPT models. Findings reveal that RETRO outperforms GPT in text generation, demonstrating less degeneration and higher factual accuracy, with lower toxicity. Additionally, RETRO excels in knowledge-intensive tasks on the LM Evaluation Harness benchmark. They introduce RETRO++, a variant improving open-domain QA results, showcasing the potential of pretraining autoregressive LMs with retrieval. | RAG Enhanced LLMs | April 2023 |
| UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation | UPRISE aims to enhance the versatility of Large Language Models by introducing a method that automatically retrieves suitable prompts for any given zero-shot task without the need for model or task-specific adjustments. This approach proves effective across various tasks and models, even those not seen during training, and demonstrates its capability to reduce the occurrence of hallucinations in models like ChatGPT. UPRISE's lightweight retriever is trained with GPT-Neo-2.7B but shows remarkable performance improvements on a wide range of larger LLMs, highlighting its potential to universally enhance LLM performance. | LLM Generalization | March 2023 |