Are there new RAG (retrieval-augmented generation) benchmarks in 2024?

26 December 2024 1 437 Report

In 2024, several new benchmarks have been introduced to evaluate Retrieval-Augmented Generation (RAG) systems across various domains and tasks:

I hope you meant same.

RAGBENCH: This benchmark offers a comprehensive dataset of 100,000 examples spanning five industry-specific domains. It introduces the TRACe evaluation framework, providing explainable and actionable metrics for assessing RAG systems.arXiv

Multihop-RAG: Designed to assess RAG systems' ability to handle multi-hop queries, which require retrieving and reasoning over multiple pieces of evidence. This dataset includes a knowledge base and a large collection of multi-hop queries with their corresponding answers and supporting evidence.arXiv

Legalbench-RAG: Focused on the legal domain, this benchmark evaluates how effectively retrieval mechanisms can identify precise legal references, offering a more detailed measure of performance in legal contexts.arXiv

CRUD-RAG: A comprehensive Chinese benchmark that categorizes RAG applications into four types—Create, Read, Update, and Delete. It provides datasets for each category to evaluate RAG systems across diverse application scenarios.arXiv

Ragnarök: Serving as a reusable RAG framework, Ragnarök offers baselines for the TREC 2024 Retrieval-Augmented Generation Track. It aims to standardize the evaluation of RAG systems and includes a web-based interface for interactive benchmarking.arXiv

Badges
Science topic

More Shulin Cao's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to find the next transformer / LLM?

Imagine having a rough idea for an alternative building block of large language models other the well-known transformers. In your head, the idea appears reasonable and overcome some perceived...

28 May 2024 2,723 3 View

Consent question for respondents for tenant satisfaction survey?

If any one did include any consent questions (few lines) in a tenant satisfaction survey. For example, we use de identified data from different housing providers tenants for benchmarking purposes....

19 May 2024 724 0 View

Where to find water benchmark values?

I am reaching out to inquire about water benchmark values for specific processes within the potash industry, namely carnallite decomposition, sylvinite washing, and vacuum-cooling crystallization....

20 April 2024 2,473 2 View

When was the NAIRU explicitly targeted by the Fed?

I have been looking for information on when the NAIRU was used as a benchmark for monetary policy in the United States.

17 April 2024 3,540 5 View

Where can I find the irradiance solar energy benchmark and hourly dataset?

Hi, where can I find the irradiance solar energy benchmark and hourly dataset? And, which criteria are essential for it? Thanks in advance

14 February 2024 2,156 3 View

Where to find DVRP benchmarks?

Is there a universally accepted set of benchmarks for the Dynamic Vehicle Routing Problem (DVRP) like there are ones for the Traveling Salesman Problem and the Vehicle Routing Problem? I'm...

13 February 2024 701 2 View

What is natural intelligence? How is natural intelligence related with the memory of the organism?

The growing interest in artificial intelligence also motivates us to revisit natural intelligence for benchmarking. I like the definition of intelligence by Wechsler (1975): - "Intelligence is...

10 February 2024 9,528 4 View

Can we use past diatom mounts to create a historical benchmark or reference point, with out using present data or current data?

We want to use past data for historical reference point.

28 November 2023 2,169 2 View

Why is the phase field diffusing towards the corners using meshless radial basis function collocation method?

I am trying to reproduce the benchmark test "Single edge notched specimen under tension" to simulate crack propagation using phase field (AT2 model) with meshless radial basis function collocation...

16 October 2023 5,435 0 View

What are some of the datasets suitable for training and benchmarking computer vision models for autonomous vehicles?

ITS

04 August 2023 3,221 5 View