File size: 3,004 Bytes
a8d3adb
 
 
 
 
 
 
 
 
 
 
 
da83cd6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: Ltu Chat
emoji: 🏢
colorFrom: pink
colorTo: red
sdk: streamlit
sdk_version: 1.43.0
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# LTU Chat RAG Evaluation

This repository contains a RAG (Retrieval-Augmented Generation) pipeline for the LTU (Luleå University of Technology) programme data, along with evaluation tools using Ragas.

## Overview

The system uses:
- **Qdrant**: Vector database for storing and retrieving embeddings
- **Haystack**: Framework for building the RAG pipeline
- **Ragas**: Framework for evaluating RAG systems

## Files

- `rag_pipeline.py`: Main RAG pipeline implementation
- `ragas_eval.py`: Script to evaluate the RAG pipeline using Ragas
- `testset.json`: JSONL file containing test questions, reference answers, and contexts
- `testset_generation.py`: Script used to generate the test set

## Requirements

```
streamlit==1.42.2
haystack-ai==2.10.3
qdrant-client==1.13.2
python-dotenv==1.0.1
beautifulsoup4==4.13.3
qdrant-haystack==8.0.0
ragas-haystack==2.1.0
rapidfuzz==3.12.2
pandas
```

## Setup

1. Make sure you have all the required packages installed:
   ```
   pip install -r requirements.txt
   ```

2. Set up your environment variables (optional):
   ```
   export NEBIUS_API_KEY="your_api_key_here"
   ```
   If not set, the script will use the default API key included in the code.

## Running the Evaluation

To evaluate the RAG pipeline using Ragas:

```bash
python ragas_eval.py
```

This will:
1. Load the Qdrant document store from the local directory
2. Load the test set from `testset.json`
3. Run the RAG pipeline on each test question
4. Evaluate the results using Ragas metrics
5. Save the evaluation results to `ragas_evaluation_results.json`

## Ragas Metrics

The evaluation uses the following Ragas metrics:

- **Faithfulness**: Measures if the generated answer is factually consistent with the retrieved contexts
- **Answer Relevancy**: Measures if the answer is relevant to the question
- **Context Precision**: Measures the proportion of retrieved contexts that are relevant
- **Context Recall**: Measures if the retrieved contexts contain the information needed to answer the question
- **Context Relevancy**: Measures the relevance of retrieved contexts to the question

## Customization

You can customize the evaluation by modifying the `RAGEvaluator` class parameters:

```python
evaluator = RAGEvaluator(
    embedding_model_name="BAAI/bge-en-icl",
    llm_model_name="meta-llama/Llama-3.3-70B-Instruct",
    qdrant_path="./qdrant_data",
    api_base_url="https://api.studio.nebius.com/v1/",
    collection_name="ltu_programmes"
)
```

## Test Set Format

The test set is a JSONL file where each line contains:
- `user_input`: The question
- `reference`: The reference answer
- `reference_contexts`: List of reference contexts that should be retrieved
- `synthesizer_name`: Name of the synthesizer used to generate the reference answer