Spaces:
Sleeping
Sleeping
File size: 3,004 Bytes
a8d3adb da83cd6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
---
title: Ltu Chat
emoji: 🏢
colorFrom: pink
colorTo: red
sdk: streamlit
sdk_version: 1.43.0
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# LTU Chat RAG Evaluation
This repository contains a RAG (Retrieval-Augmented Generation) pipeline for the LTU (Luleå University of Technology) programme data, along with evaluation tools using Ragas.
## Overview
The system uses:
- **Qdrant**: Vector database for storing and retrieving embeddings
- **Haystack**: Framework for building the RAG pipeline
- **Ragas**: Framework for evaluating RAG systems
## Files
- `rag_pipeline.py`: Main RAG pipeline implementation
- `ragas_eval.py`: Script to evaluate the RAG pipeline using Ragas
- `testset.json`: JSONL file containing test questions, reference answers, and contexts
- `testset_generation.py`: Script used to generate the test set
## Requirements
```
streamlit==1.42.2
haystack-ai==2.10.3
qdrant-client==1.13.2
python-dotenv==1.0.1
beautifulsoup4==4.13.3
qdrant-haystack==8.0.0
ragas-haystack==2.1.0
rapidfuzz==3.12.2
pandas
```
## Setup
1. Make sure you have all the required packages installed:
```
pip install -r requirements.txt
```
2. Set up your environment variables (optional):
```
export NEBIUS_API_KEY="your_api_key_here"
```
If not set, the script will use the default API key included in the code.
## Running the Evaluation
To evaluate the RAG pipeline using Ragas:
```bash
python ragas_eval.py
```
This will:
1. Load the Qdrant document store from the local directory
2. Load the test set from `testset.json`
3. Run the RAG pipeline on each test question
4. Evaluate the results using Ragas metrics
5. Save the evaluation results to `ragas_evaluation_results.json`
## Ragas Metrics
The evaluation uses the following Ragas metrics:
- **Faithfulness**: Measures if the generated answer is factually consistent with the retrieved contexts
- **Answer Relevancy**: Measures if the answer is relevant to the question
- **Context Precision**: Measures the proportion of retrieved contexts that are relevant
- **Context Recall**: Measures if the retrieved contexts contain the information needed to answer the question
- **Context Relevancy**: Measures the relevance of retrieved contexts to the question
## Customization
You can customize the evaluation by modifying the `RAGEvaluator` class parameters:
```python
evaluator = RAGEvaluator(
embedding_model_name="BAAI/bge-en-icl",
llm_model_name="meta-llama/Llama-3.3-70B-Instruct",
qdrant_path="./qdrant_data",
api_base_url="https://api.studio.nebius.com/v1/",
collection_name="ltu_programmes"
)
```
## Test Set Format
The test set is a JSONL file where each line contains:
- `user_input`: The question
- `reference`: The reference answer
- `reference_contexts`: List of reference contexts that should be retrieved
- `synthesizer_name`: Name of the synthesizer used to generate the reference answer
|