--- title: NU-CS Policy RAG (Qwen 3B CPU) emoji: πŸ“š colorFrom: yellow colorTo: gray sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: false --- # NU-CS Policy RAG (Qwen 3B β€’ CPU β€’ GGUF) β€” FastAPI Microservice This Space hosts a **bilingual Retrieval-Augmented Generation (RAG) microservice** for Nile University Computer Science policy FAQs. - **Retrieval:** FAISS + `intfloat/multilingual-e5-base` - **LLM:** Qwen2.5-3B-Instruct (quantized **GGUF**) via **llama.cpp** (CPU) - **API Framework:** FastAPI - **Endpoint:** `POST /ask` The service answers questions in **Arabic or English**, strictly grounded in the indexed policy Q&A JSON files. If the answer is not present in the corpus, it returns a fixed β€œinsufficient context” message. --- ## Project Structure ```text . β”œβ”€β”€ api.py # FastAPI app (exposes /ask) β”œβ”€β”€ app.py # Entry point (runs uvicorn on port 7860) β”œβ”€β”€ rag_core.py # RAG logic: indexing, retrieval, LLM generation β”œβ”€β”€ requirements.txt β”œβ”€β”€ README.md └── data/ └── pages/ # Policy Q&A JSON files (e.g. 44 files) At runtime, the service also creates: artifacts/ β”œβ”€β”€ policy.index # FAISS index └── policy_docs.pkl # Serialized passage metadata models/ └── ... # Downloaded GGUF model from the Hub ``` ## Deploy on Hugging Face Spaces 1. Create a new **Space** - Type: **Python** (SDK can stay as shown in the YAML header) - Hardware: **CPU Basic** (free tier) 2. Add the following files to the Space repository: - `api.py` - `app.py` - `rag_core.py` - `requirements.txt` - `README.md` 3. Create the data folder structure: ``` mkdir -p data/pages ``` 4. Commit your **policy JSON files** into `data/pages/` - (for example, 44 files like `page_001.json`, `page_002`.json, ...). 5. Push to the Space. On the first run the Space will: - Load all JSON files from `./data/pages` - Build a FAISS index and save it under `./artifacts/` - Download the GGUF model from the Hub to `./models/` As long as `app.py` starts a server on port **7860**, Spaces will route traffic to it. --- ### Environment Variables You can control behavior using environment variables (e.g. in **Settings** β†’ **Variables & secrets** on Hugging Face Spaces): - `DATA_DIR` β€” path to JSON files (default: `./data/pages`) - `INDEX_PATH` β€” FAISS index path (default: `./artifacts/policy.index`) - `DOC_STORE_PATH` β€” pickled documents path (default: `./artifacts/policy_docs.pkl`) - `EMBED_MODEL` β€” sentence transformer model (default: `intfloat/multilingual-e5-base`) - `GGUF_REPO_ID` β€” GGUF repo on the Hub default: `Qwen/Qwen2.5-3B-Instruct-GGUF` - `GGUF_FILENAME` β€” GGUF filename default: `qwen2.5-3b-instruct-q4_k_m.gguf` - `TOP_K` β€” default number of passages to retrieve (default: `5`) - `MAX_CTX_CHARS` β€” max characters of context sent to the LLM (default: `5000`) - `N_CTX` β€” model context size (default: `4096`) - `MAX_NEW_TOKENS` β€” max tokens generated by the LLM (default: `140`) ## API Documentation Once the Space is running, the FastAPI docs are available at: - Interactive Swagger UI: - `https://-.hf.space/docs` - Raw OpenAPI JSON: - `https://-.hf.space/openapi.json` ## Running locally You can run the microservice on your own machine for development/testing. ### 1.Setup ```bash python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt ``` Prepare data folders: ```bash mkdir -p data/pages artifacts # copy your JSON policy files into data/pages/ ``` ### 2. Start the API Using uvicorn directly: ```bash uvicorn api:app --host 0.0.0.0 --port 8000 ``` Or via `app.py` (same as HF Spaces): ```bash python app.py # listens on port 7860 ``` ### 3. Test the Endpoint ```bash curl -X POST "http://localhost:8000/ask" \ -H "Content-Type: application/json" \ -d '{"question": "What is the attendance policy?", "top_k": 5}' ``` Or open the Swagger UI in your browser: - `http://localhost:8000/docs` - or `http://localhost:7860/docs` (if using `python app.py`)