added config in readme
Browse files
README.md
CHANGED
|
@@ -1,102 +1,114 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
**
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
**
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
- **
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
**
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
**
|
| 80 |
-
|
| 81 |
-
- **
|
| 82 |
-
|
| 83 |
-
**
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
**
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
This modular approach ensures that the **truthfulness assessment** is **scalable**, **explainable**, and **adaptable** to new domains.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Truthfulness Checker
|
| 3 |
+
emoji: 📰
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5.4.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
### **Implementation Steps: Validating Information with Context**
|
| 14 |
+
|
| 15 |
+
Validating the accuracy or degree of truthfulness of a given piece of information requires **context**—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
### **Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search**
|
| 20 |
+
Instead of relying on a **traditional Knowledge Graph (KG)**, we use **FAISS (Facebook AI Similarity Search)**, a **faster, scalable, and flexible alternative** for semantic search.
|
| 21 |
+
|
| 22 |
+
#### **Why FAISS is Better than a Traditional KG**
|
| 23 |
+
1. **Sentence-Level Retrieval**: Unlike traditional KGs that often rely on pre-defined **entities and relationships**, FAISS uses dense **embeddings** to directly match the **semantic meaning** of entire sentences.
|
| 24 |
+
2. **Scalable and High-Speed Retrieval**: FAISS efficiently handles **millions of embeddings**, making it highly scalable for real-world applications.
|
| 25 |
+
3. **Flexibility**: It works with **unstructured text**, removing the need to pre-process information into entities and relations, which is often time-consuming.
|
| 26 |
+
4. **Generalization**: FAISS enables **approximate nearest neighbor (ANN) search**, allowing retrieval of contextually related results, even if they are not exact matches.
|
| 27 |
+
|
| 28 |
+
#### **Dataset Used**
|
| 29 |
+
We leverage the **News Category Dataset** ([Kaggle Link](https://www.kaggle.com/datasets/rmisra/news-category-dataset)), which contains **news headlines and short descriptions** across various categories.
|
| 30 |
+
|
| 31 |
+
- **Why This Dataset?**
|
| 32 |
+
It covers a **wide range of topics**, making it useful for general-purpose context building.
|
| 33 |
+
- Headlines and descriptions provide **rich semantic embeddings** for similarity searches.
|
| 34 |
+
- Categories allow filtering relevant results if required (e.g., "science" or "technology").
|
| 35 |
+
|
| 36 |
+
**Process:**
|
| 37 |
+
1. We use **SentenceTransformer (all-MiniLM-L6-v2)** to generate embeddings for the query (the input news).
|
| 38 |
+
2. We search against pre-computed embeddings stored in a **FAISS index** to retrieve the **top-K most relevant entries**.
|
| 39 |
+
3. These results form the **initial context**, capturing related information already present in the dataset.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
### **Step 2: Online Search for Real-Time Context**
|
| 44 |
+
To **augment** the context retrieved from FAISS, we incorporate **real-time online search** using an API.
|
| 45 |
+
|
| 46 |
+
#### **Why Online Search is Critical?**
|
| 47 |
+
- **Fresh Information**: News and facts evolve, especially in areas like **science, technology, or politics**. Online search ensures access to the **latest updates** that may not exist in the static dataset.
|
| 48 |
+
- **Diverse Sources**: It broadens the scope by pulling information from **multiple credible sources**, reducing bias and enhancing reliability.
|
| 49 |
+
- **Fact-Checking**: Search engines often index **trusted fact-checking websites** that we can incorporate into the context.
|
| 50 |
+
|
| 51 |
+
**Process:**
|
| 52 |
+
1. Use an API with a **search query** derived from the input news.
|
| 53 |
+
2. Retrieve relevant snippets, headlines, or summaries.
|
| 54 |
+
3. Append these results to the **context** built using FAISS.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
### **Step 3: Building Context from Combined Sources**
|
| 59 |
+
Both FAISS-based retrieval and **online search results** are combined into a **single context string**. This provides a **comprehensive knowledge base** around the input information.
|
| 60 |
+
|
| 61 |
+
- **Why Combine Both?**
|
| 62 |
+
- FAISS offers **pre-indexed knowledge**—ideal for **static facts** or concepts.
|
| 63 |
+
- Online search complements it with **dynamic and up-to-date insights**—perfect for verifying **recent developments**.
|
| 64 |
+
|
| 65 |
+
This layered context improves the model’s ability to assess the **truthfulness** of the given information.
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
### **Step 4: Truthfulness Prediction with Zero-Shot Classification Model**
|
| 70 |
+
We use the **Facebook/BART-Large-MNLI** model, a **zero-shot classification** model, for evaluation.
|
| 71 |
+
|
| 72 |
+
#### **Why BART-Large-MNLI?**
|
| 73 |
+
1. **Zero-Shot Capability**: It can handle claims and hypotheses without needing **task-specific training**—perfect for this flexible, multi-domain use case.
|
| 74 |
+
2. **Contextual Matching**: It compares the input claim (news) with the constructed context to assess **semantic consistency**.
|
| 75 |
+
3. **High Accuracy**: Pre-trained on **natural language inference tasks**, making it adept at understanding relationships like **entailment** and **contradiction**.
|
| 76 |
+
4. **Multi-Label Support**: Can evaluate multiple labels simultaneously, ideal for **degrees of truthfulness**.
|
| 77 |
+
|
| 78 |
+
**Process:**
|
| 79 |
+
1. Input the **news** as the claim and the **context** as the hypothesis.
|
| 80 |
+
2. Compute a **truthfulness score** between **0 and 1**, where:
|
| 81 |
+
- **0**: Completely **false**.
|
| 82 |
+
- **1**: Completely **true**.
|
| 83 |
+
3. Generate **explanations** based on the score and suggest actions (e.g., further verification if uncertain).
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
### **End-to-End Example**
|
| 88 |
+
**Input News:**
|
| 89 |
+
"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
|
| 90 |
+
|
| 91 |
+
**Context Built:**
|
| 92 |
+
- **FAISS Search:** Finds prior research on **quantum time reversal** and **entanglement theories**.
|
| 93 |
+
- **Online Search:** Retrieves recent articles discussing **quantum breakthroughs** and expert views.
|
| 94 |
+
|
| 95 |
+
**Model Evaluation:**
|
| 96 |
+
- Model compares the news with the combined context and outputs:
|
| 97 |
+
**Score: 0.72** (Likely True).
|
| 98 |
+
|
| 99 |
+
**Result Explanation:**
|
| 100 |
+
```plaintext
|
| 101 |
+
News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
|
| 102 |
+
Truthfulness Score: 0.72 (Likely true)
|
| 103 |
+
Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
### **Why This Approach Works?**
|
| 109 |
+
1. **Balanced Context**: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
|
| 110 |
+
2. **Model Flexibility**: Zero-shot model adapts to diverse topics without retraining.
|
| 111 |
+
3. **Scalable and Cost-Effective**: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
|
| 112 |
+
4. **Interpretability**: Outputs include confidence scores and explanations for transparency.
|
| 113 |
+
|
| 114 |
This modular approach ensures that the **truthfulness assessment** is **scalable**, **explainable**, and **adaptable** to new domains.
|