Jobly / RAG_ARCHITECTURE.md
Valentina9502's picture
First commit
fdf5af0 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

🧠 RAG Architecture & Vector Embeddings

Overview

GigMatch AI uses Retrieval-Augmented Generation (RAG) with vector embeddings to perform intelligent semantic matching between workers and gigs. This goes far beyond simple keyword matching!

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    DATA INGESTION                            │
├─────────────────────────────────────────────────────────────┤
│  50 Workers + 50 Gigs (JSON)                                │
│         ↓                                                     │
│  Text Enrichment (skills, bio, location, etc.)             │
│         ↓                                                     │
│  HuggingFace Embeddings (all-MiniLM-L6-v2)                 │
│         ↓                                                     │
│  Vector Storage (ChromaDB)                                   │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    QUERY PIPELINE                            │
├─────────────────────────────────────────────────────────────┤
│  User Query (worker profile or gig post)                    │
│         ↓                                                     │
│  Convert to Search Query                                     │
│         ↓                                                     │
│  Embed Query (HuggingFace)                                  │
│         ↓                                                     │
│  Semantic Search (Vector Similarity)                        │
│         ↓                                                     │
│  Retrieve Top K Results                                      │
│         ↓                                                     │
│  Calculate Match Scores                                      │
│         ↓                                                     │
│  Return Results to Agent                                     │
└─────────────────────────────────────────────────────────────┘

🦙 LlamaIndex Integration

Why LlamaIndex?

  1. Sponsor Recognition - LlamaIndex is a hackathon sponsor 🎉
  2. Production-Ready - Battle-tested RAG framework
  3. Easy Integration - Simple API for vector operations
  4. Flexible - Supports multiple vector stores and embeddings

Implementation

from llama_index.core import VectorStoreIndex, Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore

# Initialize embedding model
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create documents with rich text
worker_doc = Document(
    text=f"Name: {name}, Skills: {skills}, Location: {location}...",
    metadata=worker_data
)

# Create vector index
index = VectorStoreIndex.from_documents(
    documents,
    vector_store=vector_store
)

# Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("Looking for plumber in Rome...")

🤗 HuggingFace Embeddings

Model: all-MiniLM-L6-v2

Why this model?

  • ✅ Fast inference (only 23M parameters)
  • ✅ Good quality embeddings (384 dimensions)
  • ✅ Pre-trained on semantic similarity
  • ✅ HuggingFace sponsor recognition 🤗

Performance:

  • Embedding time: ~20ms per text
  • Vector size: 384 dimensions
  • Cosine similarity for matching

How Embeddings Work

  1. Text → Vector: Each worker/gig is converted to a 384-dimensional vector
  2. Semantic Meaning: Similar meanings = similar vectors
  3. Cosine Similarity: Measure angle between vectors (0-1 score)
  4. Top K: Return K most similar vectors

Example:

text1 = "Experienced plumber, pipe repair, Rome"
text2 = "Looking for plumbing services, leak fix, Rome"

# After embedding:
vec1 = [0.23, -0.45, 0.67, ...]  # 384 dimensions
vec2 = [0.21, -0.43, 0.69, ...]  # 384 dimensions

# Cosine similarity: 0.94 (very similar!)

📊 ChromaDB Vector Store

Why ChromaDB?

  • ✅ Simple local setup (no server needed)
  • ✅ Fast vector search
  • ✅ Native Python API
  • ✅ Persistence support
  • ✅ Perfect for demo/hackathon

Collections

Workers Collection:

  • 50 worker profiles
  • Indexed by skills, experience, location
  • Searchable by semantic similarity

Gigs Collection:

  • 50 gig posts
  • Indexed by requirements, project details
  • Searchable by semantic similarity

🎯 Semantic Matching Algorithm

Traditional Keyword Matching (OLD)

# Problem: Only finds exact keyword matches
if "plumbing" in worker_skills and "plumbing" in gig_requirements:
    score += 1  # Match!

Semantic Matching with RAG (NEW)

# Solution: Understands meaning and context

Query: "Need someone to fix leaking pipes"
Embedding: [0.23, -0.45, 0.67, ...]

Worker 1: "Plumber, pipe repair specialist"
Embedding: [0.21, -0.43, 0.69, ...]
Similarity: 0.94 ← HIGH MATCH!

Worker 2: "Electrician, wiring expert"
Embedding: [-0.11, 0.52, -0.33, ...]
Similarity: 0.12 ← LOW MATCH

# Semantic search finds Worker 1 even though 
# the word "plumbing" wasn't explicitly mentioned!

Advantages

  1. Synonym Understanding: "plumber" ≈ "pipe specialist"
  2. Context Awareness: "fix pipes" ≈ "repair plumbing"
  3. Related Concepts: "garden" ≈ "landscaping" ≈ "outdoor"
  4. Multi-language: Can handle slight variations
  5. Fuzzy Matching: Typos and variations still work

🔬 Match Score Calculation

Components

  1. Semantic Similarity (70% weight)

    • Cosine similarity from vector embeddings
    • Range: 0.0 to 1.0
    • Higher = better semantic match
  2. Keyword Overlap (20% weight)

    • Exact skill matches
    • Experience level alignment
    • Calculated as: matched_skills / required_skills
  3. Location Match (10% weight)

    • Geographic proximity
    • Remote work consideration
    • Binary: 1.0 (same location/remote) or 0.5 (different)

Final Formula

semantic_score = cosine_similarity(query_vec, doc_vec)
keyword_score = len(matched_skills) / len(required_skills)
location_score = 1.0 if location_match else 0.5

final_score = (
    semantic_score * 0.7 +
    keyword_score * 0.2 +
    location_score * 0.1
) * 100  # Convert to 0-100 scale

📈 Performance & Scalability

Current Setup (Demo)

  • 50 workers + 50 gigs = 100 vectors
  • Average query time: ~100ms
  • Embedding model loaded in memory: ~100MB
  • Total memory usage: ~200MB

Production Scaling

For 10,000 entries:

  • ✅ Still fast (<500ms per query)
  • ✅ ChromaDB handles easily
  • ✅ Consider batch embedding for ingestion

For 100,000+ entries:

  • Use hosted vector DB (Pinecone, Weaviate)
  • Batch processing for embeddings
  • Caching layer for frequent queries
  • GPU acceleration for embedding

🎨 Benefits for the Hackathon

Why This is WOW

  1. Not Just LLM Calls: Real vector database with semantic search
  2. Sponsor Integration: LlamaIndex 🦙 + HuggingFace 🤗
  3. Production Patterns: Proper RAG architecture
  4. Scalable: Easy to extend to 1000s of entries
  5. Explainable: Can show similarity scores

Demo Impact

Judges will see:

  • ✅ "Powered by LlamaIndex + HuggingFace"
  • ✅ Semantic similarity scores in results
  • ✅ Better matches than keyword search
  • ✅ 100 entries in vector database
  • ✅ Real-time vector search

🔮 Future Enhancements

Easy Wins

  • Add filters (location, budget, experience)
  • Implement hybrid search (semantic + keyword)
  • Add reranking with cross-encoders
  • Cache popular queries

Advanced

  • Fine-tune embedding model on gig data
  • Multi-modal embeddings (add images)
  • Graph relationships between skills
  • Temporal embeddings (availability matching)

📚 Code Examples

Creating the Index

# 1. Load data
workers = load_workers_from_json()

# 2. Create documents
documents = []
for worker in workers:
    text = f"""
    Name: {worker['name']}
    Skills: {', '.join(worker['skills'])}
    Experience: {worker['experience']}
    Location: {worker['location']}
    """
    doc = Document(text=text, metadata=worker)
    documents.append(doc)

# 3. Create vector store
chroma_collection = chroma_client.create_collection("workers")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# 4. Build index
index = VectorStoreIndex.from_documents(
    documents,
    vector_store=vector_store
)

Querying the Index

# 1. Create query
query = f"""
Looking for: {', '.join(required_skills)}
Location: {location}
Experience: {experience_level}
"""

# 2. Get query engine
query_engine = index.as_query_engine(similarity_top_k=5)

# 3. Execute query
response = query_engine.query(query)

# 4. Extract results
for node in response.source_nodes:
    worker_data = node.metadata
    similarity_score = node.score
    print(f"Match: {worker_data['name']}, Score: {similarity_score}")

🎯 Key Takeaways

  1. RAG = Better Matches: Semantic understanding > keyword matching
  2. LlamaIndex = Easy: Production RAG in <100 lines of code
  3. HuggingFace = Quality: Great embeddings, sponsor recognition
  4. ChromaDB = Fast: Local vector store, perfect for demo
  5. Scalable = Future-proof: Architecture works at scale

This is what makes GigMatch AI stand out in the hackathon! 🚀