amewebstudio
/

sparseflow-chat

sparse-attention

Model card Files Files and versions

sparseflow-chat / README.md

amewebstudio's picture

Upload README.md with huggingface_hub

1587be1 verified 4 days ago

|

history blame contribute delete

1.17 kB

	---
	license: mit
	tags:
	- sparseflow
	- sparse-attention
	- conversational
	- efficient
	---

	# SparseFlow-Chat v5

	An efficient conversational AI with sparse attention - achieving significant compute savings.

	## 🚀 Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Parameters \| 39,840,002 \|
	\| Perplexity \| 1.00 \|
	\| Token Sparsity \| 87.5% \|
	\| Attention Saved \| 87.5% \|

	## 🏗️ Architecture

	- Sparse Token Router: O(n×k) instead of O(n²) attention
	- Persistent Memory Banks: Store and retrieve knowledge
	- Channel Sparsity: Activates only top-k channels

	### Complexity Comparison

	\| Operation \| Transformer \| SparseFlow \| Speedup \|
	\|-----------\|-------------\|------------\|--------\|
	\| Attention \| O(n²) \| O(n×k) \| 8x \|
	\| FFN \| O(n×d²) \| O(n×k×d) \| ~4x \|

	## 💬 Usage

	```python
	# Load model
	import torch
	checkpoint = torch.load("model.pt")
	# ... initialize model with config.json
	model.load_state_dict(checkpoint['model'])

	# Chat
	response = chat("What is the capital of France?")
	# -> "The capital of France is Paris."
	```

	## 📝 Created By

	Logo (Mike Amega) — [Ame Web Studio](https://github.com/AmeWebStudio)

	February 2025