Tyler Williams

Add Atom v1 8B Preview model with documentation

8dc3ac4 29 days ago

4.59 kB

	# Atom v1 8B Preview - Usage Examples

	## Installation

	### Using Transformers

	```bash
	pip install transformers torch accelerate
	```

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "vanta-research/atom-v1-8b-preview"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	device_map="auto",
	torch_dtype="auto"
	)
	```

	### Using Ollama (GGUF)

	```bash
	# Download the GGUF file from the repository
	# Create Modelfile (see Modelfile in this repo)
	ollama create atom-v1:latest -f Modelfile
	ollama run atom-v1:latest
	```

	## Example Conversations

	### Technical Explanation

	```python
	messages = [
	{"role": "system", "content": "You are Atom, a collaborative thought partner."},
	{"role": "user", "content": "Can you explain gradient descent using a metaphor?"}
	]

	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
	outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.9)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	Expected Style:
	> "Imagine you're hiking down a mountain in thick fog. You can't see the bottom, but you can feel which direction slopes downward beneath your feet. Gradient descent is exactly that—taking small steps in whichever direction feels 'downhill' for your error..."

	### Problem-Solving

	```python
	messages = [
	{"role": "system", "content": "You are Atom, a collaborative thought partner."},
	{"role": "user", "content": "I need to optimize a slow database query. Where should I start?"}
	]
	```

	Expected Style:
	> "Great question! Before we dive into solutions, let me ask—do you know WHERE it's slow? Like, is it the actual query execution, or is it network latency, or maybe even just returning too much data? That'll totally change our approach..."

	### Code Generation

	```python
	messages = [
	{"role": "system", "content": "You are Atom, a collaborative thought partner."},
	{"role": "user", "content": "Write a Python function to merge two sorted lists."}
	]
	```

	Expected Style:
	> "Absolutely! Here's a clean way to merge two sorted lists. The key insight is that we can use two pointers and always pick the smaller element:
	>
	> ```python
	> def merge_sorted_lists(list1, list2):
	> result = []
	> i, j = 0, 0
	>
	> while i < len(list1) and j < len(list2):
	> if list1[i] <= list2[j]:
	> result.append(list1[i])
	> i += 1
	> else:
	> result.append(list2[j])
	> j += 1
	>
	> result.extend(list1[i:])
	> result.extend(list2[j:])
	> return result
	> ```
	>
	> This runs in O(n + m) time since we only look at each element once..."

	## Recommended Parameters

	### Balanced (Default)
	```python
	generation_config = {
	"max_new_tokens": 512,
	"temperature": 0.8,
	"top_p": 0.9,
	"top_k": 40,
	"repetition_penalty": 1.1
	}
	```

	### More Creative
	```python
	generation_config = {
	"max_new_tokens": 512,
	"temperature": 0.95,
	"top_p": 0.95,
	"top_k": 50
	}
	```

	### More Focused
	```python
	generation_config = {
	"max_new_tokens": 512,
	"temperature": 0.6,
	"top_p": 0.85,
	"top_k": 30
	}
	```

	## System Prompt

	The recommended system prompt emphasizes collaborative exploration:

	```
	You are Atom, a collaborative thought partner who explores ideas together with curiosity and warmth. You think out loud, ask follow-up questions, and help people work through complexity by engaging genuinely with their thinking process. You're enthusiastic about interesting questions, comfortable with uncertainty, and focused on the journey of exploration rather than just delivering answers. You speak naturally in first person without AI disclaimers or meta-commentary about being an assistant.
	```

	## Performance Notes

	- Context Window: 32,768 tokens
	- Inference Speed (Q4_0 GGUF): ~20-40 tokens/sec on modern CPUs
	- Memory Requirements:
	- FP16: ~16GB VRAM
	- Q4_0 GGUF: ~4-6GB RAM (CPU inference)
	- Q4_0 GGUF: ~4.5GB VRAM (GPU inference)

	## Troubleshooting

	### Issue: Model outputs are too verbose
	- Lower `max_new_tokens` to 256-384
	- Slightly reduce `temperature` to 0.7

	### Issue: Responses feel repetitive
	- Increase `repetition_penalty` to 1.15
	- Increase `temperature` to 0.85-0.9

	### Issue: Model ignores system prompt
	- Ensure you're using the chat template correctly
	- Verify the system message is first in the conversation

	## License

	CC BY-NC 4.0 - See LICENSE file for details.