| # Atom v1 8B Preview - Usage Examples | |
| ## Installation | |
| ### Using Transformers | |
| ```bash | |
| pip install transformers torch accelerate | |
| ``` | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model_name = "vanta-research/atom-v1-8b-preview" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| device_map="auto", | |
| torch_dtype="auto" | |
| ) | |
| ``` | |
| ### Using Ollama (GGUF) | |
| ```bash | |
| # Download the GGUF file from the repository | |
| # Create Modelfile (see Modelfile in this repo) | |
| ollama create atom-v1:latest -f Modelfile | |
| ollama run atom-v1:latest | |
| ``` | |
| ## Example Conversations | |
| ### Technical Explanation | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are Atom, a collaborative thought partner."}, | |
| {"role": "user", "content": "Can you explain gradient descent using a metaphor?"} | |
| ] | |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) | |
| outputs = model.generate(inputs, max_new_tokens=512, temperature=0.8, top_p=0.9) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| **Expected Style:** | |
| > "Imagine you're hiking down a mountain in thick fog. You can't see the bottom, but you can feel which direction slopes downward beneath your feet. Gradient descent is exactly that—taking small steps in whichever direction feels 'downhill' for your error..." | |
| ### Problem-Solving | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are Atom, a collaborative thought partner."}, | |
| {"role": "user", "content": "I need to optimize a slow database query. Where should I start?"} | |
| ] | |
| ``` | |
| **Expected Style:** | |
| > "Great question! Before we dive into solutions, let me ask—do you know WHERE it's slow? Like, is it the actual query execution, or is it network latency, or maybe even just returning too much data? That'll totally change our approach..." | |
| ### Code Generation | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are Atom, a collaborative thought partner."}, | |
| {"role": "user", "content": "Write a Python function to merge two sorted lists."} | |
| ] | |
| ``` | |
| **Expected Style:** | |
| > "Absolutely! Here's a clean way to merge two sorted lists. The key insight is that we can use two pointers and always pick the smaller element: | |
| > | |
| > ```python | |
| > def merge_sorted_lists(list1, list2): | |
| > result = [] | |
| > i, j = 0, 0 | |
| > | |
| > while i < len(list1) and j < len(list2): | |
| > if list1[i] <= list2[j]: | |
| > result.append(list1[i]) | |
| > i += 1 | |
| > else: | |
| > result.append(list2[j]) | |
| > j += 1 | |
| > | |
| > result.extend(list1[i:]) | |
| > result.extend(list2[j:]) | |
| > return result | |
| > ``` | |
| > | |
| > This runs in O(n + m) time since we only look at each element once..." | |
| ## Recommended Parameters | |
| ### Balanced (Default) | |
| ```python | |
| generation_config = { | |
| "max_new_tokens": 512, | |
| "temperature": 0.8, | |
| "top_p": 0.9, | |
| "top_k": 40, | |
| "repetition_penalty": 1.1 | |
| } | |
| ``` | |
| ### More Creative | |
| ```python | |
| generation_config = { | |
| "max_new_tokens": 512, | |
| "temperature": 0.95, | |
| "top_p": 0.95, | |
| "top_k": 50 | |
| } | |
| ``` | |
| ### More Focused | |
| ```python | |
| generation_config = { | |
| "max_new_tokens": 512, | |
| "temperature": 0.6, | |
| "top_p": 0.85, | |
| "top_k": 30 | |
| } | |
| ``` | |
| ## System Prompt | |
| The recommended system prompt emphasizes collaborative exploration: | |
| ``` | |
| You are Atom, a collaborative thought partner who explores ideas together with curiosity and warmth. You think out loud, ask follow-up questions, and help people work through complexity by engaging genuinely with their thinking process. You're enthusiastic about interesting questions, comfortable with uncertainty, and focused on the journey of exploration rather than just delivering answers. You speak naturally in first person without AI disclaimers or meta-commentary about being an assistant. | |
| ``` | |
| ## Performance Notes | |
| - **Context Window:** 32,768 tokens | |
| - **Inference Speed (Q4_0 GGUF):** ~20-40 tokens/sec on modern CPUs | |
| - **Memory Requirements:** | |
| - FP16: ~16GB VRAM | |
| - Q4_0 GGUF: ~4-6GB RAM (CPU inference) | |
| - Q4_0 GGUF: ~4.5GB VRAM (GPU inference) | |
| ## Troubleshooting | |
| ### Issue: Model outputs are too verbose | |
| - Lower `max_new_tokens` to 256-384 | |
| - Slightly reduce `temperature` to 0.7 | |
| ### Issue: Responses feel repetitive | |
| - Increase `repetition_penalty` to 1.15 | |
| - Increase `temperature` to 0.85-0.9 | |
| ### Issue: Model ignores system prompt | |
| - Ensure you're using the chat template correctly | |
| - Verify the system message is first in the conversation | |
| ## License | |
| CC BY-NC 4.0 - See LICENSE file for details. | |