2 2 6

Sagar pallai PRO

sagar007

AI & ML interests

LLM AND STABLE DIFFUSION

Recent Activity

new activity about 6 hours ago

sagar007/multigemma:🚨🚨🚨 License Violation Alert: Illegally Re-Licensing Google's Gemma Model as "Open Source"

replied to their post about 6 hours ago

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch

replied to their post 1 day ago

View all activity

Organizations

New activity in sagar007/multigemma about 6 hours ago

🚨🚨🚨 License Violation Alert: Illegally Re-Licensing Google's Gemma Model as "Open Source"

#1 opened 1 day ago by

JLouisBiz

replied to their post about 6 hours ago

I appreciate the clarification! My goal with this project was purely educational to understand the mechanics of how vision-language connectors work. While the base weights belong to Google and OpenAI, the implementation of the projection layers and the fine-tuning process was my contribution. I'm still learning the ropes, so I appreciate the call-out on the terminology!

replied to their post 1 day ago

Thanks

reacted to unmodeled-tyler's post with 👍 1 day ago

Post

3661

NEW MODEL: vanta-research/mox-tiny-1

Excited to share the first model from the new Mox-1 model family from VANTA Research!

Mox-Tiny-1 is finetuned from Llama's 3.1 8B architecture specifically for thinking partnership and high-fidelity AI interaction.

Training domains were carefully selected with this goal in mind, and all datasets used were synthetically generated for the Mox-1 models specifically.

Domains include but not limited to:
- Direct opinions (clear stances on various topics)
- Constructive disagreement (Pushing back on flawed premises)
- Epistemic confidence (calibrated certainty and uncertainty)
- Warmth & presence (Empathetic, human responses)
- Wonder & puzzlement (Intellectual curiosity)

Interaction Examples:

Direct opinions:

User: What's the best programming language?
Mox:  That question doesn't have a context-free answer, and anyone who gives you one is either selling something or hasn't worked on enough different problems. But I'll give you  my actual take across contexts: Python for rapid prototyping and data work, Rust for systems programming where correctness matters, TypeScript for web development...

Warmth & Humor:

User: I've been debugging this issue for six hours. 
Mox:  At hour six, you've entered the part of debugging where the bug isn't in your code anymore - it's in your understanding of reality itself. The semicolon starts looking suspicious. Show me what you're dealing with. Fresh eyes help, and mine haven't been staring at it since lunch.

Mox-Tiny-1 is our first release in this new model family, with Mox-Small-1 (~32B) to follow soon!

5 replies

replied to their post 2 days ago

i removed word "scratch", scratch mean building the full MLOps pipeline myself, including training, configuration with Hydra, data/versioning with DVC, and experiment tracking with MLflow.

replied to their post 3 days ago

By from scratch, I mean building the full MLOps pipeline myself, including training, configuration with Hydra, data/versioning with DVC, and experiment tracking with MLflow. Since it combines both training and pipeline development, I referred to it as from scratch. Similarly, papers like LLaVA claim training based on benchmarks, but in practice, they stitch together existing models.

posted an update 3 days ago

Post

4080

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!

Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!

🔧 What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency

📊 Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)

📈 sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding

🔗 **Try it yourself:**
- 🤗 Model: sagar007/multigemma
- 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m

Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!

Would love to hear your feedback! 🙏

#multimodal #gemma #clip #llava #vision-language #pytorch