Spaces:

Abdullahrasheed45
/

AI_Multimodal_Web_GPU_Assistant

Configuration error

App Files Files Community

AI_Multimodal_Web_GPU_Assistant / README.md

Abdullahrasheed45

Update README.md

b274517 verified about 1 month ago

preview code

raw

history blame contribute delete

2.69 kB

	---
	title: Ministral WebGPU
	emoji: ⚡️
	colorFrom: red
	colorTo: yellow
	sdk: static
	pinned: false
	license: apache-2.0
	short_description: Frontier multimodal AI, running entirely in your browser.
	app_build_command: npm run build
	app_file: dist/index.html
	models:
	- mistralai/Ministral-3-3B-Instruct-2512-ONNX
	- mistralai/Ministral-3-3B-Instruct-2512
	---
	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# AI Multimodal WebGPU Assistant

	Developer: Muhammad Abdullah Rasheed
	Research Assistant @ Cambridge \| MSc Data Science & AI '25 \| Google WTM Scholar

	## Overview

	This project demonstrates cutting-edge browser-based AI by running a complete 3B parameter multimodal language model entirely client-side using WebGPU acceleration. No servers, no API calls, no data sent anywhere - complete privacy and instant inference.

	## Key Features

	- Privacy-First Architecture: The entire Ministral-3B model runs locally in your browser using WebGPU - your video feed never leaves your device
	- Real-Time Multimodal AI: Live camera feed processing with visual question answering capabilities
	- WebGPU Acceleration: Leveraging the latest browser GPU APIs for near-native performance
	- Zero Backend Dependencies: No API keys, no server calls, no external services required
	- Cross-Platform: Works seamlessly across modern browsers with WebGPU support

	## Technical Stack

	- Model: Ministral-3-3B-Instruct (quantized for browser deployment)
	- Runtime: Transformers.js for in-browser inference
	- Compute: WebGPU API for GPU acceleration
	- Frontend: Modern JavaScript with WebAssembly integration

	## Use Cases

	- Visual question answering from live camera feed
	- Real-time scene understanding and description
	- Privacy-sensitive AI applications
	- Edge computing demonstrations
	- Educational tool for AI and browser technologies

	## Why This Matters

	This project showcases the future of AI deployment - moving powerful language models from cloud servers to the edge, where they can provide instant, private, and accessible intelligence without compromising user privacy or requiring expensive infrastructure.

	## Author

	Muhammad Abdullah Rasheed
	Research Assistant \| AI & Machine Learning Researcher
	- 🎓 MSc Data Science & AI '25, Google WTM Scholar
	- 🔬 Research areas: Computer Vision, NLP, Climate AI
	- 💼 Experience: Gesture Recognition, Backend Development, ML Engineering
	- 🔗 [LinkedIn](https://www.linkedin.com/in/muhammad-abdullahrasheed-/) \| [GitHub](https://github.com/Abdullahrasheed45) \| [HuggingFace](https://huggingface.co/Abdullahrasheed45)

	## License

	Apache-2.0