Spaces:

Aziz3
/

agent_decoder

Sleeping

App Files Files Community

agent_decoder / README.md

Aziz3

adding config

9154e2d 8 months ago

preview code

raw

history blame contribute delete

6.57 kB

	---
	title: English Accent Detector
	emoji: 🎤
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: "1.28.0"
	app_file: app.py
	pinned: false
	---

	# English Accent Detection Tool

	A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.

	## 🚀 Live Demo

	Deployed App: [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)

	## Features

	- Video Processing: Accepts public video URLs (MP4, Loom, etc.)
	- Audio Extraction: Automatically extracts audio from video files
	- Speech Transcription: Converts speech to text using Google Speech Recognition
	- Accent Analysis: Detects English accents with confidence scoring
	- Web Interface: Simple Streamlit UI for easy testing

	## Supported Accents

	- American English
	- British English
	- Australian English
	- Canadian English
	- South African English

	## Quick Start

	### Method 1: Use the Deployed App (Recommended)

	1. Visit: [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)
	2. Paste a public video URL
	3. Click "Analyze Accent"
	4. View results with confidence scores

	### Method 2: Local Installation

	```bash
	# Clone or download the script
	git clone <repository-url>
	cd accent-detector

	# Install dependencies
	pip install -r requirements.txt

	# Install ffmpeg (required for video processing)
	# On macOS:
	brew install ffmpeg

	# On Ubuntu/Debian:
	sudo apt update && sudo apt install ffmpeg

	# On Windows:
	# Download from https://ffmpeg.org/download.html

	# Run the app
	streamlit run accent_detector.py
	```

	## Installation

	1. Clone this repository and navigate to the project folder.
	2. (Recommended) Create and activate a Python virtual environment:
	```sh
	python3 -m venv ad_venv
	source ad_venv/bin/activate
	```
	3. Install all dependencies:
	```sh
	pip install -r requirements.txt
	```
	4. (Optional, but recommended for better performance) Install Watchdog:
	```sh
	xcode-select --install # macOS only, for build tools
	pip install watchdog
	```

	## Usage Examples

	### Test URLs
	```
	# Direct MP4 link
	https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_1mb.mp4

	# Loom video (public)
	https://www.loom.com/share/your-video-id

	# Google Drive (public)
	https://drive.google.com/file/d/your-file-id/view
	```

	### Expected Output
	```json
	{
	"accent": "American",
	"confidence": 78.5,
	"explanation": "High confidence in American accent with strong linguistic indicators.",
	"all_scores": {
	"American": 78.5,
	"British": 23.1,
	"Australian": 15.7,
	"Canadian": 19.2,
	"South African": 8.3
	}
	}
	```

	## Technical Architecture

	### Core Components

	1. Video Downloader: Downloads videos from public URLs
	2. Audio Extractor: Uses ffmpeg to extract WAV audio
	3. Speech Recognizer: Google Speech Recognition API
	4. Accent Analyzer: Pattern matching for linguistic markers
	5. Web Interface: Streamlit-based UI

	### Accent Detection Algorithm

	The system analyzes multiple linguistic features:

	- Vocabulary Patterns: Accent-specific word choices
	- Phonetic Markers: Pronunciation characteristics
	- Spelling Patterns: Regional spelling differences
	- Linguistic Markers: Characteristic phrases and expressions

	### Confidence Scoring

	- 0-20%: Insufficient markers detected
	- 21-50%: Moderate confidence with limited indicators
	- 51-75%: Good confidence with multiple patterns
	- 76-100%: High confidence with strong linguistic evidence

	## API Integration

	For programmatic access, use the core `AccentDetector` class:

	```python
	from accent_detector import AccentDetector

	detector = AccentDetector()
	result = detector.process_video("https://your-video-url.com/video.mp4")

	print(f"Accent: {result['accent']}")
	print(f"Confidence: {result['confidence']}%")
	```

	## Deployment

	### Streamlit Cloud (Recommended)

	1. Fork this repository
	2. Connect to Streamlit Cloud
	3. Deploy from your GitHub repo
	4. Share the public URL

	### Docker Deployment

	```dockerfile
	FROM python:3.9-slim

	# Install system dependencies
	RUN apt-get update && apt-get install -y ffmpeg

	WORKDIR /app
	COPY requirements.txt .
	RUN pip install -r requirements.txt

	COPY . .
	EXPOSE 8501

	CMD ["streamlit", "run", "accent_detector.py", "--server.port=8501", "--server.address=0.0.0.0"]
	```

	## Limitations & Considerations

	### Current Limitations
	- Requires clear speech audio (background noise affects accuracy)
	- Works best with 30+ seconds of speech
	- Free Google Speech Recognition has daily limits
	- Accent detection based on vocabulary/patterns, not phonetic analysis

	### Potential Improvements
	- Integrate phonetic analysis libraries
	- Add more accent varieties (Indian, Irish, etc.)
	- Implement batch processing for multiple videos
	- Add voice activity detection for better audio segmentation

	## Testing

	### Manual Testing
	1. Test with different accent samples
	2. Verify confidence scores are reasonable
	3. Check error handling with invalid URLs
	4. Test with various video formats

	### Automated Testing
	```python
	def test_accent_detection():
	detector = AccentDetector()

	# Test American accent
	american_text = "I'm gonna grab some cookies from the elevator"
	scores = detector.analyze_accent_patterns(american_text)
	assert scores['American'] > scores['British']

	# Test British accent
	british_text = "That's brilliant, quite lovely indeed"
	scores = detector.analyze_accent_patterns(british_text)
	assert scores['British'] > scores['American']
	```

	## Performance Metrics

	- Video Download: ~10-30 seconds (depends on file size)
	- Audio Extraction: ~5-15 seconds
	- Speech Recognition: ~10-30 seconds
	- Accent Analysis: <1 second
	- Total Processing: ~30-90 seconds per video

	## Troubleshooting

	### Common Issues

	Error: "Could not understand the audio"
	- Solution: Ensure clear speech, minimal background noise

	Error: "Failed to download video"
	- Solution: Verify URL is public and accessible

	Error: "ffmpeg not found"
	- Solution: Install ffmpeg system dependency

	Low confidence scores
	- Solution: Ensure longer speech samples (30+ seconds)

	### Support

	For technical issues or feature requests:
	1. Check the error messages in the Streamlit interface
	2. Verify all dependencies are installed correctly
	3. Test with known working video URLs

	## License

	MIT License - Free for commercial and personal use.

	---

	Built for REM Waste Interview Challenge
	Practical AI tools for automated hiring decisions