agent_decoder / README.md
Aziz3's picture
adding config
9154e2d
---
title: English Accent Detector
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.28.0"
app_file: app.py
pinned: false
---
# English Accent Detection Tool
A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.
## 🚀 Live Demo
**Deployed App:** [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)
## Features
- **Video Processing**: Accepts public video URLs (MP4, Loom, etc.)
- **Audio Extraction**: Automatically extracts audio from video files
- **Speech Transcription**: Converts speech to text using Google Speech Recognition
- **Accent Analysis**: Detects English accents with confidence scoring
- **Web Interface**: Simple Streamlit UI for easy testing
## Supported Accents
- American English
- British English
- Australian English
- Canadian English
- South African English
## Quick Start
### Method 1: Use the Deployed App (Recommended)
1. Visit: [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)
2. Paste a public video URL
3. Click "Analyze Accent"
4. View results with confidence scores
### Method 2: Local Installation
```bash
# Clone or download the script
git clone <repository-url>
cd accent-detector
# Install dependencies
pip install -r requirements.txt
# Install ffmpeg (required for video processing)
# On macOS:
brew install ffmpeg
# On Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
# On Windows:
# Download from https://ffmpeg.org/download.html
# Run the app
streamlit run accent_detector.py
```
## Installation
1. Clone this repository and navigate to the project folder.
2. (Recommended) Create and activate a Python virtual environment:
```sh
python3 -m venv ad_venv
source ad_venv/bin/activate
```
3. Install all dependencies:
```sh
pip install -r requirements.txt
```
4. (Optional, but recommended for better performance) Install Watchdog:
```sh
xcode-select --install # macOS only, for build tools
pip install watchdog
```
## Usage Examples
### Test URLs
```
# Direct MP4 link
https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_1mb.mp4
# Loom video (public)
https://www.loom.com/share/your-video-id
# Google Drive (public)
https://drive.google.com/file/d/your-file-id/view
```
### Expected Output
```json
{
"accent": "American",
"confidence": 78.5,
"explanation": "High confidence in American accent with strong linguistic indicators.",
"all_scores": {
"American": 78.5,
"British": 23.1,
"Australian": 15.7,
"Canadian": 19.2,
"South African": 8.3
}
}
```
## Technical Architecture
### Core Components
1. **Video Downloader**: Downloads videos from public URLs
2. **Audio Extractor**: Uses ffmpeg to extract WAV audio
3. **Speech Recognizer**: Google Speech Recognition API
4. **Accent Analyzer**: Pattern matching for linguistic markers
5. **Web Interface**: Streamlit-based UI
### Accent Detection Algorithm
The system analyzes multiple linguistic features:
- **Vocabulary Patterns**: Accent-specific word choices
- **Phonetic Markers**: Pronunciation characteristics
- **Spelling Patterns**: Regional spelling differences
- **Linguistic Markers**: Characteristic phrases and expressions
### Confidence Scoring
- **0-20%**: Insufficient markers detected
- **21-50%**: Moderate confidence with limited indicators
- **51-75%**: Good confidence with multiple patterns
- **76-100%**: High confidence with strong linguistic evidence
## API Integration
For programmatic access, use the core `AccentDetector` class:
```python
from accent_detector import AccentDetector
detector = AccentDetector()
result = detector.process_video("https://your-video-url.com/video.mp4")
print(f"Accent: {result['accent']}")
print(f"Confidence: {result['confidence']}%")
```
## Deployment
### Streamlit Cloud (Recommended)
1. Fork this repository
2. Connect to Streamlit Cloud
3. Deploy from your GitHub repo
4. Share the public URL
### Docker Deployment
```dockerfile
FROM python:3.9-slim
# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "accent_detector.py", "--server.port=8501", "--server.address=0.0.0.0"]
```
## Limitations & Considerations
### Current Limitations
- Requires clear speech audio (background noise affects accuracy)
- Works best with 30+ seconds of speech
- Free Google Speech Recognition has daily limits
- Accent detection based on vocabulary/patterns, not phonetic analysis
### Potential Improvements
- Integrate phonetic analysis libraries
- Add more accent varieties (Indian, Irish, etc.)
- Implement batch processing for multiple videos
- Add voice activity detection for better audio segmentation
## Testing
### Manual Testing
1. Test with different accent samples
2. Verify confidence scores are reasonable
3. Check error handling with invalid URLs
4. Test with various video formats
### Automated Testing
```python
def test_accent_detection():
detector = AccentDetector()
# Test American accent
american_text = "I'm gonna grab some cookies from the elevator"
scores = detector.analyze_accent_patterns(american_text)
assert scores['American'] > scores['British']
# Test British accent
british_text = "That's brilliant, quite lovely indeed"
scores = detector.analyze_accent_patterns(british_text)
assert scores['British'] > scores['American']
```
## Performance Metrics
- **Video Download**: ~10-30 seconds (depends on file size)
- **Audio Extraction**: ~5-15 seconds
- **Speech Recognition**: ~10-30 seconds
- **Accent Analysis**: <1 second
- **Total Processing**: ~30-90 seconds per video
## Troubleshooting
### Common Issues
**Error: "Could not understand the audio"**
- Solution: Ensure clear speech, minimal background noise
**Error: "Failed to download video"**
- Solution: Verify URL is public and accessible
**Error: "ffmpeg not found"**
- Solution: Install ffmpeg system dependency
**Low confidence scores**
- Solution: Ensure longer speech samples (30+ seconds)
### Support
For technical issues or feature requests:
1. Check the error messages in the Streamlit interface
2. Verify all dependencies are installed correctly
3. Test with known working video URLs
## License
MIT License - Free for commercial and personal use.
---
**Built for REM Waste Interview Challenge**
*Practical AI tools for automated hiring decisions*