Spaces:

jiani-huang
/

LASER

Running on Zero

App Files Files Community

LASER / src /vine_hf /OVERVIEW.md

moqingyan123

final fixes

888f9e4 12 days ago

preview code

raw

history blame

6.95 kB

VINE HuggingFace Interface - Complete Overview

This directory contains a complete HuggingFace-compatible interface for the VINE (Video Understanding with Natural Language) model. The interface allows you to easily use, share, and deploy your VINE model through the HuggingFace ecosystem.

📁 Directory Structure

vine_hf/
├── __init__.py                 # Package initialization and exports
├── vine_config.py              # VineConfig class (PretrainedConfig)
├── vine_model.py               # VineModel class (PreTrainedModel)  
├── vine_pipeline.py            # VinePipeline class (Pipeline)
├── example_usage.py            # Comprehensive usage examples
├── convert_inference.py        # Migration guide from inference.py
├── push_to_hub.py             # Script to push model to HF Hub
├── setup.py                   # Package setup configuration
├── README.md                  # Detailed documentation
└── OVERVIEW.md                # This file

🏗️ Architecture Components

1. VineConfig (`vine_config.py`)

Inherits from PretrainedConfig
Configures model parameters, segmentation methods, and processing options
Compatible with HuggingFace configuration system

2. VineModel (`vine_model.py`)

Inherits from PreTrainedModel
Implements the core VINE model with three CLIP backbones
Supports categorical, unary, and binary predictions
Provides both forward() and predict() methods

3. VinePipeline (`vine_pipeline.py`)

Inherits from Pipeline
Handles end-to-end video processing workflow
Integrates segmentation (SAM2, Grounding DINO + SAM2)
Provides user-friendly interface for video understanding

🚀 Key Features

✅ Full HuggingFace Compatibility

Compatible with transformers library
Supports AutoModel and pipeline interfaces
Can be pushed to and loaded from HuggingFace Hub

✅ Flexible Segmentation

Support for SAM2 automatic segmentation
Support for Grounding DINO + SAM2 text-guided segmentation
Configurable thresholds and parameters

✅ Multi-Modal Understanding

Categorical classification (object types)
Unary predicates (single object actions)
Binary relations (object-object relationships)

✅ Easy Integration

Simple pipeline interface for end users
Direct model access for researchers
Comprehensive configuration options

📖 Usage Examples

Quick Start with Pipeline

from transformers import pipeline
from vine_hf import VineModel, VinePipeline

# Create pipeline
vine_pipeline = pipeline(
    "vine-video-understanding",
    model="your-username/vine-model",
    trust_remote_code=True
)

# Process video
results = vine_pipeline(
    "video.mp4",
    categorical_keywords=['human', 'dog', 'frisbee'],
    unary_keywords=['running', 'jumping'],
    binary_keywords=['chasing', 'behind']
)

Direct Model Usage

from vine_hf import VineConfig, VineModel

config = VineConfig(segmentation_method="grounding_dino_sam2")
model = VineModel(config)

results = model.predict(
    video_frames=video_tensor,
    masks=masks_dict,
    bboxes=bboxes_dict,
    categorical_keywords=['human', 'dog'],
    unary_keywords=['running', 'sitting'],
    binary_keywords=['chasing', 'near']
)

🔧 Migration from Original Code

The convert_inference.py script shows how to migrate from the original inference.py workflow:

Original Approach:

Manual model loading and configuration
Direct handling of segmentation pipeline
Custom result processing
Complex setup requirements

New HuggingFace Interface:

Standardized model configuration
Automatic preprocessing/postprocessing
Simple pipeline interface
Easy sharing via HuggingFace Hub

📤 Sharing Your Model

Use the push_to_hub.py script to share your trained model:

python vine_hf/push_to_hub.py \
    --weights path/to/your/model.pth \
    --repo your-username/vine-model \
    --login

🛠️ Installation & Setup

Install Dependencies:

pip install transformers torch torchvision opencv-python pillow numpy

Install Segmentation Models (Optional):
- SAM2: https://github.com/facebookresearch/sam2
- Grounding DINO: https://github.com/IDEA-Research/GroundingDINO
Install VINE HF Interface:

cd vine_hf
pip install -e .

🎯 Configuration Options

The VineConfig class supports extensive configuration:

Model Settings: CLIP backbone, hidden dimensions
Segmentation: Method, thresholds, target FPS
Processing: Alpha values, top-k results, video length limits
Performance: Multi-class mode, output format options

📊 Output Format

The interface returns structured predictions:

{
    "categorical_predictions": {obj_id: [(prob, category), ...]},
    "unary_predictions": {(frame, obj): [(prob, action), ...]},
    "binary_predictions": {(frame, pair): [(prob, relation), ...]},
    "confidence_scores": {"categorical": float, "unary": float, "binary": float},
    "summary": {
        "num_objects_detected": int,
        "top_categories": [(category, prob), ...],
        "top_actions": [(action, prob), ...],
        "top_relations": [(relation, prob), ...]
    }
}

🔍 Testing & Validation

Run the example scripts to test your setup:

# Test basic functionality
python vine_hf/example_usage.py

# Test migration from original code  
python vine_hf/convert_inference.py

🤝 Contributing

To contribute or customize:

Modify Configuration: Edit vine_config.py for new parameters
Extend Model: Add functionality to vine_model.py
Enhance Pipeline: Improve preprocessing/postprocessing in vine_pipeline.py
Add Features: Create additional utility scripts

📝 Next Steps

Load Your Weights: Use your trained VINE model weights
Test Segmentation: Set up Grounding DINO and SAM2 models
Validate Results: Compare with original inference.py output
Share Model: Push to HuggingFace Hub for community use
Deploy: Use in applications, demos, or research projects

🐛 Troubleshooting

Common Issues:

Import Errors: Check PYTHONPATH and package installation
Segmentation Failures: Verify Grounding DINO/SAM2 setup
Weight Loading: Adjust weight loading logic in convert_inference.py
CUDA Issues: Check GPU availability and PyTorch installation

Support:

Check the README.md for detailed documentation
Review example_usage.py for working code examples
Examine convert_inference.py for migration guidance

This HuggingFace interface makes VINE accessible to the broader ML community while maintaining all the powerful video understanding capabilities of the original model. The standardized interface enables easy sharing, deployment, and integration with existing HuggingFace workflows.