Spaces:
Running
Running
davanstrien
HF Staff
Update README with new repositories (synthetic-data, deduplication, openai-oss)
261046a
| title: README | |
| emoji: π | |
| colorFrom: red | |
| colorTo: indigo | |
| sdk: static | |
| pinned: false | |
| # UV Scripts | |
| **Ready-to-run ML tools powered by UV - zero setup, maximum power** | |
| Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with `uv run`. | |
| ## What are UV scripts? | |
| UV scripts are self-contained Python scripts that use [inline metadata](https://docs.astral.sh/uv/guides/scripts/) to specify dependencies. Just `uv run script.py` and everything installs automatically. | |
| Perfect for: | |
| - π **GPU workflows** on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs) | |
| - π» **Local processing** on your machine | |
| - π **Reproducible pipelines** that work anywhere | |
| ## π Quick Example | |
| ```bash | |
| # Extract text from images with state-of-the-art OCR (no local GPU needed!) | |
| hf jobs uv run --flavor l4x1 \ | |
| https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \ | |
| your-images your-extracted-text | |
| ``` | |
| ## π Browse Scripts | |
| | Script Collection | Description | GPU Required | | |
| | ------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------ | | |
| | [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs (LaTeX, tables, forms) | β | | |
| | [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with guaranteed valid outputs | β | | |
| | [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create datasets from PDFs and files | β | | |
| | [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance inference with vLLM | β | | |
| | [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate high-quality synthetic data with CoT reasoning | β | | |
| | [deduplication](https://huggingface.co/datasets/uv-scripts/deduplication) | Remove duplicates using semantic similarity | β | | |
| | [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses with visible reasoning traces | β | | |
| ## π― Why UV Scripts? | |
| ### Zero Setup | |
| No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script. | |
| ### GPU Optimized | |
| Seamlessly run on local GPUs or scale to cloud with [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Same script, different compute. | |
| ## π Featured Scripts | |
| ### OCR Any Document Dataset | |
| Extract text from images with state-of-the-art accuracy: | |
| ```bash | |
| # Handles LaTeX, tables, forms, handwriting | |
| hf jobs uv run --flavor l4x1 \ | |
| https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \ | |
| your-images extracted-text | |
| ``` | |
| ### Deduplicate Datasets (CPU-Friendly!) | |
| Remove duplicates using semantic similarity - no GPU needed: | |
| ```bash | |
| # Fast semantic deduplication on CPU | |
| uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \ | |
| your-dataset text your-dataset-clean \ | |
| --method duplicates --threshold 0.9 | |
| ``` | |
| ### Generate Synthetic Training Data | |
| Create high-quality synthetic data with chain-of-thought reasoning: | |
| ```bash | |
| # Generate synthetic math problems with reasoning | |
| hf jobs uv run --flavor l4x1 \ | |
| https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \ | |
| --seed-dataset math-examples --output-dataset synthetic-math \ | |
| --task-type reasoning --num-samples 1000 | |
| ``` | |
| ## π Getting Started with HF Jobs | |
| Run any UV script on GPU infrastructure: | |
| ```bash | |
| hf jobs uv run --flavor l4x1 \ | |
| https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \ | |
| [args] | |
| ``` | |
| Choose your GPU flavor: | |
| - `l4x1` - Good balance for most tasks | |
| - `a10g-large` - More memory for larger models | |
| - `a100-large` - Maximum performance | |
| ## π Learn More | |
| - [UV Documentation](https://docs.astral.sh/uv/) | |
| - [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs) | |
| - [Script Examples](https://github.com/astral-sh/uv/tree/main/scripts) | |
| --- | |
| _UV Scripts is a community project showcasing the power of [UV](https://github.com/astral-sh/uv) for ML workflows._ | |