Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -7,22 +7,23 @@ tags:
|
|
| 7 |
- gui-agent
|
| 8 |
- vision-language-model
|
| 9 |
- screen-understanding
|
|
|
|
| 10 |
datasets:
|
| 11 |
-
- TESS-Computer/agentnet
|
| 12 |
base_model: HuggingFaceTB/SmolVLM2-500M-Instruct
|
| 13 |
pipeline_tag: image-text-to-text
|
| 14 |
---
|
| 15 |
|
| 16 |
# TESS-500M
|
| 17 |
|
| 18 |
-
**TESS
|
| 19 |
|
| 20 |
## Model Description
|
| 21 |
|
| 22 |
- **Base Model**: SmolVLM2-500M-Instruct
|
| 23 |
- **Architecture**: SmolVLM + Router + Mouse/Keyboard heads
|
| 24 |
- **Parameters**: 508M total, 48M trainable
|
| 25 |
-
- **Training Data**: [
|
| 26 |
|
| 27 |
## Usage
|
| 28 |
|
|
@@ -31,7 +32,7 @@ import torch
|
|
| 31 |
from PIL import Image
|
| 32 |
|
| 33 |
# Clone the TESS repo
|
| 34 |
-
# git clone https://github.com/
|
| 35 |
# cd TESS/model
|
| 36 |
|
| 37 |
from test_checkpoint import load_model, predict
|
|
@@ -101,9 +102,9 @@ Apache 2.0
|
|
| 101 |
|
| 102 |
```bibtex
|
| 103 |
@misc{tess2024,
|
| 104 |
-
title={TESS:
|
| 105 |
author={Hussein Lezzaik},
|
| 106 |
year={2024},
|
| 107 |
-
url={https://github.com/
|
| 108 |
}
|
| 109 |
```
|
|
|
|
| 7 |
- gui-agent
|
| 8 |
- vision-language-model
|
| 9 |
- screen-understanding
|
| 10 |
+
- vla
|
| 11 |
datasets:
|
| 12 |
+
- TESS-Computer/tess-agentnet
|
| 13 |
base_model: HuggingFaceTB/SmolVLM2-500M-Instruct
|
| 14 |
pipeline_tag: image-text-to-text
|
| 15 |
---
|
| 16 |
|
| 17 |
# TESS-500M
|
| 18 |
|
| 19 |
+
**TESS** is a Vision-Language-Action (VLA) model for computer use, inspired by robotic VLAs. Given a screenshot and natural language instruction, it predicts either a mouse action (click coordinates) or keyboard action (typing/shortcuts).
|
| 20 |
|
| 21 |
## Model Description
|
| 22 |
|
| 23 |
- **Base Model**: SmolVLM2-500M-Instruct
|
| 24 |
- **Architecture**: SmolVLM + Router + Mouse/Keyboard heads
|
| 25 |
- **Parameters**: 508M total, 48M trainable
|
| 26 |
+
- **Training Data**: [tess-agentnet](https://huggingface.co/datasets/TESS-Computer/tess-agentnet) (~312K samples)
|
| 27 |
|
| 28 |
## Usage
|
| 29 |
|
|
|
|
| 32 |
from PIL import Image
|
| 33 |
|
| 34 |
# Clone the TESS repo
|
| 35 |
+
# git clone https://github.com/husseinlezzaik/TESS.git
|
| 36 |
# cd TESS/model
|
| 37 |
|
| 38 |
from test_checkpoint import load_model, predict
|
|
|
|
| 102 |
|
| 103 |
```bibtex
|
| 104 |
@misc{tess2024,
|
| 105 |
+
title={TESS: A Vision-Language-Action Model for Computer Use},
|
| 106 |
author={Hussein Lezzaik},
|
| 107 |
year={2024},
|
| 108 |
+
url={https://github.com/husseinlezzaik/TESS}
|
| 109 |
}
|
| 110 |
```
|