HusseinLezzaik commited on
Commit
dfb50d7
·
verified ·
1 Parent(s): e8ea8b5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -7,22 +7,23 @@ tags:
7
  - gui-agent
8
  - vision-language-model
9
  - screen-understanding
 
10
  datasets:
11
- - TESS-Computer/agentnet
12
  base_model: HuggingFaceTB/SmolVLM2-500M-Instruct
13
  pipeline_tag: image-text-to-text
14
  ---
15
 
16
  # TESS-500M
17
 
18
- **TESS (Text-Enabled Screen Sense)** is a Vision-Language-Action model for computer use. Given a screenshot and natural language instruction, it predicts either a mouse action (click coordinates) or keyboard action (typing/shortcuts).
19
 
20
  ## Model Description
21
 
22
  - **Base Model**: SmolVLM2-500M-Instruct
23
  - **Architecture**: SmolVLM + Router + Mouse/Keyboard heads
24
  - **Parameters**: 508M total, 48M trainable
25
- - **Training Data**: [AgentNet](https://huggingface.co/datasets/TESS-Computer/agentnet) (~312K samples)
26
 
27
  ## Usage
28
 
@@ -31,7 +32,7 @@ import torch
31
  from PIL import Image
32
 
33
  # Clone the TESS repo
34
- # git clone https://github.com/yourusername/TESS.git
35
  # cd TESS/model
36
 
37
  from test_checkpoint import load_model, predict
@@ -101,9 +102,9 @@ Apache 2.0
101
 
102
  ```bibtex
103
  @misc{tess2024,
104
- title={TESS: Text-Enabled Screen Sense},
105
  author={Hussein Lezzaik},
106
  year={2024},
107
- url={https://github.com/yourusername/TESS}
108
  }
109
  ```
 
7
  - gui-agent
8
  - vision-language-model
9
  - screen-understanding
10
+ - vla
11
  datasets:
12
+ - TESS-Computer/tess-agentnet
13
  base_model: HuggingFaceTB/SmolVLM2-500M-Instruct
14
  pipeline_tag: image-text-to-text
15
  ---
16
 
17
  # TESS-500M
18
 
19
+ **TESS** is a Vision-Language-Action (VLA) model for computer use, inspired by robotic VLAs. Given a screenshot and natural language instruction, it predicts either a mouse action (click coordinates) or keyboard action (typing/shortcuts).
20
 
21
  ## Model Description
22
 
23
  - **Base Model**: SmolVLM2-500M-Instruct
24
  - **Architecture**: SmolVLM + Router + Mouse/Keyboard heads
25
  - **Parameters**: 508M total, 48M trainable
26
+ - **Training Data**: [tess-agentnet](https://huggingface.co/datasets/TESS-Computer/tess-agentnet) (~312K samples)
27
 
28
  ## Usage
29
 
 
32
  from PIL import Image
33
 
34
  # Clone the TESS repo
35
+ # git clone https://github.com/husseinlezzaik/TESS.git
36
  # cd TESS/model
37
 
38
  from test_checkpoint import load_model, predict
 
102
 
103
  ```bibtex
104
  @misc{tess2024,
105
+ title={TESS: A Vision-Language-Action Model for Computer Use},
106
  author={Hussein Lezzaik},
107
  year={2024},
108
+ url={https://github.com/husseinlezzaik/TESS}
109
  }
110
  ```