Spaces:
Running
on
Zero
Running
on
Zero
Default to smaller Llama checkpoint for faster init
Browse files- README.md +2 -2
- __pycache__/app.cpython-313.pyc +0 -0
- app.py +1 -1
README.md
CHANGED
|
@@ -20,7 +20,7 @@ endpoint via the `HF_ROUTER_API` environment variable.
|
|
| 20 |
|
| 21 |
| File | Purpose |
|
| 22 |
| ---- | ------- |
|
| 23 |
-
| `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-
|
| 24 |
| `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
|
| 25 |
| `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
|
| 26 |
|
|
@@ -39,7 +39,7 @@ endpoint via the `HF_ROUTER_API` environment variable.
|
|
| 39 |
```
|
| 40 |
|
| 41 |
3. **Configure secrets**
|
| 42 |
-
- `MODEL_REPO` – defaults to `Alovestocode/router-
|
| 43 |
- `HF_TOKEN` – token with read access to the merged model
|
| 44 |
|
| 45 |
4. **Connect the main router UI**
|
|
|
|
| 20 |
|
| 21 |
| File | Purpose |
|
| 22 |
| ---- | ------- |
|
| 23 |
+
| `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-llama31-merged` for faster startup), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
|
| 24 |
| `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
|
| 25 |
| `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
|
| 26 |
|
|
|
|
| 39 |
```
|
| 40 |
|
| 41 |
3. **Configure secrets**
|
| 42 |
+
- `MODEL_REPO` – defaults to `Alovestocode/router-llama31-merged` (override if you need the larger Qwen/Gemma checkpoints)
|
| 43 |
- `HF_TOKEN` – token with read access to the merged model
|
| 44 |
|
| 45 |
4. **Connect the main router UI**
|
__pycache__/app.cpython-313.pyc
CHANGED
|
Binary files a/__pycache__/app.cpython-313.pyc and b/__pycache__/app.cpython-313.pyc differ
|
|
|
app.py
CHANGED
|
@@ -26,7 +26,7 @@ except Exception: # pragma: no cover
|
|
| 26 |
load_dotenv()
|
| 27 |
|
| 28 |
|
| 29 |
-
MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-
|
| 30 |
MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
|
| 31 |
DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
|
| 32 |
DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))
|
|
|
|
| 26 |
load_dotenv()
|
| 27 |
|
| 28 |
|
| 29 |
+
MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-llama31-merged")
|
| 30 |
MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
|
| 31 |
DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
|
| 32 |
DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))
|