Spaces:

Alovestocode
/

router-router-zero

Running on Zero

Alovestocode commited on Nov 6

Commit

f5c6fe4

verified ·

1 Parent(s): 534388e

Default to smaller Llama checkpoint for faster init

Files changed (3) hide show

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ endpoint via the `HF_ROUTER_API` environment variable.
 | File | Purpose |
 | ---- | ------- |
-| `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-qwen3-32b-merged`), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
 | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
 | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
@@ -39,7 +39,7 @@ endpoint via the `HF_ROUTER_API` environment variable.
    ```
 3. **Configure secrets**
-   - `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-merged`
    - `HF_TOKEN` – token with read access to the merged model
 4. **Connect the main router UI**

 | File | Purpose |
 | ---- | ------- |
+| `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-llama31-merged` for faster startup), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
 | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
 | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
    ```
 3. **Configure secrets**
+   - `MODEL_REPO` – defaults to `Alovestocode/router-llama31-merged` (override if you need the larger Qwen/Gemma checkpoints)
    - `HF_TOKEN` – token with read access to the merged model
 4. **Connect the main router UI**

__pycache__/app.cpython-313.pyc CHANGED Viewed

Binary files a/__pycache__/app.cpython-313.pyc and b/__pycache__/app.cpython-313.pyc differ

app.py CHANGED Viewed

@@ -26,7 +26,7 @@ except Exception:  # pragma: no cover
 load_dotenv()
-MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-merged")
 MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
 DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
 DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))

 load_dotenv()
+MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-llama31-merged")
 MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
 DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
 DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))