Alovestocode commited on
Commit
f5c6fe4
·
verified ·
1 Parent(s): 534388e

Default to smaller Llama checkpoint for faster init

Browse files
Files changed (3) hide show
  1. README.md +2 -2
  2. __pycache__/app.cpython-313.pyc +0 -0
  3. app.py +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ endpoint via the `HF_ROUTER_API` environment variable.
20
 
21
  | File | Purpose |
22
  | ---- | ------- |
23
- | `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-qwen3-32b-merged`), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
24
  | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
25
  | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
26
 
@@ -39,7 +39,7 @@ endpoint via the `HF_ROUTER_API` environment variable.
39
  ```
40
 
41
  3. **Configure secrets**
42
- - `MODEL_REPO` – defaults to `Alovestocode/router-qwen3-32b-merged`
43
  - `HF_TOKEN` – token with read access to the merged model
44
 
45
  4. **Connect the main router UI**
 
20
 
21
  | File | Purpose |
22
  | ---- | ------- |
23
+ | `app.py` | Loads the merged checkpoint on demand (defaults to `Alovestocode/router-llama31-merged` for faster startup), exposes a `/v1/generate` API, and ships an interactive Gradio UI for manual testing. |
24
  | `requirements.txt` | Minimal dependency set (transformers, bitsandbytes, torch, gradio, fastapi). |
25
  | `.huggingface/spaces.yml` | Configures the Space for ZeroGPU hardware and disables automatic sleep. |
26
 
 
39
  ```
40
 
41
  3. **Configure secrets**
42
+ - `MODEL_REPO` – defaults to `Alovestocode/router-llama31-merged` (override if you need the larger Qwen/Gemma checkpoints)
43
  - `HF_TOKEN` – token with read access to the merged model
44
 
45
  4. **Connect the main router UI**
__pycache__/app.cpython-313.pyc CHANGED
Binary files a/__pycache__/app.cpython-313.pyc and b/__pycache__/app.cpython-313.pyc differ
 
app.py CHANGED
@@ -26,7 +26,7 @@ except Exception: # pragma: no cover
26
  load_dotenv()
27
 
28
 
29
- MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-qwen3-32b-merged")
30
  MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
31
  DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
32
  DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))
 
26
  load_dotenv()
27
 
28
 
29
+ MODEL_ID = os.environ.get("MODEL_REPO", "Alovestocode/router-llama31-merged")
30
  MAX_NEW_TOKENS = int(os.environ.get("MAX_NEW_TOKENS", "600"))
31
  DEFAULT_TEMPERATURE = float(os.environ.get("DEFAULT_TEMPERATURE", "0.2"))
32
  DEFAULT_TOP_P = float(os.environ.get("DEFAULT_TOP_P", "0.9"))