Spaces:
Running
Running
| # LLM Text Generation (Chat) | |
| This benchmark suite benchmarks vLLM and TGI with the chat completion task with various models. | |
| ## Setup | |
| ### Docker images | |
| You can pull vLLM and TGI Docker images with: | |
| ```sh | |
| docker pull mlenergy/vllm:v0.4.2-openai | |
| docker pull mlenergy/tgi:v2.0.2 | |
| ``` | |
| ### Installing Benchmark Script Dependencies | |
| ```sh | |
| pip install -r requirements.txt | |
| ``` | |
| ### Starting the NVML container | |
| Changing the power limit requires the `SYS_ADMIN` Linux security capability, which we delegate to a daemon Docker container running a base CUDA image. | |
| ```sh | |
| bash ../../common/start_nvml_container.sh | |
| ``` | |
| With the `nvml` container running, you can change power limit with something like `docker exec nvml nvidia-smi -i 0 -pl 200`. | |
| ### HuggingFace cache directory | |
| The scripts assume the HuggingFace cache directory will be under `/data/leaderboard/hfcache` on the node that runs this benchmark. | |
| ## Benchmarking | |
| ### Obtaining one datapoint | |
| Export your HuggingFace hub token as environment variable `$HF_TOKEN`. | |
| The script `scripts/benchmark_one_datapoint.py` assumes that it was run from the directory where `scripts` is, like this: | |
| ```sh | |
| python scripts/benchmark_one_datapoint.py --help | |
| ``` | |
| ### Obtaining all datapoints for a single model | |
| Run `scripts/benchmark_one_model.py`. | |
| ### Running the entire suite with Pegasus | |
| You can use [`pegasus`](https://github.com/jaywonchung/pegasus) to run the entire benchmark suite. | |
| Queue and host files are in [`./pegasus`](./pegasus). | |