Spaces:
Runtime error
Runtime error
| Feature: llama.cpp server | |
| Background: Server startup | |
| Given a server listening on localhost:8080 | |
| And a model url https://huggingface.co/ggml-org/models/resolve/main/bert-bge-small/ggml-model-f16.gguf | |
| And a model file bert-bge-small.gguf | |
| And a model alias bert-bge-small | |
| And 42 as server seed | |
| And 2 slots | |
| # the bert-bge-small model has context size of 512 | |
| # since the generated prompts are as big as the batch size, we need to set the batch size to <= 512 | |
| # ref: https://huggingface.co/BAAI/bge-small-en-v1.5/blob/5c38ec7c405ec4b44b94cc5a9bb96e735b38267a/config.json#L20 | |
| And 128 as batch size | |
| And 128 as ubatch size | |
| And 512 KV cache size | |
| And enable embeddings endpoint | |
| Then the server is starting | |
| Then the server is healthy | |
| Scenario: Embedding | |
| When embeddings are computed for: | |
| """ | |
| What is the capital of Bulgaria ? | |
| """ | |
| Then embeddings are generated | |
| Scenario: Embedding (error: prompt too long) | |
| When embeddings are computed for: | |
| """ | |
| Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. | |
| Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. | |
| Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. | |
| Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. | |
| Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. | |
| Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. | |
| Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. | |
| Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. | |
| """ | |
| And embeddings request with 500 api error | |
| Scenario: OAI Embeddings compatibility | |
| Given a model bert-bge-small | |
| When an OAI compatible embeddings computation request for: | |
| """ | |
| What is the capital of Spain ? | |
| """ | |
| Then embeddings are generated | |
| Scenario: OAI Embeddings compatibility with multiple inputs | |
| Given a model bert-bge-small | |
| Given a prompt: | |
| """ | |
| In which country Paris is located ? | |
| """ | |
| And a prompt: | |
| """ | |
| Is Madrid the capital of Spain ? | |
| """ | |
| When an OAI compatible embeddings computation request for multiple inputs | |
| Then embeddings are generated | |
| Scenario: Multi users embeddings | |
| Given a prompt: | |
| """ | |
| Write a very long story about AI. | |
| """ | |
| And a prompt: | |
| """ | |
| Write another very long music lyrics. | |
| """ | |
| And a prompt: | |
| """ | |
| Write a very long poem. | |
| """ | |
| And a prompt: | |
| """ | |
| Write a very long joke. | |
| """ | |
| Given concurrent embedding requests | |
| Then the server is busy | |
| Then the server is idle | |
| Then all embeddings are generated | |
| Scenario: Multi users OAI compatibility embeddings | |
| Given a prompt: | |
| """ | |
| In which country Paris is located ? | |
| """ | |
| And a prompt: | |
| """ | |
| Is Madrid the capital of Spain ? | |
| """ | |
| And a prompt: | |
| """ | |
| What is the biggest US city ? | |
| """ | |
| And a prompt: | |
| """ | |
| What is the capital of Bulgaria ? | |
| """ | |
| And a model bert-bge-small | |
| Given concurrent OAI embedding requests | |
| Then the server is busy | |
| Then the server is idle | |
| Then all embeddings are generated | |
| Scenario: All embeddings should be the same | |
| Given 10 fixed prompts | |
| And a model bert-bge-small | |
| Given concurrent OAI embedding requests | |
| Then all embeddings are the same | |