zegen-1

NOTE: This is an alpha version for early testing, the model weights and even architecture may change.

This is a keyword generation model that can rewrite natural language queries into arrays of keyword search expressions, intended for use in retrieval pipelines with sparse keyword embedding search available.

How to Use

from vllm import LLM, SamplingParams

query = "When am I supposed to complete my security training?"

llm = LLM(
    model="zeroentropy/zegen-1",
    tensor_parallel_size=1,
    dtype="auto",
    gpu_memory_utilization=0.8,
)

messages = [
    {
        "role": "system",
        "content": """
Your goal is to take a User natural language query, and make a list of Slack API searches for that User query. The presumption is that there are slack messages inside of the User's slack, that answer the User's query, and you are trying to find them.

- Consider synonyms for each of the words in the query
- Consider alternative ways a slack message could be written, that would match a query term, even if it's not an exact synonym. For example, if the query is "How do I reset my password", a target word could be "forgot". Even though, in isolation, no query word is synonymous with "forgot".
- Make a final list of 7 search queries, that you believe will be specific enough to pull 1-10 options, but not too specific that you get 0 results (It's very common to get 0 results from the slack API, so be conservative, prefer 1 word or 2 word searches, but have some searches at every length).
  - Note that capitalization doesn't make a difference, neither does suffixes such as -s or -ing. Don't include stop words either.
- Output your answer as a JSON string array. Do not output anything else.
""".strip(),
    },
    {"role": "user", "content": query},
]

outputs = llm.chat(
    messages=[messages],
    sampling_params=SamplingParams(
        max_tokens=max_new_tokens,
        temperature=0.0,
    ),
    chat_template_kwargs={"enable_thinking": False},
)

response = outputs[0].outputs[0].text
"""
[
    "security training",
    "security due",
    "training deadline",
    "security training deadline",
    "required training",
    "onboarding",
    "compliance"
]
"""

Evaluations

On a human-annotated set of evaluations over real private Slack repositories, zegen-1 shows significant accuracy gain over similarly small models, and 10x lower latencies over similarly accurate models.

Downloads last month
41
Safetensors
Model size
15B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zeroentropy/zegen-1

Finetuned
Qwen/Qwen3-14B
Finetuned
(152)
this model
Quantizations
1 model