---
base_model: my2000cup/Gaia-LLM-8B
library_name: peft
license: other
tags:
- llama-factory
- lora
- generated_from_trainer
- llama-cpp
- gguf-my-repo
model-index:
- name: train_2025-05-04-14-25-19
  results: []
---

# FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF
This model was converted to GGUF format from [`my2000cup/Gaia-LLM-8B`](https://huggingface.co/my2000cup/Gaia-LLM-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/my2000cup/Gaia-LLM-8B) for more details on the model.

This model is a fine-tuned version of [../pretrained/Qwen3-4B](https://huggingface.co/../pretrained/Qwen3-8B) on the wikipedia_zh, petro_books, datasets001, the datasets002, the datasets003, the datasets004 and the datasets006 datasets.

## Model description

Gaia-Petro-LLM is a large language model specialized in the oil and gas industry, fine-tuned from Qwen/Qwen3-4B. It was further pre-trained on a curated 20GB corpus of petroleum engineering texts, including technical documents, academic papers, and domain literature. The model is designed to support domain experts, researchers, and engineers in petroleum-related tasks, providing high-quality, domain-specific language understanding and generation.
## Model Details
Base Model: Qwen/Qwen3-8B
Domain: Oil & Gas / Petroleum Engineering
Corpus Size: ~20GB (petroleum engineering)
Languages: Primarily Chinese; domain-specific English supported
Repository: my2000cup/Gaia-LLM-8B

## Intended uses & limitations

Technical Q&A in petroleum engineering
Document summarization for oil & gas reports
Knowledge extraction from unstructured domain texts
Education & training in oil & gas technologies

Not suitable for general domain tasks outside oil & gas.
May not be up to date with the latest industry developments (post-2023).
Not to be used for critical, real-time decision-making without expert review.

## Training and evaluation data

The model was further pre-trained on an in-house text corpus (~20GB) collected from:

Wikipedia (Chinese, petroleum-related entries)
Open petroleum engineering books and literature
Technical standards and manuals

### Context Windows
From the original [tokenizer configuration](https://huggingface.co/my2000cup/Gaia-LLM-8B/blob/main/tokenizer_config.json), the model inherited the same conext window of **128k tokens**

```
"model_max_length": 131072
```

Unless you have a really powerfull GPU with a lot of VRAM, when running the model with llama.cpp try first a smaller context window of 12k, or offload only a part of the GPU layers to keep more VRAM available for the context window

---

## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp

```
Invoke the llama.cpp server or the CLI.

### CLI:
```bash
llama-cli --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -p "The meaning to life and the universe is"
```

### Server:
```bash
llama-server --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -c 2048
```

Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```

Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```

Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -p "The meaning to life and the universe is"
```
or llama-server with 12k context window
```
./llama-server --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -c 12288
```

Use the flag `-ngl XX` where XX is the aomunt of layers to move to the GPU

example of local run of the model, considering you already downloaded the GGUF file
This command will load on GPU all the 36 layers
> better you start with a smaller number though...

```
./llama-server -m gaia-llm-8b-q4_k_m.gguf -c 12288 -ngl 36
```

About the original model

# Qwen3-8B
<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
    <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
</a>

## Qwen3 Highlights

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
- **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
- **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
- **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.

## Model Overview

**Qwen3-8B** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 8.2B
- Number of Paramaters (Non-Embedding): 6.95B
- Number of Layers: 36
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts). 

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).