--- library_name: transformers tags: - llama - gguf - safetensors - finetune base_model: - gia-uh/cecilia-2b-v0.1 license: mit datasets: - gia-uh/maria-silvia-v1 language: - es - en pipeline_tag: text-generation --- # Cecilia: The Cuban Language Model Cecilia is a family of language models continual pretrained specifically on Cuban written text, capturing the linguistic, cultural, and social nuances of Cuban Spanish. These models are designed to support natural language processing tasks with a focus on Cuban language varieties and cultural context. ## About Cecilia FT MS v1 This model is a fine-tuned version of **Cecilia 2B v0.1** which is a continual pre-trained model based on [Salamandra 2b](https://huggingface.co/BSC-LT/salamandra-2b). It belongs to the **Cecilia** collection and follows the same lineage as [Cecilia 2B v0.1](https://huggingface.co/gia-uh/cecilia-2b-v0.1). ## Model Formats This repository is a **Hybrid Release** containing: - **Safetensors:** For use with Hugging Face `transformers`. - **GGUF (FP16):** For use with `llama.cpp`, `vLLM`, or local inference tools. ## Quantizations Official quantized GGUF versions (Q8_0, Q6_K, Q4_K_M) in the repository [gia-uh/cecilia-2b-instruct-v1-GGUF](https://huggingface.co/gia-uh/cecilia-2b-instruct-v1-GGUF) ## Quickstart (Transformers) ```python from transformers import AutoConfig, AutoModel, AutoTokenizer repo_id = "gia-uh/cecilia_ft_ms_v1" # Load model and tokenizer config = AutoConfig.from_pretrained(repo_id, trust_remote_code=False) tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModel.from_pretrained(repo_id, trust_remote_code=False) # Simple inference inputs = tokenizer("Hola, que bolá?", return_tensors="pt") outputs = model(**inputs)