Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Tweeties in a Tweety World

community
Activity Feed

AI & ML interests

Multilingual and Low-Resource NLP

François Remy's profile picture Pieter Delobelle's profile picture Miryam de Lhoneux's profile picture Avetisyan's profile picture Alfiya Khabibullina's profile picture Giuseppe Attanasio's profile picture Jessa Bekker's profile picture
Organization Card
Community About org cards

The Tweeties is a series of foundation models incorporating native tokenizers for each language, for a better understanding and generation of text in these languages. These models are adapted from existing models using trans-tokenization, and further pre-trained on existing corpora.

Collections 1

Papers on Trans-Tokenization
  • Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

    Paper • 2408.04303 • Published Aug 8, 2024 • 22
Papers on Trans-Tokenization
  • Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP

    Paper • 2408.04303 • Published Aug 8, 2024 • 22

models 6

Tweeties/tweety-7b-dutch-v24a

Text Generation • 7B • Updated May 11, 2025 • 433 • 13

Tweeties/tweety-tatar-hydra-mt-7b-v24a

Text Generation • 7B • Updated Aug 9, 2024

Tweeties/tweety-tatar-hydra-base-7b-v24a

Text Generation • 7B • Updated Aug 9, 2024 • 5

Tweeties/tweety-7b-tatar-v24a

Text Generation • 7B • Updated Aug 9, 2024 • 43 • 12

Tweeties/tweety-7b-armenian-v24a

Text Generation • 7B • Updated May 27, 2024 • 1 • 1

Tweeties/tweety-7b-italian-v24a

Text Generation • 7B • Updated May 13, 2024 • 2 • 2

datasets 0

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs