9 2 20

Yatharth Sharma

YatharthS

AI & ML interests

TTS, speech generation, Agents, MCP

Recent Activity

updated a model about 22 hours ago

YatharthS/NovaSR

liked a model 2 days ago

YatharthS/NovaSR

published a model 2 days ago

YatharthS/NovaSR

View all activity

Organizations

updated a model about 22 hours ago

YatharthS/NovaSR

Audio-to-Audio • Updated about 22 hours ago • 1

liked a model 2 days ago

YatharthS/NovaSR

Audio-to-Audio • Updated about 22 hours ago • 1

published a model 2 days ago

YatharthS/NovaSR

Audio-to-Audio • Updated about 22 hours ago • 1

updated a model 4 days ago

YatharthS/LinaCodec

Audio-to-Audio • Updated 4 days ago • 129 • 15

liked a model 5 days ago

YatharthS/LinaCodec

Audio-to-Audio • Updated 4 days ago • 129 • 15

published a model 5 days ago

YatharthS/LinaCodec

Audio-to-Audio • Updated 4 days ago • 129 • 15

liked 3 Spaces 7 days ago

Mira-TTS

🪞

(Unofficial) Gradio demo for MiraTTS

Mira-TTS

🪞

(Unofficial) Gradio demo for MiraTTS

Kartoffel - German TTS Arena

🏆

TTS Arena with German Chatterbox and MiraTTS

New activity in YatharthS/MiraTTS 9 days ago

Two questions

#3 opened 19 days ago by

mrfakename

updated a model 12 days ago

YatharthS/FlashSR

Audio-to-Audio • Updated 12 days ago • 49

New activity in YatharthS/FlashSR 12 days ago

Add ONNX model

#1 opened 12 days ago by

Xenova

commented on LLM based Audio models 13 days ago

Thanks! Glad you found it helpful. I guess right now better, and more compressive audio tokenizers would be great. Training data for tasks apart from simple TTS and voice cloning is lacking as well.

liked a model 13 days ago

YatharthS/FlashSR

Audio-to-Audio • Updated 12 days ago • 49

New activity in YatharthS/MiraTTS 14 days ago

Dataset

#4 opened 14 days ago by

rahul7star

updated a model 14 days ago

YatharthS/MiraTTS

Text-to-Speech • 0.5B • Updated 14 days ago • 6.54k • 175

liked a model 15 days ago

jiaqili3/flexicodec

Text-to-Speech • Updated Nov 25, 2025 • 7

New activity in YatharthS/MiraTTS 16 days ago

Finetune when?

#2 opened 20 days ago by

ebybucuresteanu

commented on LLM based Audio models 17 days ago

This comment has been hidden

commented on LLM based Audio models 19 days ago

Speech tokens and text tokens are treated the same in LLMs, they just learn speech tokens as a different language as I stated. They will learn about using speech tokens in a similar way they learn about text tokens.

Unfortunately reasoning capabilities do decrease because of a few reasons:

Simply not much training data forcing the model to reason.
They are usually trained on relatively little amount of data. For example most models are trained on trillions of tokens of text but only billions of tokens of audio.
Small sizes, most models are less than 3b params and hence most just don’t have great reasoning capabilities.

Yatharth Sharma

AI & ML interests

Recent Activity

Organizations

YatharthS's activity

Mira-TTS

Mira-TTS

Kartoffel - German TTS Arena

Two questions

Add ONNX model

Dataset

Finetune when?