mlx-community
/

mlx_bark

Model card Files Files and versions

j-csc commited on Feb 3, 2024

Commit

568e35d

·

verified ·

1 Parent(s): a6af09a

Add details from Suno

Files changed (1) hide show

README.md +39 -0

README.md CHANGED Viewed

@@ -33,3 +33,42 @@ huggingface-cli download --local-dir-use-symlinks False --local-dir weights/ mlx
 # Run example (large model)
 python model.py --text="Hello world!" --path weights/ --model large

 # Run example (large model)
 python model.py --text="Hello world!" --path weights/ --model large
+```
+The rest of the model card was copied from [the original Bark repository](https://huggingface.co/suno/bark)
+## Model Details
+The following is additional information about the models released here.
+Bark is a series of three transformer models that turn text into audio.
+### Text to semantic tokens
+ - Input: text, tokenized with [BERT tokenizer from Hugging Face](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer)
+ - Output: semantic tokens that encode the audio to be generated
+### Semantic to coarse tokens
+ - Input: semantic tokens
+ - Output: tokens from the first two codebooks of the [EnCodec Codec](https://github.com/facebookresearch/encodec) from facebook
+### Coarse to fine tokens
+ - Input: the first two codebooks from EnCodec
+ - Output: 8 codebooks from EnCodec
+### Architecture
+|           Model           | Parameters | Attention  | Output Vocab size |
+|:-------------------------:|:----------:|------------|:-----------------:|
+|  Text to semantic tokens  |    80/300 M    | Causal     |       10,000      |
+| Semantic to coarse tokens |    80/300 M    | Causal     |     2x 1,024      |
+|   Coarse to fine tokens   |    80/300 M    | Non-causal |     6x 1,024      |
+### Release date
+April 2023
+## Broader Implications
+We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages.
+While we hope that this release will enable users to express their creativity and build applications that are a force
+for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
+to voice clone known people with Bark, it can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
+we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).