---
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
- music
- art
- piano
- midi
inference: true
spaces:
  - yhj137/pianist-transformer-rendering-rendering
---
# Pianist Transformer

**Pianist Transformer** is a state-of-the-art model for generating expressive, human-like piano performances from musical scores. In subjective listening studies, its quality was found to be statistically indistinguishable from a human pianist and was often preferred.

This work is based on the paper: **Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training**.
- **Paper:** https://arxiv.org/abs/2512.02652
- **Github:** https://github.com/yhj137/PianistTransformer
- **Project Page:** https://yhj137.github.io/pianist-transformer-demo/
- **Try Online Demo**: https://huggingface.co/spaces/yhj137/pianist-transformer-rendering

## Model Description

Pianist Transformer addresses the data scarcity problem in expressive performance rendering by leveraging a large-scale, self-supervised pre-training strategy. The model first learns the deep principles of musical structure from a massive **10-billion-token** MIDI corpus before being fine-tuned for the final rendering task.

It uses an efficient **135M-parameter asymmetric Transformer architecture** (10-layer encoder, 2-layer decoder) with sequence compression, enabling it to process long musical contexts while maintaining fast inference speeds.

### Key Features
*   **Human-Level Expressivity:** Generates nuanced performances that rival and are sometimes preferred over human pianists.
*   **Scalable Pre-training:** Overcomes the limitations of small supervised datasets by learning from a vast, diverse corpus of unlabeled piano music.
*   **Efficient Architecture:** A custom design provides a strong balance between performance quality and real-world inference latency.
*   **DAW-Friendly Output:** Includes a novel post-processing algorithm to convert model output into a standard, fully editable MIDI file with a dynamic tempo map.

## How to Use

For detailed instructions on data preparation, inference, and using the model, please refer to the official GitHub repository. The full pipeline is required for correct usage.

**➡️ [Get Started on GitHub](https://github.com/yhj137/PianistTransformer)**

## Citation

If you use this model in your work, please cite the original paper:

```bibtex
@misc{you2025pianisttransformerexpressivepiano,
      title={Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training}, 
      author={Hong-Jie You and Jie-Jing Shao and Xiao-Wen Yang and Lin-Han Jia and Lan-Zhe Guo and Yu-Feng Li},
      year={2025},
      eprint={2512.02652},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
```