--- license: apache-2.0 pipeline_tag: audio-to-audio tags: - music - art - piano - midi inference: true spaces: - yhj137/pianist-transformer-rendering-rendering --- # Pianist Transformer **Pianist Transformer** is a state-of-the-art model for generating expressive, human-like piano performances from musical scores. In subjective listening studies, its quality was found to be statistically indistinguishable from a human pianist and was often preferred. This work is based on the paper: **Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training**. - **Paper:** https://arxiv.org/abs/2512.02652 - **Github:** https://github.com/yhj137/PianistTransformer - **Project Page:** https://yhj137.github.io/pianist-transformer-demo/ - **Try Online Demo**: https://huggingface.co/spaces/yhj137/pianist-transformer-rendering ## Model Description Pianist Transformer addresses the data scarcity problem in expressive performance rendering by leveraging a large-scale, self-supervised pre-training strategy. The model first learns the deep principles of musical structure from a massive **10-billion-token** MIDI corpus before being fine-tuned for the final rendering task. It uses an efficient **135M-parameter asymmetric Transformer architecture** (10-layer encoder, 2-layer decoder) with sequence compression, enabling it to process long musical contexts while maintaining fast inference speeds. ### Key Features * **Human-Level Expressivity:** Generates nuanced performances that rival and are sometimes preferred over human pianists. * **Scalable Pre-training:** Overcomes the limitations of small supervised datasets by learning from a vast, diverse corpus of unlabeled piano music. * **Efficient Architecture:** A custom design provides a strong balance between performance quality and real-world inference latency. * **DAW-Friendly Output:** Includes a novel post-processing algorithm to convert model output into a standard, fully editable MIDI file with a dynamic tempo map. ## How to Use For detailed instructions on data preparation, inference, and using the model, please refer to the official GitHub repository. The full pipeline is required for correct usage. **➡️ [Get Started on GitHub](https://github.com/yhj137/PianistTransformer)** ## Citation If you use this model in your work, please cite the original paper: ```bibtex @misc{you2025pianisttransformerexpressivepiano, title={Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training}, author={Hong-Jie You and Jie-Jing Shao and Xiao-Wen Yang and Lin-Han Jia and Lan-Zhe Guo and Yu-Feng Li}, year={2025}, eprint={2512.02652}, archivePrefix={arXiv}, primaryClass={cs.SD} } ```