ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 299 • 10 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 76 • 3 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 4 • 5 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 13 • 4
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 1.87k • 58 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 517 • 15 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 374 • 8
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 62 • 18 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 378 • 21
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 1.04k • 262 microsoft/Phi-4-mini-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 12.1k • 213 microsoft/Phi-4-reasoning Text Generation • 15B • Updated Nov 24, 2025 • 5k • 215 microsoft/Phi-4-reasoning-plus Text Generation • 15B • Updated Nov 24, 2025 • 16.3k • 331
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 3.2k • 218 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 36.3k • 1.35k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 152 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 5.29k • 1.26k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 2.18k • 34 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 4.55k • 222 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 81
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 69 • 59 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.03k • 43 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 58 • 10 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 572 • 18
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.03k • 43
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 1.23k • • 77 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 841k • • 20 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 119 • 17
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 1.23k • • 77
microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 119 • 17
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 928k • 472 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 459k • 66 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 97.4k • 61 microsoft/layoutxlm-base Updated Sep 16, 2022 • 6.22k • 73
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 2.18k • 223 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 4.71k • 666
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 5.38k • 106 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 352 • 17 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 102 • 20
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 48 • 9 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 46 • 5
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated Sep 1, 2025 • 296k • 2.17k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 268k • 1.06k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 141
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated Jul 22, 2025 • 1.77B • 5.21k • 7 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated 5 days ago • 97 • 3 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated 5 days ago • 70 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated 5 days ago • 93
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 19k • 30 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 200 • 16 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 360 • • 66 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 1.19k • 50
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 404k • 948 microsoft/Phi-3.5-MoE-instruct Text Generation • 42B • Updated Dec 10, 2025 • 80.9k • 566 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • 4B • Updated Dec 10, 2025 • 797k • 724 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 1.33M • 1.37k
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 56 • 2 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 35 • 2 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 87 • 2
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • 671B • Updated Dec 15, 2025 • 219 • 293 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 648 • 24
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 73.3k • 821 Runtime error Featured 219 SpeechT5 Speech Synthesis Demo 👩 219 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 1.65k • 110
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 1.47M • 392 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 769k • 207 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 198k • 79 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 338 • 1
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 769k • 207
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 198k • 79
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 338 • 1
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 9.17k • 68 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 657 • 19 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 83.3k • 68 microsoft/radedit Updated Dec 8, 2025 • 27
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 14.3k • 121 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 46 • 5 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 51 • 33
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 676k • 1.74k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 413k • 331 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 21k • 378
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 9 microsoft/mocapact-data Updated Aug 17, 2024 • 41 • 4 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1
ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 299 • 10 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 76 • 3 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 4 • 5 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 13 • 4
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated Sep 1, 2025 • 296k • 2.17k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 268k • 1.06k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 141
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 1.87k • 58 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 517 • 15 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 374 • 8
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 4
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated Jul 22, 2025 • 1.77B • 5.21k • 7 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated 5 days ago • 97 • 3 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated 5 days ago • 70 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated 5 days ago • 93
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 62 • 18 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 378 • 21
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 19k • 30 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 200 • 16 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 360 • • 66 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 1.19k • 50
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 1.04k • 262 microsoft/Phi-4-mini-reasoning Text Generation • 4B • Updated Dec 10, 2025 • 12.1k • 213 microsoft/Phi-4-reasoning Text Generation • 15B • Updated Nov 24, 2025 • 5k • 215 microsoft/Phi-4-reasoning-plus Text Generation • 15B • Updated Nov 24, 2025 • 16.3k • 331
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 404k • 948 microsoft/Phi-3.5-MoE-instruct Text Generation • 42B • Updated Dec 10, 2025 • 80.9k • 566 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • 4B • Updated Dec 10, 2025 • 797k • 724 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 1.33M • 1.37k
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 3.2k • 218 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 36.3k • 1.35k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 152 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 56 • 2 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 35 • 2 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 87 • 2
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 13
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 5.29k • 1.26k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 2.18k • 34 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 4.55k • 222 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 81
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • 671B • Updated Dec 15, 2025 • 219 • 293 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 648 • 24
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 69 • 59 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.03k • 43 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 58 • 10 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 572 • 18
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 2.03k • 43
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 73.3k • 821 Runtime error Featured 219 SpeechT5 Speech Synthesis Demo 👩 219 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 1.65k • 110
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 5
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 1.23k • • 77 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 841k • • 20 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 119 • 17
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 2
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 1.23k • • 77
microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 119 • 17
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 1.47M • 392 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 769k • 207 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 198k • 79 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 338 • 1
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 769k • 207
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 198k • 79
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 338 • 1
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 928k • 472 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 459k • 66 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 97.4k • 61 microsoft/layoutxlm-base Updated Sep 16, 2022 • 6.22k • 73
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 9.17k • 68 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 657 • 19 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 83.3k • 68 microsoft/radedit Updated Dec 8, 2025 • 27
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 2.18k • 223 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 4.71k • 666
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 14.3k • 121 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 46 • 5 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 51 • 33
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 11
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 5.38k • 106 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 352 • 17 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 102 • 20
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 676k • 1.74k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 413k • 331 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 21k • 378
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 95
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 48 • 9 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 46 • 5
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 9 microsoft/mocapact-data Updated Aug 17, 2024 • 41 • 4 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 1