Models sold directly by Azure
Models sold directly by Azure include all Azure OpenAI models and specific, selected models from top providers. To learn more about these models, see Models Sold Directly by Azure.Azure OpenAI
Azure OpenAI in Azure AI Foundry Models offers a diverse set of models with different capabilities and price points. Learn more details at Azure OpenAI Model availability. These models include:- State-of-the-art models designed to tackle reasoning and problem-solving tasks with increased focus and capability
- Models that can understand and generate natural language and code
- Models that can transcribe and translate speech to text
| Models | Description |
|---|---|
| GPT-4.1 series | Latest model release from Azure OpenAI |
| model-router | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. |
| computer-use-preview | An experimental model trained for use with the Responses API computer use tool. |
| GPT-4.5 Preview | The latest GPT model that excels at diverse text and image tasks. |
| o-series models | Reasoning models with advanced problem-solving and increased focus and capability. |
| GPT-4o & GPT-4o mini & GPT-4 Turbo | The latest most capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
| GPT-4 | A set of models that improve on GPT-3.5 and can understand and generate natural language and code. |
| GPT-3.5 | A set of models that improve on GPT-3 and can understand and generate natural language and code. |
| Embeddings | A set of models that can convert text into numerical vector form to facilitate text similarity. |
| Image generation | A series of models that can generate original images from natural language. |
| Audio | A series of models for speech to text, translation, and text to speech. GPT-4o audio models support either low-latency, “speech in, speech out” conversational interactions or audio generation. |
DeepSeek models sold directly by Azure
DeepSeek family of models includes DeepSeek-R1, which excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| DeepSeek-R1-0528 | chat-completion (with reasoning content) | - Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text. | Foundry, Hub-based |
| DeepSeek-V3-0324 | chat-completion | - Input: text (131,072 tokens) - Output: (131,072 tokens) - Languages: en and zh - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| DeepSeek-R1 | chat-completion (with reasoning content) | - Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text. | Foundry, Hub-based |
Meta models sold directly by Azure
Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models. Meta models range is scale to include:- Small language models (SLMs) like 1B and 3B Base and Instruct models for on-device and edge inferencing
- Mid-size large language models (LLMs) like 7B, 8B, and 70B Base and Instruct models
- High-performant models like Meta Llama 3.1-405B Instruct for synthetic data generation and distillation use cases.
| Model | Type | Capabilities | Project type |
|---|---|---|---|
| Llama-4-Maverick-17B-128E-Instruct-FP8 | chat-completion | - Input: text and images (1M tokens) - Output: text (1M tokens) - Languages: ar, en, fr, de, hi, id, it, pt, es, tl, th, and vi - Tool calling: No* - Response formats: Text | Foundry, Hub-based |
| Llama-3.3-70B-Instruct | chat-completion | - Input: text (128,000 tokens) - Output: text (8,192 tokens) - Languages: en, de, fr, it, pt, hi, es, and th - Tool calling: No* - Response formats: Text | Foundry, Hub-based |
Microsoft models sold directly by Azure
Microsoft models include various model groups such as MAI models, Phi models, healthcare AI models, and more. To see all the available Microsoft models, view the Microsoft model collection in Azure AI Foundry portal.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| MAI-DS-R1 | chat-completion (with reasoning content) | - Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text. | Foundry, Hub-based |
xAI models sold directly by Azure
xAI’s Grok 3 and Grok 3 Mini models are designed to excel in various enterprise domains. Grok 3, a non-reasoning model pretrained by the Colossus datacenter, is tailored for business use cases such as data extraction, coding, and text summarization, with exceptional instruction-following capabilities. It supports a 131,072 token context window, allowing it to handle extensive inputs while maintaining coherence and depth, and is adept at drawing connections across domains and languages. On the other hand, Grok 3 Mini is a lightweight reasoning model trained to tackle agentic, coding, mathematical, and deep science problems with test-time compute. It also supports a 131,072 token context window for understanding codebases and enterprise documents, and excels at using tools to solve complex logical problems in novel environments, offering raw reasoning traces for user inspection with adjustable thinking budgets.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| grok-31 | chat-completion | - Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: en - Tool calling: yes - Response formats: text | Foundry, Hub-based |
| grok-3-mini1 | chat-completion | - Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: en - Tool calling: yes - Response formats: text | Foundry, Hub-based |
Models from partners and community
Models from partners and community constitute the majority of the Azure AI Foundry Models and are provided by trusted third-party organizations, partners, research labs, and community contributors. To learn more about these models, see Models from Partners and Community.Cohere
The Cohere family of models includes various models optimized for different use cases, including chat completions and embeddings. Cohere models are optimized for various use cases that include reasoning, summarization, and question answering.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| Cohere-command-a | chat-completion | - Input: text (131,072 tokens) - Output: text (8,182 tokens) - Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| Cohere-command-r-plus-08-2024 | chat-completion | - Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| Cohere-command-r-08-2024 | chat-completion | - Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| embed-v-4-0 | embeddings | - Input: text (512 tokens) and images (2MM pixels) - Output: Vector (256, 512, 1024, 1536 dim.) - Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar | Foundry, Hub-based |
| Cohere-embed-v3-english | embeddings | - Input: text and images (512 tokens) - Output: Vector (1024 dim.) - Languages: en | Foundry, Hub-based |
| Cohere-embed-v3-multilingual | embeddings | - Input: text (512 tokens) - Output: Vector (1024 dim.) - Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar | Foundry, Hub-based |
Cohere rerank
| Model | Type | Capabilities | API Reference | Project type |
|---|---|---|---|---|
| Cohere-rerank-v3.5 | rerank text classification | - Input: text - Output: text - Languages: English, Chinese, French, German, Indonesian, Italian, Portuguese, Russian, Spanish, Arabic, Dutch, Hindi, Japanese, Vietnamese | Cohere’s v2/rerank API | Hub-based |
Core42
Core42 includes autoregressive bi-lingual LLMs for Arabic & English with state-of-the-art capabilities in Arabic.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| jais-30b-chat | chat-completion | - Input: text (8,192 tokens) - Output: (4,096 tokens) - Languages: en and ar - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
Meta
Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models. Meta models range is scale to include:- Small language models (SLMs) like 1B and 3B Base and Instruct models for on-device and edge inferencing
- Mid-size large language models (LLMs) like 7B, 8B, and 70B Base and Instruct models
- High-performant models like Meta Llama 3.1-405B Instruct for synthetic data generation and distillation use cases.
| Model | Type | Capabilities | Project type |
|---|---|---|---|
| Llama-3.2-11B-Vision-Instruct | chat-completion | - Input: text and image (128,000 tokens) - Output: (8,192 tokens) - Languages: en - Tool calling: No* - Response formats: Text | Foundry, Hub-based |
| Llama-3.2-90B-Vision-Instruct | chat-completion | - Input: text and image (128,000 tokens) - Output: (8,192 tokens) - Languages: en - Tool calling: No* - Response formats: Text | Foundry, Hub-based |
| Meta-Llama-3.1-405B-Instruct | chat-completion | - Input: text (131,072 tokens) - Output: (8,192 tokens) - Languages: en, de, fr, it, pt, hi, es, and th - Tool calling: No* - Response formats: Text | Foundry, Hub-based |
| Meta-Llama-3.1-8B-Instruct | chat-completion | - Input: text (131,072 tokens) - Output: (8,192 tokens) - Languages: en, de, fr, it, pt, hi, es, and th - Tool calling: No* - Response formats: Text | Foundry, Hub-based |
| Llama-4-Scout-17B-16E-Instruct | chat-completion | - Input: text and image (128,000 tokens) - Output: text (8,192 tokens) - Tool calling: No - Response formats: Text | Foundry, Hub-based |
Microsoft
Microsoft models include various model groups such as MAI models, Phi models, healthcare AI models, and more. To see all the available Microsoft models, view the Microsoft model collection in Azure AI Foundry portal.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| Phi-4-mini-instruct | chat-completion | - Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: ar, zh, cs, da, nl, en, fi, fr, de, he, hu, it, ja, ko, no, pl, pt, ru, es, sv, th, tr, and uk - Tool calling: No - Response formats: Text | Foundry, Hub-based |
| Phi-4-multimodal-instruct | chat-completion | - Input: text, images, and audio (131,072 tokens) - Output: (4,096 tokens) - Languages: ar, zh, cs, da, nl, en, fi, fr, de, he, hu, it, ja, ko, no, pl, pt, ru, es, sv, th, tr, and uk - Tool calling: No - Response formats: Text | Foundry, Hub-based |
| Phi-4 | chat-completion | - Input: text (16,384 tokens) - Output: (16,384 tokens) - Languages: en, ar, bn, cs, da, de, el, es, fa, fi, fr, gu, ha, he, hi, hu, id, it, ja, jv, kn, ko, ml, mr, nl, no, or, pa, pl, ps, pt, ro, ru, sv, sw, ta, te, th, tl, tr, uk, ur, vi, yo, and zh - Tool calling: No - Response formats: Text | Foundry, Hub-based |
| Phi-4-reasoning | chat-completion with reasoning content | - Input: text (32,768 tokens) - Output: text (32,768 tokens) - Languages: en - Tool calling: No - Response formats: Text | Foundry, Hub-based |
| Phi-4-mini-reasoning | chat-completion with reasoning content | - Input: text (128,000 tokens) - Output: text (128,000 tokens) - Languages: en - Tool calling: No - Response formats: Text | Foundry, Hub-based |
Mistral AI
Mistral AI offers two categories of models: premium models such as Mistral Large 2411 and Ministral 3B, and open models such as Mistral Nemo.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| Codestral-2501 | chat-completion | - Input: text (262,144 tokens) - Output: text (4,096 tokens) - Languages: en - Tool calling: No - Response formats: Text | Foundry, Hub-based |
| Ministral-3B | chat-completion | - Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| Mistral-Nemo | chat-completion | - Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: en, fr, de, es, it, zh, ja, ko, pt, nl, and pl - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| Mistral-small-2503 | chat-completion | - Input: text (32,768 tokens) - Output: text (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| Mistral-medium-2505 | chat-completion | - Input: text (128,000 tokens), image - Output: text (128,000 tokens) - Tool calling: No - Response formats: Text, JSON | Foundry, Hub-based |
| Mistral-Large-2411 | chat-completion | - Input: text (128,000 tokens) - Output: text (4,096 tokens) - Languages: en, fr, de, es, it, zh, ja, ko, pt, nl, and pl - Tool calling: Yes - Response formats: Text, JSON | Foundry, Hub-based |
| Mistral-OCR-2503 | image to text | - Input: image or PDF pages (1,000 pages, max 50MB PDF file) - Output: text - Tool calling: No - Response formats: Text, JSON, Markdown | Hub-based |
| mistralai-Mistral-7B-Instruct-v01 | chat-completion | - Input: text - Output: text - Languages: en - Response formats: Text | Hub-based |
| mistralai-Mistral-7B-Instruct-v0-2 | chat-completion | - Input: text - Output: text - Languages: en - Response formats: Text | Hub-based |
| mistralai-Mixtral-8x7B-Instruct-v01 | chat-completion | - Input: text - Output: text - Languages: en - Response formats: Text | Hub-based |
| mistralai-Mixtral-8x22B-Instruct-v0-1 | chat-completion | - Input: text (64,000 tokens) - Output: text (4,096 tokens) - Languages: fr, it, de, es, en - Response formats: Text | Hub-based |
Nixtla
Nixtla’s TimeGEN-1 is a generative pretrained forecasting and anomaly detection model for time series data. TimeGEN-1 can produce accurate forecasts for new time series without training, using only historical values and exogenous covariates as inputs. To perform inferencing, TimeGEN-1 requires you to use Nixtla’s custom inference API.| Model | Type | Capabilities | Inference API | Project type |
|---|---|---|---|---|
| TimeGEN-1 | Forecasting | - Input: Time series data as JSON or dataframes (with support for multivariate input) - Output: Time series data as JSON - Tool calling: No - Response formats: JSON | Forecast client to interact with Nixtla’s API | Hub-based |
NTT Data
tsuzumi is an autoregressive language optimized transformer. The tuned versions use supervised fine-tuning (SFT). tsuzumi handles both Japanese and English language with high efficiency.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| tsuzumi-7b | chat-completion | - Input: text (8,192 tokens) - Output: text (8,192 tokens) - Languages: en and jp - Tool calling: No - Response formats: Text | Hub-based |
Stability AI
The Stability AI collection of image generation models include Stable Image Core, Stable Image Ultra, and Stable Diffusion 3.5 Large. Stable Diffusion 3.5 Large allows for an image and text input.| Model | Type | Capabilities | Project type |
|---|---|---|---|
| Stable Diffusion 3.5 Large | Image generation | - Input: text and image (1,000 tokens and 1 image) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG) | Hub-based |
| Stable Image Core | Image generation | - Input: text (1,000 tokens) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG) | Hub-based |
| Stable Image Ultra | Image generation | - Input: text (1,000 tokens) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG) | Hub-based |