Models available in Azure AI Foundry Models

Azure AI Foundry Models gives you access to flagship models in Azure AI Foundry to consume them as APIs with flexible deployment options. This article lists a selection of model offerings and their capabilities, excluding deprecated and legacy models. Depending on what kind of project you’re using in Azure AI Foundry, you might see a different selection of these models. Specifically, if you’re using a Foundry project, built on an Azure AI Foundry resource, you see the models that are available for standard deployment to a Foundry resource. Alternatively, if you’re using a hub-based project, hosted by an Azure AI Foundry hub, you see models that are available for deployment to managed compute and serverless APIs. These model selections do overlap in many cases, since many models support the multiple deployment options. Foundry Models in the model catalog belong to two main categories:

Models sold directly by Azure
Models from partners and community

To learn more about these two categories, and Models from Partners and Community.

Models sold directly by Azure

Models sold directly by Azure include all Azure OpenAI models and specific, selected models from top providers. To learn more about these models, see Models Sold Directly by Azure.

Azure OpenAI

Azure OpenAI in Azure AI Foundry Models offers a diverse set of models with different capabilities and price points. Learn more details at Azure OpenAI Model availability. These models include:

State-of-the-art models designed to tackle reasoning and problem-solving tasks with increased focus and capability
Models that can understand and generate natural language and code
Models that can transcribe and translate speech to text

Models	Description
GPT-4.1 series	Latest model release from Azure OpenAI
model-router	A model that intelligently selects from a set of underlying chat models to respond to a given prompt.
computer-use-preview	An experimental model trained for use with the Responses API computer use tool.
GPT-4.5 Preview	The latest GPT model that excels at diverse text and image tasks.
o-series models	Reasoning models with advanced problem-solving and increased focus and capability.
GPT-4o & GPT-4o mini & GPT-4 Turbo	The latest most capable Azure OpenAI models with multimodal versions, which can accept both text and images as input.
GPT-4	A set of models that improve on GPT-3.5 and can understand and generate natural language and code.
GPT-3.5	A set of models that improve on GPT-3 and can understand and generate natural language and code.
Embeddings	A set of models that can convert text into numerical vector form to facilitate text similarity.
Image generation	A series of models that can generate original images from natural language.
Audio	A series of models for speech to text, translation, and text to speech. GPT-4o audio models support either low-latency, “speech in, speech out” conversational interactions or audio generation.

See this model collection in Azure AI Foundry portal.

DeepSeek models sold directly by Azure

DeepSeek family of models includes DeepSeek-R1, which excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.

Model	Type	Capabilities	Project type
DeepSeek-R1-0528	chat-completion (with reasoning content)	- Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text.	Foundry, Hub-based
DeepSeek-V3-0324	chat-completion	- Input: text (131,072 tokens) - Output: (131,072 tokens) - Languages: `en` and `zh` - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
DeepSeek-R1	chat-completion (with reasoning content)	- Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text.	Foundry, Hub-based

See this model collection in Azure AI Foundry portal.

Meta models sold directly by Azure

Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models. Meta models range is scale to include:

Small language models (SLMs) like 1B and 3B Base and Instruct models for on-device and edge inferencing
Mid-size large language models (LLMs) like 7B, 8B, and 70B Base and Instruct models
High-performant models like Meta Llama 3.1-405B Instruct for synthetic data generation and distillation use cases.

Model	Type	Capabilities	Project type
Llama-4-Maverick-17B-128E-Instruct-FP8	chat-completion	- Input: text and images (1M tokens) - Output: text (1M tokens) - Languages: `ar`, `en`, `fr`, `de`, `hi`, `id`, `it`, `pt`, `es`, `tl`, `th`, and `vi` - Tool calling: No* - Response formats: Text	Foundry, Hub-based
Llama-3.3-70B-Instruct	chat-completion	- Input: text (128,000 tokens) - Output: text (8,192 tokens) - Languages: `en`, `de`, `fr`, `it`, `pt`, `hi`, `es`, and `th` - Tool calling: No* - Response formats: Text	Foundry, Hub-based

See this model collection in Azure AI Foundry portal. There are also several Meta models available from partners and community.

Microsoft models sold directly by Azure

Microsoft models include various model groups such as MAI models, Phi models, healthcare AI models, and more. To see all the available Microsoft models, view the Microsoft model collection in Azure AI Foundry portal.

Model	Type	Capabilities	Project type
MAI-DS-R1	chat-completion (with reasoning content)	- Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text.	Foundry, Hub-based

See the Microsoft model collection in Azure AI Foundry portal. There are also several Microsoft models available from partners and community.

xAI models sold directly by Azure

xAI’s Grok 3 and Grok 3 Mini models are designed to excel in various enterprise domains. Grok 3, a non-reasoning model pretrained by the Colossus datacenter, is tailored for business use cases such as data extraction, coding, and text summarization, with exceptional instruction-following capabilities. It supports a 131,072 token context window, allowing it to handle extensive inputs while maintaining coherence and depth, and is adept at drawing connections across domains and languages. On the other hand, Grok 3 Mini is a lightweight reasoning model trained to tackle agentic, coding, mathematical, and deep science problems with test-time compute. It also supports a 131,072 token context window for understanding codebases and enterprise documents, and excels at using tools to solve complex logical problems in novel environments, offering raw reasoning traces for user inspection with adjustable thinking budgets.

Model	Type	Capabilities	Project type
grok-3¹	chat-completion	- Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: `en` - Tool calling: yes - Response formats: text	Foundry, Hub-based
grok-3-mini¹	chat-completion	- Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: `en` - Tool calling: yes - Response formats: text	Foundry, Hub-based

See the xAI model collection in Azure AI Foundry portal.

Models from partners and community

Models from partners and community constitute the majority of the Azure AI Foundry Models and are provided by trusted third-party organizations, partners, research labs, and community contributors. To learn more about these models, see Models from Partners and Community.

Cohere

The Cohere family of models includes various models optimized for different use cases, including chat completions and embeddings. Cohere models are optimized for various use cases that include reasoning, summarization, and question answering.

Model	Type	Capabilities	Project type
Cohere-command-a	chat-completion	- Input: text (131,072 tokens) - Output: text (8,182 tokens) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
Cohere-command-r-plus-08-2024	chat-completion	- Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
Cohere-command-r-08-2024	chat-completion	- Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
embed-v-4-0	embeddings	- Input: text (512 tokens) and images (2MM pixels) - Output: Vector (256, 512, 1024, 1536 dim.) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar`	Foundry, Hub-based
Cohere-embed-v3-english	embeddings	- Input: text and images (512 tokens) - Output: Vector (1024 dim.) - Languages: `en`	Foundry, Hub-based
Cohere-embed-v3-multilingual	embeddings	- Input: text (512 tokens) - Output: Vector (1024 dim.) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar`	Foundry, Hub-based

Cohere rerank

Model	Type	Capabilities	API Reference	Project type
Cohere-rerank-v3.5	rerank text classification	- Input: text - Output: text - Languages: English, Chinese, French, German, Indonesian, Italian, Portuguese, Russian, Spanish, Arabic, Dutch, Hindi, Japanese, Vietnamese	Cohere’s v2/rerank API	Hub-based

For more details on pricing for Cohere rerank models, see Pricing for Cohere rerank models. See the Cohere model collection in Azure AI Foundry portal.

Core42

Core42 includes autoregressive bi-lingual LLMs for Arabic & English with state-of-the-art capabilities in Arabic.

Model	Type	Capabilities	Project type
jais-30b-chat	chat-completion	- Input: text (8,192 tokens) - Output: (4,096 tokens) - Languages: en and ar - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based

See this model collection in Azure AI Foundry portal.

Model	Type	Capabilities	Project type
Llama-3.2-11B-Vision-Instruct	chat-completion	- Input: text and image (128,000 tokens) - Output: (8,192 tokens) - Languages: `en` - Tool calling: No* - Response formats: Text	Foundry, Hub-based
Llama-3.2-90B-Vision-Instruct	chat-completion	- Input: text and image (128,000 tokens) - Output: (8,192 tokens) - Languages: `en` - Tool calling: No* - Response formats: Text	Foundry, Hub-based
Meta-Llama-3.1-405B-Instruct	chat-completion	- Input: text (131,072 tokens) - Output: (8,192 tokens) - Languages: `en`, `de`, `fr`, `it`, `pt`, `hi`, `es`, and `th` - Tool calling: No* - Response formats: Text	Foundry, Hub-based
Meta-Llama-3.1-8B-Instruct	chat-completion	- Input: text (131,072 tokens) - Output: (8,192 tokens) - Languages: `en`, `de`, `fr`, `it`, `pt`, `hi`, `es`, and `th` - Tool calling: No* - Response formats: Text	Foundry, Hub-based
Llama-4-Scout-17B-16E-Instruct	chat-completion	- Input: text and image (128,000 tokens) - Output: text (8,192 tokens) - Tool calling: No - Response formats: Text	Foundry, Hub-based

Microsoft

Model	Type	Capabilities	Project type
Phi-4-mini-instruct	chat-completion	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `ar`, `zh`, `cs`, `da`, `nl`, `en`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` - Tool calling: No - Response formats: Text	Foundry, Hub-based
Phi-4-multimodal-instruct	chat-completion	- Input: text, images, and audio (131,072 tokens) - Output: (4,096 tokens) - Languages: `ar`, `zh`, `cs`, `da`, `nl`, `en`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` - Tool calling: No - Response formats: Text	Foundry, Hub-based
Phi-4	chat-completion	- Input: text (16,384 tokens) - Output: (16,384 tokens) - Languages: `en`, `ar`, `bn`, `cs`, `da`, `de`, `el`, `es`, `fa`, `fi`, `fr`, `gu`, `ha`, `he`, `hi`, `hu`, `id`, `it`, `ja`, `jv`, `kn`, `ko`, `ml`, `mr`, `nl`, `no`, `or`, `pa`, `pl`, `ps`, `pt`, `ro`, `ru`, `sv`, `sw`, `ta`, `te`, `th`, `tl`, `tr`, `uk`, `ur`, `vi`, `yo`, and `zh` - Tool calling: No - Response formats: Text	Foundry, Hub-based
Phi-4-reasoning	chat-completion with reasoning content	- Input: text (32,768 tokens) - Output: text (32,768 tokens) - Languages: `en` - Tool calling: No - Response formats: Text	Foundry, Hub-based
Phi-4-mini-reasoning	chat-completion with reasoning content	- Input: text (128,000 tokens) - Output: text (128,000 tokens) - Languages: `en` - Tool calling: No - Response formats: Text	Foundry, Hub-based

See the Microsoft model collection in Azure AI Foundry portal. There are also several Microsoft models available as models sold directly by Azure.

Mistral AI

Mistral AI offers two categories of models: premium models such as Mistral Large 2411 and Ministral 3B, and open models such as Mistral Nemo.

Model	Type	Capabilities	Project type
Codestral-2501	chat-completion	- Input: text (262,144 tokens) - Output: text (4,096 tokens) - Languages: en - Tool calling: No - Response formats: Text	Foundry, Hub-based
Ministral-3B	chat-completion	- Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
Mistral-Nemo	chat-completion	- Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: `en`, `fr`, `de`, `es`, `it`, `zh`, `ja`, `ko`, `pt`, `nl`, and `pl` - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
Mistral-small-2503	chat-completion	- Input: text (32,768 tokens) - Output: text (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
Mistral-medium-2505	chat-completion	- Input: text (128,000 tokens), image - Output: text (128,000 tokens) - Tool calling: No - Response formats: Text, JSON	Foundry, Hub-based
Mistral-Large-2411	chat-completion	- Input: text (128,000 tokens) - Output: text (4,096 tokens) - Languages: `en`, `fr`, `de`, `es`, `it`, `zh`, `ja`, `ko`, `pt`, `nl`, and `pl` - Tool calling: Yes - Response formats: Text, JSON	Foundry, Hub-based
Mistral-OCR-2503	image to text	- Input: image or PDF pages (1,000 pages, max 50MB PDF file) - Output: text - Tool calling: No - Response formats: Text, JSON, Markdown	Hub-based
mistralai-Mistral-7B-Instruct-v01	chat-completion	- Input: text - Output: text - Languages: en - Response formats: Text	Hub-based
mistralai-Mistral-7B-Instruct-v0-2	chat-completion	- Input: text - Output: text - Languages: en - Response formats: Text	Hub-based
mistralai-Mixtral-8x7B-Instruct-v01	chat-completion	- Input: text - Output: text - Languages: en - Response formats: Text	Hub-based
mistralai-Mixtral-8x22B-Instruct-v0-1	chat-completion	- Input: text (64,000 tokens) - Output: text (4,096 tokens) - Languages: fr, it, de, es, en - Response formats: Text	Hub-based

See this model collection in Azure AI Foundry portal.

Nixtla

Nixtla’s TimeGEN-1 is a generative pretrained forecasting and anomaly detection model for time series data. TimeGEN-1 can produce accurate forecasts for new time series without training, using only historical values and exogenous covariates as inputs. To perform inferencing, TimeGEN-1 requires you to use Nixtla’s custom inference API.

Model	Type	Capabilities	Inference API	Project type
TimeGEN-1	Forecasting	- Input: Time series data as JSON or dataframes (with support for multivariate input) - Output: Time series data as JSON - Tool calling: No - Response formats: JSON	Forecast client to interact with Nixtla’s API	Hub-based

For more details on pricing for Nixtla models, see Nixtla.

NTT Data

tsuzumi is an autoregressive language optimized transformer. The tuned versions use supervised fine-tuning (SFT). tsuzumi handles both Japanese and English language with high efficiency.

Model	Type	Capabilities	Project type
tsuzumi-7b	chat-completion	- Input: text (8,192 tokens) - Output: text (8,192 tokens) - Languages: `en` and `jp` - Tool calling: No - Response formats: Text	Hub-based

Stability AI

The Stability AI collection of image generation models include Stable Image Core, Stable Image Ultra, and Stable Diffusion 3.5 Large. Stable Diffusion 3.5 Large allows for an image and text input.

Model	Type	Capabilities	Project type
Stable Diffusion 3.5 Large	Image generation	- Input: text and image (1,000 tokens and 1 image) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG)	Hub-based
Stable Image Core	Image generation	- Input: text (1,000 tokens) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG)	Hub-based
Stable Image Ultra	Image generation	- Input: text (1,000 tokens) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG)	Hub-based

See this model collection in Azure AI Foundry portal.

Open and custom models

The model catalog offers a larger selection of models, from a bigger range of providers. For these models, you can’t use the option for standard deployment in Azure AI Foundry resources, where models are provided as APIs; rather, to deploy these models, you might be required to host them on your infrastructure, create an AI hub, and provide the underlying compute quota to host the models. Furthermore, these models can be open-access or IP protected. In both cases, you have to deploy them in managed compute offerings in Azure AI Foundry. To get started, see How-to: Deploy to Managed compute.

​Models sold directly by Azure

​Azure OpenAI

​DeepSeek models sold directly by Azure

​Meta models sold directly by Azure

​Microsoft models sold directly by Azure

​xAI models sold directly by Azure

​Models from partners and community

​Cohere

​Cohere rerank

​Core42

​Meta

​Microsoft

​Mistral AI

​Nixtla

​NTT Data

​Stability AI

​Open and custom models

​Related content

Models sold directly by Azure

Azure OpenAI

DeepSeek models sold directly by Azure

Meta models sold directly by Azure

Microsoft models sold directly by Azure

xAI models sold directly by Azure

Models from partners and community

Cohere

Cohere rerank

Core42

Meta

Microsoft

Mistral AI

Nixtla

NTT Data

Stability AI

Open and custom models

Related content