Azure AI Foundry: Complete Platform Guide (2026)

Azure AI Foundry is Microsoft’s unified AI development platform — a model catalog with 1,800+ models, prompt flow orchestration, evaluation tools, and enterprise-grade deployment. It replaced Azure AI Studio in late 2024 and expanded beyond just OpenAI models to include Phi, Mistral, Llama, Cohere, and more. If your organization runs on Azure, Azure AI Foundry is where you build, test, and deploy AI applications. For the full comparison of all three major platforms, see our Cloud AI Platforms guide.

Who this is for:

Junior engineers: You’re exploring Azure’s AI services and need to understand how Azure AI Foundry fits together
Senior engineers: You’re evaluating Azure AI Foundry against Bedrock and Vertex AI for enterprise AI workloads

Real-World Problem Context

You’re a GenAI engineer at a company running on Microsoft Azure. Your team needs to build an AI-powered document assistant that searches internal SharePoint documents and answers employee questions. The requirements from security: all data stays within the Azure boundary, model invocations must use Entra ID authentication, and the system must support SOC 2 compliance.

Here’s where Azure AI Foundry components fit:

Requirement	Azure AI Foundry Solution	Alternative (Without Foundry)
LLM inference	Model catalog → deploy GPT-4o or Phi-4	Direct OpenAI API (outside Azure boundary)
Document search	Azure AI Search (native SharePoint connector)	External vector DB (Pinecone, Weaviate)
Orchestration	Prompt flow (visual + code)	Custom LangChain/LangGraph code
Auth & RBAC	Entra ID managed identity	API keys (manual rotation, no RBAC)
Content safety	Built-in responsible AI filters	Custom guardrails from scratch
Evaluation	Built-in eval tools (groundedness, relevance)	Custom eval pipeline

The biggest mistake teams make: using Azure AI Foundry only for OpenAI models. The platform’s value is the integrated workflow — model selection, prompt engineering, evaluation, and deployment in one place. Teams that treat it as “just an API endpoint” miss the orchestration and evaluation features that save weeks of custom development.

Core Concepts and Mental Model

Think of Azure AI Foundry as a three-layer platform:

Model Catalog — The model marketplace. Browse 1,800+ models, compare benchmarks, deploy with one click. This is where you pick your model.
AI Studio — The development environment. Build prompt flows, test in the playground, fine-tune models, run evaluations. This is where you build your application.
Deployment & Operations — Managed endpoints, content safety, monitoring, and Azure-native security. This is where you run in production.

Key Components

Model Catalog: The headline feature that differentiates Azure AI Foundry from the old Azure OpenAI Service. Instead of only OpenAI models, you now get access to:

OpenAI: GPT-4o, GPT-4 Turbo, o1, o3-mini, DALL-E, Whisper
Microsoft: Phi-4, Phi-3.5 (small language models optimized for cost)
Meta: Llama 3.1, Llama 3.2 (open-weight models)
Mistral: Mistral Large, Mixtral (European AI models)
Cohere: Command R+ (retrieval-optimized models)
1,700+ more from various providers and the open-source community

Prompt Flow: A visual and code-based orchestration tool for building LLM pipelines. Define your chain as a directed acyclic graph (DAG) — each node is a step (LLM call, Python function, API call, conditional logic). Similar to LangGraph but Azure-native with built-in versioning, evaluation, and deployment.

Azure AI Search: The managed vector + keyword search service. For RAG pipelines, it serves as the retrieval layer with native connectors for SharePoint, Azure Blob Storage, and Cosmos DB. Supports hybrid search (vector + semantic + keyword) out of the box.

Responsible AI: Built-in content safety filters, groundedness detection, and bias evaluation. Applied to both inputs and outputs by default. Configurable per deployment — stricter for public-facing apps, relaxed for internal tools (with Microsoft approval).

Authentication: Entra ID First

Azure AI Foundry uses Entra ID (formerly Azure AD) as the primary authentication method. In production, always use managed identities — no API keys to rotate or leak:

from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI

# Managed identity — no key required, Azure handles rotation
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
    azure_endpoint="https://YOUR-RESOURCE.openai.azure.com/",
    azure_ad_token=token.token,
    api_version="2024-12-01-preview"
)

Azure AI Foundry Models: Catalog Reference

The model catalog is Azure AI Foundry’s biggest differentiator — 1,800+ models from multiple providers, all deployable through the same API. Here’s a reference by provider and use case:

Provider	Key Models	Best For	Deployment Type
OpenAI	GPT-4o, GPT-4 Turbo, o1, o3-mini, DALL-E 3, Whisper	General reasoning, code generation, multimodal, audio	Serverless (pay-per-token)
Microsoft	Phi-4, Phi-3.5-mini, Phi-3.5-vision	Cost-sensitive tasks, edge deployment, vision	Serverless or Managed Compute
Meta	Llama 3.1 (8B/70B/405B), Llama 3.2	Open-weight models, fine-tuning, on-premise	Managed Compute
Mistral	Mistral Large, Mixtral 8x22B	European AI compliance, multilingual	Serverless
Cohere	Command R+, Embed v3	RAG-optimized generation, embeddings	Serverless
Open-source	DBRX, Falcon, StarCoder2, 1,700+ more	Specialized tasks, research, fine-tuning	Managed Compute

Two deployment modes:

Serverless API — Pay-per-token, zero infrastructure. Model provider sets pricing. Best for GPT-4o, Mistral, Cohere. No GPU management.
Managed Compute — Hourly compute charges. You control the VM size. Best for open-weight models (Llama, Phi) that you want to fine-tune or run on dedicated infrastructure.

Both modes use the same Azure AI Model Inference API — switch models by changing a single deployment name string, not your code.

Azure AI Foundry SDK Packages

Three Python packages cover the platform:

Package	Purpose	Install
`azure-ai-projects`	Hub/project management, connection handling	`pip install azure-ai-projects`
`azure-ai-inference`	Model inference (chat, embeddings, image generation)	`pip install azure-ai-inference`
`azure-ai-evaluation`	Evaluation metrics (groundedness, relevance, similarity)	`pip install azure-ai-evaluation`

All three support Entra ID authentication via DefaultAzureCredential. The inference SDK provides a unified interface across all models in the catalog — same API whether you’re calling GPT-4o or Llama 3.1.

Step-by-Step: Building with Azure AI Foundry

These five steps take you from creating a hub and project to deploying a model, building a prompt flow, setting up RAG, and evaluating output quality.

Step 1: Create an Azure AI Foundry Hub and Project

Azure AI Foundry organizes work into hubs (shared resources like compute and connections) and projects (individual workspaces within a hub).

Via the Azure Portal: navigate to Azure AI Foundry → Create a new hub → Create a project within the hub. The hub provides shared compute, connections, and security settings; the project is where you build.

Via the CLI:

# Create an Azure AI hub
az ml workspace create \
  --name my-ai-hub \
  --resource-group my-rg \
  --kind hub \
  --location eastus

# Create a project within the hub
az ml workspace create \
  --name my-genai-project \
  --resource-group my-rg \
  --kind project \
  --hub-id /subscriptions/SUB_ID/resourceGroups/my-rg/providers/Microsoft.MachineLearningServices/workspaces/my-ai-hub

Step 2: Deploy a Model from the Catalog

Browse the model catalog, filter by task (chat, embeddings, image generation), compare benchmarks, and deploy:

# Deploy GPT-4o as a serverless endpoint
az ml serverless-endpoint create \
  --name gpt4o-prod \
  --resource-group my-rg \
  --workspace-name my-genai-project \
  --model-id azureml://registries/azure-openai/models/gpt-4o/versions/2024-08-06

# Deploy Phi-4 for cost-sensitive workloads
az ml online-endpoint create \
  --name phi4-endpoint \
  --resource-group my-rg \
  --workspace-name my-genai-project

Two deployment types:

Serverless API (pay-per-token) — for OpenAI, Mistral, Cohere, Meta models. No infrastructure management.
Managed compute (pay-per-hour) — for open-weight models you want to run on dedicated VMs. Full control over scaling.

Step 3: Build a Prompt Flow

Create an orchestration pipeline that retrieves documents and generates answers:

# prompt_flow/flow.dag.yaml — defines the pipeline
inputs:
  question:
    type: string

nodes:
  - name: retrieve_docs
    type: python
    source:
      type: code
      path: retrieve.py
    inputs:
      query: ${inputs.question}

  - name: generate_answer
    type: llm
    source:
      type: code
      path: generate.jinja2
    inputs:
      model: gpt-4o
      context: ${retrieve_docs.output}
      question: ${inputs.question}

outputs:
  answer:
    type: string
    reference: ${generate_answer.output}

Step 4: Set Up RAG with Azure AI Search

import os
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_KEY"],
    api_version="2024-12-01-preview"
)

response = client.chat.completions.create(
    model="gpt-4o-prod",
    messages=[{"role": "user", "content": "What is our PTO policy?"}],
    extra_body={
        "data_sources": [
            {
                "type": "azure_search",
                "parameters": {
                    "endpoint": "https://YOUR-SEARCH.search.windows.net",
                    "index_name": "company-docs",
                    "authentication": {
                        "type": "api_key",
                        "key": "YOUR-SEARCH-KEY"
                    },
                    "query_type": "vector_semantic_hybrid",
                    "semantic_configuration": "default",
                    "top_n_documents": 5
                }
            }
        ]
    }
)

# Response includes citations from retrieved documents
print(response.choices[0].message.content)

Step 5: Evaluate with Built-in Metrics

Azure AI Foundry includes evaluation metrics for RAG quality:

from azure.ai.evaluation import evaluate, GroundednessEvaluator, RelevanceEvaluator

# Evaluate your RAG pipeline
result = evaluate(
    data="test_data.jsonl",
    evaluators={
        "groundedness": GroundednessEvaluator(model_config=model_config),
        "relevance": RelevanceEvaluator(model_config=model_config),
    }
)

print(f"Groundedness: {result['groundedness']:.2f}")
print(f"Relevance: {result['relevance']:.2f}")

Azure AI Foundry Agent Service

Agent Service is Azure AI Foundry’s managed platform for building autonomous agents — agents that can reason, use tools, and take multi-step actions without human intervention at each step.

When to use Agent Service vs. Prompt Flow:

Prompt Flow — Deterministic pipelines. You define the exact DAG: step A → step B → step C. Best for RAG, structured workflows, and predictable orchestration.
Agent Service — Autonomous reasoning. The agent decides which tools to call and in what order. Best for research assistants, customer support bots, and tasks where the sequence of actions depends on intermediate results.

Key capabilities:

Tool calling — Connect agents to Azure Functions, REST APIs, Azure AI Search, and custom code tools
Multi-turn conversations — Agents maintain context across turns with managed thread state
Code Interpreter — Built-in sandboxed Python execution for data analysis, math, and file processing
File Search — Attach documents to agent threads for on-the-fly RAG without building a separate retrieval pipeline
Managed state — Agent threads persist across sessions in Azure-managed storage

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# Create an agent with tool access
project = AIProjectClient(
    credential=DefaultAzureCredential(),
    endpoint="https://YOUR-HUB.api.azureml.ms"
)

agent = project.agents.create_agent(
    model="gpt-4o",
    name="research-assistant",
    instructions="You are a research assistant. Use the search tool to find relevant information.",
    tools=[{"type": "code_interpreter"}, {"type": "file_search"}]
)

# Create a thread and run the agent
thread = project.agents.create_thread()
project.agents.create_message(thread_id=thread.id, role="user", content="Analyze Q3 revenue trends")
run = project.agents.create_and_process_run(thread_id=thread.id, agent_id=agent.id)

Agent Service vs. building agents yourself: If you’re already using LangGraph or CrewAI for agent orchestration, Agent Service provides the same pattern but Azure-managed — no infrastructure to maintain, built-in content safety, and Entra ID authentication. The trade-off is less flexibility in agent routing and no access to Anthropic models.

Architecture and System View

Azure AI Foundry stacks five tiers from application code down to data — with Entra ID security and Prompt Flow orchestration sitting between your app and the 1,800+ model catalog.

📊 Azure AI Foundry Platform Architecture

Azure AI Foundry Architecture

The complete AI development platform on Microsoft Azure

Application Tier

Azure App Service · AKS · Azure Functions · Teams · Power Platform

Azure Entra ID + Responsible AI

Managed identity · RBAC · Content Safety · Groundedness filters

Prompt Flow Orchestration

DAG pipelines · LLM chains · Tool calling · Conditional logic

Model Catalog (1,800+ Models)

GPT-4o · Phi-4 · Llama 3 · Mistral · Cohere · Open-source

Azure AI Search + Data Layer

Vector + semantic hybrid search · SharePoint · Blob · Cosmos DB

Idle

📊 Azure AI Foundry Development Workflow

Azure AI Foundry Development Workflow

From model selection to production deployment

Select

Model catalog

Browse 1,800+ models

Compare benchmarks

Choose deployment type

Build

AI Studio

Playground testing

Prompt flow orchestration

Fine-tune if needed

Evaluate

Quality assurance

Groundedness scoring

Relevance metrics

Responsible AI checks

Deploy

Production

Managed endpoints

Content safety filters

Monitoring + alerts

Idle

Private Endpoint Architecture

For production deployments where all traffic must stay within the corporate network:

Create an Azure Private Endpoint for the Azure AI Foundry hub
Configure a Private DNS Zone in your VNet
Create a Private Endpoint for Azure AI Search
Disable public network access on all resources
All API traffic resolves to private IP addresses within the VNet — no public internet traversal

This configuration is standard for financial services and healthcare workloads on Azure.

Practical Examples

Model selection, streaming, and function calling are the three patterns engineers reach for most often when building with Azure AI Foundry.

Cost Comparison: Model Selection in Azure AI Foundry

One of Azure AI Foundry’s biggest advantages is model choice. Here’s how costs compare for a document classification task processing 1M documents/month:

Model	Provider	Input Cost	Output Cost	Monthly Cost (1M docs)	Quality
GPT-4o	OpenAI	$2.50/M tokens	$10/M tokens	~$3,750	Highest
GPT-4o-mini	OpenAI	$0.15/M tokens	$0.60/M tokens	~$225	Good
Phi-4	Microsoft	~$0.07/M tokens	~$0.14/M tokens	~$63	Moderate
Mistral Large	Mistral	$2/M tokens	$6/M tokens	~$2,400	High
Llama 3.1 70B	Meta	Compute cost	Compute cost	~$500 (managed compute)	High

Using Phi-4 instead of GPT-4o for simple classification saves 98% — from $3,750 to $63/month.

Streaming Responses

import os
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_KEY"],
    api_version="2024-12-01-preview"
)

def stream_response(user_message: str):
    stream = client.chat.completions.create(
        model="gpt-4o-prod",
        messages=[{"role": "user", "content": user_message}],
        stream=True
    )
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

Function Calling with Tool Use

import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_company_docs",
            "description": "Search internal company documents by topic",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query"},
                    "department": {"type": "string", "enum": ["hr", "finance", "engineering", "legal"]}
                },
                "required": ["query"]
            }
        }
    }
]

messages = [{"role": "user", "content": "What is the engineering team's on-call policy?"}]
response = client.chat.completions.create(model="gpt-4o-prod", messages=messages, tools=tools)

if response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    result = search_company_docs(**args)  # Your implementation
    messages.extend([
        response.choices[0].message,
        {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)}
    ])
    final = client.chat.completions.create(model="gpt-4o-prod", messages=messages, tools=tools)
    print(final.choices[0].message.content)

Azure AI Foundry Pricing and Cost Tiers

Azure AI Foundry pricing depends on how you deploy models. There is no single “Foundry subscription fee” — you pay for the resources you use.

Deployment Type	Pricing Model	Example Cost	Best For
Serverless API (OpenAI)	Per-token (input + output)	GPT-4o: ~$2.50/1M input, ~$10/1M output	Production inference, variable load
Serverless API (3rd party)	Per-token (provider sets price)	Mistral Large: ~$2/1M input, ~$6/1M output	Multi-model testing, Mistral/Cohere
Managed Compute	Hourly VM + storage	Standard_NC24ads_A100: ~$3.67/hr	Fine-tuning, open-weight models (Llama, Phi)
Azure AI Search	Per search unit (SU)	Basic: ~$75/month (15GB, 5 indexes)	RAG retrieval layer
Provisioned Throughput	Reserved capacity (PTU)	Per PTU/month (varies by model)	High-volume, predictable workloads

Cost optimization strategies:

Model routing — Route simple queries to Phi-4 (~$0.07/1M input) and complex queries to GPT-4o. This can cut inference costs by 80-95% for workloads where most queries are straightforward.
Serverless over Managed Compute for variable loads — you only pay when the model is invoked. At low-to-moderate traffic, serverless is almost always cheaper.
Provisioned Throughput Units (PTUs) for high-volume — if you consistently process 100K+ requests/day, reserved capacity is cheaper per-token than pay-as-you-go.
Prompt caching — Azure caches common prompt prefixes automatically. System prompts and few-shot examples that repeat across requests are charged at reduced rates.

Estimating your monthly bill:

Customer support bot (10K conversations/day, avg 500 tokens each):
  Model routing: 80% Phi-4, 20% GPT-4o
  Phi-4 cost:   8K × 500 × $0.07/1M = ~$0.28/day
  GPT-4o cost:  2K × 500 × $2.50/1M (in) + $10/1M (out) = ~$12.50/day
  Total inference: ~$13/day → ~$390/month
  Azure AI Search (Basic): $75/month
  Estimated total: ~$465/month

Pricing changes frequently. Check the Azure AI Foundry pricing page for current per-token rates.

Trade-offs, Limitations and Failure Modes

No Claude or Anthropic Models

Azure AI Foundry does not include Anthropic’s Claude models. If your evaluation shows Claude produces better results for your use case, you need AWS Bedrock (which has Claude) or the direct Anthropic API. This is the primary model gap compared to Bedrock.

Model Availability Varies by Region

Not all 1,800+ models are available in every Azure region. GPT-4o is broadly available, but newer models and third-party models may be limited to specific regions. Check regional availability before committing to a model — especially for data residency requirements that restrict which regions you can use.

Prompt Flow Learning Curve

Prompt flow is powerful but has a steeper learning curve than raw API calls. Teams that just need a simple chat endpoint may find it over-engineered. Start with direct API calls, migrate to prompt flow when you need multi-step orchestration, evaluation, or versioned deployments.

Content Safety Filter False Positives

The default content safety filter occasionally blocks legitimate content in professional contexts — technical security discussions, medical content, legal analysis. Azure provides a process to request modified filter configurations with documented justification. Allow 2-4 weeks for approval when planning production timelines.

API Version Churn

Azure AI Foundry API versions change frequently. Each version may alter response formats or behavior. Pin to a specific version in production and test new versions in staging before upgrading. The 2024-12-01-preview version is current as of early 2026.

Interview Perspective

These three questions test whether you understand Azure AI Foundry’s platform strategy, enterprise RAG architecture, and trade-offs versus competing managed AI services.

Q1: “What is Azure AI Foundry and how does it differ from Azure OpenAI Service?”

What they’re testing: Do you understand Microsoft’s evolving AI platform strategy?

Strong answer: “Azure AI Foundry is Microsoft’s unified AI development platform — it includes Azure OpenAI Service as one component but goes much further. The model catalog has 1,800+ models beyond OpenAI — Phi, Llama, Mistral, Cohere. It adds prompt flow for orchestration, built-in evaluation tools, and responsible AI features. Think of Azure OpenAI as the engine and Foundry as the whole vehicle. The rebrand from Azure AI Studio happened in late 2024.”

Q2: “How would you build an enterprise RAG system on Azure AI Foundry?”

What they’re testing: Can you design a production system using Azure-native components?

Strong answer: “Azure AI Search for retrieval — it has native SharePoint and Blob Storage connectors, plus hybrid search combining vector, semantic, and keyword. Model deployment through the Foundry catalog — GPT-4o for generation, text-embedding-3-large for embeddings. Authentication via Entra ID managed identities — no API keys. Private endpoints to keep all traffic within the VNet. Content safety filters for responsible AI. Evaluation using Foundry’s built-in groundedness and relevance metrics before production deployment.”

Q3: “When would you choose Azure AI Foundry over AWS Bedrock?”

What they’re testing: Platform selection reasoning.

Strong answer: “Azure AI Foundry for Microsoft-first organizations — the Entra ID integration, SharePoint connectors, and existing Azure compliance infrastructure are major advantages. Bedrock for AWS-first organizations or teams that specifically need Claude (Anthropic), which isn’t available on Azure. Both offer multi-model catalogs, but Azure has more models (1,800+ vs ~30 on Bedrock) and prompt flow for orchestration. The practical decision usually comes down to which cloud your organization already uses.”

Production Perspective

Production deployments on Azure AI Foundry follow four consistent patterns: managed identity authentication, model routing for cost, cross-region failover, and pinned API versions.

Use Managed Identity, Not API Keys

Every Azure workload calling Azure AI Foundry endpoints should authenticate with a managed identity. Managed identities are automatically rotated by Azure, have no secret to store or leak, and integrate with Azure RBAC.

from azure.identity import ManagedIdentityCredential
from openai import AzureOpenAI

credential = ManagedIdentityCredential()

def get_client():
    token = credential.get_token("https://cognitiveservices.azure.com/.default")
    return AzureOpenAI(
        azure_endpoint="https://YOUR-RESOURCE.openai.azure.com/",
        azure_ad_token=token.token,
        api_version="2024-12-01-preview"
    )

Model Routing for Cost Optimization

With 1,800+ models available, implement a routing strategy: GPT-4o for complex reasoning, Phi-4 or GPT-4o-mini for simple tasks. A model router can cut costs by 60-90% for workloads with mixed complexity.

Cross-Region Failover

Azure AI Foundry resources are regional. For production workloads with availability SLAs, deploy endpoints in two regions and route traffic with Azure API Management. Use circuit breaker patterns to detect and route around regional outages automatically.

Pin API Versions

API versions are specified in every client initialization. They are not backward-compatible — a new version may change response formats. Pin to a specific version in production, test new versions in staging.

For more on choosing between cloud AI platforms — including AWS Bedrock and Google Vertex AI — see our platform comparison guide.

Summary and Key Takeaways

Azure AI Foundry is the unified platform — model catalog (1,800+ models), prompt flow orchestration, evaluation tools, and responsible AI in one place
Beyond OpenAI — Phi-4, Llama 3, Mistral, Cohere, and hundreds of open-source models are now available alongside GPT-4o
Prompt flow replaces custom orchestration — visual and code-based pipeline builder with built-in versioning and deployment
Entra ID integration — managed identities, RBAC, and private endpoints extend existing Azure security to AI workloads
Cost optimization through model choice — using Phi-4 instead of GPT-4o for simple tasks can save 98%
No Claude models — if you need Anthropic’s Claude, use AWS Bedrock or the direct Anthropic API
The naming keeps changing — Azure AI Studio → Azure AI Foundry (late 2024). APIs are the same, branding differs

Cloud AI Platforms Compared — Side-by-side comparison of Azure AI Foundry, Bedrock, and Vertex AI
AWS Bedrock Deep-Dive — The comparable managed platform for AWS-first organizations
Google Vertex AI Deep-Dive — The comparable managed platform for GCP-first organizations
Agentic Design Patterns — The ReAct and orchestration patterns used in Foundry agent workflows
RAG Architecture — How retrieval-augmented generation works under the hood
GenAI Engineering Tools — The broader tool ecosystem for GenAI engineers
Azure vs Bedrock — Head-to-head comparison of Azure AI Foundry and AWS Bedrock

Frequently Asked Questions

What is Azure AI Foundry?

Azure AI Foundry (formerly Azure AI Studio) is Microsoft's unified AI development platform. It provides a model catalog with 1,800+ models from OpenAI, Meta, Mistral, Microsoft, and others, plus tools for prompt flow orchestration, fine-tuning, evaluation, and responsible AI. It replaced the separate Azure OpenAI Service portal as the central hub for all AI development on Azure.

Is Azure AI Foundry the same as Azure OpenAI Service?

No. Azure OpenAI Service is one component within Azure AI Foundry — it provides access to OpenAI models (GPT-4o, o1, DALL-E). Azure AI Foundry is the broader platform that includes Azure OpenAI plus 1,800+ additional models (Phi, Mistral, Llama, Cohere), prompt flow for orchestration, evaluation tools, and responsible AI features. Think of Azure OpenAI as the engine and Azure AI Foundry as the entire vehicle.

How does Azure AI Foundry compare to AWS Bedrock?

Both are managed AI platforms with multi-model catalogs. Azure AI Foundry offers 1,800+ models and deep Microsoft ecosystem integration (Entra ID, SharePoint, Teams). AWS Bedrock offers fewer models but includes Claude (Anthropic) which Azure AI Foundry does not. Choose Azure AI Foundry for Microsoft-first organizations, Bedrock for AWS-first organizations or teams that need Claude.

What models are available in Azure AI Foundry?

Azure AI Foundry's model catalog includes OpenAI models (GPT-4o, GPT-4 Turbo, o1, o3-mini), Microsoft models (Phi-4, Phi-3.5), Meta models (Llama 3.1, Llama 3.2), Mistral models (Mistral Large, Mixtral), Cohere models, and many others — over 1,800 models total. Models are deployed as serverless API endpoints or managed compute endpoints depending on the provider.

How do I get started with Azure AI Foundry?

Create an Azure subscription, then navigate to Azure AI Foundry in the Azure Portal. Create a hub (shared resources like compute and connections) and a project within the hub. Deploy a model from the catalog — start with GPT-4o for general use or Phi-4 for cost-sensitive tasks. Use the playground to test prompts before building prompt flow pipelines for production workloads.

What is Prompt Flow in Azure AI Foundry?

Prompt Flow is Azure AI Foundry's visual and code-based orchestration tool for building LLM pipelines. You define your chain as a directed acyclic graph (DAG) — each node is a step (LLM call, Python function, API call, conditional logic). It is similar to LangGraph but Azure-native with built-in versioning, evaluation, and deployment. Use it when you need multi-step orchestration beyond simple API calls.

Can I use open-source models in Azure AI Foundry?

Yes. The model catalog includes 1,800+ open-source and third-party models including Meta's Llama 3.1 and 3.2, Microsoft's Phi-4 and Phi-3.5, Mistral's models, and many community models. Open-source models are deployed as managed compute endpoints (hourly VM billing) rather than serverless endpoints, giving you full control over the infrastructure and the ability to fine-tune.

How much does Azure AI Foundry cost?

There is no single Foundry subscription fee — you pay for the resources you use. Serverless API models (GPT-4o, Mistral) charge per token. Managed compute models (Llama, Phi) charge per hour of VM time. Azure AI Search for RAG starts at roughly $75/month for the Basic tier. Using Phi-4 instead of GPT-4o for simple classification can save 98% — from $3,750 to $63/month for 1M documents.

What are managed endpoints vs serverless endpoints in Azure AI Foundry?

Serverless API endpoints charge per token with no infrastructure management — best for OpenAI, Mistral, and Cohere models with variable load. Managed compute endpoints charge per hour of VM time and give you control over VM size — best for open-weight models (Llama, Phi) that you want to fine-tune or run on dedicated infrastructure. Both use the same Azure AI Model Inference API.

How do I fine-tune models in Azure AI Foundry?

Azure AI Foundry supports fine-tuning for OpenAI models (GPT-4o, GPT-4o-mini) and open-source models (Llama, Phi). Upload your training data in JSONL format, configure hyperparameters (epochs, learning rate), and run the fine-tuning job on managed compute. The fine-tuned model is deployed as a new endpoint. Fine-tuning is best when you need domain-specific behavior that prompt engineering and RAG cannot achieve.

Last updated: March 2026 | Azure AI Foundry (formerly Azure AI Studio / Azure OpenAI Service)