Azure AI Foundry: Complete Platform Guide (2026)
Azure AI Foundry is Microsoft’s unified AI development platform — a model catalog with 1,800+ models, prompt flow orchestration, evaluation tools, and enterprise-grade deployment. It replaced Azure AI Studio in late 2024 and expanded beyond just OpenAI models to include Phi, Mistral, Llama, Cohere, and more. If your organization runs on Azure, Azure AI Foundry is where you build, test, and deploy AI applications. For the full comparison of all three major platforms, see our Cloud AI Platforms guide.
Who this is for:
- Junior engineers: You’re exploring Azure’s AI services and need to understand how Azure AI Foundry fits together
- Senior engineers: You’re evaluating Azure AI Foundry against Bedrock and Vertex AI for enterprise AI workloads
Real-World Problem Context
Section titled “Real-World Problem Context”You’re a GenAI engineer at a company running on Microsoft Azure. Your team needs to build an AI-powered document assistant that searches internal SharePoint documents and answers employee questions. The requirements from security: all data stays within the Azure boundary, model invocations must use Entra ID authentication, and the system must support SOC 2 compliance.
Here’s where Azure AI Foundry components fit:
| Requirement | Azure AI Foundry Solution | Alternative (Without Foundry) |
|---|---|---|
| LLM inference | Model catalog → deploy GPT-4o or Phi-4 | Direct OpenAI API (outside Azure boundary) |
| Document search | Azure AI Search (native SharePoint connector) | External vector DB (Pinecone, Weaviate) |
| Orchestration | Prompt flow (visual + code) | Custom LangChain/LangGraph code |
| Auth & RBAC | Entra ID managed identity | API keys (manual rotation, no RBAC) |
| Content safety | Built-in responsible AI filters | Custom guardrails from scratch |
| Evaluation | Built-in eval tools (groundedness, relevance) | Custom eval pipeline |
The biggest mistake teams make: using Azure AI Foundry only for OpenAI models. The platform’s value is the integrated workflow — model selection, prompt engineering, evaluation, and deployment in one place. Teams that treat it as “just an API endpoint” miss the orchestration and evaluation features that save weeks of custom development.
Core Concepts and Mental Model
Section titled “Core Concepts and Mental Model”Think of Azure AI Foundry as a three-layer platform:
- Model Catalog — The model marketplace. Browse 1,800+ models, compare benchmarks, deploy with one click. This is where you pick your model.
- AI Studio — The development environment. Build prompt flows, test in the playground, fine-tune models, run evaluations. This is where you build your application.
- Deployment & Operations — Managed endpoints, content safety, monitoring, and Azure-native security. This is where you run in production.
Key Components
Section titled “Key Components”Model Catalog: The headline feature that differentiates Azure AI Foundry from the old Azure OpenAI Service. Instead of only OpenAI models, you now get access to:
- OpenAI: GPT-4o, GPT-4 Turbo, o1, o3-mini, DALL-E, Whisper
- Microsoft: Phi-4, Phi-3.5 (small language models optimized for cost)
- Meta: Llama 3.1, Llama 3.2 (open-weight models)
- Mistral: Mistral Large, Mixtral (European AI models)
- Cohere: Command R+ (retrieval-optimized models)
- 1,700+ more from various providers and the open-source community
Prompt Flow: A visual and code-based orchestration tool for building LLM pipelines. Define your chain as a directed acyclic graph (DAG) — each node is a step (LLM call, Python function, API call, conditional logic). Similar to LangGraph but Azure-native with built-in versioning, evaluation, and deployment.
Azure AI Search: The managed vector + keyword search service. For RAG pipelines, it serves as the retrieval layer with native connectors for SharePoint, Azure Blob Storage, and Cosmos DB. Supports hybrid search (vector + semantic + keyword) out of the box.
Responsible AI: Built-in content safety filters, groundedness detection, and bias evaluation. Applied to both inputs and outputs by default. Configurable per deployment — stricter for public-facing apps, relaxed for internal tools (with Microsoft approval).
Authentication: Entra ID First
Section titled “Authentication: Entra ID First”Azure AI Foundry uses Entra ID (formerly Azure AD) as the primary authentication method. In production, always use managed identities — no API keys to rotate or leak:
from azure.identity import DefaultAzureCredentialfrom openai import AzureOpenAI
# Managed identity — no key required, Azure handles rotationcredential = DefaultAzureCredential()token = credential.get_token("https://cognitiveservices.azure.com/.default")
client = AzureOpenAI( azure_endpoint="https://YOUR-RESOURCE.openai.azure.com/", azure_ad_token=token.token, api_version="2024-12-01-preview")Azure AI Foundry Models: Catalog Reference
Section titled “Azure AI Foundry Models: Catalog Reference”The model catalog is Azure AI Foundry’s biggest differentiator — 1,800+ models from multiple providers, all deployable through the same API. Here’s a reference by provider and use case:
| Provider | Key Models | Best For | Deployment Type |
|---|---|---|---|
| OpenAI | GPT-4o, GPT-4 Turbo, o1, o3-mini, DALL-E 3, Whisper | General reasoning, code generation, multimodal, audio | Serverless (pay-per-token) |
| Microsoft | Phi-4, Phi-3.5-mini, Phi-3.5-vision | Cost-sensitive tasks, edge deployment, vision | Serverless or Managed Compute |
| Meta | Llama 3.1 (8B/70B/405B), Llama 3.2 | Open-weight models, fine-tuning, on-premise | Managed Compute |
| Mistral | Mistral Large, Mixtral 8x22B | European AI compliance, multilingual | Serverless |
| Cohere | Command R+, Embed v3 | RAG-optimized generation, embeddings | Serverless |
| Open-source | DBRX, Falcon, StarCoder2, 1,700+ more | Specialized tasks, research, fine-tuning | Managed Compute |
Two deployment modes:
- Serverless API — Pay-per-token, zero infrastructure. Model provider sets pricing. Best for GPT-4o, Mistral, Cohere. No GPU management.
- Managed Compute — Hourly compute charges. You control the VM size. Best for open-weight models (Llama, Phi) that you want to fine-tune or run on dedicated infrastructure.
Both modes use the same Azure AI Model Inference API — switch models by changing a single deployment name string, not your code.
Azure AI Foundry SDK Packages
Section titled “Azure AI Foundry SDK Packages”Three Python packages cover the platform:
| Package | Purpose | Install |
|---|---|---|
azure-ai-projects | Hub/project management, connection handling | pip install azure-ai-projects |
azure-ai-inference | Model inference (chat, embeddings, image generation) | pip install azure-ai-inference |
azure-ai-evaluation | Evaluation metrics (groundedness, relevance, similarity) | pip install azure-ai-evaluation |
All three support Entra ID authentication via DefaultAzureCredential. The inference SDK provides a unified interface across all models in the catalog — same API whether you’re calling GPT-4o or Llama 3.1.
Step-by-Step: Building with Azure AI Foundry
Section titled “Step-by-Step: Building with Azure AI Foundry”These five steps take you from creating a hub and project to deploying a model, building a prompt flow, setting up RAG, and evaluating output quality.
Step 1: Create an Azure AI Foundry Hub and Project
Section titled “Step 1: Create an Azure AI Foundry Hub and Project”Azure AI Foundry organizes work into hubs (shared resources like compute and connections) and projects (individual workspaces within a hub).
Via the Azure Portal: navigate to Azure AI Foundry → Create a new hub → Create a project within the hub. The hub provides shared compute, connections, and security settings; the project is where you build.
Via the CLI:
# Create an Azure AI hubaz ml workspace create \ --name my-ai-hub \ --resource-group my-rg \ --kind hub \ --location eastus
# Create a project within the hubaz ml workspace create \ --name my-genai-project \ --resource-group my-rg \ --kind project \ --hub-id /subscriptions/SUB_ID/resourceGroups/my-rg/providers/Microsoft.MachineLearningServices/workspaces/my-ai-hubStep 2: Deploy a Model from the Catalog
Section titled “Step 2: Deploy a Model from the Catalog”Browse the model catalog, filter by task (chat, embeddings, image generation), compare benchmarks, and deploy:
# Deploy GPT-4o as a serverless endpointaz ml serverless-endpoint create \ --name gpt4o-prod \ --resource-group my-rg \ --workspace-name my-genai-project \ --model-id azureml://registries/azure-openai/models/gpt-4o/versions/2024-08-06
# Deploy Phi-4 for cost-sensitive workloadsaz ml online-endpoint create \ --name phi4-endpoint \ --resource-group my-rg \ --workspace-name my-genai-projectTwo deployment types:
- Serverless API (pay-per-token) — for OpenAI, Mistral, Cohere, Meta models. No infrastructure management.
- Managed compute (pay-per-hour) — for open-weight models you want to run on dedicated VMs. Full control over scaling.
Step 3: Build a Prompt Flow
Section titled “Step 3: Build a Prompt Flow”Create an orchestration pipeline that retrieves documents and generates answers:
# prompt_flow/flow.dag.yaml — defines the pipelineinputs: question: type: string
nodes: - name: retrieve_docs type: python source: type: code path: retrieve.py inputs: query: ${inputs.question}
- name: generate_answer type: llm source: type: code path: generate.jinja2 inputs: model: gpt-4o context: ${retrieve_docs.output} question: ${inputs.question}
outputs: answer: type: string reference: ${generate_answer.output}Step 4: Set Up RAG with Azure AI Search
Section titled “Step 4: Set Up RAG with Azure AI Search”import osfrom openai import AzureOpenAI
client = AzureOpenAI( azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ["AZURE_OPENAI_KEY"], api_version="2024-12-01-preview")
response = client.chat.completions.create( model="gpt-4o-prod", messages=[{"role": "user", "content": "What is our PTO policy?"}], extra_body={ "data_sources": [ { "type": "azure_search", "parameters": { "endpoint": "https://YOUR-SEARCH.search.windows.net", "index_name": "company-docs", "authentication": { "type": "api_key", "key": "YOUR-SEARCH-KEY" }, "query_type": "vector_semantic_hybrid", "semantic_configuration": "default", "top_n_documents": 5 } } ] })
# Response includes citations from retrieved documentsprint(response.choices[0].message.content)Step 5: Evaluate with Built-in Metrics
Section titled “Step 5: Evaluate with Built-in Metrics”Azure AI Foundry includes evaluation metrics for RAG quality:
from azure.ai.evaluation import evaluate, GroundednessEvaluator, RelevanceEvaluator
# Evaluate your RAG pipelineresult = evaluate( data="test_data.jsonl", evaluators={ "groundedness": GroundednessEvaluator(model_config=model_config), "relevance": RelevanceEvaluator(model_config=model_config), })
print(f"Groundedness: {result['groundedness']:.2f}")print(f"Relevance: {result['relevance']:.2f}")Azure AI Foundry Agent Service
Section titled “Azure AI Foundry Agent Service”Agent Service is Azure AI Foundry’s managed platform for building autonomous agents — agents that can reason, use tools, and take multi-step actions without human intervention at each step.
When to use Agent Service vs. Prompt Flow:
- Prompt Flow — Deterministic pipelines. You define the exact DAG: step A → step B → step C. Best for RAG, structured workflows, and predictable orchestration.
- Agent Service — Autonomous reasoning. The agent decides which tools to call and in what order. Best for research assistants, customer support bots, and tasks where the sequence of actions depends on intermediate results.
Key capabilities:
- Tool calling — Connect agents to Azure Functions, REST APIs, Azure AI Search, and custom code tools
- Multi-turn conversations — Agents maintain context across turns with managed thread state
- Code Interpreter — Built-in sandboxed Python execution for data analysis, math, and file processing
- File Search — Attach documents to agent threads for on-the-fly RAG without building a separate retrieval pipeline
- Managed state — Agent threads persist across sessions in Azure-managed storage
from azure.ai.projects import AIProjectClientfrom azure.identity import DefaultAzureCredential
# Create an agent with tool accessproject = AIProjectClient( credential=DefaultAzureCredential(), endpoint="https://YOUR-HUB.api.azureml.ms")
agent = project.agents.create_agent( model="gpt-4o", name="research-assistant", instructions="You are a research assistant. Use the search tool to find relevant information.", tools=[{"type": "code_interpreter"}, {"type": "file_search"}])
# Create a thread and run the agentthread = project.agents.create_thread()project.agents.create_message(thread_id=thread.id, role="user", content="Analyze Q3 revenue trends")run = project.agents.create_and_process_run(thread_id=thread.id, agent_id=agent.id)Agent Service vs. building agents yourself: If you’re already using LangGraph or CrewAI for agent orchestration, Agent Service provides the same pattern but Azure-managed — no infrastructure to maintain, built-in content safety, and Entra ID authentication. The trade-off is less flexibility in agent routing and no access to Anthropic models.
Architecture and System View
Section titled “Architecture and System View”Azure AI Foundry stacks five tiers from application code down to data — with Entra ID security and Prompt Flow orchestration sitting between your app and the 1,800+ model catalog.
📊 Azure AI Foundry Platform Architecture
Section titled “📊 Azure AI Foundry Platform Architecture”Azure AI Foundry Architecture
The complete AI development platform on Microsoft Azure
📊 Azure AI Foundry Development Workflow
Section titled “📊 Azure AI Foundry Development Workflow”Azure AI Foundry Development Workflow
From model selection to production deployment
Private Endpoint Architecture
Section titled “Private Endpoint Architecture”For production deployments where all traffic must stay within the corporate network:
- Create an Azure Private Endpoint for the Azure AI Foundry hub
- Configure a Private DNS Zone in your VNet
- Create a Private Endpoint for Azure AI Search
- Disable public network access on all resources
- All API traffic resolves to private IP addresses within the VNet — no public internet traversal
This configuration is standard for financial services and healthcare workloads on Azure.
Practical Examples
Section titled “Practical Examples”Model selection, streaming, and function calling are the three patterns engineers reach for most often when building with Azure AI Foundry.
Cost Comparison: Model Selection in Azure AI Foundry
Section titled “Cost Comparison: Model Selection in Azure AI Foundry”One of Azure AI Foundry’s biggest advantages is model choice. Here’s how costs compare for a document classification task processing 1M documents/month:
| Model | Provider | Input Cost | Output Cost | Monthly Cost (1M docs) | Quality |
|---|---|---|---|---|---|
| GPT-4o | OpenAI | $2.50/M tokens | $10/M tokens | ~$3,750 | Highest |
| GPT-4o-mini | OpenAI | $0.15/M tokens | $0.60/M tokens | ~$225 | Good |
| Phi-4 | Microsoft | ~$0.07/M tokens | ~$0.14/M tokens | ~$63 | Moderate |
| Mistral Large | Mistral | $2/M tokens | $6/M tokens | ~$2,400 | High |
| Llama 3.1 70B | Meta | Compute cost | Compute cost | ~$500 (managed compute) | High |
Using Phi-4 instead of GPT-4o for simple classification saves 98% — from $3,750 to $63/month.
Streaming Responses
Section titled “Streaming Responses”import osfrom openai import AzureOpenAI
client = AzureOpenAI( azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ["AZURE_OPENAI_KEY"], api_version="2024-12-01-preview")
def stream_response(user_message: str): stream = client.chat.completions.create( model="gpt-4o-prod", messages=[{"role": "user", "content": user_message}], stream=True ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: yield chunk.choices[0].delta.contentFunction Calling with Tool Use
Section titled “Function Calling with Tool Use”import json
tools = [ { "type": "function", "function": { "name": "search_company_docs", "description": "Search internal company documents by topic", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "The search query"}, "department": {"type": "string", "enum": ["hr", "finance", "engineering", "legal"]} }, "required": ["query"] } } }]
messages = [{"role": "user", "content": "What is the engineering team's on-call policy?"}]response = client.chat.completions.create(model="gpt-4o-prod", messages=messages, tools=tools)
if response.choices[0].finish_reason == "tool_calls": tool_call = response.choices[0].message.tool_calls[0] args = json.loads(tool_call.function.arguments) result = search_company_docs(**args) # Your implementation messages.extend([ response.choices[0].message, {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)} ]) final = client.chat.completions.create(model="gpt-4o-prod", messages=messages, tools=tools) print(final.choices[0].message.content)Azure AI Foundry Pricing and Cost Tiers
Section titled “Azure AI Foundry Pricing and Cost Tiers”Azure AI Foundry pricing depends on how you deploy models. There is no single “Foundry subscription fee” — you pay for the resources you use.
| Deployment Type | Pricing Model | Example Cost | Best For |
|---|---|---|---|
| Serverless API (OpenAI) | Per-token (input + output) | GPT-4o: ~$2.50/1M input, ~$10/1M output | Production inference, variable load |
| Serverless API (3rd party) | Per-token (provider sets price) | Mistral Large: ~$2/1M input, ~$6/1M output | Multi-model testing, Mistral/Cohere |
| Managed Compute | Hourly VM + storage | Standard_NC24ads_A100: ~$3.67/hr | Fine-tuning, open-weight models (Llama, Phi) |
| Azure AI Search | Per search unit (SU) | Basic: ~$75/month (15GB, 5 indexes) | RAG retrieval layer |
| Provisioned Throughput | Reserved capacity (PTU) | Per PTU/month (varies by model) | High-volume, predictable workloads |
Cost optimization strategies:
- Model routing — Route simple queries to Phi-4 (~$0.07/1M input) and complex queries to GPT-4o. This can cut inference costs by 80-95% for workloads where most queries are straightforward.
- Serverless over Managed Compute for variable loads — you only pay when the model is invoked. At low-to-moderate traffic, serverless is almost always cheaper.
- Provisioned Throughput Units (PTUs) for high-volume — if you consistently process 100K+ requests/day, reserved capacity is cheaper per-token than pay-as-you-go.
- Prompt caching — Azure caches common prompt prefixes automatically. System prompts and few-shot examples that repeat across requests are charged at reduced rates.
Estimating your monthly bill:
Customer support bot (10K conversations/day, avg 500 tokens each): Model routing: 80% Phi-4, 20% GPT-4o Phi-4 cost: 8K × 500 × $0.07/1M = ~$0.28/day GPT-4o cost: 2K × 500 × $2.50/1M (in) + $10/1M (out) = ~$12.50/day Total inference: ~$13/day → ~$390/month Azure AI Search (Basic): $75/month Estimated total: ~$465/monthPricing changes frequently. Check the Azure AI Foundry pricing page for current per-token rates.
Trade-offs, Limitations and Failure Modes
Section titled “Trade-offs, Limitations and Failure Modes”No Claude or Anthropic Models
Section titled “No Claude or Anthropic Models”Azure AI Foundry does not include Anthropic’s Claude models. If your evaluation shows Claude produces better results for your use case, you need AWS Bedrock (which has Claude) or the direct Anthropic API. This is the primary model gap compared to Bedrock.
Model Availability Varies by Region
Section titled “Model Availability Varies by Region”Not all 1,800+ models are available in every Azure region. GPT-4o is broadly available, but newer models and third-party models may be limited to specific regions. Check regional availability before committing to a model — especially for data residency requirements that restrict which regions you can use.
Prompt Flow Learning Curve
Section titled “Prompt Flow Learning Curve”Prompt flow is powerful but has a steeper learning curve than raw API calls. Teams that just need a simple chat endpoint may find it over-engineered. Start with direct API calls, migrate to prompt flow when you need multi-step orchestration, evaluation, or versioned deployments.
Content Safety Filter False Positives
Section titled “Content Safety Filter False Positives”The default content safety filter occasionally blocks legitimate content in professional contexts — technical security discussions, medical content, legal analysis. Azure provides a process to request modified filter configurations with documented justification. Allow 2-4 weeks for approval when planning production timelines.
API Version Churn
Section titled “API Version Churn”Azure AI Foundry API versions change frequently. Each version may alter response formats or behavior. Pin to a specific version in production and test new versions in staging before upgrading. The 2024-12-01-preview version is current as of early 2026.
Interview Perspective
Section titled “Interview Perspective”These three questions test whether you understand Azure AI Foundry’s platform strategy, enterprise RAG architecture, and trade-offs versus competing managed AI services.
Q1: “What is Azure AI Foundry and how does it differ from Azure OpenAI Service?”
Section titled “Q1: “What is Azure AI Foundry and how does it differ from Azure OpenAI Service?””What they’re testing: Do you understand Microsoft’s evolving AI platform strategy?
Strong answer: “Azure AI Foundry is Microsoft’s unified AI development platform — it includes Azure OpenAI Service as one component but goes much further. The model catalog has 1,800+ models beyond OpenAI — Phi, Llama, Mistral, Cohere. It adds prompt flow for orchestration, built-in evaluation tools, and responsible AI features. Think of Azure OpenAI as the engine and Foundry as the whole vehicle. The rebrand from Azure AI Studio happened in late 2024.”
Q2: “How would you build an enterprise RAG system on Azure AI Foundry?”
Section titled “Q2: “How would you build an enterprise RAG system on Azure AI Foundry?””What they’re testing: Can you design a production system using Azure-native components?
Strong answer: “Azure AI Search for retrieval — it has native SharePoint and Blob Storage connectors, plus hybrid search combining vector, semantic, and keyword. Model deployment through the Foundry catalog — GPT-4o for generation, text-embedding-3-large for embeddings. Authentication via Entra ID managed identities — no API keys. Private endpoints to keep all traffic within the VNet. Content safety filters for responsible AI. Evaluation using Foundry’s built-in groundedness and relevance metrics before production deployment.”
Q3: “When would you choose Azure AI Foundry over AWS Bedrock?”
Section titled “Q3: “When would you choose Azure AI Foundry over AWS Bedrock?””What they’re testing: Platform selection reasoning.
Strong answer: “Azure AI Foundry for Microsoft-first organizations — the Entra ID integration, SharePoint connectors, and existing Azure compliance infrastructure are major advantages. Bedrock for AWS-first organizations or teams that specifically need Claude (Anthropic), which isn’t available on Azure. Both offer multi-model catalogs, but Azure has more models (1,800+ vs ~30 on Bedrock) and prompt flow for orchestration. The practical decision usually comes down to which cloud your organization already uses.”
Production Perspective
Section titled “Production Perspective”Production deployments on Azure AI Foundry follow four consistent patterns: managed identity authentication, model routing for cost, cross-region failover, and pinned API versions.
Use Managed Identity, Not API Keys
Section titled “Use Managed Identity, Not API Keys”Every Azure workload calling Azure AI Foundry endpoints should authenticate with a managed identity. Managed identities are automatically rotated by Azure, have no secret to store or leak, and integrate with Azure RBAC.
from azure.identity import ManagedIdentityCredentialfrom openai import AzureOpenAI
credential = ManagedIdentityCredential()
def get_client(): token = credential.get_token("https://cognitiveservices.azure.com/.default") return AzureOpenAI( azure_endpoint="https://YOUR-RESOURCE.openai.azure.com/", azure_ad_token=token.token, api_version="2024-12-01-preview" )Model Routing for Cost Optimization
Section titled “Model Routing for Cost Optimization”With 1,800+ models available, implement a routing strategy: GPT-4o for complex reasoning, Phi-4 or GPT-4o-mini for simple tasks. A model router can cut costs by 60-90% for workloads with mixed complexity.
Cross-Region Failover
Section titled “Cross-Region Failover”Azure AI Foundry resources are regional. For production workloads with availability SLAs, deploy endpoints in two regions and route traffic with Azure API Management. Use circuit breaker patterns to detect and route around regional outages automatically.
Pin API Versions
Section titled “Pin API Versions”API versions are specified in every client initialization. They are not backward-compatible — a new version may change response formats. Pin to a specific version in production, test new versions in staging.
For more on choosing between cloud AI platforms — including AWS Bedrock and Google Vertex AI — see our platform comparison guide.
Summary and Key Takeaways
Section titled “Summary and Key Takeaways”- Azure AI Foundry is the unified platform — model catalog (1,800+ models), prompt flow orchestration, evaluation tools, and responsible AI in one place
- Beyond OpenAI — Phi-4, Llama 3, Mistral, Cohere, and hundreds of open-source models are now available alongside GPT-4o
- Prompt flow replaces custom orchestration — visual and code-based pipeline builder with built-in versioning and deployment
- Entra ID integration — managed identities, RBAC, and private endpoints extend existing Azure security to AI workloads
- Cost optimization through model choice — using Phi-4 instead of GPT-4o for simple tasks can save 98%
- No Claude models — if you need Anthropic’s Claude, use AWS Bedrock or the direct Anthropic API
- The naming keeps changing — Azure AI Studio → Azure AI Foundry (late 2024). APIs are the same, branding differs
Related
Section titled “Related”- Cloud AI Platforms Compared — Side-by-side comparison of Azure AI Foundry, Bedrock, and Vertex AI
- AWS Bedrock Deep-Dive — The comparable managed platform for AWS-first organizations
- Google Vertex AI Deep-Dive — The comparable managed platform for GCP-first organizations
- Agentic Design Patterns — The ReAct and orchestration patterns used in Foundry agent workflows
- RAG Architecture — How retrieval-augmented generation works under the hood
- GenAI Engineering Tools — The broader tool ecosystem for GenAI engineers
- Azure vs Bedrock — Head-to-head comparison of Azure AI Foundry and AWS Bedrock
Frequently Asked Questions
What is Azure AI Foundry?
Azure AI Foundry (formerly Azure AI Studio) is Microsoft's unified AI development platform. It provides a model catalog with 1,800+ models from OpenAI, Meta, Mistral, Microsoft, and others, plus tools for prompt flow orchestration, fine-tuning, evaluation, and responsible AI. It replaced the separate Azure OpenAI Service portal as the central hub for all AI development on Azure.
Is Azure AI Foundry the same as Azure OpenAI Service?
No. Azure OpenAI Service is one component within Azure AI Foundry — it provides access to OpenAI models (GPT-4o, o1, DALL-E). Azure AI Foundry is the broader platform that includes Azure OpenAI plus 1,800+ additional models (Phi, Mistral, Llama, Cohere), prompt flow for orchestration, evaluation tools, and responsible AI features. Think of Azure OpenAI as the engine and Azure AI Foundry as the entire vehicle.
How does Azure AI Foundry compare to AWS Bedrock?
Both are managed AI platforms with multi-model catalogs. Azure AI Foundry offers 1,800+ models and deep Microsoft ecosystem integration (Entra ID, SharePoint, Teams). AWS Bedrock offers fewer models but includes Claude (Anthropic) which Azure AI Foundry does not. Choose Azure AI Foundry for Microsoft-first organizations, Bedrock for AWS-first organizations or teams that need Claude.
What models are available in Azure AI Foundry?
Azure AI Foundry's model catalog includes OpenAI models (GPT-4o, GPT-4 Turbo, o1, o3-mini), Microsoft models (Phi-4, Phi-3.5), Meta models (Llama 3.1, Llama 3.2), Mistral models (Mistral Large, Mixtral), Cohere models, and many others — over 1,800 models total. Models are deployed as serverless API endpoints or managed compute endpoints depending on the provider.
How do I get started with Azure AI Foundry?
Create an Azure subscription, then navigate to Azure AI Foundry in the Azure Portal. Create a hub (shared resources like compute and connections) and a project within the hub. Deploy a model from the catalog — start with GPT-4o for general use or Phi-4 for cost-sensitive tasks. Use the playground to test prompts before building prompt flow pipelines for production workloads.
What is Prompt Flow in Azure AI Foundry?
Prompt Flow is Azure AI Foundry's visual and code-based orchestration tool for building LLM pipelines. You define your chain as a directed acyclic graph (DAG) — each node is a step (LLM call, Python function, API call, conditional logic). It is similar to LangGraph but Azure-native with built-in versioning, evaluation, and deployment. Use it when you need multi-step orchestration beyond simple API calls.
Can I use open-source models in Azure AI Foundry?
Yes. The model catalog includes 1,800+ open-source and third-party models including Meta's Llama 3.1 and 3.2, Microsoft's Phi-4 and Phi-3.5, Mistral's models, and many community models. Open-source models are deployed as managed compute endpoints (hourly VM billing) rather than serverless endpoints, giving you full control over the infrastructure and the ability to fine-tune.
How much does Azure AI Foundry cost?
There is no single Foundry subscription fee — you pay for the resources you use. Serverless API models (GPT-4o, Mistral) charge per token. Managed compute models (Llama, Phi) charge per hour of VM time. Azure AI Search for RAG starts at roughly $75/month for the Basic tier. Using Phi-4 instead of GPT-4o for simple classification can save 98% — from $3,750 to $63/month for 1M documents.
What are managed endpoints vs serverless endpoints in Azure AI Foundry?
Serverless API endpoints charge per token with no infrastructure management — best for OpenAI, Mistral, and Cohere models with variable load. Managed compute endpoints charge per hour of VM time and give you control over VM size — best for open-weight models (Llama, Phi) that you want to fine-tune or run on dedicated infrastructure. Both use the same Azure AI Model Inference API.
How do I fine-tune models in Azure AI Foundry?
Azure AI Foundry supports fine-tuning for OpenAI models (GPT-4o, GPT-4o-mini) and open-source models (Llama, Phi). Upload your training data in JSONL format, configure hyperparameters (epochs, learning rate), and run the fine-tuning job on managed compute. The fine-tuned model is deployed as a new endpoint. Fine-tuning is best when you need domain-specific behavior that prompt engineering and RAG cannot achieve.
Last updated: March 2026 | Azure AI Foundry (formerly Azure AI Studio / Azure OpenAI Service)