Build an Agentic RAG App for Under £25 a Month with Azure SQL and OpenAI
The Challenge
Every AI demo I see starts with "just spin up a vector database, add a search index, deploy an embedding service, configure an orchestrator..." and by the time you've provisioned the infrastructure, you've spent more on setup than the prototype is worth.
The result is a gap between teams who can experiment with AI and teams who can't. Large enterprises with dedicated platform teams build RAG pipelines. Everyone else reads blog posts about it.
For startups, small teams, and developers just trying to validate an idea, the barrier isn't knowledge — it's cost and complexity. You shouldn't need five Azure services and a Kubernetes cluster to ask natural language questions over a dataset.
What's Changed
Davide Mauri from the Azure SQL team has built a reference architecture that runs an agentic RAG solution on nearly zero budget. The full stack — backend, frontend, AI inference — costs under £25 a month at 500 queries per day. At lower volumes, it's effectively free.
Here's the architecture:
Azure SQL (free tier) handles everything data-related: storing documents, metadata, vectors, and running hybrid search (vector + full-text with reciprocal rank fusion). The stored procedure connects directly to Azure OpenAI, does the embedding, runs the semantic search, and orchestrates the entire RAG pattern — all inside the database.
Data API Builder on Azure Container Apps (free tier) exposes the stored procedures as REST or GraphQL endpoints automatically. No custom backend code, no drivers, no middleware. Authentication, authorisation, pagination, and caching are handled out of the box.
Azure Static Web Apps (free tier) hosts a React frontend that calls the API. It integrates with Entra ID for auth and deploys from GitHub on every push.
Azure OpenAI is the only paid component. At roughly 5,000 tokens per end-to-end query using GPT-4o, 500 queries per day stays within a modest budget.
Why this matters
The clever bit is using Azure SQL as the orchestration layer. Instead of chaining together LangChain, a vector database, an embedding service, and an LLM API, the stored procedure does all of it: vectorise the query, determine whether a SQL query or semantic search is needed, run hybrid search with RRF reranking, send context to GPT-4o, and return structured results. One stored procedure. One database call from the frontend.
This isn't a toy. The reference implementation powers the actual Azure SQL AI samples website, with 60+ code samples searchable via natural language. It handles queries like "find samples using Semantic Kernel built by David" — understanding intent, running the appropriate search strategy, and explaining why each result is relevant.
Getting Started
The full source code is available at aka.ms/budgetbytes. Here's the practical setup:
-
Provision Azure SQL free tier. You get 32GB storage with full vector search, JSON support, and stored procedure capabilities at no cost. This is your data layer and your AI orchestration layer in one.
-
Deploy Data API Builder on Azure Container Apps. The free tier gives you 180,000 vCPU-seconds and 2 million requests per month. Data API Builder is stateless and container-native — it reads your database schema and exposes endpoints automatically.
-
Create a Static Web App. Point it at your GitHub repo. The free tier covers custom domains and Entra ID integration. The React frontend calls the REST endpoints exposed by Data API Builder.
-
Connect Azure OpenAI. You'll need an embedding model and a completion model (GPT-4o works well). The stored procedure handles the API calls directly from T-SQL.
-
Push and go. Both Static Web Apps and Container Apps integrate with GitHub Actions. Every
git pushdeploys your updates.
One practical tip: start by adapting the stored procedure to your own dataset. The AI orchestration logic — query classification, hybrid search, LLM prompting — is reusable. Swap in your own documents and metadata, adjust the structured output schema, and you've got a working agentic RAG app.
What This Means
The "agentic" part of this architecture is worth calling out. The stored procedure doesn't just do retrieval. It reasons about the query, decides the search strategy, and synthesises an answer with explanations. That's agent behaviour, running inside a database engine, with no external orchestration framework.
For teams evaluating RAG, this is a strong starting point because it eliminates the infrastructure decisions that usually slow things down. You don't need to choose between Pinecone and Qdrant. You don't need to configure LangChain chains. The database handles it.
And the cost story is compelling for proof-of-concept work. If your prototype validates, the same architecture scales — Azure SQL moves to Hyperscale, Container Apps scales horizontally, and OpenAI handles millions of requests. No architectural rework needed.
The honest caveat: running AI orchestration inside stored procedures isn't how most teams think about application architecture. It trades the flexibility of a Python orchestration layer for simplicity and cost efficiency. For complex multi-agent workflows, you'll outgrow this pattern. But for single-purpose RAG over structured data, it's hard to beat on the cost-to-value ratio.
Leon Godwin, Principal Cloud Evangelist at Cloud Direct