Building Multi-Tenant AI Chatbots with Vercel AI SDK and Gemini

January 18, 2026

English

The chatbot worked perfectly — for exactly one tenant.

I was 14 hours into a hackathon, running on caffeine and the naive assumption that building a multi-tenant AI chatbot platform was mostly an LLM problem. I had a single chatbot responding to user messages with Gemini, streaming tokens beautifully, calling tools on command. Then I duplicated the config for a second tenant, and everything fell apart.

The second chatbot responded with the first chatbot's personality. Tool calls routed to the wrong org's backend. The system prompt was hardcoded in one place and I hadn't thought about what "multi-tenant" actually means when your entire application state lives inside a prompt string.

That hackathon forced me to solve the hardest problem in building AI chatbot platforms: managing multiple contexts — deciding what gets injected into the LLM per chatbot, per request, with different persona and tool configurations for each tenant. I ended up winning the hackathon with this solution. Here's how it works.

The Real Problem: Context Management

The LLM itself is the easy part. Gemini handles the language. The hard problem is everything around it — specifically, how do you serve dozens of different chatbot personas from a single codebase, where each chatbot has its own system prompt, its own set of enabled tools, its own personality, and its own domain knowledge?

Early days of the Vercel AI SDK, there weren't many frameworks or patterns for this. Most tutorials showed you how to build a single chatbot. Nobody was writing about what happens when you need fifty of them running on the same infrastructure with completely different behaviors.

I spent the first few hours of the hackathon trying to figure out how to handle different persona and tool configurations for what was essentially an AI agent. The SDK was brand new, documentation was thin in places, and I was improvising.

The Architecture

The solution I landed on: dynamic slug routing. Each org gets a unique slug ID. When a request hits the API, the slug determines everything — which system prompt to build, which tools to enable, which persona to use.

Loading diagram...

Each chatbot is not a separate model or deployment. A dynamic system prompt is constructed from the org's stored config on every request. The slug in the URL is the key to everything.

Dynamic Slug Routing and Context Injection

When a user opens a chatbot, the slug ID in the URL determines which org config to fetch. That config contains everything: the bot's name, personality, domain knowledge, enabled tools, and behavioral rules.

export async function POST(
  req: Request,
  { params }: { params: { slug: string } }
) {
  const orgConfig = await getOrgConfig(params.slug);
  if (!orgConfig) return new Response('Not found', { status: 404 });

  const { messages } = await req.json();

  const result = streamText({
    model: google('gemini-1.5-pro'),
    system: buildSystemPrompt(orgConfig),
    messages,
    tools: buildToolSet(orgConfig),
    maxSteps: 3,
  });

  return result.toDataStreamResponse();
}

The buildToolSet function is where the per-tenant tool configuration happens. Each org picks which tools they want enabled — an event company gets book_ticket, a restaurant gets reserve_table, a university gets check_admission_status. Same platform, completely different capabilities per slug.

export function buildSystemPrompt(org: OrgConfig): string {
  const toolList = org.enabledTools
    .map((t) => `- ${t.name}: ${t.description}`)
    .join('\n');

  return `You are ${org.botName}, a helpful assistant for ${org.entityName}.

## About ${org.entityName}
${org.description}

## Personality
${org.persona}

## Available Tools
${toolList}

## Rules
- Never reveal you are powered by an AI model
- Stay in character as ${org.botName} at all times
- Only use tools when intent is explicit and confirmed
- If you can't help: direct users to ${org.contactEmail}`.trim();
}

The Caching Problem

The first version fetched the org config from MongoDB on every single request. During the hackathon demo, with judges hitting the chatbot repeatedly, I noticed response latency creeping up. The system prompt build itself is cheap, but the database round-trip on every message adds up — especially when the same org's config hasn't changed in hours.

The fix: a simple in-memory cache with a TTL.

const configCache = new Map<string, { data: OrgConfig; expiry: number }>();
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes

export async function getOrgConfig(slug: string): Promise<OrgConfig | null> {
  const cached = configCache.get(slug);
  if (cached && cached.expiry > Date.now()) {
    return cached.data;
  }

  const config = await db.collection('orgs').findOne({ slug });
  if (!config) return null;

  configCache.set(slug, {
    data: config as OrgConfig,
    expiry: Date.now() + CACHE_TTL,
  });

  return config as OrgConfig;
}

This was a hackathon, so in-memory caching was fine. In production, you'd use Redis or a similar distributed cache — especially if you're running multiple serverless instances that don't share memory.

The 5-minute TTL means if an org updates their config, it takes at most 5 minutes to propagate. For a hackathon demo, that's instant enough. The latency improvement was immediately noticeable — subsequent messages in the same conversation felt snappier because the config was already cached.

Tool Orchestration

Each tool is Zod-typed and returns structured objects. Never throw from a tool — if it throws, the stream crashes.

export function buildToolSet(org: OrgConfig) {
  const toolMap: Record<string, any> = {
    book_ticket: tool({
      description: 'Book a ticket for an event',
      parameters: z.object({
        eventId: z.string(),
        quantity: z.number().min(1).max(10),
        attendeeName: z.string(),
        attendeeEmail: z.string().email(),
      }),
      execute: async (params) => {
        try {
          return await bookTicket({ orgId: org._id, ...params });
        } catch {
          return { success: false, error: 'Booking failed. Try again.' };
        }
      },
    }),
    // ... other tools
  };

  // Only return tools this org has enabled
  return Object.fromEntries(
    org.enabledTools
      .filter((t) => toolMap[t.name])
      .map((t) => [t.name, toolMap[t.name]])
  );
}

Loading diagram...

The maxSteps: 3 parameter is critical and poorly documented. Without it, Gemini calls a tool and stops — the response is just the raw tool invocation with no synthesis. With it, the chain completes: call tool, read result, respond to the user in natural language. I lost an hour to this during the hackathon before finding it buried in the Vercel AI SDK source.

What I Learned

This was a dev-only hackathon project — it never saw real users. But building it taught me things I wouldn't have learned from a tutorial.

The multi-context problem is the real problem. Everyone focuses on prompt engineering for a single chatbot. The moment you have two tenants with different tools and personas, you're solving a routing and configuration problem, not an AI problem. The slug-based dynamic context injection pattern is simple, but I didn't find it written up anywhere at the time.

Cache aggressively, invalidate simply. The org config doesn't change often. Fetching it from the database on every message is wasteful. A TTL-based cache solved the latency issue with minimal complexity. Start with the simplest caching strategy that works and complicate it only when you have evidence you need to.

The Vercel AI SDK was the right bet. When I built this, the SDK had just launched and the ecosystem was thin. But the abstractions — streamText, typed tools, maxSteps for multi-step execution — let me build the entire tool orchestration layer in hours instead of days. The SDK's model-agnostic design meant swapping between providers during development was a one-line change.

Hackathons force architectural decisions. With unlimited time, I would have over-engineered this — probably built some elaborate plugin system for tools, maybe a custom DSL for system prompts. The time constraint forced me to pick the simplest pattern that could support multiple tenants: a slug, a database lookup, and a prompt builder function. That simplicity is why it worked.

If I were building this for production today, I'd add Redis caching, per-org rate limiting, rolling conversation summaries for long chats, and proper usage tracking with a background queue. But the core pattern — dynamic slug routing with per-tenant context injection — wouldn't change. That's the part I got right on the first try.

Posted ondevwith tags:

#nextjs #ai #typescript #gemini