The Problem
Education businesses — coaching institutes, training centers, placement agencies — face a specific kind of sales problem. Leads arrive from many channels: a website chatbot at 2 AM, a WhatsApp message asking about fees, a Facebook Lead Ad form submission. Each prospect asks roughly the same questions. What courses are available? What does it cost? Is there placement support?
The response window is narrow. A prospective student who doesn't hear back within minutes moves on. Most teams we spoke to were managing this with spreadsheets, personal WhatsApp accounts, and manual follow-ups. Leads slipped between channels. There was no shared conversation history, no automated triage, and no visibility into which counsellor was handling what.
We wanted to build a system that solved this end-to-end: capture the lead from any channel, respond immediately with accurate, institution-specific information, collect callback details when needed, and hand off to a human when the conversation required it.
The Build
The backend is a FastAPI application organized into domain-driven modules — leads, chat, WhatsApp, campaigns, knowledge base, automations — each with its own router, service layer, and models. PostgreSQL handles relational data. Qdrant stores vector embeddings for the RAG pipeline. Redis serves triple duty: response caching, circuit breaker state, and Celery message broker. The React frontend is a Vite-built dashboard using Tailwind CSS, served through Nginx as a reverse proxy.
The conversation engine is built as a LangGraph state machine. When a message arrives — from the website chatbot or WhatsApp — it enters a graph with eight nodes: context building, intent classification, greeting, AI-powered Q&A, callback collection, escalation, lead closure, and a wait state for gibberish or empty inputs. GPT-4o-mini classifies the user's intent and routes to the appropriate node. A fast-path optimization bypasses the LLM entirely when the lead is already mid-callback-collection, routing directly to the callback handler based on database state rather than waiting for a classification round-trip.
The RAG pipeline uses dual-source retrieval. Course-specific queries — fees, duration, eligibility — pull from a structured catalog stored in PostgreSQL. This data is authoritative, fast, and always current. Broader questions (career advice, industry context) hit Qdrant's vector store, populated by documents the institution uploads through the dashboard. A query classifier, built with regex fast-paths and an LLM fallback, decides which source to use. Both results are combined into a context window and passed to GPT-4o-mini for generation. The system also normalizes Hinglish input — common in Indian education markets — before classification.
WhatsApp integration runs through Meta's Cloud API. Inbound messages hit a webhook endpoint that verifies HMAC-SHA256 signatures, deduplicates via Redis to prevent double-processing, resolves the workspace from the phone number, and triggers the conversation graph. Outbound messaging supports individual replies and bulk campaigns through pre-approved Meta templates, dispatched asynchronously via Celery workers. Facebook Lead Ads follow a similar webhook pattern: form submissions are signature-verified, mapped to CRM fields through a configurable field-mapping layer, and routed through the assignment rules engine.
Multi-tenancy is enforced through a workspace_id carried in every JWT. A dependency-injected WorkspaceContext filters every database query and every Qdrant vector retrieval. There is no shared state between tenants. The RAG knowledge base is versioned per workspace, enabling zero-downtime updates when an institution refreshes their course material — the active version pointer swaps atomically once new embeddings are ready.
Key Engineering Decisions
A few decisions shaped the system's reliability in production:
Redis-backed circuit breaker. The breaker state lives in Redis, not application memory. This means it is shared across the API server and Celery workers and survives container restarts. When OpenAI or Meta's API starts failing, the breaker opens after five consecutive errors, blocks requests for 60 seconds (30 for WhatsApp), then allows a single test request through. If it succeeds, traffic resumes. This prevents cascading failures from overwhelming a struggling upstream dependency.
Fail-fast on cache. If Redis fails to connect at startup in production, the application crashes intentionally. Docker restarts the container. Without caching, the database would be hammered under concurrent load — we chose a loud failure over silent degradation.
Conversation logs as JSONB, not a messages table. Chat history is stored as a capped JSON array (last 20 messages) directly on the lead record. This avoids a separate table and the joins that come with it. For short, transactional conversations — a student asking about a course, not a months-long support thread — this trade-off holds well. It also simplifies the context window assembly for the LLM: one read, not a join.
Celery for document processing. PDFs and Word documents uploaded for RAG are parsed, chunked into ~512-token segments, embedded via OpenAI, and upserted into Qdrant — all in a Celery worker outside the FastAPI event loop. The worker runs with task_acks_late and a prefetch multiplier of one, guaranteeing delivery without overcommitting memory. This keeps the API responsive even during large uploads.
Phone-based deduplication. Leads are matched on normalized last-10-digit phone numbers, not names or emails. This handles format variations (+91, 0-prefixed, raw digits) cheaply and prevents duplicate records when the same person inquires through the chatbot and then messages on WhatsApp.
The Outcome
The platform handles lead capture, AI-driven response, and counsellor handoff as a single flow. Inquiries that previously went unanswered overnight now get an immediate, context-aware reply drawn from the institution's actual course catalog and uploaded documents — not generic filler. Counsellors see a unified timeline of every interaction across website, WhatsApp, and Facebook, with leads automatically distributed through configurable assignment rules.
The entire stack — PostgreSQL, Qdrant, Redis, FastAPI, Celery, and the React frontend — runs on a single VPS behind Nginx, orchestrated with Docker Compose. Five containers, one machine, strict tenant isolation. It is not a complex infrastructure story. It is a focused one.
The system is designed to be the kind of tool we wanted to exist for small-to-mid education businesses: opinionated enough to work out of the box, flexible enough to handle multiple institutions with different course catalogs and conversation styles, and honest enough to escalate to a human when the AI doesn't have the answer.