Intercom Fin AI

B2B SaaS · Technical PRD · RAG Architecture · AI Evals

Intercom Fin AI:
RAG-Powered Resolution Engine

A 17-page technical PRD for a production-ready multi-tenant RAG system — dual-model routing, 9-point failure analysis, permission-aware retrieval, token cost modeling, and a complete system prompt specification.

TypeTechnical PRD

DomainB2B SaaS / AI Support

TargetSeries B–D SaaS

Informed byReal consulting work

← All Projects

The Problem

B2B SaaS support teams are drowning in fragmented knowledge. Agents switch between Intercom Articles, Notion, Confluence, Slack, admin tools, and macros to answer a single ticket — copying, pasting, hoping they haven't leaked an internal link to a customer.

The Before State

Context-switching across 5+ tools per ticket. Inconsistent answers. Internal links accidentally shared with customers. New agents take months to onboard because knowledge is tribal. The result: slow responses, inconsistent quality, and growing "knowledge debt."

Why RAG — Not Fine-Tuning or Pure Prompting

Approach	Problem
Pure Prompting	Can't fit hundreds of articles in-context. "Lost in the middle" issues degrade answer quality on long prompts.
Fine-Tuning	Bakes knowledge into weights — makes per-tenant isolation, GDPR "right to be forgotten," and rapid policy updates impractical.
RAG	Separates reasoning (LLM) from knowledge (indexed docs + live APIs). Supports instant re-indexing, citation tracking, and permission-aware retrieval per tenant.

Technical Architecture — 7 Layers

Layer	What It Does
1. Data Ingestion	Connectors for Intercom Articles, PDFs, Notion/Confluence, CRM/admin DB, past conversations. Normalize, parse, optional OCR, generate semantic metadata per chunk.
2. Chunking & Embedding	Semantic header-aware recursive chunking (400–800 tokens). Metadata attached: tenant_id, visibility, roles, plan, source_type. Stored in multi-tenant vectorDB with namespace filtering.
3. Query Processing	Safety + intent classification → query refinement → pre-filtered vector query (tenant + visibility + role/plan) → retrieve top 40 → rerank to top 5–10.
4. Dynamic Context	If query needs live data (billing, feature flags), call registered tools: billing API, CRM, admin DB. Inject structured outputs alongside retrieved chunks.
5. LLM Generation	Needle (GPT-4o mini / Claude Haiku) for standard queries. Sword (GPT-4o / Claude Sonnet) triggered on low confidence, multi-doc synthesis, VIP users.
6. Response Post-processing	Validate: hallucination heuristics, policy keywords, missing citations. Render citation UI for end-users; source snippets + macro suggestions for agents.
7. Observability	Structured logs per step: query, chunks, tools used, model version, latency, cost, confidence. LangSmith/Arize integration for pipeline traces.

Dual-Model Strategy: Needle vs Sword

🪡 Needle (Default)

GPT-4o mini · Claude Haiku · Gemini Flash

Intent detection, FAQ answers, short responses. ~1–3K input + 400–800 output tokens. Fast, cheap.

85–90% of all queries resolved here.

⚔️ Sword (On-Demand)

GPT-4o · Claude Sonnet · Gemini Pro

Triggered by: confidence <0.7, multi-doc synthesis, complex policy questions, VIP users. ~3–5K input tokens.

10–15% of queries escalate here.

Token Economics

Target COGS: ≤$0.15/resolution against $0.99 billing = ≥85% gross margin. With semantic caching, prompt caching, and small-to-large routing, modeled Year-1 costs on 50K MAUs / 75K queries/month stay well below revenue.

9-Point RAG Failure Analysis

For each failure point: what breaks, how to detect it, what the PM specifies to fix it.

#	Failure Point	Detection Signal	PM Fix
1	Data Quality / OCR	Unusual embedding norms; "bad answer" feedback clustered by source	Source quality score exposed to AI Ops Manager; mark certain PDFs "human-reviewed only"
2	Chunking Strategy	Golden eval failures on multi-step questions; LLM-Judge scoring low completeness	Tune chunk sizes; "Warning/Note" must stay in same chunk as the procedure it qualifies
3	Embedding Quality	Low recall on golden questions where correct chunk is known	Upgrade embedding model; add synonym/alias dictionaries to metadata
4	Search/Retrieval	Ground-truth chunk not in retrieved set on eval queries	Per-source priority (Articles > PDFs > tickets); filter knobs in AI Ops UI
5	Re-ranking Failures	Offline eval on reranker; A/B test reranking models	Fine-tune reranker on domain-specific pairs; add source authority + recency features
6	Prompt / Augmentation	Hallucination feedback; "unfaithful but plausible" golden eval answers	Hard constraint: "Use only provided snippets; if insufficient, say you cannot answer and escalate"
7	Model Quality	Golden set comparison across models; hallucination rate per model	Switch/upgrade models; adjust Needle→Sword routing thresholds
8	Data Drift	Time-based mismatch: doc last-updated vs. retrieval hit rates; spike in agent edits on specific topics	Near-real-time re-index on article publish webhooks; doc gating before feature rollouts
9	User Behavior	Router flags low clarity / multi-intent; safety guardrails detect prompt injection	Ask clarifying questions on low-confidence intent; split multi-intent queries; reject unsafe with friendly refusal

Permission Architecture

Every chunk carries metadata that enforces access control at retrieval time — not generation time (too late to be safe).

Chunk Metadata Schema

tenant_id · source_type · visibility (public / customer_internal / agent_internal) · required_role(s) · required_plan(s) · locale · version

Every retrieval query includes mandatory pre-filters: WHERE tenant_id = <tenant> AND visibility IN allowed_visibilities AND required_plan IN user_plans

Edge Cases Handled

Visibility change (Internal → Public): Watcher re-indexes permissions within minutes of change
Agent offboarding: SSO (Okta/SAML) revocation immediately removes access — no delay
Plan downgrade: Retrieval narrows automatically; Fin suggests upgrade flow instead of leaking enterprise content

Success Metrics

≥50%Automated Resolution Rate target

≥90%CSAT for AI-handled conversations

≤$0.15Target COGS per resolution

Metric	Type	Definition
Automated Resolution Rate	Business	% conversations resolved without human intervention
CSAT (AI-handled)	Business	Satisfaction score for Fin-resolved tickets — target ≥90%
Precision@K	RAG	Correct chunk in top K retrieved results on golden dataset
Faithfulness Score	RAG	LLM-as-judge 1–5 scale vs. ground-truth answers
Hallucination Rate	RAG	% answers introducing unsupported facts or violating citation rules
Gross Margin	Business	COGS per resolution ≤$0.15 against $0.99 billing = ≥85% GM target

System Prompt Specification

Hard Rules (Excerpt)

You are "Fin," an AI support agent for a multi-tenant SaaS product.

Use only provided snippets and tool outputs. Do not invent or speculate.

If retrieved context is insufficient, outdated, or contradictory — clearly say you are not certain and recommend escalation to a human agent.

Never expose content marked internal-only to end-users — summarize as needed in internal notes for agents only.

Always include citations: after each factual claim, reference the snippet IDs that support it.

5 Example Interactions

Scenario	Query	Fin's Behavior
Simple FAQ	"How do I reset my password?"	Needle model → retrieve public article → 2–3 step answer + citation → no tools needed
Negative Constraint	"How do I set up round-robin for Twitter DMs?"	Explain not supported for Twitter; cite both assignment docs and channel limitation — never invent a workaround
Policy + Dynamic	"My trial ended yesterday, can I extend it?"	Tool call for trial_end_date → retrieve trial policy → personalized yes/no with rationale + citation
Internal SOP	Agent on Legacy plan deprecation issue	Retrieve internal SOP → generate internal note with workaround → propose customer-safe reply (no internal jargon)
Escalation	"There's a bug with the new webhook system"	Retrieve docs → tool check for known incidents → if unclear: gather info, escalate to #support-eng with summarized context + tags

Intercom Fin AI:
RAG-Powered Resolution Engine

The Problem

Why RAG — Not Fine-Tuning or Pure Prompting

Technical Architecture — 7 Layers

Dual-Model Strategy: Needle vs Sword

9-Point RAG Failure Analysis

Permission Architecture

Edge Cases Handled

Success Metrics

System Prompt Specification

5 Example Interactions

Project Snapshot

Artifacts Produced

Tech Stack Referenced

Original Artifacts

Intercom Fin AI:RAG-Powered Resolution Engine

The Problem

Why RAG — Not Fine-Tuning or Pure Prompting

Technical Architecture — 7 Layers

Dual-Model Strategy: Needle vs Sword

9-Point RAG Failure Analysis

Permission Architecture

Edge Cases Handled

Success Metrics

System Prompt Specification

5 Example Interactions

Project Snapshot

Artifacts Produced

Tech Stack Referenced

Original Artifacts

Intercom Fin AI:
RAG-Powered Resolution Engine