Architecture (Vibe Coder Guide)
Deep dive into the RAG pipeline for developers who want to understand how it works.
RAG Pipeline Overview
LaunchChat uses Retrieval Augmented Generation (RAG) to answer questions based on your documentation.
1. Ingestion
Parse → Chunk → Embed → Store
2. Retrieval
Query → Vector Search → Rank
3. Generation
Context → LLM → Answer + Citations
Ingestion Pipeline
1. Parsing
Content is parsed from various sources into plain text:
- Notion: Block-by-block parsing preserving hierarchy
- DOCX: Extracted via mammoth.js
- Markdown: Parsed with remark/unified
- Website: Crawled and cleaned of navigation/footer
2. Chunking
Text is split into overlapping chunks for optimal retrieval:
{
targetSize: 400, // tokens per chunk
overlap: 50, // token overlap between chunks
preserveHeadings: true, // keep heading context
minChunkSize: 100 // minimum viable chunk
}Each chunk preserves its parent heading hierarchy for context.
3. Embedding
Chunks are converted to 1536-dimensional vectors:
Model: text-embedding-3-small
Dimensions: 1536
Provider: OpenAI (via OpenRouter)4. Storage
Vectors are stored in PostgreSQL with pgvector extension:
-- content_chunks table
id: uuid
knowledge_base_id: uuid
page_id: string
page_title: string
content: text
embedding: vector(1536)
parent_heading: stringRetrieval Strategy
Hybrid Search
We use a two-stage retrieval process:
- Vector Search: Cosine similarity using pgvector's <=> operator
- Keyword Fallback: If vector results have low similarity, we add keyword-matched chunks
Similarity Scoring
-- Vector similarity query
SELECT *, 1 - (embedding <=> query_embedding) as similarity
FROM content_chunks
WHERE knowledge_base_id = $1
ORDER BY embedding <=> query_embedding
LIMIT 5Answer Generation
Confidence Scoring
Before generating, we calculate a confidence score:
confidence = bestSimilarity + (hasMultipleChunks ? 0.1 : 0) + 0.2
// Capped at 1.0
if (confidence < threshold) {
return refusalMessage; // Don't hallucinate
}Citation Extraction
The LLM is instructed to use [Source N] format. We parse these and link to original pages:
// Extract citations from answer
const citationPattern = /\[Source (\d+)\]/g;
const matches = answer.matchAll(citationPattern);
// Map to original pages
citations = matches.map(m => chunks[m[1] - 1])Documentation Best Practices
Structure your docs for optimal AI retrieval:
Do
- Use clear, descriptive headings
- Keep sections focused on one topic
- Include examples and code snippets
- Define terms and acronyms
- Update docs when features change
Avoid
- Very long pages without structure
- Duplicate content across pages
- Outdated or contradictory info
- Heavy use of images without alt text
- Navigation-only pages
AI Prompt Template
Copy this prompt into Cursor, Windsurf, or Claude Code to help integrate LaunchChat:
I'm integrating LaunchChat, an AI-powered support widget.
Widget Setup:
1. Add to HTML: <script>window.LaunchChatConfig = {widgetId: "ID"}</script>
<script src="https://domain.com/widget.js" async></script>
2. For React/Next.js, create a client component that:
- Sets window.LaunchChatConfig
- Dynamically loads widget.js
- Cleans up on unmount
API Reference:
- window.LaunchChatWidget.open() - Open chat
- window.LaunchChatWidget.close() - Close chat
- window.LaunchChatWidget.on(event, callback) - Listen to events
- Events: 'open', 'close', 'message', 'escalate', 'feedback'
Help me integrate this into my [FRAMEWORK] app.