Architecture (Vibe Coder Guide)

Deep dive into the RAG pipeline for developers who want to understand how it works.

RAG Pipeline Overview

LaunchChat uses Retrieval Augmented Generation (RAG) to answer questions based on your documentation.

1. Ingestion

Parse → Chunk → Embed → Store

2. Retrieval

Query → Vector Search → Rank

3. Generation

Context → LLM → Answer + Citations

Ingestion Pipeline

1. Parsing

Content is parsed from various sources into plain text:

  • Notion: Block-by-block parsing preserving hierarchy
  • DOCX: Extracted via mammoth.js
  • Markdown: Parsed with remark/unified
  • Website: Crawled and cleaned of navigation/footer

2. Chunking

Text is split into overlapping chunks for optimal retrieval:

{
  targetSize: 400,    // tokens per chunk
  overlap: 50,        // token overlap between chunks
  preserveHeadings: true,  // keep heading context
  minChunkSize: 100   // minimum viable chunk
}

Each chunk preserves its parent heading hierarchy for context.

3. Embedding

Chunks are converted to 1536-dimensional vectors:

Model: text-embedding-3-small
Dimensions: 1536
Provider: OpenAI (via OpenRouter)

4. Storage

Vectors are stored in PostgreSQL with pgvector extension:

-- content_chunks table
id: uuid
knowledge_base_id: uuid
page_id: string
page_title: string
content: text
embedding: vector(1536)
parent_heading: string

Retrieval Strategy

Hybrid Search

We use a two-stage retrieval process:

  1. Vector Search: Cosine similarity using pgvector's <=> operator
  2. Keyword Fallback: If vector results have low similarity, we add keyword-matched chunks

Similarity Scoring

-- Vector similarity query
SELECT *, 1 - (embedding <=> query_embedding) as similarity
FROM content_chunks
WHERE knowledge_base_id = $1
ORDER BY embedding <=> query_embedding
LIMIT 5

Answer Generation

Confidence Scoring

Before generating, we calculate a confidence score:

confidence = bestSimilarity + (hasMultipleChunks ? 0.1 : 0) + 0.2
// Capped at 1.0

if (confidence < threshold) {
  return refusalMessage;  // Don't hallucinate
}

Citation Extraction

The LLM is instructed to use [Source N] format. We parse these and link to original pages:

// Extract citations from answer
const citationPattern = /\[Source (\d+)\]/g;
const matches = answer.matchAll(citationPattern);

// Map to original pages
citations = matches.map(m => chunks[m[1] - 1])

Documentation Best Practices

Structure your docs for optimal AI retrieval:

Do

  • Use clear, descriptive headings
  • Keep sections focused on one topic
  • Include examples and code snippets
  • Define terms and acronyms
  • Update docs when features change

Avoid

  • Very long pages without structure
  • Duplicate content across pages
  • Outdated or contradictory info
  • Heavy use of images without alt text
  • Navigation-only pages

AI Prompt Template

Copy this prompt into Cursor, Windsurf, or Claude Code to help integrate LaunchChat:

I'm integrating LaunchChat, an AI-powered support widget.

Widget Setup:
1. Add to HTML: <script>window.LaunchChatConfig = {widgetId: "ID"}</script>
   <script src="https://domain.com/widget.js" async></script>

2. For React/Next.js, create a client component that:
   - Sets window.LaunchChatConfig
   - Dynamically loads widget.js
   - Cleans up on unmount

API Reference:
- window.LaunchChatWidget.open() - Open chat
- window.LaunchChatWidget.close() - Close chat
- window.LaunchChatWidget.on(event, callback) - Listen to events
- Events: 'open', 'close', 'message', 'escalate', 'feedback'

Help me integrate this into my [FRAMEWORK] app.