FinSAH: Our pragmatic BERT-first AI for grounded answers
FinSAH is our production-focused question answering stack combining a fast extractive model (DistilBERT family), lightweight retrieval over your content, and deterministic business rules like FAQs and safety fallbacks. It’s engineered to answer confidently from your data—documents, profiles, and blogs—without hallucination.
Why this approach
- Grounded: Answers are extracted from your content; no free-form hallucinations.
- Deterministic: FAQs and rules ensure consistent output for common questions.
- Portable: No heavy GPU inference in your stack; we call a hosted QA endpoint.
- Economical: Small model + simple retrieval keeps latency and cost low.
Architecture at a glance
- Ingestion: We load multiple corpora — profile data, blog posts, legal/disclaimers, and PDFs (e.g., resumes) into simple TypeScript modules. PDFs are chunked into sentences/paragraphs with IDs.
- Retrieval: A keyword-overlap scorer picks top-k chunks across corpora. No vector DB required for this scale; the scorer is tuned to prefer dense, entity-rich chunks.
- QA: A hosted DistilBERT extractive QA endpoint selects the best span within the retrieved context.
- Orchestration: Conversational flow runs stages — FAQ match → general cross-corpus QA → profile-only QA → history summary → curated fallback. We also synthesize from context if QA is uncertain.
Key components in this repo
- Retrieval and QA:
src/utils/bert.ts— retrieval helpers, prompt builders, and answerWithBert with context synthesis. - Conversational API:
src/app/api/conversational-chat/route.ts— runs the multi-stage pipeline and returns sources + stage. - Profile chat API:
src/app/api/profile-chat/route.ts— focuses on profile corpus with regex fallbacks for phone/email. - FAQ data:
src/data/faqs.ts— curated, scored matcher for deterministic answers. - PDF ingestion:
scripts/ingest-pdf.*→src/data/alinewCorpus.ts.
Training and “FinSAH” tuning
Rather than fine-tuning a large generative model, FinSAH prioritizes high-precision extraction + retrieval and uses business rules for stability. Where domain-specific phrasing matters, we tune:
- Corpus curation: normalize/clean content; ensure key entities appear in chunks.
- Retriever weights: favor proper nouns, emails, phones, and section titles.
- FAQ coverage: capture recurring asks with canonical answers and versioning.
- Fallback synthesis: if extraction yields low confidence, generate concise summaries from retrieved text.
Evaluation and safety
- Smoke tests:
scripts/smoke-tests.jsfor FAQ, general, and profile Q&A. - “No I don’t know”: replace uncertainty with grounded, useful summaries.
- Source reporting: API includes sources and stage to audit answers.
Deployment
The app runs on Next.js App Router with API routes. QA uses a hosted endpoint (HF-style) configured by environment token, so there’s no native dependency or GPU requirement in your deployment.
When to add a vector index or LLM
For larger corpora or free-form generation, we can add embeddings-based retrieval or a generative LLM layer. FinSAH’s design keeps this upgrade path simple while maintaining today’s cost and reliability.