FinSAH: Our pragmatic BERT-first AI for grounded answers

FinSAH is our production-focused question answering stack combining a fast extractive model (DistilBERT family), lightweight retrieval over your content, and deterministic business rules like FAQs and safety fallbacks. It’s engineered to answer confidently from your data—documents, profiles, and blogs—without hallucination.

Why this approach

Architecture at a glance

  1. Ingestion: We load multiple corpora — profile data, blog posts, legal/disclaimers, and PDFs (e.g., resumes) into simple TypeScript modules. PDFs are chunked into sentences/paragraphs with IDs.
  2. Retrieval: A keyword-overlap scorer picks top-k chunks across corpora. No vector DB required for this scale; the scorer is tuned to prefer dense, entity-rich chunks.
  3. QA: A hosted DistilBERT extractive QA endpoint selects the best span within the retrieved context.
  4. Orchestration: Conversational flow runs stages — FAQ match → general cross-corpus QA → profile-only QA → history summary → curated fallback. We also synthesize from context if QA is uncertain.

Key components in this repo

Training and “FinSAH” tuning

Rather than fine-tuning a large generative model, FinSAH prioritizes high-precision extraction + retrieval and uses business rules for stability. Where domain-specific phrasing matters, we tune:

Evaluation and safety

Deployment

The app runs on Next.js App Router with API routes. QA uses a hosted endpoint (HF-style) configured by environment token, so there’s no native dependency or GPU requirement in your deployment.

When to add a vector index or LLM

For larger corpora or free-form generation, we can add embeddings-based retrieval or a generative LLM layer. FinSAH’s design keeps this upgrade path simple while maintaining today’s cost and reliability.

Try the chatRAG chatbots service