Skip to content

AI Platform — Current State

This document describes the as-built Feoda AI assistant as it exists on 2026-04-22. It is the documentation-site assistant deployed in Phase 2 of the platform roadmap. It does not describe the target corporate AI Platform — see target-architecture.md (planned) for that, and subsystems/ for per-subsystem designs.

The corresponding business strategy is governed by the Corporate AI Strategy. All providers used here must appear on the Approved Provider List.


1. Goals and non-goals (of the current build)

Goals

  • G1. Answer staff questions about Feoda's products, processes, clients, and technical knowledge using the documentation in this repository as the single source of truth.
  • G2. Cite the exact files used to produce each answer.
  • G3. Run on the existing Vercel deployment with no separate backend or vector database.
  • G4. Keep retrieval cost at zero or near zero (free Groq tier) and answer cost predictable (Anthropic Claude per-call).

Non-goals (of the current build)

  • NG1. Multi-tenant client-facing access.
  • NG2. Vector embeddings or semantic indexing.
  • NG3. Persistent server-side conversation storage. Conversation history lives in the client's sessionStorage only.
  • NG4. Identity-bound or role-aware access. The whole site sits behind a single shared Basic Auth credential.
  • NG5. Fine-tuning or custom model training.

These non-goals are deliberate scope choices for the as-built system. They are not statements about the target platform — the target platform must address most of them, and that work belongs in target-architecture.md and the relevant action plans.


2. System overview

                ┌──────────────────────────────────────────┐
                │        Vercel project (single)           │
                │                                          │
  Browser  ──▶  │  middleware.js (Basic Auth)              │
                │           │                              │
                │           ▼                              │
                │  Static MkDocs site (HTML/CSS/JS)        │
                │           │                              │
                │           │ chat-widget.js (POST)        │
                │           ▼                              │
                │  api/chat.js  (Vercel Serverless Fn)     │
                │     ├── walkDocs() reads .md from disk   │
                │     ├── Groq API  ── retrieval ──┐       │
                │     ├── keyword fallback ────────┤       │
                │     └── Anthropic API ── answering        │
                │                                          │
                └──────────────────────────────────────────┘

Everything ships in one Vercel deploy. The .md files are present on the function's filesystem because they are part of the deployed bundle.


3. Request flow

A single chat turn:

  1. User types in the widget rendered by assets/js/chat-widget.js.
  2. Widget POSTs { question, history } to /api/chat. history is the last six turns from sessionStorage.
  3. api/chat.js builds an annotated tree of all .md paths (excluding mkdocs/, api/, assets/, doc-templates/, _template/, node_modules/, .git/, .github/).
  4. Retrieval — Groq. The function asks Groq (llama-3.1-8b-instant) which files are relevant. Timeout 8 s. Returns up to 10 file paths.
  5. Retrieval — keyword fallback. A synonym-expanded keyword scorer always runs in parallel and its top results are merged with Groq's. This is the safety net for any Groq failure or ambiguous question.
  6. The merged set (≤12 paths) is unioned with ALWAYS_INCLUDE (README.md, company/about-feoda.md, company/glossary.md).
  7. Each file is read from disk and concatenated into a context block.
  8. Answering — Claude. The function calls Anthropic (claude-sonnet-4-5) with the system prompt, the prior conversation turns, and the new question with embedded docs. Timeout 45 s.
  9. Claude is instructed to emit a <!-- SOURCES: ... --> marker listing only the files it actually used. The function strips the marker from the displayed answer and returns { answer, sources }.
  10. Widget renders the answer (lightweight Markdown) and a collapsed list of source links.

4. Component reference

4.1 Frontend widget — assets/js/chat-widget.js

  • Vanilla JS, injected on every MkDocs page via extra_javascript (and styled by assets/css/chat-widget.css).
  • State persisted to sessionStorage under feoda-chat-state (history, last-rendered HTML, open/closed).
  • "Clear" button resets local history.
  • Source links convert path/to/file.md to /path/to/file/ and README.md to /.

4.2 Serverless handler — api/chat.js

  • Stateless. No server-side persistence.
  • CORS open (*) — acceptable while the whole site is behind Basic Auth.
  • Reads files from process.cwd(). Vercel deploys the repo root, so all .md files are present.
  • Two outbound HTTP calls per request: Groq + Anthropic. Both have explicit AbortController timeouts.
  • Logs Groq failures to stderr and continues with the keyword-fallback set.

4.3 Retrieval — Groq

  • Model: llama-3.1-8b-instant (free tier, sub-second).
  • Input: question + annotated file tree (≈2 KB).
  • Output: JSON array of paths. The handler regex-extracts the array and strips any annotation tail.
  • Cost: $0 within the free tier.

4.4 Answering — Anthropic Claude

  • Model: claude-sonnet-4-5.
  • max_tokens: 4000.
  • System prompt anchors Feoda context (products, clients, tone) and the source-marker contract.
  • The whole history is replayed each turn. Cost grows roughly linearly with conversation length; the six-turn cap on the client bounds it.

4.5 Auth — middleware.js

  • Single shared Basic Auth credential covering the entire site (docs and /api/chat).
  • No per-user identity. No per-role scoping.

5. Data and trust boundaries

Boundary Trust Notes
Browser → Vercel Edge Untrusted Basic Auth gate. CORS open.
chat-widget.jsapi/chat.js Untrusted (same origin) No auth header sent on the API call; the page itself is gated by Basic Auth.
api/chat.js ↔ Groq Trusted by Feoda Free tier. Internal documentation content is sent in the request. Provider listed on Approved Provider List.
api/chat.js ↔ Anthropic Trusted by Feoda Documentation content + question + recent history sent. Provider listed on Approved Provider List.
api/chat.js ↔ filesystem (.md) Trusted Bundled with the deploy. No write paths from the function.

Data classification. All documentation in this repository is currently treated as Internal. No client PII or live billing data is exposed via this assistant.

No client data egress today. Client folders contain implementation knowledge (configurations, requirements, integration specs). They do not contain student PII, financial records, or live transaction data.


6. Configuration and secrets

All secrets live in Vercel project environment variables. Never in code or in this repository.

Variable Used by Source
GROQ_API_KEY api/chat.js console.groq.com
ANTHROPIC_API_KEY api/chat.js console.anthropic.com
BASIC_AUTH_USER middleware.js manually set in Vercel
BASIC_AUTH_PASS middleware.js manually set in Vercel

Rotation policy: see Approved Provider List entries (cadence will be set as part of AP-03).


7. Limits, performance, and cost

Dimension Current
Cold-start latency ~1–2 s (Vercel Node serverless)
Retrieval latency (Groq) < 1 s typical, hard cap 8 s
Answer latency (Claude) 3–10 s typical, hard cap 45 s
Files retrieved per turn ≤ 12 + 3 always-included
History per request 6 most recent turns (client-enforced)
Claude max_tokens per turn 4000
Per-question cost Groq $0 + Claude pay-per-token

Scaling triggers. Move to embeddings/vector retrieval when any of the following holds: corpus exceeds ~500 files, average context exceeds ~80 KB per request, or Groq retrieval precision degrades below an acceptable threshold (to be defined as Outcome KPI in baselines.md).


8. Observability — current gaps

Signal State
Per-request structured log Partial
Groq vs keyword-fallback hit rate Missing
Files retrieved per question Missing
Question text retention None
Error alerting Missing
Cost per question Missing

This section is the input list for the audit-logging work in AP-05. Closing these gaps is part of the target architecture, not the current build.


9. Known limitations of the current build

These are statements about what the as-built system does not do today. Resolutions live in target-architecture.md and the relevant action plans.

  1. No identity. Single shared Basic Auth credential. No per-user logging, no per-user rate limiting, no per-user scope.
  2. No role-based document scoping. Every authenticated user can ask about every document. Client-specific folders are not isolated from each other.
  3. CORS is open. Acceptable only because the entire site is gated by Basic Auth.
  4. No server-side audit trail. Question text, retrieved files, and answers are not persisted anywhere by Feoda. (Provider-side retention is governed by Anthropic and Groq policies.)
  5. No rate limiting. A misbehaving client can drive Anthropic cost without bound.
  6. No multi-channel access. Chat is only available inside the Basic-Auth-gated docs site.
  7. Single-region serverless. No latency or residency engineering.
  8. No automated evals. Answer quality is not measured against a regression set.

10. Status log

Date Note
2026-04-22 Initial as-built description. Future-state content moved out to target-architecture.md (to be authored).
---
title: AI Platform Architecture
description: Architecture of the Feoda Documentation AI Assistant — retrieval, answering, deployment, and roadmap to role-aware multi-channel access.
author: Head of Technology
created: 2026-04-22
last-updated: 2026-04-22
status: draft
audience: Technology, Engineering
tags: [architecture, ai, chat, vercel, groq, claude]
owner: Head of Technology
related-documents:
- ../../../ROADMAP.md
- ../../../company/strategy/ai-strategy.md
- ../../../company/strategy/approved-providers.md
- ../../../api/chat.js
- ../../../assets/js/chat-widget.js
---

AI Platform Architecture

This document is the technical reference for Feoda's internal AI Documentation Assistant. It describes the as-built Phase 2 system, the deployment topology, the security model, and the path to Phase 3 (role-aware access) and Phase 4 (client-facing channels).

The corresponding business strategy is governed by the Corporate AI Strategy. All providers used here must appear on the Approved Provider List.


1. Goals and non-goals

Goals

  • G1. Answer staff questions about Feoda's products, processes, clients, and technical knowledge using the documentation in this repository as the single source of truth.
  • G2. Cite the exact files used to produce each answer.
  • G3. Run on the existing Vercel deployment with no separate backend or vector database.
  • G4. Keep retrieval cost at zero or near zero (free Groq tier) and answer cost predictable (Anthropic Claude per-call).
  • G5. Provide a path to role-based document scoping (Phase 3) without re-architecting.

Non-goals (Phase 2)

  • NG1. Multi-tenant client-facing access. Deferred to Phase 4.
  • NG2. Vector embeddings or semantic indexing. Deferred until corpus exceeds token budget.
  • NG3. Persistent server-side conversation storage. Conversation history lives in the client's sessionStorage only.
  • NG4. Fine-tuning or custom model training.

2. System overview

                ┌──────────────────────────────────────────┐
                │        Vercel project (single)           │
                │                                          │
  Browser  ──▶  │  middleware.js (Basic Auth — Phase 1)    │
                │           │                              │
                │           ▼                              │
                │  Static MkDocs site (HTML/CSS/JS)        │
                │           │                              │
                │           │ chat-widget.js (POST)        │
                │           ▼                              │
                │  api/chat.js  (Vercel Serverless Fn)     │
                │     ├── walkDocs() reads .md from disk   │
                │     ├── Groq API  ── retrieval ──┐       │
                │     ├── keyword fallback ────────┤       │
                │     └── Anthropic API ── answering        │
                │                                          │
                └──────────────────────────────────────────┘

Everything ships in one Vercel deploy. The .md files are present on the function's filesystem because they are part of the deployed bundle.


3. Request flow

A single chat turn:

  1. User types in the widget rendered by assets/js/chat-widget.js.
  2. Widget POSTs { question, history } to /api/chat. history is the last six turns from sessionStorage.
  3. api/chat.js builds an annotated tree of all .md paths (excluding mkdocs/, api/, assets/, doc-templates/, _template/, node_modules/, .git/, .github/).
  4. Retrieval — Groq. The function asks Groq (llama-3.1-8b-instant) which files are relevant. Timeout 8 s. Returns up to 10 file paths.
  5. Retrieval — keyword fallback. A synonym-expanded keyword scorer always runs in parallel and its top results are merged with Groq's. This is the safety net for any Groq failure or ambiguous question.
  6. The merged set (≤12 paths) is unioned with ALWAYS_INCLUDE (README.md, company/about-feoda.md, company/glossary.md).
  7. Each file is read from disk and concatenated into a context block.
  8. Answering — Claude. The function calls Anthropic (claude-sonnet-4-5) with the system prompt, the prior conversation turns, and the new question with embedded docs. Timeout 45 s.
  9. Claude is instructed to emit a <!-- SOURCES: ... --> marker listing only the files it actually used. The function strips the marker from the displayed answer and returns { answer, sources }.
  10. Widget renders the answer (lightweight Markdown) and a collapsed list of source links.

4. Component reference

4.1 Frontend widget — assets/js/chat-widget.js

  • Vanilla JS, injected on every MkDocs page via extra_javascript (and styled by assets/css/chat-widget.css).
  • State persisted to sessionStorage under feoda-chat-state (history, last-rendered HTML, open/closed).
  • "Clear" button resets local history.
  • Source links convert path/to/file.md to /path/to/file/ and README.md to /.

4.2 Serverless handler — api/chat.js

  • Stateless. No server-side persistence.
  • CORS open (*) — acceptable while the whole site is behind Basic Auth. Tighten to the docs domain when Basic Auth is replaced (Phase 3).
  • Reads files from process.cwd(). Vercel deploys the repo root, so all .md files are present.
  • Two outbound HTTP calls per request: Groq + Anthropic. Both have explicit AbortController timeouts.
  • Logs Groq failures to stderr and continues with the keyword-fallback set.

4.3 Retrieval — Groq

  • Model: llama-3.1-8b-instant (free tier, sub-second).
  • Input: question + annotated file tree (≈2 KB).
  • Output: JSON array of paths. The handler regex-extracts the array and strips any annotation tail.
  • Cost: $0 within the free tier.

4.4 Answering — Anthropic Claude

  • Model: claude-sonnet-4-5.
  • max_tokens: 4000.
  • System prompt anchors Feoda context (products, clients, tone) and the source-marker contract.
  • The whole history is replayed each turn. Cost grows roughly linearly with conversation length; the six-turn cap on the client bounds it.

4.5 Auth — middleware.js

  • Phase 1 only. Single shared Basic Auth credential.
  • Replaced by Clerk in Phase 3 (see §8).

5. Data and trust boundaries

Boundary Trust Notes
Browser → Vercel Edge Untrusted Basic Auth gate. CORS to be tightened post-Phase 1.
chat-widget.jsapi/chat.js Untrusted (same origin) No auth header sent today; the page itself is gated by Basic Auth.
api/chat.js ↔ Groq Trusted by Feoda Free tier. Internal documentation content is sent in the request. Provider must remain on Approved Provider List.
api/chat.js ↔ Anthropic Trusted by Feoda Documentation content + question + recent history sent. Provider must remain on Approved Provider List.
api/chat.js ↔ filesystem (.md) Trusted Bundled with the deploy. No write paths from the function.

Data classification. All documentation in this repository is currently treated as Internal. No client PII or live billing data is exposed via this assistant. Any future expansion of the corpus must be checked against the AI Strategy data-classification rules before being indexed.

No client data egress today. Client folders contain implementation knowledge (configurations, requirements, integration specs). They do not contain student PII, financial records, or live transaction data.


6. Configuration and secrets

All secrets live in Vercel project environment variables. Never in code or in this repository.

Variable Used by Source
GROQ_API_KEY api/chat.js console.groq.com
ANTHROPIC_API_KEY api/chat.js console.anthropic.com
BASIC_AUTH_USER middleware.js Phase 1 — to be removed
BASIC_AUTH_PASS middleware.js Phase 1 — to be removed

Rotation policy: see Approved Provider List entries (cadence will be set as part of AP-03).


7. Limits, performance, and cost

Dimension Current
Cold-start latency ~1–2 s (Vercel Node serverless)
Retrieval latency (Groq) < 1 s typical, hard cap 8 s
Answer latency (Claude) 3–10 s typical, hard cap 45 s
Files retrieved per turn ≤ 12 + 3 always-included
History per request 6 most recent turns (client-enforced)
Claude max_tokens per turn 4000
Per-question cost Groq $0 + Claude pay-per-token

Scaling triggers. Move to embeddings/vector retrieval when any of the following holds: corpus exceeds ~500 files, average context exceeds ~80 KB per request, or Groq retrieval precision degrades below an acceptable threshold (to be defined as Outcome KPI in baselines.md).


8. Phase 3 — role-aware access (planned)

Phase 3 replaces Basic Auth with Clerk and makes both the docs site and the AI assistant role-aware.

Required changes

  1. Auth. middleware.js is replaced with Clerk Edge middleware. Anonymous access is blocked.
  2. Role propagation. The widget reads the Clerk session and forwards { role, client_tag } (or a verified JWT) to /api/chat.
  3. Server-side verification. api/chat.js verifies the Clerk session token. Never trust role claims from the request body alone.
  4. Scope filter. Before passing allFiles into retrieval, filter by path prefix:
  5. admin, staff → all paths
  6. client:pymbleclients/pymble/** plus a small allow-listed shared set (e.g. solutions/arm/** overview pages, company/glossary.md)
  7. and similarly per client tag
  8. Source-link sanitisation. The widget must hide source links the user is not authorised to view (it already only links to docs the user could navigate to, but defence in depth is required when scopes diverge).
  9. Rate limiting. Per-user rate limits in api/chat.js (e.g. token-bucket keyed on Clerk user id) to bound cost per identity.

Migration order

  1. Provision Clerk, configure roles, invite staff.
  2. Deploy Clerk middleware alongside Basic Auth in a feature-flagged branch.
  3. Switch api/chat.js to verify Clerk sessions but keep an allow-all scope filter.
  4. Enable per-role scope filter and validate against test accounts for each client.
  5. Remove Basic Auth and the legacy env vars.

9. Phase 4 — multi-channel client access (planned)

The Phase 2 architecture is reused, with channel adapters added in front of the same api/chat.js:

  • WhatsApp. Meta Cloud API webhook → Vercel function that maps the WhatsApp identity to a Feoda client tag, then calls the same answering pipeline with the appropriate scope.
  • Telegram. Telegram Bot webhook, same pattern.
  • Standalone client chat page. Public-facing route under /client-chat/*, gated by the same Clerk-issued client identity.

In every channel the scope filter and Approved Provider List rules apply identically. Channel adapters are translation layers, not policy layers.


10. Observability (current gaps)

Signal State Action
Per-request structured log Partial Standardise log line format; emit JSON per request to stdout
Groq vs keyword-fallback hit rate Missing Add counter to log line
Files retrieved per question Missing Log retrieved set + Claude-claimed sources
Question text retention None Decide retention policy under AP-05 (Audit Logging)
Error alerting Missing Wire Vercel logs → email/Slack on 5xx
Cost per question Missing Aggregate Anthropic usage from API responses

This section is the input list for the audit-logging work in AP-05.


11. Open questions

  1. Source-link permissions in Phase 3. What is the canonical mapping from Clerk role to allow-listed path prefixes? To be defined alongside Clerk rollout.
  2. History retention. Is the client-only sessionStorage retention sufficient for governance, or do we need a server-side audit trail of questions and answers? Resolve under AP-05.
  3. Embedding migration trigger. Define the precise corpus-size and latency thresholds at which we switch to a vector index.
  4. Provider redundancy. If Anthropic is unavailable, do we degrade to a Groq-only answer or return a "service degraded" message? Default today: error.
  5. Client-facing channel persona. How much of the system prompt changes when the same backend is exposed via WhatsApp/Telegram? Phase 4 design item.

12. Status log

Date Note
2026-04-22 Initial draft. As-built description of Phase 2 plus Phase 3/4 plan.