AI Platform — Current State¶
This document describes the as-built Feoda AI assistant as it exists on 2026-04-22. It is the documentation-site assistant deployed in Phase 2 of the platform roadmap. It does not describe the target corporate AI Platform — see target-architecture.md (planned) for that, and subsystems/ for per-subsystem designs.
The corresponding business strategy is governed by the Corporate AI Strategy. All providers used here must appear on the Approved Provider List.
1. Goals and non-goals (of the current build)¶
Goals¶
- G1. Answer staff questions about Feoda's products, processes, clients, and technical knowledge using the documentation in this repository as the single source of truth.
- G2. Cite the exact files used to produce each answer.
- G3. Run on the existing Vercel deployment with no separate backend or vector database.
- G4. Keep retrieval cost at zero or near zero (free Groq tier) and answer cost predictable (Anthropic Claude per-call).
Non-goals (of the current build)¶
- NG1. Multi-tenant client-facing access.
- NG2. Vector embeddings or semantic indexing.
- NG3. Persistent server-side conversation storage. Conversation history lives in the client's
sessionStorageonly. - NG4. Identity-bound or role-aware access. The whole site sits behind a single shared Basic Auth credential.
- NG5. Fine-tuning or custom model training.
These non-goals are deliberate scope choices for the as-built system. They are not statements about the target platform — the target platform must address most of them, and that work belongs in target-architecture.md and the relevant action plans.
2. System overview¶
┌──────────────────────────────────────────┐
│ Vercel project (single) │
│ │
Browser ──▶ │ middleware.js (Basic Auth) │
│ │ │
│ ▼ │
│ Static MkDocs site (HTML/CSS/JS) │
│ │ │
│ │ chat-widget.js (POST) │
│ ▼ │
│ api/chat.js (Vercel Serverless Fn) │
│ ├── walkDocs() reads .md from disk │
│ ├── Groq API ── retrieval ──┐ │
│ ├── keyword fallback ────────┤ │
│ └── Anthropic API ── answering │
│ │
└──────────────────────────────────────────┘
Everything ships in one Vercel deploy. The .md files are present on the function's filesystem because they are part of the deployed bundle.
3. Request flow¶
A single chat turn:
- User types in the widget rendered by assets/js/chat-widget.js.
- Widget POSTs
{ question, history }to/api/chat.historyis the last six turns fromsessionStorage. - api/chat.js builds an annotated tree of all
.mdpaths (excludingmkdocs/,api/,assets/,doc-templates/,_template/,node_modules/,.git/,.github/). - Retrieval — Groq. The function asks Groq (
llama-3.1-8b-instant) which files are relevant. Timeout 8 s. Returns up to 10 file paths. - Retrieval — keyword fallback. A synonym-expanded keyword scorer always runs in parallel and its top results are merged with Groq's. This is the safety net for any Groq failure or ambiguous question.
- The merged set (≤12 paths) is unioned with
ALWAYS_INCLUDE(README.md,company/about-feoda.md,company/glossary.md). - Each file is read from disk and concatenated into a context block.
- Answering — Claude. The function calls Anthropic (
claude-sonnet-4-5) with the system prompt, the prior conversation turns, and the new question with embedded docs. Timeout 45 s. - Claude is instructed to emit a
<!-- SOURCES: ... -->marker listing only the files it actually used. The function strips the marker from the displayed answer and returns{ answer, sources }. - Widget renders the answer (lightweight Markdown) and a collapsed list of source links.
4. Component reference¶
4.1 Frontend widget — assets/js/chat-widget.js¶
- Vanilla JS, injected on every MkDocs page via
extra_javascript(and styled by assets/css/chat-widget.css). - State persisted to
sessionStorageunderfeoda-chat-state(history, last-rendered HTML, open/closed). - "Clear" button resets local history.
- Source links convert
path/to/file.mdto/path/to/file/andREADME.mdto/.
4.2 Serverless handler — api/chat.js¶
- Stateless. No server-side persistence.
- CORS open (
*) — acceptable while the whole site is behind Basic Auth. - Reads files from
process.cwd(). Vercel deploys the repo root, so all.mdfiles are present. - Two outbound HTTP calls per request: Groq + Anthropic. Both have explicit
AbortControllertimeouts. - Logs Groq failures to stderr and continues with the keyword-fallback set.
4.3 Retrieval — Groq¶
- Model:
llama-3.1-8b-instant(free tier, sub-second). - Input: question + annotated file tree (≈2 KB).
- Output: JSON array of paths. The handler regex-extracts the array and strips any annotation tail.
- Cost: $0 within the free tier.
4.4 Answering — Anthropic Claude¶
- Model:
claude-sonnet-4-5. max_tokens: 4000.- System prompt anchors Feoda context (products, clients, tone) and the source-marker contract.
- The whole
historyis replayed each turn. Cost grows roughly linearly with conversation length; the six-turn cap on the client bounds it.
4.5 Auth — middleware.js¶
- Single shared Basic Auth credential covering the entire site (docs and
/api/chat). - No per-user identity. No per-role scoping.
5. Data and trust boundaries¶
| Boundary | Trust | Notes |
|---|---|---|
| Browser → Vercel Edge | Untrusted | Basic Auth gate. CORS open. |
chat-widget.js ↔ api/chat.js |
Untrusted (same origin) | No auth header sent on the API call; the page itself is gated by Basic Auth. |
api/chat.js ↔ Groq |
Trusted by Feoda | Free tier. Internal documentation content is sent in the request. Provider listed on Approved Provider List. |
api/chat.js ↔ Anthropic |
Trusted by Feoda | Documentation content + question + recent history sent. Provider listed on Approved Provider List. |
api/chat.js ↔ filesystem (.md) |
Trusted | Bundled with the deploy. No write paths from the function. |
Data classification. All documentation in this repository is currently treated as Internal. No client PII or live billing data is exposed via this assistant.
No client data egress today. Client folders contain implementation knowledge (configurations, requirements, integration specs). They do not contain student PII, financial records, or live transaction data.
6. Configuration and secrets¶
All secrets live in Vercel project environment variables. Never in code or in this repository.
| Variable | Used by | Source |
|---|---|---|
GROQ_API_KEY |
api/chat.js |
console.groq.com |
ANTHROPIC_API_KEY |
api/chat.js |
console.anthropic.com |
BASIC_AUTH_USER |
middleware.js |
manually set in Vercel |
BASIC_AUTH_PASS |
middleware.js |
manually set in Vercel |
Rotation policy: see Approved Provider List entries (cadence will be set as part of AP-03).
7. Limits, performance, and cost¶
| Dimension | Current |
|---|---|
| Cold-start latency | ~1–2 s (Vercel Node serverless) |
| Retrieval latency (Groq) | < 1 s typical, hard cap 8 s |
| Answer latency (Claude) | 3–10 s typical, hard cap 45 s |
| Files retrieved per turn | ≤ 12 + 3 always-included |
| History per request | 6 most recent turns (client-enforced) |
Claude max_tokens per turn |
4000 |
| Per-question cost | Groq $0 + Claude pay-per-token |
Scaling triggers. Move to embeddings/vector retrieval when any of the following holds: corpus exceeds ~500 files, average context exceeds ~80 KB per request, or Groq retrieval precision degrades below an acceptable threshold (to be defined as Outcome KPI in baselines.md).
8. Observability — current gaps¶
| Signal | State |
|---|---|
| Per-request structured log | Partial |
| Groq vs keyword-fallback hit rate | Missing |
| Files retrieved per question | Missing |
| Question text retention | None |
| Error alerting | Missing |
| Cost per question | Missing |
This section is the input list for the audit-logging work in AP-05. Closing these gaps is part of the target architecture, not the current build.
9. Known limitations of the current build¶
These are statements about what the as-built system does not do today. Resolutions live in target-architecture.md and the relevant action plans.
- No identity. Single shared Basic Auth credential. No per-user logging, no per-user rate limiting, no per-user scope.
- No role-based document scoping. Every authenticated user can ask about every document. Client-specific folders are not isolated from each other.
- CORS is open. Acceptable only because the entire site is gated by Basic Auth.
- No server-side audit trail. Question text, retrieved files, and answers are not persisted anywhere by Feoda. (Provider-side retention is governed by Anthropic and Groq policies.)
- No rate limiting. A misbehaving client can drive Anthropic cost without bound.
- No multi-channel access. Chat is only available inside the Basic-Auth-gated docs site.
- Single-region serverless. No latency or residency engineering.
- No automated evals. Answer quality is not measured against a regression set.
10. Status log¶
| Date | Note |
|---|---|
| 2026-04-22 | Initial as-built description. Future-state content moved out to target-architecture.md (to be authored). |
| --- | |
| title: AI Platform Architecture | |
| description: Architecture of the Feoda Documentation AI Assistant — retrieval, answering, deployment, and roadmap to role-aware multi-channel access. | |
| author: Head of Technology | |
| created: 2026-04-22 | |
| last-updated: 2026-04-22 | |
| status: draft | |
| audience: Technology, Engineering | |
| tags: [architecture, ai, chat, vercel, groq, claude] | |
| owner: Head of Technology | |
| related-documents: | |
| - ../../../ROADMAP.md | |
| - ../../../company/strategy/ai-strategy.md | |
| - ../../../company/strategy/approved-providers.md | |
| - ../../../api/chat.js | |
| - ../../../assets/js/chat-widget.js | |
| --- |
AI Platform Architecture¶
This document is the technical reference for Feoda's internal AI Documentation Assistant. It describes the as-built Phase 2 system, the deployment topology, the security model, and the path to Phase 3 (role-aware access) and Phase 4 (client-facing channels).
The corresponding business strategy is governed by the Corporate AI Strategy. All providers used here must appear on the Approved Provider List.
1. Goals and non-goals¶
Goals¶
- G1. Answer staff questions about Feoda's products, processes, clients, and technical knowledge using the documentation in this repository as the single source of truth.
- G2. Cite the exact files used to produce each answer.
- G3. Run on the existing Vercel deployment with no separate backend or vector database.
- G4. Keep retrieval cost at zero or near zero (free Groq tier) and answer cost predictable (Anthropic Claude per-call).
- G5. Provide a path to role-based document scoping (Phase 3) without re-architecting.
Non-goals (Phase 2)¶
- NG1. Multi-tenant client-facing access. Deferred to Phase 4.
- NG2. Vector embeddings or semantic indexing. Deferred until corpus exceeds token budget.
- NG3. Persistent server-side conversation storage. Conversation history lives in the client's
sessionStorageonly. - NG4. Fine-tuning or custom model training.
2. System overview¶
┌──────────────────────────────────────────┐
│ Vercel project (single) │
│ │
Browser ──▶ │ middleware.js (Basic Auth — Phase 1) │
│ │ │
│ ▼ │
│ Static MkDocs site (HTML/CSS/JS) │
│ │ │
│ │ chat-widget.js (POST) │
│ ▼ │
│ api/chat.js (Vercel Serverless Fn) │
│ ├── walkDocs() reads .md from disk │
│ ├── Groq API ── retrieval ──┐ │
│ ├── keyword fallback ────────┤ │
│ └── Anthropic API ── answering │
│ │
└──────────────────────────────────────────┘
Everything ships in one Vercel deploy. The .md files are present on the function's filesystem because they are part of the deployed bundle.
3. Request flow¶
A single chat turn:
- User types in the widget rendered by assets/js/chat-widget.js.
- Widget POSTs
{ question, history }to/api/chat.historyis the last six turns fromsessionStorage. - api/chat.js builds an annotated tree of all
.mdpaths (excludingmkdocs/,api/,assets/,doc-templates/,_template/,node_modules/,.git/,.github/). - Retrieval — Groq. The function asks Groq (
llama-3.1-8b-instant) which files are relevant. Timeout 8 s. Returns up to 10 file paths. - Retrieval — keyword fallback. A synonym-expanded keyword scorer always runs in parallel and its top results are merged with Groq's. This is the safety net for any Groq failure or ambiguous question.
- The merged set (≤12 paths) is unioned with
ALWAYS_INCLUDE(README.md,company/about-feoda.md,company/glossary.md). - Each file is read from disk and concatenated into a context block.
- Answering — Claude. The function calls Anthropic (
claude-sonnet-4-5) with the system prompt, the prior conversation turns, and the new question with embedded docs. Timeout 45 s. - Claude is instructed to emit a
<!-- SOURCES: ... -->marker listing only the files it actually used. The function strips the marker from the displayed answer and returns{ answer, sources }. - Widget renders the answer (lightweight Markdown) and a collapsed list of source links.
4. Component reference¶
4.1 Frontend widget — assets/js/chat-widget.js¶
- Vanilla JS, injected on every MkDocs page via
extra_javascript(and styled by assets/css/chat-widget.css). - State persisted to
sessionStorageunderfeoda-chat-state(history, last-rendered HTML, open/closed). - "Clear" button resets local history.
- Source links convert
path/to/file.mdto/path/to/file/andREADME.mdto/.
4.2 Serverless handler — api/chat.js¶
- Stateless. No server-side persistence.
- CORS open (
*) — acceptable while the whole site is behind Basic Auth. Tighten to the docs domain when Basic Auth is replaced (Phase 3). - Reads files from
process.cwd(). Vercel deploys the repo root, so all.mdfiles are present. - Two outbound HTTP calls per request: Groq + Anthropic. Both have explicit
AbortControllertimeouts. - Logs Groq failures to stderr and continues with the keyword-fallback set.
4.3 Retrieval — Groq¶
- Model:
llama-3.1-8b-instant(free tier, sub-second). - Input: question + annotated file tree (≈2 KB).
- Output: JSON array of paths. The handler regex-extracts the array and strips any annotation tail.
- Cost: $0 within the free tier.
4.4 Answering — Anthropic Claude¶
- Model:
claude-sonnet-4-5. max_tokens: 4000.- System prompt anchors Feoda context (products, clients, tone) and the source-marker contract.
- The whole
historyis replayed each turn. Cost grows roughly linearly with conversation length; the six-turn cap on the client bounds it.
4.5 Auth — middleware.js¶
- Phase 1 only. Single shared Basic Auth credential.
- Replaced by Clerk in Phase 3 (see §8).
5. Data and trust boundaries¶
| Boundary | Trust | Notes |
|---|---|---|
| Browser → Vercel Edge | Untrusted | Basic Auth gate. CORS to be tightened post-Phase 1. |
chat-widget.js ↔ api/chat.js |
Untrusted (same origin) | No auth header sent today; the page itself is gated by Basic Auth. |
api/chat.js ↔ Groq |
Trusted by Feoda | Free tier. Internal documentation content is sent in the request. Provider must remain on Approved Provider List. |
api/chat.js ↔ Anthropic |
Trusted by Feoda | Documentation content + question + recent history sent. Provider must remain on Approved Provider List. |
api/chat.js ↔ filesystem (.md) |
Trusted | Bundled with the deploy. No write paths from the function. |
Data classification. All documentation in this repository is currently treated as Internal. No client PII or live billing data is exposed via this assistant. Any future expansion of the corpus must be checked against the AI Strategy data-classification rules before being indexed.
No client data egress today. Client folders contain implementation knowledge (configurations, requirements, integration specs). They do not contain student PII, financial records, or live transaction data.
6. Configuration and secrets¶
All secrets live in Vercel project environment variables. Never in code or in this repository.
| Variable | Used by | Source |
|---|---|---|
GROQ_API_KEY |
api/chat.js |
console.groq.com |
ANTHROPIC_API_KEY |
api/chat.js |
console.anthropic.com |
BASIC_AUTH_USER |
middleware.js |
Phase 1 — to be removed |
BASIC_AUTH_PASS |
middleware.js |
Phase 1 — to be removed |
Rotation policy: see Approved Provider List entries (cadence will be set as part of AP-03).
7. Limits, performance, and cost¶
| Dimension | Current |
|---|---|
| Cold-start latency | ~1–2 s (Vercel Node serverless) |
| Retrieval latency (Groq) | < 1 s typical, hard cap 8 s |
| Answer latency (Claude) | 3–10 s typical, hard cap 45 s |
| Files retrieved per turn | ≤ 12 + 3 always-included |
| History per request | 6 most recent turns (client-enforced) |
Claude max_tokens per turn |
4000 |
| Per-question cost | Groq $0 + Claude pay-per-token |
Scaling triggers. Move to embeddings/vector retrieval when any of the following holds: corpus exceeds ~500 files, average context exceeds ~80 KB per request, or Groq retrieval precision degrades below an acceptable threshold (to be defined as Outcome KPI in baselines.md).
8. Phase 3 — role-aware access (planned)¶
Phase 3 replaces Basic Auth with Clerk and makes both the docs site and the AI assistant role-aware.
Required changes¶
- Auth.
middleware.jsis replaced with Clerk Edge middleware. Anonymous access is blocked. - Role propagation. The widget reads the Clerk session and forwards
{ role, client_tag }(or a verified JWT) to/api/chat. - Server-side verification.
api/chat.jsverifies the Clerk session token. Never trust role claims from the request body alone. - Scope filter. Before passing
allFilesinto retrieval, filter by path prefix: admin,staff→ all pathsclient:pymble→clients/pymble/**plus a small allow-listed shared set (e.g.solutions/arm/**overview pages,company/glossary.md)- and similarly per client tag
- Source-link sanitisation. The widget must hide source links the user is not authorised to view (it already only links to docs the user could navigate to, but defence in depth is required when scopes diverge).
- Rate limiting. Per-user rate limits in
api/chat.js(e.g. token-bucket keyed on Clerk user id) to bound cost per identity.
Migration order¶
- Provision Clerk, configure roles, invite staff.
- Deploy Clerk middleware alongside Basic Auth in a feature-flagged branch.
- Switch
api/chat.jsto verify Clerk sessions but keep an allow-all scope filter. - Enable per-role scope filter and validate against test accounts for each client.
- Remove Basic Auth and the legacy env vars.
9. Phase 4 — multi-channel client access (planned)¶
The Phase 2 architecture is reused, with channel adapters added in front of the same api/chat.js:
- WhatsApp. Meta Cloud API webhook → Vercel function that maps the WhatsApp identity to a Feoda client tag, then calls the same answering pipeline with the appropriate scope.
- Telegram. Telegram Bot webhook, same pattern.
- Standalone client chat page. Public-facing route under
/client-chat/*, gated by the same Clerk-issued client identity.
In every channel the scope filter and Approved Provider List rules apply identically. Channel adapters are translation layers, not policy layers.
10. Observability (current gaps)¶
| Signal | State | Action |
|---|---|---|
| Per-request structured log | Partial | Standardise log line format; emit JSON per request to stdout |
| Groq vs keyword-fallback hit rate | Missing | Add counter to log line |
| Files retrieved per question | Missing | Log retrieved set + Claude-claimed sources |
| Question text retention | None | Decide retention policy under AP-05 (Audit Logging) |
| Error alerting | Missing | Wire Vercel logs → email/Slack on 5xx |
| Cost per question | Missing | Aggregate Anthropic usage from API responses |
This section is the input list for the audit-logging work in AP-05.
11. Open questions¶
- Source-link permissions in Phase 3. What is the canonical mapping from Clerk role to allow-listed path prefixes? To be defined alongside Clerk rollout.
- History retention. Is the client-only
sessionStorageretention sufficient for governance, or do we need a server-side audit trail of questions and answers? Resolve under AP-05. - Embedding migration trigger. Define the precise corpus-size and latency thresholds at which we switch to a vector index.
- Provider redundancy. If Anthropic is unavailable, do we degrade to a Groq-only answer or return a "service degraded" message? Default today: error.
- Client-facing channel persona. How much of the system prompt changes when the same backend is exposed via WhatsApp/Telegram? Phase 4 design item.
12. Status log¶
| Date | Note |
|---|---|
| 2026-04-22 | Initial draft. As-built description of Phase 2 plus Phase 3/4 plan. |