AI Platform — Target Architecture¶
This document defines the target-state corporate Feoda AI Platform. It is the technical realisation of the Corporate AI Strategy commitment to embed AI at Level 4 maturity across all Feoda departments and to extend AI capability to clients through Feoda-hosted channels.
The target architecture is delivered through vertical slices (V1–V5+) — each slice is shippable end-to-end and adds one new capability while reusing all foundations. The first slice (V1) migrates the existing documentation assistant — currently described in current-state.md — onto the platform foundations defined here, validating the whole stack with the lowest-risk subsystem.
Status. This is a draft target. It reflects decisions taken on 2026-04-23 between the Head of Technology and the AI assistant supporting the platform design. Open questions are listed in §13. Specific vendor and product choices may evolve as the first vertical slice is built; substantive changes will be tracked in §14.
Revision — 2026-04-28. Vendor choices for the gateway, vector index, and source connectors were re-evaluated against the Corporate AI Strategy governance requirements and the principle that the strategy fixes goals and governance, not vendors. The revision swaps Portkey → Cloudflare AI Gateway + a thin policy Worker, Pinecone Assistant → Cloudflare Vectorize + D1 FTS5 + reranker-via-gateway, and Nango → native connectors (Nango retained as future option). The agentic answering pattern (single orchestrator with a tool catalog) is added as the default in §5.6, replacing one-shot retrieval. Identity, audit, ACL, PII, allowlist, rate-limit, and spend-cap commitments are unchanged — see §14 for the full diff and rationale.
1. Goals and non-goals¶
Goals¶
- G1. Provide a single, governed AI platform that any Feoda subsystem (channel or workflow) can consume.
- G2. Enforce policy in code, not in process: the Approved Provider List is technically enforced at the gateway; non-listed providers cannot be called.
- G3. Identity-bind every AI call to a real principal so that audit, rate-limiting, and access scoping are possible.
- G4. Isolate client data at the retrieval layer: no client tenant can see another client's documents through the AI, even via prompt injection.
- G5. Stay cost-efficient. Default to managed services with free tiers for MVP; only pay when a service crosses scale that justifies it.
- G6. Keep the platform portable: subsystem code must run unchanged on Cloudflare Workers, on a small always-on box, or on AWS later if cost or residency demands a move.
- G7. Deliver vertically. Each milestone is a thin end-to-end slice that proves the foundations, not a horizontal foundation built in isolation.
Non-goals¶
- NG1. Building any of the platform foundations in-house when a credible managed alternative exists.
- NG2. Mandating data residency. UAE PDPL, KSA PDPL, and Australian APPs all permit US-based AI providers under DPA + consent + PII-minimisation. Residency engineering is reserved for specific client contractual demands.
- NG3. Embedding AI inside the ARM/ERP/EPM products themselves. Product-embedded AI is governed by separate product-architecture documents.
- NG4. Per-client model fine-tuning. The platform uses prompt overlays and RAG ACLs for per-client behaviour; fine-tuning is reserved for the rare case where prompt + retrieval cannot achieve the outcome.
2. System overview¶
┌──────────────────────────── Identity ────────────────────────────┐
│ T1 Feoda staff → Microsoft Entra ID (M365 SSO, free) │
│ T2 Client staff → Federated SSO / Clerk email+password │
│ T3 End-users → Clerk OTP / verified channel ID │
└──────────────────────────────────────────────────────────────────┘
│ all normalise to one principal
▼
┌──────────────────────── Channels & Workflows ────────────────────┐
│ Channels (inbound): Docs widget · Teams bot · Web widget · │
│ In-product chat (ARM) · Telegram · WApp │
│ Workflows (event-driven outbound): │
│ Code review · SDR · Support triage · │
│ Reporting · Digest │
└──────────────────────────────────────────────────────────────────┘
│ every AI call
▼
┌──── Policy Worker (Cloudflare) + AI Gateway (Cloudflare) ────────┐
│ Policy Worker (in-house, ~one file): │
│ • Hard provider/model allowlist (Approved Provider List) │
│ • PII redaction in transit (UAE Emirates ID, KSA national ID, │
│ AU TFN, emails, phones, PANs) │
│ • Attaches principal_id as gateway rate-limit / budget key │
│ • Emits structured audit event │
│ AI Gateway (managed): │
│ • Per-principal rate limits and spend caps │
│ • Provider failover, response cache │
│ • Logpush → Axiom │
└──────────────────────────────────────────────────────────────────┘
│
┌───────────────┼────────────────┐
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────┐
│ Models │ │ Retrieval │ │ Tools │
│ Anthropic│ │ Vectorize │ │ Native │
│ direct + │ │ + D1 FTS5 │ │ first │
│ OpenRouter│ │ + reranker │ │ (GitHub │
│ (failover│ │ │ │ Action, │
│ + bulk) │ │ ACL by │ │ NetSuite│
│ │ │ client_tag │ │ RESTlet,│
│ Claude │ │ + audience │ │ M365 │
│ Sonnet │ │ + namespace│ │ Graph) │
│ default │ │ per tenant │ │ Nango as │
│ │ │ │ │ fallback │
└──────────┘ └────────────┘ └──────────┘
▲
│ ingestion (event-driven sync)
┌───────────────┼────────────────┐
│ │ │
GitHub repo SharePoint / NetSuite /
(this docs) Drive / Gmail ARM DB / CRM
(source of (source of (source of
truth) truth) truth)
Three principles are non-negotiable in this picture:
- Single gateway. Every AI call from every subsystem goes through the Policy Worker → AI Gateway path. There is no second path.
- Source-of-truth stays put. Documents live in GitHub, SharePoint, NetSuite, etc. Vectorize (and D1 FTS5) is an index over them, not a competing store. Authority always resolves back to the original system.
- Identity is mandatory. No anonymous AI call exists in the platform. Every call carries a verified principal.
3. Identity layer¶
3.1 Three user tiers¶
| Tier | Population | Identity provider | Auth methods | Default scope |
|---|---|---|---|---|
| T1 Feoda staff | Internal Feoda employees | Microsoft Entra ID (bundled with our M365 subscription, free) | M365 SSO, MFA via Entra | All internal docs, all client docs (subject to role) |
| T2 Client staff | School/university staff (Pymble finance, Saint Edwards admin, Al Faisal IT, etc.) | Federated SSO where the client has Entra B2B / Google Workspace SAML; Clerk email+password as fallback | SSO if available, else email+password+MFA | Their own client tenant only |
| T3 Client end-users | Parents, students, families | Clerk for web (email+password / OTP / magic link); verified channel identity for messaging (WhatsApp number, Telegram ID) | Per channel | Their own records only — never another family's data |
All three tiers normalise to a single internal principal:
principal {
id: surrogate UUID // never a name or email
tier: T1 | T2 | T3
role: admin | staff | client_admin | client_staff | end_user | …
client_tag: null | "pymble" | "saint-edwards" | "al-faisal" | …
identifier: opaque pointer back to the identity provider's user record
}
Audit logs and the gateway only ever see principal.id and client_tag. Resolving an id to a real name or email requires a separate, audited operation in the support tool — see §8.4.
3.2 Why Entra for staff and Clerk for clients¶
Entra is already paid for through our M365 subscription, supports SAML/OIDC, and gives staff one-click SSO using the credentials they already use for Outlook and SharePoint. There is no reason to layer a second IdP for internal users. Clerk is reserved for client-facing tiers because (a) we cannot put external users in the Feoda Entra tenant without polluting it, and (b) Clerk has cleaner support for OTP, magic links, and social login at zero cost up to 50k MAU.
Two separate Clerk applications are provisioned: one for T2 (client staff portal) and one for T3 (end-user channels), so the surfaces are clearly separated and either can evolve independently.
3.3 Roles and permissions¶
Roles are defined and assigned in the platform's own admin UI, not in the IdPs. The IdP only authenticates; the platform authorises. This keeps role evolution decoupled from identity vendor changes.
Initial role catalogue (extensible by the admin):
admin— Feoda Head of Technology and a deputy. Full platform configuration access.staff:engineering,staff:sales,staff:support,staff:delivery,staff:finance,staff:hr,staff:executive— department roles for T1.client_admin:<slug>,client_staff:<slug>— per-client roles for T2.end_user:<slug>— per-client end-user role for T3.
Role-to-scope mapping (which docs each role can ask the AI about) is enforced at the retrieval layer through Pinecone metadata filters — see §5.4.
4. AI gateway¶
4.1 Choice: Cloudflare AI Gateway + in-house Policy Worker¶
The gateway is split into two cooperating components, both running on Cloudflare:
- Policy Worker — a small in-house Cloudflare Worker (~1 file, no external dep) that every subsystem calls. It enforces the Approved Provider/model allowlist, runs PII redaction on the prompt, attaches the
principal_idas the rate-limit/budget key, and emits the structured audit event. This is where the strategy's policy-in-code commitment lives. - Cloudflare AI Gateway (managed) — sits behind the Policy Worker. Provides per-key rate limits, per-key spend caps, provider failover, response cache, and Logpush to Axiom. Upstreams configured: Anthropic (default), OpenRouter (failover + bulk/cheap models), Cohere or Voyage (reranker only).
No subsystem code calls a model provider directly; all calls are proxied through the Policy Worker → AI Gateway path.
Why this combination instead of Portkey, or going direct, or AWS Bedrock / Vertex AI:
- vs Portkey (our 2026-04-23 choice, now superseded — see §14): Cloudflare AI Gateway covers rate limits, spend caps, failover, cache, and audit natively at $0 within our scale. Portkey's two unique features (hard allowlist enforcement and PII redaction) are not better than ~100 lines of Worker code we'd need anyway, since Portkey's regex set is mediocre for UAE/KSA national IDs. Removing Portkey saves $49/mo and one third-party governance dependency.
- vs direct provider SDKs: every subsystem would need its own keys, its own rate-limit logic, its own audit, its own PII handling. Policy becomes process, not code.
- vs AWS Bedrock / Vertex: vendor lock-in, opaque pricing, fewer models, region restrictions. The marginal capability gain does not outweigh the lock-in cost for our scale.
- vs LiteLLM self-hosted: free in software, adds an always-on service to operate. Reserved as a future option if AI Gateway pricing or capability becomes limiting.
Provider note: Cloudflare, OpenRouter, and the chosen reranker (Cohere or Voyage) are not yet on the Approved Provider List (approved-providers.md) — AP-03 is in progress. They must complete AP-03 evaluation (DPA, residency review, Strategy Council ratification) before production traffic.
4.2 Hard provider/model allowlist¶
The Approved Provider List is mirrored as a hardcoded constant in the Policy Worker (and as the configured upstream set in the AI Gateway — defence in depth). A subsystem that requests a model whose provider is not on the list receives a 403 from the Policy Worker before any upstream call is made. Listing a new provider requires:
- Provider added to approved-providers.md via the AI Strategy governance process (AP-03).
- DPA signed.
- Configuration applied to Policy Worker (code change, code-reviewed) and AI Gateway (config change) by an
adminrole.
There is no runtime override. The check is in code, not config that a subsystem can pass through.
4.3 Per-principal rate limits and spend caps¶
The Policy Worker attaches principal.id as the AI Gateway rate-limit and budget key on every request. The AI Gateway applies:
- Request rate: e.g. 60 requests/minute for staff, 10 requests/minute for end-users (tunable per role).
- Spend cap: a daily token-cost ceiling per principal, per role. When breached, requests return
429and an alert fires (Axiom monitor → email/Teams).
Defaults are conservative; an admin can lift caps for known power users via a config-only change.
4.4 PII redaction in transit¶
The Policy Worker runs a regex-based PII pass on every outgoing prompt. The pattern set is curated for our jurisdictions and is more complete than any third-party gateway provides today:
- UAE Emirates ID (15-digit format with checksum)
- KSA national ID and Iqama number
- Australian TFN, ABN, Medicare number
- India Aadhaar (planned market)
- Email addresses, international phone numbers, payment card numbers (Luhn-validated), IBANs
Matches are replaced with stable placeholders ([EMIRATES_ID_1], [EMAIL_1] …) before the prompt leaves the Worker. The original is not stored — the redacted form is what reaches the upstream model and what appears in audit logs. This keeps the platform on the safe side of PDPL/APP regardless of what a subsystem accidentally tries to send.
The PII redaction rule does not replace the broader PII policy in §10. It is a defence-in-depth control.
4.5 Policy Worker contract¶
Every subsystem calls the Policy Worker with a single POST /v1/chat (and /v1/embed, /v1/rerank) request shape:
{
principal: { id, tier, role, client_tag },
subsystem: "docs-assistant" | "code-review" | …,
request_id: <uuid>,
model: "claude-sonnet-4-5" | …,
messages: [...],
tools?: [...]
}
The Worker verifies the signed principal token, checks the model is on the allowlist, redacts PII, forwards to AI Gateway with the principal binding, and on response emits the audit event before returning the result to the caller. A subsystem cannot bypass any of these steps — there is no second path to the AI Gateway.
5. Retrieval (RAG)¶
5.1 Choice: Cloudflare Vectorize + D1 FTS5 + reranker-via-gateway¶
Retrieval is composed from three Cloudflare-native layers:
- Cloudflare Vectorize — dense vector index. Per-tenant namespaces give hard isolation; metadata filters give per-document ACL. Pricing: $0.04 per million queried vectors, $0.05 per GB-month. The MVP corpus is well inside the free allowance.
- Cloudflare D1 (SQLite) FTS5 — keyword/lexical index. Built into D1 at no extra cost. Provides BM25 scoring for the sparse leg of hybrid search.
- Reranker (Cohere Rerank or Voyage) — called via the AI Gateway as one of its upstreams. Used to re-score the fused top-N candidates before sending to the answering model.
The full retrieval pipeline (§5.7) runs Vectorize and D1 FTS5 in parallel, fuses results with Reciprocal Rank Fusion in the Worker, and reranks the top candidates. This matches Pinecone Assistant's hybrid + rerank quality while keeping everything on Cloudflare and removing one paid vendor.
Why this combination instead of Pinecone Assistant (our 2026-04-23 choice, now superseded — see §14):
- Same governance fit. Per-tenant namespaces + metadata filters cover ACL identically (§5.4).
- Same retrieval quality. Hybrid (dense + sparse) and reranking are achieved through composition rather than a single product. The agent loop already does multi-step tool calls, so adding rerank as a step is natural.
- One fewer vendor. Pinecone is a separate provider with its own DPA, residency profile, and AP-03 entry. Vectorize lives under Cloudflare's existing footprint.
- Cost. Vectorize is materially cheaper than Pinecone's $50/mo paid tier and stays in free territory longer.
Why not the alternatives that were rejected on 2026-04-23 (still rejected for the same reasons):
- vs DIY (Qdrant + pgvector): operational burden does not scale to a small team.
- vs Bedrock Knowledge Bases: vendor lock-in, opaque pricing, weaker per-document ACL.
- vs OpenAI File Search: no row-level ACL.
- vs Vertex AI RAG Engine: ties retrieval to GCP.
- vs Cohere Compass: enterprise pricing not friendly to small teams.
Provider note: Cloudflare and the chosen reranker need to clear AP-03 before production (see §4.1).
5.2 Source-of-truth stays in the originating system¶
Vectorize and D1 FTS5 are indices, not stores of record. For every source:
| Source | Source of truth | Sync mechanism | Refresh trigger |
|---|---|---|---|
| Documentation | This GitHub repo | GitHub Action → Cloudflare Queue → ingest Worker | On push to main |
| Microsoft 365 (SharePoint, Drive) | M365 tenant | Microsoft Graph subscription → ingest Worker | Webhook on file change, hourly poll fallback |
| NetSuite (records, saved searches) | NetSuite tenant | RESTlet → ingest Worker | Saved-search trigger or hourly poll |
| Code repositories | GitHub | GitHub Action → ingest | On push to main |
| Support tickets | Helpdesk (TBD) | Webhook → ingest Worker | Webhook |
| CRM | TBD | Webhook → ingest Worker | Webhook |
| Meeting transcripts | This GitHub repo (meetings/) |
Same as docs | On push |
Connectors are written natively against each source (Microsoft Graph, NetSuite RESTlet, GitHub webhooks). Nango is retained as a fallback option for any source where the native build cost exceeds a working day; see §6.
If a sync goes wrong, the source of truth is intact and both indices are rebuildable from scratch. The platform never edits the source.
5.3 One Vectorize index per environment, namespace per tenant, metadata for filters¶
One Vectorize index per environment (dev, staging, prod). Within each index, a Vectorize namespace per tenant (internal, pymble, saint-edwards, al-faisal) gives hard isolation: a query into the pymble namespace cannot return a vector from saint-edwards regardless of metadata filter mistakes. Cross-tenant content (e.g. shared solution docs) lives in internal.
Per-document metadata carries:
{
source: "docs" | "sharepoint" | "netsuite" | "github" | …
source_path: original location (URL or path)
client_tag: null | "pymble" | "saint-edwards" | "al-faisal" | …
audience: "internal" | "client_staff" | "end_user"
classification: "public" | "internal" | "confidential" | "restricted"
tier_required: "T1" | "T2" | "T3"
doc_type: "sprint-retro" | "design" | "meeting" | "requirement" | …
last_synced: ISO timestamp
last_modified: ISO timestamp (from source)
}
The corresponding D1 FTS5 row carries the same metadata as columns so the same ACL filter applies on the lexical leg. Per-query metadata filters enforce ACL — see §5.4.
5.4 ACL at retrieval time (not at the answer layer)¶
The model never sees a document the principal is not authorised to read. The Policy Worker derives a (namespace_set, metadata_filter) pair from the principal and passes it to every retrieval tool call (Vectorize, D1 FTS5, and any future structured tool):
T1 admin→ all namespaces, no metadata filter (sees everything).T1 staff:<dept>→ all namespaces;audience IN [internal, client_staff, end_user] AND classification != restricted(restricted requires explicit role grant).T2 client_staff:pymble→ namespaces[internal, pymble];client_tag IN [null, pymble] AND audience IN [client_staff, end_user] AND classification != restricted.T3 end_user:pymble→ namespace[pymble]only;client_tag = pymble AND audience = end_user.
Filtering is enforced at the index level (namespace selection in Vectorize, WHERE clause in D1) — not in post-processing. This is the same rule used in Phase 3 of the documentation site, generalised.
5.5 Smart retrieval, with the docs-assistant pattern as a fallback¶
The original "ask an LLM which files are relevant" pattern (see current-state.md §3) remains useful for small corpora and degraded-mode operation. If Vectorize is down or returns insufficient context, the agent loop (§5.6) falls back to the file-tree pattern restricted to the GitHub-sourced subset using its list_dir and read_file tools. Subsystems consume "retrieved context", not "the result of a vector search".
5.6 Agentic answering: agent loop and tool catalog¶
The answering model is run as an agent, not as a one-shot RAG pipeline. The model receives the question + a tool catalog and decides which tools to call, in what order, until it has enough evidence to answer. This is the same pattern used by Claude Code, Cursor, and Perplexity Pro, and it materially improves retrieval accuracy on questions where simple semantic similarity returns the wrong document type (e.g. "what's the status of the standalone ARM project?" — a sprint-retro question that semantic search alone tends to answer with design docs).
The tool catalog (V1):
| Tool | Purpose | ACL |
|---|---|---|
list_dir(path) |
List files under a path in the indexed corpus | Filter applied to results |
read_file(path, line_range?) |
Read a specific document or section | Permission check per file before read |
grep(pattern, path?) |
Exact-string / regex search | Filter applied to results |
semantic_search(query, doc_type?, top_k=10) |
Hybrid vector + lexical search (§5.7) | Namespace + metadata filter from principal |
get_recent(folder, since?, limit=20) |
Most recently modified docs in a folder; for currency-sensitive questions | Filter applied |
list_meetings(since?, topic?) |
Lookup of meeting extractions | Filter applied |
Later tool additions (V2+): netsuite_query, ticket_search, web_search, remember, recall. Each new capability is a new tool, not a new architecture.
Every tool call goes through the Policy Worker (audit, ACL, rate limits apply per call). The agent loop is bounded by max-iterations (e.g. 8) and max-cost (per-principal spend cap from §4.3) — runaway loops self-terminate.
5.7 Hybrid retrieval pipeline (inside semantic_search)¶
query + principal
│
├──► Vectorize (dense) ──► top_k=20 candidates ─┐
│ (namespace + metadata filter) │
│ ├──► RRF fusion ──► top_k=15
├──► D1 FTS5 (BM25) ──► top_k=20 candidates ──┘ │
│ (same WHERE clause for ACL) ▼
│ Reranker (Cohere/Voyage)
│ via AI Gateway
│ │
│ ▼
│ top_k=5 reranked passages
All three calls (Vectorize, D1, reranker) inherit the same principal-derived ACL constraint. The fused top-K is reranked, then returned to the agent. The agent decides whether the evidence is sufficient or whether to call another tool.
6. Source connectors¶
6.1 Choice: native connectors first, Nango as fallback¶
Connectors are built natively against each source. The MVP set is small (GitHub, Microsoft 365, NetSuite) and each has a stable, well-documented API:
- GitHub — GitHub Actions on push to
mainpost to a Cloudflare Queue; the ingest Worker consumes from the queue. ~50 lines. - Microsoft 365 (SharePoint, OneDrive, Outlook) — Microsoft Graph change-notification subscriptions → ingest Worker webhook. The Worker uses an app-level token (no per-user OAuth at this stage). Falls back to hourly delta poll if a subscription expires.
- NetSuite — a thin SuiteScript RESTlet plus a saved-search trigger calls the ingest Worker on record change. The Worker uses NetSuite token-based auth (TBA) at the platform level.
- GitHub webhooks for code review (V2) — handled the same way as documentation ingest.
Nango is retained as a fallback option for any future source where the native build cost exceeds a working day (e.g. a long-tail CRM, a niche helpdesk). The platform's connector interface is defined first; whether the implementation is native or Nango is an internal detail.
Why native first instead of Nango (our 2026-04-23 choice, now superseded — see §14): for the three sources that matter in V1–V3, native connectors are quick to build, have no per-vendor cost, and avoid an additional vendor dependency that has to clear AP-03 before production. Nango's value compounds only as the connector count grows past ~5; we add it then.
6.2 Source authentication pattern¶
Every source authenticates the platform to itself, not the user to the source. This is intentional: the platform is the data processor; ACL is enforced inside the platform, not by relying on the source's per-user permissions. This keeps retrieval fast and consistent across sources, and lets us support sources (NetSuite saved searches, ticket systems) where per-user OAuth would be impractical.
For sources that hold user-level secrets (e.g. a user's personal Drive), we add per-user OAuth with the user's consent recorded at first use — built natively or via Nango as appropriate.
6.3 Ingestion pipeline¶
Source change
│
▼
Native connector (or Nango) emits event
│
▼
Cloudflare Queue ──► ingest Worker
│ • fetch document
│ • extract text + metadata
│ • compute classification, audience, client_tag, tier_required
│ • semantic chunk
│ • (optional) Anthropic Contextual Retrieval enrichment via OpenRouter/Llama (cheap bulk model)
│ • call Policy Worker → embedding model (Vectorize embedding upsert)
│ • upsert lexical row to D1 FTS5
▼
Audit log entry → Axiom
The ingest Worker is idempotent and rerunnable. A failed sync logs to Axiom with the source path and reason; a daily Cron Worker reconciliation job retries.
7. Channels and workflows¶
The platform serves two distinct subsystem families.
7.1 Channels (inbound, human-initiated)¶
A channel is a surface where a human asks a question or starts a conversation.
Priority order (V1 → V5+):
- Documentation Assistant (V1) — the existing widget, migrated onto the new stack. Proves end-to-end. See current-state.md for as-built; the V1 migration is the first thing built on the platform.
- Microsoft Teams bot (later) — internal channel for staff. Uses Entra SSO automatically.
- Public web widget (later) — Feoda-hosted chat for prospects and general public.
- In-product chat (ARM) (later) — embedded in the ARM application for client staff.
- Telegram bot (later) — channel for end-users in markets where Telegram dominates.
- WhatsApp Business (deferred) — channel for parent/family communication. Deferred until business volume justifies the Meta Cloud API setup cost.
Slack is not on the list. Feoda does not use Slack internally.
7.2 Workflows (event-driven and human-initiated, AI-executed)¶
A workflow is an event-triggered task that the AI executes without a human conversation. Each workflow has an explicit trigger, a defined input set, a defined output, and an audit trail.
Initial workflows (V2 → V5+):
- Developer code review (V2) — triggered by GitHub PR webhook. Reviews diff, comments inline, suggests improvements consistent with conventions. Proves the code-repo source connector and the workflow pattern.
- SDR / lead workflow (V3) — triggered by NetSuite Lead record creation or update. Scores the lead, drafts outreach, files a follow-up task. Proves the NetSuite source connector and outbound automation.
- Support triage (V4) — triggered by helpdesk webhook. Classifies the ticket, suggests reply, escalates to a human if confidence is low. Proves the ticket source connector and human-in-the-loop pattern.
- Reporting and digest (later) — scheduled (cron). Generates weekly client-status reports, internal usage digests.
Workflows and channels share the same gateway, identity (every workflow runs as a service principal owned by an admin role), retrieval, and audit foundations.
8. Audit, observability, and support¶
8.1 Choice: Axiom¶
Axiom is the managed log store. Free tier: 500 GB ingest, 100 GB storage per month — generous enough that the platform will not pay for logs during the entire MVP and well into production. Paid begins at $25/month plus usage.
Why Axiom over Better Stack / Datadog: Axiom is purpose-built for high-volume structured event data, has a free tier that comfortably covers our scale, and is AI-native (built for log queries that themselves use AI).
8.2 What is logged¶
Every AI call emits one structured event. Schema:
{
ts: ISO timestamp
principal_id: surrogate UUID (NEVER a name or email)
tier: T1 | T2 | T3
role: …
client_tag: null | …
subsystem: "docs-assistant" | "code-review" | "sdr" | …
channel_or_trigger: "web-widget" | "github-pr-webhook" | …
request_id: correlation UUID
provider: "anthropic" | "google" | "openai" | "groq" | …
model: "claude-sonnet-4-5" | …
retrieved_docs: [{ source, source_path, client_tag, score }]
prompt_tokens: int
completion_tokens:int
cost_usd: float
latency_ms: int
outcome: "ok" | "error" | "rate_limited" | "blocked_provider" | …
error_class: …
}
The prompt and completion text are not logged by default — only their token counts and the retrieved-documents list. This is the privacy-first default. A separate, admin-only "diagnostic mode" can be enabled per-principal for a bounded time window for debugging.
8.3 Retention¶
Tiered retention policy:
- Hot, queryable: 90 days. Covers operational incident response and recent audits.
- Warm, queryable on request: 1 year. Covers compliance reviews and the bulk of typical school-year audits.
- Cold, aggregated and anonymised only: 7 years. Covers the AU education-sector 7-year norm and the upper bound of UAE PDPL "as long as necessary".
- Purge: anything past 7 years is deleted.
Right-to-erasure is supported via a per-principal delete API. Deletion is logged (the deletion log is exempt from erasure).
8.4 Surrogate identity in logs (support pattern)¶
Audit logs contain principal.id (a UUID), never names or emails. When a support case requires resolving a UUID to a real user, an admin invokes a dedicated identity-resolution tool in the platform admin UI. That tool:
- Requires
adminrole. - Logs the lookup itself (who resolved which
principal.idand why). - Returns the user's name and email from a separate, sealed identity table.
- Is rate-limited.
This pattern keeps PII out of the high-volume log store while still letting Support reproduce a user's session when needed.
8.5 Eval and regression testing¶
A lightweight eval suite is maintained per subsystem:
- A small set of golden questions with expected source documents (not expected exact answers — exact-match scoring is too brittle for LLMs).
- Run on every deploy of the affected subsystem. Failures block the deploy.
- Run nightly against the live model versions to catch silent regressions when a provider rolls a model update.
The suite is intentionally small (≤ 30 questions per subsystem) and is curated by the subsystem owner. We resist the urge to grow it past what a human will actually triage.
9. Hosting and runtime¶
9.1 Cloudflare Workers for the platform¶
The gateway, ingest workers, channel adapters, and workflow runners are all deployed to Cloudflare Workers. Free tier: 100k requests/day (~3M/month), 10ms CPU per request. Paid begins at $5/month.
Why Cloudflare:
- Cost. Free tier covers the MVP entirely. Vercel's serverless function pricing scales with invocations + bandwidth + duration; Cloudflare Workers does not charge for bandwidth.
- No egress fees. Cloudflare R2 (document staging, model artifacts) is $0 egress; AWS S3 / Vercel Blob are not.
- Edge presence in our region. Cloudflare has POPs in Dubai, Riyadh, and Jeddah. Vercel does not.
- Same domain, both Pages and Workers. A future Cloudflare-hosted docs site (Pages) and the Workers gateway can co-exist on one domain with path-based routing.
9.2 Storage¶
- Cloudflare R2 — staging for raw documents pulled by ingest workers, model artifacts, eval fixtures.
- Pinecone — vector index (managed).
- Cloudflare D1 (SQLite serverless) — platform metadata: principals, roles, role-to-scope mappings, source registrations, workflow definitions.
- Source systems — actual document content stays in GitHub, M365, NetSuite, etc.
9.3 Documentation site hosting (parallel evaluation)¶
The MkDocs documentation site stays on Vercel on main. A parallel cloudflare-pilot branch deploys the same site to Cloudflare Pages for evaluation. Both deploys auto-build from GitHub. The two are run side-by-side until a decision is made together to retire one.
The platform gateway and backend code goes to Cloudflare Workers from day one regardless — that decision is independent of the docs-site hosting decision.
9.4 Portability¶
All platform code is written as portable HTTP handlers (Hono framework or equivalent), not as Workers-only API. Switching the gateway from Cloudflare Workers to a small always-on box (Hetzner, DigitalOcean, AWS) becomes a deployment-config change, not a rewrite. This protects against future Cloudflare pricing changes and against any residency-driven move to AWS me-central-1.
10. Data classification and PII policy¶
10.1 Classification¶
Every document indexed by the platform carries one of three classifications:
| Classification | Examples | AI providers allowed |
|---|---|---|
| public | Marketing, public website content, published case studies | All approved providers |
| internal | Process docs, technical docs, client implementation docs (no PII) | All approved providers |
| confidential | Anything containing PII, financial transaction data, or contractual material flagged confidential | Subject to §10.2 |
10.2 No live PII in prompts¶
Live student PII, parent/family PII, and live financial transaction data are not sent to external AI providers. The platform enforces this at three points:
- At ingestion: the ingest worker classifies documents. Any document flagged
confidentialbecause it contains live PII is not embedded into Pinecone by default; it is added to a separate, manually-curated index that is opt-in per subsystem. - At the gateway: Portkey's PII redaction (§4.4) catches anything that slips through and replaces detected PII with placeholders before forwarding to the model provider.
- At the policy level: subsystems that need to operate on PII must use pseudonymised identifiers (a stable surrogate for the underlying record) rather than the PII itself. The mapping from pseudonym to record stays inside the platform.
This applies across UAE PDPL, KSA PDPL, and Australian APP jurisdictions — a single policy that satisfies all three.
10.3 Cross-border posture¶
Until a specific client contract demands in-country residency:
- US-hosted AI providers (Anthropic, OpenAI, Google, Cohere/Voyage) and Cloudflare (US/EU edge) are the default.
- OpenRouter is used as a routing/failover provider; the underlying model providers it dispatches to must themselves be on the Approved Provider List.
- Each provider used must have a signed DPA on file with Feoda, referenced in the Approved Provider List. Cloudflare, OpenRouter, and the chosen reranker are pending AP-03 and must complete it before production traffic.
- Parent/staff consent for AI processing is collected at enrolment by client institutions; Feoda does not collect end-user consent directly except for users it onboards through public channels.
- For onboarding a KSA client specifically, a one-page PDPL compliance addendum is added to the standard MSA (template TBD). No platform changes required.
11. Vendor and cost summary¶
| Component | Vendor | Free tier sufficient for MVP? | Paid tier starts at |
|---|---|---|---|
| Identity (T1 staff, Phase 1) | Cloudflare Access (Entra-backed) | Yes | $3/user/mo beyond 50 users |
| Identity (T2/T3 client, Phase 3) | Clerk | Yes (50k MAU) | $20/month |
| Policy Worker (governance plane) | In-house, on Cloudflare Workers | Yes | (counts under Workers) |
| Gateway (managed plane) | Cloudflare AI Gateway | Yes | Free; pay only for Logpush egress |
| Default model | Anthropic Claude Sonnet 4.5 (direct) | Pay-per-token | Pay-per-token |
| Model failover / bulk | OpenRouter | Pay-per-token + ~5.5% markup | Pay-per-token |
| Reranker | Cohere Rerank or Voyage | Yes (free dev tier) | Pay-per-token |
| Vector index | Cloudflare Vectorize | Yes (free up to 5M vectors) | $0.04 / M queried, $0.05 / GB-mo |
| Lexical index | Cloudflare D1 FTS5 | Yes | Counts under D1 ($5/mo beyond free) |
| Async / ingestion | Cloudflare Queues + Workflows | Yes | $0.40 / M operations beyond free |
| Source connectors | Native (Graph API, NetSuite RESTlet, GitHub Actions) | Yes (own code) | $0 |
| Source connectors (fallback) | Nango | Yes (10 conns, 100k reqs) | $50/month |
| Audit logs (long-term) | Axiom | Yes (500 GB ingest / 100 GB storage) | $25/month + usage |
| Hosting (gateway, workers, ingest) | Cloudflare Workers | Yes (100k req/day) | $5/month |
| Storage (raw docs, artifacts) | Cloudflare R2 | Yes (10 GB storage, $0 egress) | $0.015/GB beyond |
| Metadata DB | Cloudflare D1 | Yes | $5/month beyond free |
| Docs site (current) | Vercel | Yes | varies |
| Docs site (target) | Cloudflare Pages | Yes (unlimited bandwidth) | $25/month for Pro |
Indicative monthly cost when the platform is past the free tier in every component: ~$80–140 / month at low volume — materially lower than the 2026-04-23 estimate of $185–270 because Portkey ($49), Pinecone ($50), and Nango ($50) are removed.
A separate live tracker for all SaaS subscriptions (renewals, owners, costs) lives in a Notion database. Its governance reference is recorded under company/operations/vendor-register.md (to be created).
12. Vertical-slice delivery plan¶
The platform is built one shippable end-to-end slice at a time. Each slice adds one new capability and reuses everything from prior slices.
V1 — Documentation Assistant on the new stack¶
Builds the foundations end-to-end with the lowest-risk subsystem.
- Identity: Cloudflare Access (Entra-backed) for T1 staff; Clerk added later in Phase 3.
- Gateway: Policy Worker + Cloudflare AI Gateway with Anthropic (default) and OpenRouter (failover) upstreams; hard provider/model allowlist enforced in the Policy Worker.
- Retrieval: Cloudflare Vectorize (per-tenant namespaces) + D1 FTS5 + reranker via gateway. Indexed from the GitHub repo via a GitHub Action posting to a Cloudflare Queue.
- Answering: agent loop with the V1 tool catalog (
list_dir,read_file,grep,semantic_search,get_recent,list_meetings). - Audit: Workers Logs → Logpush → Axiom, with the surrogate-identity logging pattern.
- Hosting: Cloudflare Workers; the existing Vercel docs site widget continues to call the new platform endpoint.
Delivery posture: the new platform deploys to a cloudflare-platform branch in parallel with the live Vercel V1 (ai-platform-v1). Cutover happens only after the eval suite passes against the live model versions and the user explicitly approves. Vercel V1 is not retired until cutover is confirmed.
Outcome: every architectural foundation is exercised by a real subsystem. The existing docs-site widget keeps working throughout.
V2 — Developer code-review workflow¶
Adds the workflow category and the first source-connector pattern.
- New: code-repo source ingestion (GitHub Action → Pinecone).
- New: workflow trigger pattern (GitHub PR webhook → Cloudflare Worker → gateway).
- Reuses: identity (service principal owned by
admin), gateway, RAG, audit, hosting.
Outcome: workflows are a first-class subsystem, not an afterthought.
V3 — SDR / lead workflow¶
Adds NetSuite source connector and outbound action pattern.
- New: NetSuite source connector (RESTlet + Nango wrapper).
- New: outbound action pattern (writing back to NetSuite, sending drafts to staff).
- Reuses: everything else.
Outcome: the platform proves it can both read from and (carefully, with human-in-the-loop) act on NetSuite.
V4 — Support triage workflow¶
Adds ticket source connector, multi-step reasoning, and human handoff.
- New: helpdesk source connector.
- New: confidence-aware escalation pattern.
- Reuses: everything else.
Outcome: the support team has a working AI triage assistant; the platform proves human-in-the-loop.
V5+ — Channel expansion¶
Microsoft Teams bot, public web widget, in-product chat in ARM, Telegram, WhatsApp (deferred). Each is a new channel adapter that consumes the same foundations. No platform-level change should be required to add a channel by V5.
Adding T2/T3 identity tiers¶
T2 (client staff) and T3 (end-user) tiers are built when V5 channels need them — not before. V1–V4 are T1-only.
13. Open questions¶
| # | Question | Owner | Resolution path |
|---|---|---|---|
| OQ-1 | Which helpdesk system will V4 integrate with? | Head of Support | Decide before V4 begins |
| OQ-2 | Which CRM (or NetSuite-as-CRM) feeds the SDR workflow? | Head of Sales | Decide before V3 begins |
| OQ-3 | Final confirmation of role-to-scope mapping for T2 and T3 — which docs each role can see | Head of Technology | Workshop before V5 |
| OQ-4 | Microsoft Teams bot publication: per-tenant or platform-wide? | Head of Technology | Decide before Teams bot work |
| OQ-5 | KSA PDPL compliance addendum to MSA template | Legal advisor | Before onboarding any KSA client |
| OQ-6 | Eval-suite curation owner per subsystem | Each subsystem owner | At V1 completion |
| OQ-7 | Subscription register format and ownership | Head of Technology | Create company/operations/vendor-register.md |
| OQ-8 | When (if ever) to retire Vercel for docs hosting | Head of Technology | After Cloudflare Pages parallel deploy is validated |
| OQ-9 | Do workflows ever bypass PII masking when both source and destination are internal Feoda systems and no external provider is involved? | Head of Technology | Define before V3 |
14. Status log¶
| Date | Note |
|---|---|
| 2026-04-23 | Initial draft. Vendor decisions: Entra (T1), Clerk (T2/T3), Portkey + OpenRouter, Pinecone, Nango, Axiom, Cloudflare Workers/R2/D1. Vertical-slice plan V1–V5 defined. Vercel docs site preserved; Cloudflare Pages pilot to run in parallel on cloudflare-pilot branch. |
| 2026-04-28 | Revision after re-reading the Corporate AI Strategy, applying the principle that the strategy fixes goals and governance, not vendors. Vendor changes: Portkey → Cloudflare AI Gateway + in-house Policy Worker (§4); Pinecone Assistant → Cloudflare Vectorize + D1 FTS5 + Cohere/Voyage reranker via gateway (§5.1, §5.7); Nango → native connectors first, Nango retained as fallback (§6.1). Architecture changes: agentic answering pattern with a tool catalog added as the default (§5.6), replacing one-shot retrieval; Vectorize per-tenant namespaces give hard isolation (§5.3); ingestion gains an optional Anthropic Contextual Retrieval enrichment step routed through OpenRouter for cost (§6.3); identity for T1 in V1 uses Cloudflare Access (Entra-backed) instead of direct Entra integration. Governance commitments unchanged: hard allowlist, per-principal rate limits + spend caps, PII redaction in transit, surrogate-identity audit logs, retrieval-time ACL, no live PII in prompts, source-of-truth-stays-put. Indicative monthly cost falls from ~$185–270 to ~$80–140. New AP-03 entries required: Cloudflare, OpenRouter, Cohere/Voyage. New branch for the build: cloudflare-platform; Vercel V1 on ai-platform-v1 runs in parallel until explicit cutover. |