The journey.
From a one-evening Groq spike in late 2024 to a 40-engine cascade with voice, RAG, multi-agent and a Monday-morning digest in directors' inboxes. Engineered in-house, one commit at a time.
First spike — one Groq call
An evening prototype: ~80 lines of Next.js, a single fetch to Llama 3.1 70B on Groq, no streaming. The concept proved in one sitting.
Supabase auth + persistent chats
Wired @memo.co.uk-only auth, persistent chat sessions in Postgres, SSE streaming responses, basic sidebar. Memo Nexus and Memo AI now share a login.
Multi-provider cascade
First fallback chain — Groq → SambaNova → Cerebras → OpenRouter. If one provider 429s, the next picks up in under 50ms. The architecture that defines Memo AI today.
Six specialised model tiers
Smart / Reasoner / Live / Fast / Coder / Vision — each with its own cascade, daily limits, and recommended use cases. Auto-router classifies intent and picks for you.
Gemini grounded search
Live mode goes online with Gemini 2.5 Flash + Google Search grounding. Memo AI starts answering questions about today's news, today's weather, today's exchange rates.
Document understanding
PDF / Excel / Word reading wired in — extraction via Gemini's 1M-token context for PDFs, mammoth for Word, xlsx for spreadsheets. Up to 10 files per message.
Image gen + image edit
Cloudflare FLUX.2 klein across 4 accounts for generation. Image-to-image editing with verifier loop (a second AI confirms the colour really changed).
Memory across chats
Auto-extracts facts about each user after every conversation, stores them in a per-user memory table, injects relevant facts into every future chat. ChatGPT-style memory — built-in, free, owned by Memo.
PWA install + mobile polish
Installable on iPhone and Android home screens. Safe-area insets, offline page, versioned service worker. Login form tuned to never trigger iOS Safari zoom.
Memo data tools
AI starts answering questions about your leave balance, your attendance, who's in the office today, your expense claims, your calendar, your colleagues' contact details — by querying Memo Nexus tables directly.
Personas + folders + export
HR Advisor, Email Drafter, Translator, Code Reviewer, Designer, Researcher — pick a persona per chat. Group conversations into folders. Export any chat to branded PDF.
v2.0 — voice · RAG · agents
Hands-free voice loop (Groq Whisper + Cloudflare TTS). Personal RAG knowledge base on pgvector. Artifacts / Canvas side panel. In-browser Python interpreter. Multi-agent workflow runner. MCP-style API for external tools. AI-generated weekly digest emailed to directors every Monday. Same engine now also powers Design Mate and Deals.
Maintained weekly
Models, cascades and quotas reviewed every Sunday — dead model IDs removed, faster providers promoted, keys rotated. The memory pipeline is retuned weekly using anonymised usage patterns. New features ship in small batches behind reviewer-gated commits. Always on, always evolving.
Every Sunday: cascade health-check (40-key probe across all 9 providers), dead model IDs removed, faster providers promoted to the top of each tier, OpenRouter free-model catalogue refreshed.
Every Monday 08:00 UK: AI-generated weekly digest lands in every director's inbox — last week's leave, expenses, attendance anomalies and notice-board activity, with 2-4 concrete action items.
Continuously: memory pipeline retuned, per-user rate limits adjusted to actual usage, reviewer-gated commits, TypeScript-strict + production-build-clean on every push.
On every API key rotation: .env.local and Vercel prod synced via the management API, dead keys removed from all environments.