# KodiLingo — Founder Starter Kit

> **What this is.** A ready-to-use artifact pack that turns the KodiLingo concept note into a
> buildable plan: the wedge, the architecture + database, a costed unit-economics model, the
> privacy/compliance workstream, the prioritization data, and a **runnable offline walking-skeleton
> MVP**. Read this file first, then work the four-track plan below.

---

## What KodiLingo is (the wedge)

KodiLingo is the only tool where a **Kenyan CBE coding lesson IS an English-literacy exercise for the
same Grade 4–8 child**. A learner (a) arranges code blocks to solve a coding micro-challenge, and
(b) must answer **in English** what their code does. That single fused step is the entire defensible
wedge. It runs **offline-first** as a PWA served from a per-school box, with a thin cloud tier only for
sync and the parent/admin portal. Everything else (pronunciation scoring, AI tutor, robotics, VR,
Swahili UI, galleries) is deferred past pilot. **Breadth is the enemy.**

---

## Files in this pack (every file below exists in this folder)

### Top level
| File | Purpose |
|---|---|
| [`../KodiLingo - Strengthening the Concept.md`](../KodiLingo%20-%20Strengthening%20the%20Concept.md) | The original analysis: the 8 weaknesses, how each is strengthened, the "do this first" sequence and the Phase 0–3 roadmap. The *why* behind everything here. |

### `docs/` — the plan
| File | Purpose |
|---|---|
| [`00-README.md`](00-README.md) | This index + the four-track plan + roadmap (you are here). |
| [`01-wedge-and-positioning.md`](01-wedge-and-positioning.md) | The wedge stated precisely, the target user, the explicit v1 NOT-list, the discovery kill-rule, the pilot success/kill criteria. |
| [`02-competitor-matrix.md`](02-competitor-matrix.md) | Competitor × feature matrix (Elimutab, MsingiPack, eLimu, Kytabu, Eneza, Zeraki, Scratch, Code.org, Tynker, Kodable, Duolingo ABC) and the empty cell KodiLingo owns. |
| [`03-architecture-ADR.md`](03-architecture-ADR.md) | The architecture decision: offline-first PWA + per-school box (nginx + FastAPI/Node + SQLite WAL) + thin cloud; the 9-table model; 4-role auth; the sync design. |
| [`04-ai-tiering.md`](04-ai-tiering.md) | Every "AI" feature tagged Tier 0 (on-device) / 1 (LAN) / 2 (cloud, queued), with offline behavior and a data-egress column; what's NOT feasible on target phones; the two gated spikes. |
| [`05-privacy-pack.md`](05-privacy-pack.md) | Kenya DPA 2019 + ODPC obligations as a working pack: controllership map, registration checklist, pseudonymous child identity, consent flows, retention, DPIA skeleton, vendor-DPA rule. |
| [`06-unit-economics.md`](06-unit-economics.md) | The cost model in prose: the cost bombs, caching/quota strategy, the free/paid/premium tiering, the ~US$200–300 box, the break-even idea. Companion to the CSV + R script below. |
| [`07-hardware-tiering.md`](07-hardware-tiering.md) | Virtual-first device API; Tier 0 simulator / Tier 1 funded micro:bit / Tier 2 deferred VR; the MakeCode embed decision; Kio Kit delivery; the hardware-pilot funding ask. |
| [`08-operating-plan.md`](08-operating-plan.md) | The living one-pager: North Star, metric tree, the 10 day-one events, the learning-outcome instrument, phased table, the hiring ladder, funder-readiness checklist. |
| [`09-safeguarding-policy.md`](09-safeguarding-policy.md) | Child safeguarding (distinct from data privacy): avatars/handles only, no public chat, no v1 gallery, escalation path — procurement-ready. |
| [`10-ip-license-inventory.md`](10-ip-license-inventory.md) | Open-source license obligations (Blockly, MakeCode, p5.js, LanguageTool, Vosk, whisper.cpp, etc.) — settle before shipping/fundraising. |

### `data/` — editable data files
| File | Purpose |
|---|---|
| [`../data/feature-prioritization-RICE.csv`](../data/feature-prioritization-RICE.csv) | ~64 features from the original 13 pillars, RICE-scored, with MoSCoW / depends_on / phase. Draw your MVP line here. |
| [`../data/data-inventory.csv`](../data/data-inventory.csv) | One row per data element: who collects it, lawful basis, purpose, whether it leaves Kenya, retention. Feeds the DPIA. |
| [`../data/unit-economics-COGS.csv`](../data/unit-economics-COGS.csv) | The cost-driver rows (unit costs, uses/student, cache-hit rates) that feed the R model. Edit and recompute. |

### `db/` — the database
| File | Purpose |
|---|---|
| [`../db/schema.sql`](../db/schema.sql) | The full **9-table** SQLite schema (WAL-ready) + the append-only `event_queue` sync spine, row-level scoping, no-PII student rows, and a small seed. Loads clean in `sqlite3`. |

### Cost model (interactive, in-browser)
| File | Purpose |
|---|---|
| [`../cogs.html`](../cogs.html) | Reproducible per-student COGS + break-even model that runs in any browser from `data/unit-economics-COGS.csv` — no R, no server. Open it from the site home or on GitHub Pages, and drag the sliders. |

### `src/` — code stubs
| File | Purpose |
|---|---|
| [`../src/ai/providers.js`](../src/ai/providers.js) | The **provider-abstraction layer** (swappable ASR/TTS/grading/tutor engines, local + cloud drivers, quota + cache guards) — build this *before* coupling to any vendor SDK. |

### `mvp/` — the runnable walking skeleton
| Path | Purpose |
|---|---|
| [`../mvp/`](../mvp/) | A dependency-free, **offline** PWA that actually runs the fused loop: toy block UI (stand-in for MakeCode) + typed-English step + deterministic check + the 10-event SQLite-shaped queue + a teacher dashboard. See [`../mvp/README.md`](../mvp/README.md) to run it. Production swaps the toy UI for embedded MakeCode/Blockly — **never build a block engine.** |

---

## Do this first (four parallel tracks)

You are solo, part-time, R/data background, unfunded. **Assume part-time roughly doubles every
timeline.** Run these in parallel — the human-recruitment track is the real bottleneck, so start it
on day one.

### Track A — Week 1: paper artifacts (you, alone, no code)
- [ ] Read `docs/01`–`docs/10`; confirm nothing contradicts the wedge.
- [ ] Lock the wedge sentence and the v1 NOT-list on one page you can show a teacher.
- [ ] Open `data/feature-prioritization-RICE.csv`, sanity-check the scores, and draw your MVP line.
- [ ] Write the 15–20 discovery-interview script (teacher pain, ICT budget line, device + WiFi reality).

### Track B — Weeks 1–3: people, paperwork, pipeline (highest leverage)
- [ ] **Recruit one practicing CBE lower-primary teacher as co-author** — co-founder-grade dependency; content is the critical path. (~US$300–800/mo when paid.)
- [ ] **Register with the ODPC.** Education is a mandatory-registration sector with **no small-business exemption**. ~KES 4,000 initial / KES 2,000 renewal, 24-mo validity. **You are the DPO.** 72h breach-notice obligation. (See `docs/05-privacy-pack.md`.)
- [ ] Line up Kenyan data-protection counsel for the child-data DPA/consent posture (~US$1,500–4,000).
- [ ] Run **15–20 discovery interviews**; validate price against the **school ICT line item**, not tuition.
- [ ] Pilot outreach: find 1 school (BRCK Kio Kit on-site, or a Pi/mini-PC + BYOD) for an n=20–40 signal study.

### Track C — Weeks 2–12: build the offline skeleton (you + the artifacts here)
- [ ] Run `mvp/` and load `db/schema.sql`; confirm the fused step + deterministic check + event queue work fully offline.
- [ ] Co-author the first 5–8 lessons with the Track-B teacher.
- [ ] Open `cogs.html` (the in-browser cost model) and sanity-check the paid-tier COGS against `data/unit-economics-COGS.csv` before any vendor SDK.
- [ ] Buy **one** micro:bit (~US$20) only to validate the hardware-agnostic device API — **not a class set.**

### Track D — ongoing: don't get distracted
- [ ] Anything on the deferred list stays deferred until after the pilot signal. If it isn't the wedge, it waits.

---

## Roadmap (Phase 0–3)

| Phase | Calendar (part-time ≈ 2×) | Goal | What ships | Hard NOs |
|---|---|---|---|---|
| **Phase 0** | 0–3 mo | Build the wedge MVP | Offline PWA: toy-block UI + fused typed-English step + deterministic check + 9-table SQLite + event queue; 5–8 CBE lessons | pronunciation scoring, AI tutor/grading, robotics, VR, LLM box, portals, Swahili UI, gallery |
| **Phase 1** | 3–9 mo | Prove learning + retention | 1-school pilot; fixed pre/post Blockly task (n=20–40, honest signal); embed MakeCode/Blockly for real; weekly-active North Star tracked | scaling, paid cloud tier, second engineer |
| **Phase 2** | 9–18 mo | Prove it travels + costs work | 2nd/3rd school; sync hub + Tier-1 LAN services; validate ~US$200–300 box + ~US$0.35–0.55/student/mo paid-tier COGS | nationwide, VR, robotics class sets |
| **Phase 3** | post-funding | Expand | AI tutor, adaptive learning, robotics pilot, virtual labs, gallery, VR (premium) — vendor-abstracted, behind the privacy + cost guards | building any of it before the pilot signal |

---

## Bottom line

The work that de-risks KodiLingo is mostly **paper and discipline before code**: lock the wedge, run
discovery with a real kill-rule, recruit the CBE teacher, start ODPC registration — then build the thin
offline loop in `mvp/` while the lessons are co-authored. Within ~6 months you'll know, from real
Kenyan kids on real low-end devices, whether KodiLingo should exist.
