Phase1: Schema + skill event tracking
All checks were successful
Build and Push / build (push) Successful in 1m35s

This commit is contained in:
2026-05-24 00:21:29 +02:00
parent faf49119ea
commit 6baca1a459
2 changed files with 64 additions and 256 deletions

289
README.md
View File

@@ -1,273 +1,54 @@
# PieCed Portal — Billing Phase 1 (drop-in replacement)
# PieCed Portal — Billing Phase 1 patch (suspend-via-admin fix)
Schema + event tracking. No UI yet (that lands in Phase 2).
This zip mirrors the `pieced-portal/` repo root — extract over your
existing source tree to apply.
Single-file fix on top of the Phase 1 v2 drop.
**v2 fix:** stripped stray backticks from SQL comments that were
closing the `MIGRATION_SQL` template literal early. If you got
"Expected a semicolon" at db.ts:335 with v1, this build is the fix.
## What it fixes
---
The admin panel's suspend/resume button hits
`/api/admin/tenants/[name]/suspend` (a different route from the
customer-side `/api/tenants/[name]/suspend`). The v2 drop only
hooked the customer route — admin suspends were going to K8s
without producing a row in `tenant_suspension_events`.
## Files in this drop
This patch adds the same `recordSuspensionEvent` hook to the
admin route. No other code paths affected; no schema changes.
## Files
```
src/lib/db.ts MODIFIED
src/types/index.ts MODIFIED
src/app/api/admin/requests/[id]/approve/route.ts MODIFIED
src/app/api/tenants/[name]/route.ts MODIFIED
src/app/api/tenants/[name]/suspend/route.ts MODIFIED
src/app/api/admin/tenants/[name]/delete/route.ts MODIFIED
src/app/api/admin/billing/backfill/route.ts NEW
src/app/api/admin/tenants/[name]/suspend/route.ts MODIFIED
```
No `package.json` changes — Phase 1 uses only deps already present.
### What changed
`src/lib/db.ts`
- Extended `MIGRATION_SQL` with 11 new tables (idempotent — uses
`CREATE TABLE IF NOT EXISTS`)
- Added a new "Billing — Phase 1" section at the bottom with ~25
helper functions
`src/types/index.ts`
- 6 new interfaces appended at the bottom
`src/app/api/admin/requests/[id]/approve/route.ts`
- Imports `recordTenantCreated`, `recordSkillEvents`,
`recordSuspensionEvent` from `@/lib/db`
- Resume path: records a `resumed` suspension event after
`patchTenantSpec({suspend: false})`
- Provision path: records `recordTenantCreated` + initial
`enabled` events after `createTenant`
`src/app/api/tenants/[name]/route.ts`
- Imports `recordSkillEvents`
- After `patchTenantSpec` succeeds and the patch touched
`packages`, computes the diff (added/removed) and writes events.
Diff is computed against the patched CR (the returned state)
so events match what K8s committed.
`src/app/api/tenants/[name]/suspend/route.ts`
- Imports `recordSuspensionEvent`
- Records `suspended` or `resumed` after the patch succeeds
`src/app/api/admin/tenants/[name]/delete/route.ts`
- Imports `recordTenantDeleted`
- Stamps `deleted_at` on the lifecycle row after `deleteTenant`
`src/app/api/admin/billing/backfill/route.ts` (new)
- `POST /api/admin/billing/backfill` — platform-only, idempotent
- Reads every live PiecedTenant CR, mirrors creationTimestamp,
current `spec.packages`, and `status.suspendedAt` into the new
tables. Run once after deploy to bootstrap historical data.
### Tables added (Postgres, all idempotent)
```
platform_pricing single-row pricing config
skill_pricing per-package daily price (optional)
tenant_billing_lifecycle per-tenant created_at + deleted_at
tenant_skill_events append-only enable/disable log
tenant_suspension_events append-only suspend/resume log
org_billing_config per-org billing posture (pay-by-invoice,
stripe id, auto-cron toggles)
org_payment_methods Stripe payment methods (Phase 4)
invoice_number_counters gapless per-year counter
invoices immutable issued invoices (Phase 2)
invoice_lines invoice line items (Phase 2)
invoice_reminders sent reminders + their PDFs (Phase 6)
```
The invoice/lines/reminders tables ship now so Phase 2 doesn't need
a second migration, but no code writes to them until Phase 2.
### Design properties
* Every billing-tracking call is wrapped in `try/catch`. A logging
failure never blocks the K8s operation.
* PATCH-diff is computed against the *returned* CR state, not the
pre-patch state, so events match what K8s actually committed.
* Event tables are append-only. Historical billing can be
recomputed reproducibly.
* `tenant_billing_lifecycle` mirrors created_at + deleted_at so
deleted tenants still have a final-invoice anchor.
* All money is `NUMERIC`: 10,2 for CHF amounts, 10,5 for per-unit
prices.
---
## Deploy
1. Extract this zip over your `pieced-portal/` source tree
2. Build & push:
```
./buildanddeploy.sh # or your usual flow
```
3. Bump the image tag in `gitops/apps/portal/deployment.yaml`,
commit, push. ArgoCD picks it up.
4. On pod boot, the next DB query auto-runs `MIGRATION_SQL` (your
existing `ensureSchema` pattern). No manual `psql` needed.
Extract over your `pieced-portal/` tree, rebuild, redeploy as
usual. After the new image is running, verify:
---
1. Suspend any test tenant from the `/admin` panel.
2. Check the events table:
## Testing (in order — don't skip steps)
### Step 1 — Migration ran
After the new pod is `Ready`, exec into the portal DB and verify
all 11 new tables exist:
```
kubectl -n portal exec -it portal-db-1 -- \
psql -U portal -d portal -c "\dt"
```bash
kubectl -n pieced-system exec -it portal-db-1 -- psql -U postgres -d portal -c \
"SELECT * FROM tenant_suspension_events ORDER BY id DESC LIMIT 5;"
```
You should see the new tables alongside the existing ones.
Expect a fresh `suspended` row for the tenant you just toggled.
Sanity-check the single-row pricing config seed:
3. Resume → expect a `resumed` row.
```
kubectl -n portal exec -it portal-db-1 -- \
psql -U portal -d portal -c "SELECT * FROM platform_pricing;"
```
## Why I missed this
Expected: one row, all zeros, vat_rate_chli=8.10.
Both routes share the same shape (PATCH/POST that sets
`spec.suspend`), but they differ on:
### Step 2 — Backfill existing tenants
- URL path (`/api/admin/tenants/...` vs `/api/tenants/...`)
- Method (POST vs PATCH)
- Authorization (platform-only vs owner+platform)
- Caller (admin panel vs customer cancel button)
Run the backfill once. From a logged-in admin browser tab DevTools
console:
```js
await fetch('/api/admin/billing/backfill', { method: 'POST' })
.then(r => r.json())
```
Expected response (numbers will vary):
```
{
"message": "Backfill complete.",
"tenantsExamined": 4,
"lifecycleInserted": 4,
"eventsInserted": 12,
"suspensionEventsInserted": 0
}
```
Run it a SECOND time — all three "Inserted" counts should be 0
(idempotency check).
### Step 3 — Verify backfill data
```
kubectl -n portal exec -it portal-db-1 -- psql -U portal -d portal
```
```sql
SELECT tenant_name, zitadel_org_id, created_at, deleted_at
FROM tenant_billing_lifecycle ORDER BY created_at;
SELECT tenant_name, skill_id, event_kind, occurred_at
FROM tenant_skill_events ORDER BY tenant_name, occurred_at;
```
Cross-check against the live CR:
```
kubectl get piecedtenants -o jsonpath='{range .items[*]}{.metadata.name}{": "}{.spec.packages}{"\n"}{end}'
```
Every package currently in `spec.packages` should have a matching
`enabled` event row.
### Step 4 — Live skill toggle
From the customer-facing tenant detail page, enable a package not
previously present (e.g. `searxng-local-search`):
```sql
SELECT * FROM tenant_skill_events
WHERE tenant_name = 'your-test-tenant'
ORDER BY id DESC LIMIT 3;
```
Expect a fresh `enabled` row. Disable the package → expect a
`disabled` row on top.
### Step 5 — Live suspend toggle
Cancel a test tenant from the customer-side button:
```sql
SELECT * FROM tenant_suspension_events
WHERE tenant_name = 'your-test-tenant'
ORDER BY id DESC LIMIT 3;
```
Expect a `suspended` row. Resume via the admin approval flow →
expect a `resumed` row.
### Step 6 — Live delete
Delete a test tenant from the admin panel:
```sql
SELECT tenant_name, created_at, deleted_at
FROM tenant_billing_lifecycle
WHERE tenant_name = 'your-deleted-tenant';
```
`deleted_at` should be stamped with roughly "now".
### Step 7 — Pricing rows survive (optional)
Direct-INSERT a price into `platform_pricing` and `skill_pricing`,
restart the portal pod, confirm rows survive:
```sql
UPDATE platform_pricing
SET tenant_monthly_fee_chf = 49.00,
tenant_setup_fee_chf = 99.00,
threema_message_chf = 0.005
WHERE id = 1;
INSERT INTO skill_pricing (skill_id, daily_price_chf)
VALUES ('searxng-local-search', 0.10)
ON CONFLICT (skill_id) DO UPDATE
SET daily_price_chf = EXCLUDED.daily_price_chf;
```
No application behaviour changes from these — they're inert until
Phase 2 starts computing invoices.
---
## Rollback
The migration is additive — no existing columns/tables touched.
To roll back:
1. Re-deploy the previous portal image (revert the tag in gitops)
2. New tables remain in the DB but are unreferenced. Leave them
in place — Phase 2 will use them again. Or drop them:
```sql
DROP TABLE IF EXISTS invoice_reminders, invoice_lines, invoices,
invoice_number_counters, org_payment_methods, org_billing_config,
tenant_suspension_events, tenant_skill_events,
tenant_billing_lifecycle, skill_pricing, platform_pricing CASCADE;
```
---
## What's NOT in this phase (by design)
* No customer-facing /billing page
* No admin pricing UI
* No invoice generation
* No PDF rendering
* No Stripe wiring
* No reminders or cron
These are Phases 2-6.
When I grepped for the suspend hook target I matched on the
customer endpoint and didn't audit cross-cutting admin
duplicates. I've since checked every site that calls
`patchTenantSpec`, `createTenant`, or `deleteTenant` — this was
the only missed billing-relevant one. Other `patchTenantSpec`
sites are confirmed non-billing (openClawImage, channelUsers).

View File

@@ -1,6 +1,7 @@
import { NextResponse } from "next/server";
import { requirePlatformRole } from "@/lib/session";
import { getTenant, patchTenantSpec } from "@/lib/k8s";
import { recordSuspensionEvent } from "@/lib/db";
import { safeError } from "@/lib/errors";
/**
@@ -29,6 +30,32 @@ export async function POST(
try {
const updated = await patchTenantSpec(name, { suspend });
// Billing — Phase 1: record the transition. Mirrors the same
// hook in the customer-side suspend route so admin actions
// also produce events. Best-effort; logging failures don't
// block the response.
try {
const orgId =
tenant.metadata.labels?.["pieced.ch/zitadel-org-id"] ?? null;
if (orgId) {
await recordSuspensionEvent(
name,
orgId,
suspend ? "suspended" : "resumed"
);
} else {
console.warn(
`billing: tenant ${name} has no zitadel-org-id label; suspension event not recorded`
);
}
} catch (e) {
console.error(
`billing: failed to record suspension event for ${name}:`,
e
);
}
return NextResponse.json({
message: suspend ? "Tenant suspended." : "Tenant resumed.",
tenant: updated,