Compare commits

..

23 Commits

Author SHA1 Message Date
Tim Leikauf
1085a54761 fix: prompt_styles.themenfeld_id → kategorie_id mit FK auf categories
- Spalte umbenannt (idempotent ALTER TABLE)
- FK-Constraint zu categories hinzugefügt
- Seed befüllt kategorie_id per Kategoriename-Lookup (unabhängig von UUIDs)
- Route prompt-styles.js angepasst

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 21:15:28 +02:00
Tim Leikauf
dbd077e239 docs: CLAUDE.md – Bild-Generierungs-Pipeline (prompt_styles, picture_jobs) 2026-06-20 21:04:13 +02:00
Tim Leikauf
01f6df67f3 feat: API-Routen für prompt_styles und picture_jobs
- GET/POST/PATCH/DELETE /api/prompt-styles
- GET/POST/PATCH/DELETE /api/picture-jobs inkl. M2M /words
- Beide Routen in index.js registriert

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 21:02:36 +02:00
Tim Leikauf
1335136dcb feat: prompt_styles + picture_jobs Tabellen + Seed-Daten
Neue Tabellen via db-migrate.js:
- prompt_styles (fix/atmosphere/setting, themenfeld_id, text_en) inkl. 38 Seed-Einträge aus CSV
- picture_jobs (FKs auf categories, prompt_styles, pictures) + Status-Enum
- picture_job_words (M2M-Junction für word_ids)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 20:54:33 +02:00
455969bdec fix: unique index words.titel_en als partial index + robuster Upsert ohne ON CONFLICT
- Migration: partiell WHERE IS NOT NULL, dedup vorher, kein silent-catch
- Route: INSERT mit .catch(23505) → UPDATE statt ON CONFLICT (partial index inkompatibel)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-18 21:14:21 +02:00
1d25f84f5d fix: POST /api/words akzeptiert conc_m + ON CONFLICT (titel_en) DO UPDATE
Ermöglicht sicheres CSV-Upsert via API (Brysbaert-Import).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-18 21:06:53 +02:00
294608de22 fix: enrich-batch Endpoint in words-Router verschieben (war nach 404-Handler in index.js)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-18 21:03:20 +02:00
7ba6b7120b feat: words-Tabelle – Brysbaert-Import + hierarchische Kategorien + Batch-Anreicherung
- categories: parent_id (self-referential) + 49 Unterkategorien geseedet
- words: neue Spalten conc_m, dom_pos, level, themenfeld_id + unique index titel_en
- enrich_batches + word_generative Tabellen
- src/lib/enrichWords.js: Batch-Anreicherung (DE/SV-Übersetzung, Wortart, CEFR, Themenfeld)
- src/routes/wordGenerative.js: CRUD für KI-Bild-Pipeline
- src/routes/words.js: Filter dom_pos/level/themenfeld_id/has_conc_m + picture_count
- scripts/import-brysbaert.js: CSV-Import-Skript (lokal gegen Prod-DB)
- POST /api/words/enrich-batch als manueller Trigger

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-18 20:41:52 +02:00
1605d2cdd1 docs: CLAUDE.md – Fortschritt/Gamification (Level-Kurve, Progress-Vertrag, Achievements)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 22:17:03 +02:00
61b3bcb5ff feat: Erfolge (Achievements) – Unlock-Erkennung + Listing
- Tabelle user_achievements (Migration in db-migrate.js)
- src/lib/achievements.js: Definitionen + dedup-sichere Freischaltung
  (ON CONFLICT DO NOTHING … RETURNING → nur Neues), Listing mit Status
- /auth/progress liefert unlocked_achievements (defensiv gekapselt)
- neue Route GET /auth/achievements

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 21:53:49 +02:00
bb863640c0 feat: progressive Level-Kurve + atomarer /auth/progress-Vertrag
- levelForEp/levelInfo (Level 1 bei 20 EP statt fixer 500/Level), src/lib/leveling.js
- /auth/me liefert level + ep_into_level + ep_to_next_level
- /auth/progress liefert prev_level, streak_increased, daily_ep, daily_goal_ep, goal_just_reached
  (CTE fängt die Pre-Update-Werte, damit Level-Up/Streak-Up atomar erkennbar sind)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 21:43:36 +02:00
806e25c3ff docs: CLAUDE.md – Hintergrund-Job & Kategorie-Datenfluss dokumentieren
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:48:03 +02:00
339a3ed27d fix: bessere Wort-Kategorisierung, weniger "Sonstiges"
- Taxonomie um "Eigenschaften" (Adjektive) und "Verben & Handlungen"
  ergänzt → Wortarten haben ein Zuhause statt Sonstiges.
- Klassifizierer geschärft: klare Wortart-/Themen-Regeln, "Sonstiges"
  nur als letzter Ausweg; Sofort-Pfad nutzt jetzt Beispielsätze und
  kleinere Batches (15) für deutlich genauere Treffer.
- ?reset=true: bestehende Zuordnungen verwerfen und neu klassifizieren.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:39:28 +02:00
bd18a9c303 fix: greeting für en/sv zuverlässig setzen
Das ON-CONFLICT-Update griff bei bereits existierenden en/sv-Zeilen
nicht (Begrüßung blieb NULL). Stattdessen explizites, idempotentes
UPDATE für de/en/sv (Hallo/Hi/Hej).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:34:04 +02:00
d66cff3f61 feat: automatische Wort-Kategorisierung (Batches API + Sofort-Backfill)
Feste ~20er-Taxonomie geseedet (de/en/sv, published; bestehende
Kategorien werden wiederverwendet) + Tabelle category_batches.

src/lib/classifyWords.js: findet in Pairs verwendete Wörter ohne
Kategorie und klassifiziert sie per Haiku gegen die feste Liste.
- Stundenjob über die Message Batches API (asynchron, ~50% günstiger):
  submit/collect-Ticks, in index.js nach Boot + stündlich.
- Sofortiger synchroner One-Shot-Backfill (classifyWordsSync) für
  Live-Test ohne 24h-Verzug.
Beides materialisiert pair_categories via derivePairCategories.

POST /api/categories/auto-assign (admin): ?sync=true = Sofort-Backfill,
sonst ein Batch-Tick. Entkoppelt von generate-words und Publish.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 14:27:09 +02:00
9738d3e35a feat: Profil-Kategorien + Begrüßung in Zielsprache
languages.greeting (de/en/sv geseedet), neue pair_categories-Tabelle
(abgeleitet aus statement- und objektverknüpften Wörtern via
word_categories) inkl. Backfill für bereits veröffentlichte Pairs.
derivePairCategories() wird beim Publish (pairs + pipeline) aufgerufen.
/auth/me liefert language_target_greeting, /auth/stats liefert
categories[] mit Punkten je Kategorie fürs Profil.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 12:55:57 +02:00
508d6993ee feat: Feed-Pagination – erledigte und vom Client gelieferte Pairs ausschließen
GET /auth/feed schließt jetzt Pairs aus user_pair_progress (cross-session)
sowie per ?exclude=<uuids> übergebene, bereits geladene Pairs (In-Session)
aus. Leere Antwort signalisiert dem Client: keine weiteren Karten.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 11:53:05 +02:00
e44d896f9e feat: Objekt-Token-Cleanup + schärferer LLM-Prompt (Kopf-Kompositum vs Bestimmungswort)
Der LLM-Fallback hatte Objektwörter auch dann verlinkt, wenn sie nur
Bestimmungswort eines anderen Dings waren (z.B. "jordgubbsfältet"/Erdbeerfeld
als Erdbeere). Regel jetzt explizit:
- behalten: Wort, Beugung/Mehrzahl/bestimmte Form, Kopf-Kompositum
  ("Landschildkröte"=Schildkröte), Synonym ("Stiefel"/"Lederstiefel"=Schuh)
- entfernen: Objektwort nur als Bestimmungswort ("Erdbeerfeld" != Erdbeere)

- locateSurfaceLLM-Prompt um diese Regel + Beispiele geschärft (verhindert
  künftiges Fehl-Tagging).
- Neuer cleanup-Modus: POST /api/pipeline/retag-objects {"cleanup":true}
  prüft bestehende Objekt-Tokens per LLM und entfernt die falschen. Eindeutig
  gute Formen (exakt/Lemma+Endung) werden ohne LLM behalten.
- Helfer in objectTagging.js: objectTokensInSentence, isSimpleObjectForm, untagToken.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 19:56:13 +02:00
434839e1d4 feat: Objekt-Wörter deterministisch tokenisieren (Forward + Backfill)
Objekt-Hervorhebung (Chip + Bildregion) hängt an {{label.o:uuid}}-Tokens im
Satz. Bisher entstanden die nur aus dem LLM-Nomen-Markup, das Haiku oft
ausließ -> Objekt blieb un-getokt (z.B. "ryggsäcken"/Rucksack), obwohl korrekt
verlinkt.

- src/lib/objectTagging.js: deterministischer, flexions-toleranter Tagger
  (schwed. bestimmte Form -en/-et/...), idempotent, schützt bestehende Tokens.
- generatePairs.resolveNounMarkup: Sweep als Sicherheitsnetz + titel_sv im Lookup.
- pipeline.retagPair/retagObjects: per-Pair Nachtokenisierung (Hybrid-LLM-Fallback
  nur für in anderer Sprache bestätigte Objekte), Backfill über Bild/alle Bilder.
- POST /api/pipeline/retag-objects (dry_run/use_llm/picture_id).

Ändert nur Satz-Textfelder -> Audio/Alignment bleiben gültig.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 19:37:12 +02:00
f0f768ff2c feat: Fortschritts-Tracking – user_daily_activity, Tagesziel & GET /auth/stats
- Neue Tabelle user_daily_activity (Tagesverlauf) + Spalte daily_goal_ep
- POST /auth/progress schreibt Tagesaktivität mit
- GET /auth/me liefert daily_goal_ep
- Neuer GET /auth/stats: Tagesverlauf, Tagesziel, Totals, echte Skills je answer_type
- Neuer PUT /auth/goal zum Setzen des Tagesziels

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 16:40:57 +02:00
895d7c56a1 feat: Placeholder in der Auto-Generierung + Token-Leak-Fix
- Pair-Generierung markiert Nomen per [surface|lemma]-Markup und löst sie zu
  {{label.o:objectId}} / {{label.w:wordId}} auf (Words werden auto-erstellt)
- Pipeline übersetzt + vertont Placeholder-Wörter aus den Sätzen mit
- translateText halluziniert keine ⟦PHn⟧-Tokens mehr (kein Token-Prompt ohne
  Tokens, defensives Strippen); TTS/Review lösen geleakte Tokens auf
- POST /api/pipeline/repair-tokens repariert bestehende Sätze + Audios

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 22:43:39 +02:00
25d1e89446 feat: Deep-Delete für Pairs und Bilder (Fragen/Statements/Audios/Objekte kaskadieren)
DELETE /pairs/:id räumt jetzt unreferenzierte Fragen/Statements samt
Audio-Dateien (DB + S3) mit auf. DELETE /pictures/:id löscht zusätzlich
die nur mit diesem Bild verknüpften Objekte inkl. deren Pairs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 21:25:15 +02:00
ddbd879dab feat: KI-Review-Schritt in der Pipeline (Korrekturlesen vor Audio)
Alle Pairs eines Bildes (de/en/sv) gehen zusammen mit dem Bild an Sonnet
zur Prüfung von Rechtschreibung, Übersetzungs-Konsistenz und Plausibilität.
Korrekturen werden vor der Audio-Erzeugung angewendet; vorhandene Audios
korrigierter Zellen werden invalidiert. Review-Fehler sind nicht fatal.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 21:41:20 +02:00
27 changed files with 2845 additions and 66 deletions

42
CLAUDE.md Executable file
View File

@@ -0,0 +1,42 @@
# CLAUDE.md
REST-API für das snakkimo-Projekt. Node/Express + PostgreSQL (`pg`, kein ORM), Bild-Assets auf Hetzner Object Storage (S3-kompatibel). Ausführliche API-Doku in [README.md](README.md).
## Befehle
- `npm run dev` — lokaler Server mit nodemon (Hot-Reload)
- `npm start` — Produktion (`node src/index.js`)
- Keine Tests / kein Linter konfiguriert.
## Architektur
- Einstieg: [src/index.js](src/index.js) — registriert alle Routen, jede `/api/*`-Route ist mit der `auth`-Middleware geschützt.
- **Migrationen laufen automatisch beim Boot** ([src/db-migrate.js](src/db-migrate.js)), bevor der Server lauscht. Idempotent halten: `CREATE TABLE IF NOT EXISTS`, Spalten-Renames mit `.catch(() => {})`. Es gibt **kein** separates Migrations-Tool — Schema-Änderungen hier eintragen.
- `src/db.js` exportiert `query(text, params)` und `pool`. Immer parametrisierte Queries (`$1, $2 …`), nie String-Interpolation von User-Input.
- `src/routes/` — eine Datei pro Entität. `src/lib/`, `src/middleware/`, `src/s3.js`, `src/voices.js` für geteilte Logik.
- **Hintergrund-Job (Auto-Kategorisierung):** [src/index.js](src/index.js) startet ~30 s nach dem Boot und stündlich `runCategorizationTick()` ([src/lib/classifyWords.js](src/lib/classifyWords.js)). Er klassifiziert in Pairs verwendete Wörter ohne Kategorie per **Anthropic Message Batches API** (Haiku, asynchron, ~50 % günstiger) gegen die feste Taxonomie und materialisiert `pair_categories`. ⚠️ Braucht `ANTHROPIC_API_KEY` und verursacht echte LLM-Kosten — **auch lokal bei `npm run dev`**. Manuell anstoßen: `POST /api/categories/auto-assign` (`?sync=true` = sofort/synchron statt Batch, `&reset=true` = bestehende Zuordnungen verwerfen und neu klassifizieren).
- **Kategorie-Datenfluss:** Kategorien hängen an Wörtern (`word_categories`, feste Taxonomie wird in [src/db-migrate.js](src/db-migrate.js) geseedet). `pair_categories` wird daraus abgeleitet ([src/lib/pairCategories.js](src/lib/pairCategories.js) `derivePairCategories`) — beim Pair-Publish (`routes/pairs.js`, `routes/pipeline.js`) und im Job. `GET /auth/stats` liefert daraus die Punkte je Kategorie fürs Profil; `GET /auth/me` liefert `language_target_greeting` (Spalte `languages.greeting`, de/en/sv geseedet). Async-Batch-Status liegt in `category_batches`.
## Fortschritt / Gamification ([src/routes/auth.js](src/routes/auth.js))
- **Level-Kurve = Single Source of Truth:** [src/lib/leveling.js](src/lib/leveling.js) (`levelForEp`/`levelInfo`, progressive Kurve — kumulativ `5·n·(n+3)` EP, Level 1 bei 20 EP). Wird in `GET /auth/me` (liefert `level` + `ep_into_level` + `ep_to_next_level`) **und** `POST /auth/progress` genutzt. Das Frontend spiegelt dieselbe Kurve nur als Fallback — Kurvenänderungen hier vornehmen.
- **`POST /auth/progress`** bucht EP/Streak/Pair-Statistik und liefert den **Milestone-Vertrag**: `{ total_ep, level, prev_level, streak_days, streak_increased, daily_ep, daily_goal_ep, goal_just_reached, unlocked_achievements }`. Ein CTE fängt die **Pre-Update-Werte** mit, damit Level-Up/Streak-Up atomar erkennbar sind. `daily_goal_ep` via `PUT /auth/goal` (geklemmt 5500).
- **Erfolge (Achievements):** [src/lib/achievements.js](src/lib/achievements.js) definiert die Erfolge und schaltet sie **dedup-sicher** frei (`INSERT … ON CONFLICT DO NOTHING RETURNING` → nur Neues). Persistenz in Tabelle `user_achievements` (Migration in `db-migrate.js`). `/auth/progress` ruft `evaluateAchievements` (defensiv gekapselt, darf die Buchung nicht kippen); `GET /auth/achievements` listet alle mit Status fürs Profil.
## Konventionen
- **Code-Kommentare auf Deutsch**, Code/Bezeichner auf Englisch (dem Bestand folgen).
- Route-Handler-Muster: `async (req, res, next) => { try { … } catch (err) { next(err); } }`. Fehler an den zentralen Error-Handler in `index.js` durchreichen, nicht selbst 500en.
- Listen-Endpoints: `limit`/`offset` aus Query, `limit` hart begrenzen (z. B. `Math.min(parseInt(limit), 500)`).
- Status-Felder gegen eine `STATUSES`-Whitelist prüfen → bei Verstoß `400`.
- **Sprachen-Suffixe: `_de`, `_en`, `_sv`.** `_se` ist veraltet (falscher ISO-639-1-Code) und wird beim Boot zu `_sv` umbenannt — niemals neue `_se`-Spalten anlegen.
## Bild-Generierungs-Pipeline
- **`prompt_styles`** — Seed-Tabelle mit fertigen Prompt-Bausteinen (Typen: `fix` / `atmosphere` / `setting`). `themenfeld_id` gruppiert Settings nach Thema (plain UUID, kein FK). 38 Einträge werden beim Boot geseeded. Route: `/api/prompt-styles`.
- **`picture_jobs`** — Job-Queue für die Bildgenerierung. Referenziert eine Kategorie, bis zu drei Prompt-Styles und nach Generierung ein Bild. M2M-Words über `picture_job_words`. Status-Flow: `pending → generating → done | failed`. Route: `/api/picture-jobs` (inkl. `/words`-Subroute für M2M).
## Auth (zwei Pfade, siehe [src/middleware/auth.js](src/middleware/auth.js))
1. Statische Tokens aus `API_TOKENS` (komma-separiert) → Server-zu-Server / Admin, keine Rollenprüfung.
2. JWT aus `/auth/login` · `/auth/register`. Rolle `end-user` bekommt auf allen `/api/*` bewusst **403** (App-Gating).
Öffentlich (ohne Auth): `GET /health`, `/auth/*`.
Konfig über `.env` (siehe [.env.example](.env.example)). Deployment via Coolify/Docker.

View File

@@ -0,0 +1,71 @@
// Einmaliger Import der Brysbaert-Concreteness-CSV in die words-Tabelle.
// Verwendung: node scripts/import-brysbaert.js <pfad-zur-csv>
// Setzt titel_en + conc_m; status = 'requested'. Bestehende Zeilen (gleicher titel_en)
// bekommen nur conc_m aktualisiert — alle anderen Felder bleiben unverändert.
require('dotenv').config({ path: require('path').join(__dirname, '..', '.env') });
const { query, pool } = require('../src/db');
const fs = require('fs');
const readline = require('readline');
async function main() {
const csvPath = process.argv[2];
if (!csvPath) {
console.error('Verwendung: node scripts/import-brysbaert.js <pfad-zur-csv>');
process.exit(1);
}
if (!fs.existsSync(csvPath)) {
console.error(`Datei nicht gefunden: ${csvPath}`);
process.exit(1);
}
const rl = readline.createInterface({
input: fs.createReadStream(csvPath),
crlfDelay: Infinity,
});
let header = true;
let inserted = 0;
let updated = 0;
let skipped = 0;
let errors = 0;
for await (const line of rl) {
if (header) { header = false; continue; }
const trimmed = line.trim();
if (!trimmed) continue;
// Letztes Komma trennt Wort und Score (Wort kann Leerzeichen enthalten)
const comma = trimmed.lastIndexOf(',');
if (comma === -1) { skipped++; continue; }
const word = trimmed.slice(0, comma).trim();
const conc = parseFloat(trimmed.slice(comma + 1).trim());
if (!word || isNaN(conc)) { skipped++; continue; }
try {
const res = await query(
`INSERT INTO words (titel_en, conc_m, status, requested_at)
VALUES ($1, $2, 'requested', NOW())
ON CONFLICT (titel_en) DO UPDATE SET conc_m = EXCLUDED.conc_m
RETURNING (xmax = 0) AS is_insert`,
[word, conc]
);
if (res.rows[0]?.is_insert) inserted++;
else updated++;
} catch (err) {
errors++;
if (errors <= 5) console.error(`Fehler bei "${word}":`, err.message);
}
}
console.log(`Import abgeschlossen:`);
console.log(` ${inserted} neu eingefügt`);
console.log(` ${updated} aktualisiert (conc_m)`);
if (skipped) console.log(` ${skipped} Zeilen übersprungen (leer/ungültig)`);
if (errors) console.log(` ${errors} Fehler`);
await pool.end();
}
main().catch(err => { console.error(err); process.exit(1); });

119
scripts/upload-pictures.mjs Normal file
View File

@@ -0,0 +1,119 @@
#!/usr/bin/env node
/**
* Uploads all images from a directory to Hetzner S3 + pictures table.
* Re-encodes each file to WebP at 85% quality via cwebp.
*
* Usage:
* TOKEN=your-dev-token node scripts/upload-pictures.mjs /path/to/folder
* TOKEN=... BASE_URL=https://hyggecraftery.com/api/snakkimo node scripts/upload-pictures.mjs /path/to/folder
*/
import { readdir, readFile, unlink, writeFile } from 'fs/promises';
import { execSync } from 'child_process';
import { join, basename, extname } from 'path';
import { tmpdir } from 'os';
import { randomUUID } from 'crypto';
const TOKEN = process.env.TOKEN;
const BASE_URL = (process.env.BASE_URL || 'https://hyggecraftery.com/api/snakkimo/api').replace(/\/$/, '');
const CONCURRENCY = 4;
if (!TOKEN) {
console.error('ERROR: TOKEN env var required. Run: TOKEN=your-token node scripts/upload-pictures.mjs <dir>');
process.exit(1);
}
const dir = process.argv[2];
if (!dir) {
console.error('ERROR: Pass the image directory as argument.');
process.exit(1);
}
function extractDesign(filename) {
const name = basename(filename, extname(filename));
// Strip trailing _xxxxxxxx hash (8 hex chars)
return name.replace(/_[0-9a-f]{8}$/i, '').replace(/_/g, ' ');
}
async function apiPost(path, body) {
const res = await fetch(`${BASE_URL}${path}`, {
method: 'POST',
headers: { Authorization: `Bearer ${TOKEN}`, 'Content-Type': 'application/json' },
body: JSON.stringify(body),
});
if (!res.ok) throw new Error(`POST ${path}${res.status}: ${await res.text()}`);
return res.json();
}
async function apiUpload(pictureId, fileBuffer) {
const form = new FormData();
const blob = new Blob([fileBuffer], { type: 'image/webp' });
form.append('file', blob, `${pictureId}.webp`);
const res = await fetch(`${BASE_URL}/pictures/${pictureId}/upload`, {
method: 'POST',
headers: { Authorization: `Bearer ${TOKEN}` },
body: form,
});
if (!res.ok) throw new Error(`upload → ${res.status}: ${await res.text()}`);
return res.json();
}
async function processFile(filePath) {
const filename = basename(filePath);
const design = extractDesign(filename);
const tmpFile = join(tmpdir(), `${randomUUID()}.webp`);
try {
// Re-encode to webp at 85% quality
execSync(`cwebp -q 85 "${filePath}" -o "${tmpFile}" -quiet`, { stdio: 'pipe' });
const buffer = await readFile(tmpFile);
// 1. Create picture record
const picture = await apiPost('/pictures', { design });
// 2. Upload file
await apiUpload(picture.id, buffer);
return { ok: true, design, id: picture.id };
} finally {
await unlink(tmpFile).catch(() => {});
}
}
async function run() {
const files = (await readdir(dir))
.filter(f => /\.(webp|jpg|jpeg|png)$/i.test(f))
.map(f => join(dir, f));
console.log(`Found ${files.length} files. Uploading with concurrency ${CONCURRENCY}...\n`);
let done = 0;
const errors = [];
// Process in chunks of CONCURRENCY
for (let i = 0; i < files.length; i += CONCURRENCY) {
const chunk = files.slice(i, i + CONCURRENCY);
const results = await Promise.allSettled(chunk.map(processFile));
for (let j = 0; j < results.length; j++) {
const r = results[j];
done++;
if (r.status === 'fulfilled') {
console.log(`[${done}/${files.length}] ✓ ${r.value.design} (${r.value.id})`);
} else {
const name = basename(chunk[j]);
console.error(`[${done}/${files.length}] ✗ ${name}: ${r.reason.message}`);
errors.push({ file: name, error: r.reason.message });
}
}
}
console.log(`\nDone. ${done - errors.length} succeeded, ${errors.length} failed.`);
if (errors.length) {
console.error('\nFailed files:');
errors.forEach(e => console.error(` ${e.file}: ${e.error}`));
}
}
run().catch(err => { console.error(err); process.exit(1); });

404
src/db-migrate.js Normal file → Executable file
View File

@@ -1,6 +1,6 @@
const { query } = require('./db'); const { query } = require('./db');
async function migrate() { async function migrateCore() {
// Rename _se → _sv (Swedish ISO 639-1 correction) // Rename _se → _sv (Swedish ISO 639-1 correction)
const renames = [ const renames = [
['words', 'titel_se', 'titel_sv'], ['words', 'titel_se', 'titel_sv'],
@@ -131,6 +131,58 @@ async function migrate() {
) )
`); `);
// Feste Alltags-Taxonomie seeden (de/en/sv, published). Basis für die automatische
// Wort-Kategorisierung (src/lib/classifyWords.js) und die Kategorie-Punkte im Profil.
// Idempotent: bestehende Kategorie (z. B. "Tiere") wird wiederverwendet, keine Dubletten.
const CATEGORY_TAXONOMY = [
['Lebensmittel', 'Food', 'Mat'],
['Tiere', 'Animals', 'Djur'],
['Körper', 'Body', 'Kropp'],
['Kleidung', 'Clothing', 'Kläder'],
['Familie & Menschen','Family & People', 'Familj & människor'],
['Beruf & Arbeit', 'Job & Work', 'Jobb & arbete'],
['Haushalt', 'Household', 'Hushåll'],
['Wohnen & Möbel', 'Home & Furniture', 'Hem & möbler'],
['Natur & Pflanzen', 'Nature & Plants', 'Natur & växter'],
['Wetter', 'Weather', 'Väder'],
['Verkehr & Reisen', 'Transport & Travel', 'Transport & resor'],
['Stadt & Gebäude', 'City & Buildings', 'Stad & byggnader'],
['Schule & Bildung', 'School & Education', 'Skola & utbildning'],
['Technik & Geräte', 'Technology & Devices','Teknik & apparater'],
['Sport & Freizeit', 'Sports & Leisure', 'Sport & fritid'],
['Gefühle', 'Emotions', 'Känslor'],
['Farben', 'Colors', 'Färger'],
['Zahlen & Zeit', 'Numbers & Time', 'Tal & tid'],
['Werkzeuge', 'Tools', 'Verktyg'],
['Eigenschaften', 'Properties', 'Egenskaper'],
['Verben & Handlungen','Verbs & Actions', 'Verb & handlingar'],
['Sonstiges', 'Other', 'Övrigt'],
];
for (const [de, en, sv] of CATEGORY_TAXONOMY) {
await query(
`INSERT INTO categories (titel_de, titel_en, titel_sv, status, requested_at, published_at)
SELECT $1, $2, $3, 'published', NOW(), NOW()
WHERE NOT EXISTS (SELECT 1 FROM categories WHERE lower(titel_de) = lower($1))`,
[de, en, sv]
).catch(() => {});
}
// Bestehende Treffer auf published heben (z. B. die alte "Tiere"-Kategorie)
await query(
`UPDATE categories
SET status = 'published', published_at = COALESCE(published_at, NOW())
WHERE lower(titel_de) = ANY($1) AND status <> 'published'`,
[CATEGORY_TAXONOMY.map(([de]) => de.toLowerCase())]
).catch(() => {});
// Asynchroner Kategorisierungs-Batch (Message Batches API) — Status über Boots/Redeploys merken
await query(`
CREATE TABLE IF NOT EXISTS category_batches (
batch_id TEXT PRIMARY KEY,
status TEXT NOT NULL DEFAULT 'submitted',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
)
`);
await query(` await query(`
CREATE TABLE IF NOT EXISTS questions ( CREATE TABLE IF NOT EXISTS questions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
@@ -331,6 +383,41 @@ async function migrate() {
) )
`); `);
// M2M: pairs <-> categories — abgeleitet aus den verknüpften Wörtern (Statements + Objekte).
// Wird beim Publish materialisiert (src/lib/pairCategories.js). Basis für die Kategorie-Punkte im Profil.
await query(`
CREATE TABLE IF NOT EXISTS pair_categories (
pair_id UUID NOT NULL REFERENCES pairs(id) ON DELETE CASCADE,
category_id UUID NOT NULL REFERENCES categories(id) ON DELETE CASCADE,
PRIMARY KEY (pair_id, category_id)
)
`);
// Backfill: Kategorien für bereits veröffentlichte Pairs ableiten. Idempotent (ON CONFLICT DO NOTHING),
// nach dem Erstlauf praktisch leer, da neue Pairs ihre Kategorien beim Publish selbst materialisieren.
await query(`
INSERT INTO pair_categories (pair_id, category_id)
SELECT DISTINCT pid, category_id FROM (
SELECT p.id AS pid, wc.category_id
FROM pairs p
JOIN (
SELECT statement_id, word_id FROM statement_positive_words
UNION
SELECT statement_id, word_id FROM statement_negative_words
) sw ON sw.statement_id IN (p.positive_statement_id, p.negative_statement_id)
JOIN word_categories wc ON wc.word_id = sw.word_id
WHERE p.status = 'published'
UNION
SELECT op.pair_id AS pid, wc.category_id
FROM object_pairs op
JOIN pairs p2 ON p2.id = op.pair_id AND p2.status = 'published'
JOIN object_words ow ON ow.object_id = op.object_id
JOIN word_categories wc ON wc.word_id = ow.word_id
) src
WHERE category_id IS NOT NULL
ON CONFLICT (pair_id, category_id) DO NOTHING
`).catch(() => {});
// pairs.answer_type → single TEXT (was TEXT[], now back to single value + new 'question' type) // pairs.answer_type → single TEXT (was TEXT[], now back to single value + new 'question' type)
await query(`ALTER TABLE pairs DROP CONSTRAINT IF EXISTS pairs_answer_type_check`).catch(() => {}); await query(`ALTER TABLE pairs DROP CONSTRAINT IF EXISTS pairs_answer_type_check`).catch(() => {});
await query(` await query(`
@@ -444,6 +531,9 @@ async function migrate() {
FOR EACH ROW EXECUTE FUNCTION update_updated_at() FOR EACH ROW EXECUTE FUNCTION update_updated_at()
`); `);
// Begrüßung pro Sprache (in der Sprache selbst, z. B. sv = "Hej") — für die persönliche Profil-Anrede
await query(`ALTER TABLE languages ADD COLUMN IF NOT EXISTS greeting TEXT`).catch(() => {});
// user_names // user_names
await query(` await query(`
CREATE TABLE IF NOT EXISTS user_names ( CREATE TABLE IF NOT EXISTS user_names (
@@ -489,11 +579,25 @@ async function migrate() {
// Full unique constraint (not partial) so ON CONFLICT works cleanly // Full unique constraint (not partial) so ON CONFLICT works cleanly
await query(`CREATE UNIQUE INDEX IF NOT EXISTS languages_short_en_idx ON languages (short_en)`).catch(() => {}); await query(`CREATE UNIQUE INDEX IF NOT EXISTS languages_short_en_idx ON languages (short_en)`).catch(() => {});
await query(` await query(`
INSERT INTO languages (short_en, titel_de, titel_en, titel_sv, status, published_at) INSERT INTO languages (short_en, titel_de, titel_en, titel_sv, greeting, status, published_at)
VALUES VALUES
('en', 'Englisch', 'English', 'Engelska', 'published', NOW()), ('en', 'Englisch', 'English', 'Engelska', 'Hi', 'published', NOW()),
('sv', 'Schwedisch', 'Swedish', 'Svenska', 'published', NOW()) ('sv', 'Schwedisch', 'Swedish', 'Svenska', 'Hej', 'published', NOW())
ON CONFLICT (short_en) DO UPDATE SET status = EXCLUDED.status, published_at = COALESCE(languages.published_at, EXCLUDED.published_at) ON CONFLICT (short_en) DO UPDATE SET
status = EXCLUDED.status,
published_at = COALESCE(languages.published_at, EXCLUDED.published_at),
greeting = COALESCE(languages.greeting, EXCLUDED.greeting)
`).catch(() => {});
// Begrüßung robust nachtragen (das ON-CONFLICT-Update oben greift bei bereits
// existierenden en/sv-Zeilen nicht zuverlässig → hier explizit, idempotent).
await query(`
UPDATE languages
SET greeting = CASE short_en
WHEN 'de' THEN 'Hallo'
WHEN 'en' THEN 'Hi'
WHEN 'sv' THEN 'Hej'
END
WHERE short_en IN ('de', 'en', 'sv') AND greeting IS NULL
`).catch(() => {}); `).catch(() => {});
// Seed bbox for watermelon test object (only if bbox_x is still NULL) // Seed bbox for watermelon test object (only if bbox_x is still NULL)
@@ -534,6 +638,31 @@ async function migrate() {
FOR EACH ROW EXECUTE FUNCTION update_last_seen_at() FOR EACH ROW EXECUTE FUNCTION update_last_seen_at()
`); `);
// user_daily_activity — Tagesverlauf für Streak-Kalender, Wochengraph, Tagesziel
await query(`
CREATE TABLE IF NOT EXISTS user_daily_activity (
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
activity_date DATE NOT NULL,
ep_earned INTEGER NOT NULL DEFAULT 0,
cards_done INTEGER NOT NULL DEFAULT 0,
correct_count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (user_id, activity_date)
)
`);
// Tagesziel (EP/Tag) auf dem App-Profil
await query(`ALTER TABLE users_public ADD COLUMN IF NOT EXISTS daily_goal_ep INTEGER NOT NULL DEFAULT 30`).catch(() => {});
// Freigeschaltete Erfolge je User (ein Eintrag pro Erfolg, dedup-sicher)
await query(`
CREATE TABLE IF NOT EXISTS user_achievements (
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
achievement_key VARCHAR(40) NOT NULL,
unlocked_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (user_id, achievement_key)
)
`);
// audios // audios
await query(` await query(`
CREATE TABLE IF NOT EXISTS audios ( CREATE TABLE IF NOT EXISTS audios (
@@ -642,6 +771,140 @@ async function migrate() {
ON CONFLICT (key) DO NOTHING ON CONFLICT (key) DO NOTHING
`).catch(() => {}); `).catch(() => {});
// ── Brysbaert-Erweiterungen ─────────────────────────────────────────────────
// parent_id auf categories (self-referential, Oberkategorie → Unterkategorie)
await query(`ALTER TABLE categories ADD COLUMN IF NOT EXISTS parent_id UUID REFERENCES categories(id) ON DELETE SET NULL`).catch(() => {});
// Unterkategorien seeden. Die bestehenden 22 Einträge sind die Oberkategorien (parent_id = NULL).
const SUBCATEGORY_TAXONOMY = [
// Lebensmittel
['Obst', 'Fruit', 'Frukt', 'Lebensmittel'],
['Gemüse', 'Vegetables', 'Grönsaker', 'Lebensmittel'],
['Fleisch & Fisch', 'Meat & Fish', 'Kött & fisk', 'Lebensmittel'],
['Backwaren & Getreide', 'Baked Goods & Grains', 'Bröd & spannmål', 'Lebensmittel'],
['Milchprodukte', 'Dairy', 'Mejeriprodukter', 'Lebensmittel'],
['Getränke', 'Drinks', 'Drycker', 'Lebensmittel'],
['Gewürze & Kräuter', 'Spices & Herbs', 'Kryddor & örter', 'Lebensmittel'],
['Süßigkeiten & Snacks', 'Sweets & Snacks', 'Sötsaker & snacks', 'Lebensmittel'],
// Tiere
['Haustiere', 'Pets', 'Husdjur', 'Tiere'],
['Wildtiere', 'Wild Animals', 'Vilda djur', 'Tiere'],
['Vögel', 'Birds', 'Fåglar', 'Tiere'],
['Reptilien & Amphibien', 'Reptiles & Amphibians', 'Reptiler & amfibier', 'Tiere'],
['Insekten & Spinnen', 'Insects & Spiders', 'Insekter & spindlar', 'Tiere'],
['Meerestiere', 'Sea Animals', 'Havsdjur', 'Tiere'],
// Körper
['Kopf & Gesicht', 'Head & Face', 'Huvud & ansikte', 'Körper'],
['Rumpf', 'Torso', 'Bål', 'Körper'],
['Arme & Beine', 'Arms & Legs', 'Armar & ben', 'Körper'],
['Innere Organe', 'Internal Organs', 'Inre organ', 'Körper'],
['Körperpflege', 'Personal Care', 'Kroppsvård', 'Körper'],
// Kleidung
['Oberbekleidung', 'Tops & Outerwear', 'Överkläder', 'Kleidung'],
['Unterbekleidung', 'Underwear', 'Underkläder', 'Kleidung'],
['Kopfbedeckung', 'Headwear', 'Huvudbonader', 'Kleidung'],
['Schuhe & Socken', 'Shoes & Socks', 'Skor & strumpor', 'Kleidung'],
['Accessoires', 'Accessories', 'Accessoarer', 'Kleidung'],
// Familie & Menschen
['Familienmitglieder', 'Family Members', 'Familjemedlemmar', 'Familie & Menschen'],
['Berufe & Titel', 'Professions & Titles', 'Yrken & titlar', 'Familie & Menschen'],
['Beziehungen', 'Relationships', 'Relationer', 'Familie & Menschen'],
// Haushalt
['Küchenutensilien', 'Kitchen Utensils', 'Köksredskap', 'Haushalt'],
['Reinigung & Pflege', 'Cleaning & Care', 'Rengöring & vård', 'Haushalt'],
['Verpackung & Behälter', 'Packaging & Containers', 'Förpackningar & behållare','Haushalt'],
// Wohnen & Möbel
['Zimmer & Räume', 'Rooms & Spaces', 'Rum & utrymmen', 'Wohnen & Möbel'],
['Möbel', 'Furniture', 'Möbler', 'Wohnen & Möbel'],
['Beleuchtung & Elektro', 'Lighting & Electronics', 'Belysning & el', 'Wohnen & Möbel'],
// Natur & Pflanzen
['Pflanzen & Blumen', 'Plants & Flowers', 'Växter & blommor', 'Natur & Pflanzen'],
['Bäume & Sträucher', 'Trees & Shrubs', 'Träd & buskar', 'Natur & Pflanzen'],
['Landschaftsmerkmale', 'Landscape Features', 'Landskapsdrag', 'Natur & Pflanzen'],
['Gesteine & Böden', 'Rocks & Soils', 'Stenar & jordar', 'Natur & Pflanzen'],
// Verkehr & Reisen
['Fahrzeuge (Land)', 'Land Vehicles', 'Landfordon', 'Verkehr & Reisen'],
['Fahrzeuge (Wasser & Luft)', 'Water & Air Vehicles', 'Vatten- & luftfordon', 'Verkehr & Reisen'],
['Straße & Infrastruktur', 'Roads & Infrastructure', 'Vägar & infrastruktur', 'Verkehr & Reisen'],
// Stadt & Gebäude
['Gebäude & Orte', 'Buildings & Places', 'Byggnader & platser', 'Stadt & Gebäude'],
['Innenräume & Bereiche', 'Indoor Spaces & Areas', 'Inomhusutrymmen', 'Stadt & Gebäude'],
// Technik & Geräte
['Haushaltsgeräte', 'Household Appliances', 'Hushållsapparater', 'Technik & Geräte'],
['Elektronik & Computer', 'Electronics & Computers', 'Elektronik & datorer', 'Technik & Geräte'],
['Werkzeuge & Maschinen', 'Tools & Machines', 'Verktyg & maskiner', 'Technik & Geräte'],
// Sport & Freizeit
['Sport & Bewegung', 'Sports & Exercise', 'Sport & rörelse', 'Sport & Freizeit'],
['Spiele & Spielzeug', 'Games & Toys', 'Spel & leksaker', 'Sport & Freizeit'],
['Kunst & Musik', 'Arts & Music', 'Konst & musik', 'Sport & Freizeit'],
];
for (const [de, en, sv, parentDe] of SUBCATEGORY_TAXONOMY) {
await query(
`INSERT INTO categories (titel_de, titel_en, titel_sv, status, published_at, parent_id)
SELECT $1, $2, $3, 'published', NOW(),
(SELECT id FROM categories WHERE lower(titel_de) = lower($4) AND parent_id IS NULL LIMIT 1)
WHERE NOT EXISTS (SELECT 1 FROM categories WHERE lower(titel_de) = lower($1))`,
[de, en, sv, parentDe]
).catch(() => {});
}
// Neue Spalten auf words (Brysbaert-Import + Anreicherung)
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS conc_m NUMERIC(4,2)`).catch(() => {});
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS dom_pos VARCHAR(20)`).catch(() => {});
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS level VARCHAR(5)`).catch(() => {});
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS themenfeld_id UUID`).catch(() => {});
await query(`ALTER TABLE words ADD CONSTRAINT words_themenfeld_id_fkey FOREIGN KEY (themenfeld_id) REFERENCES categories(id) ON DELETE SET NULL`).catch(() => {});
await query(`ALTER TABLE words DROP CONSTRAINT IF EXISTS words_dom_pos_check`).catch(() => {});
await query(`ALTER TABLE words ADD CONSTRAINT words_dom_pos_check CHECK (dom_pos IN ('noun', 'verb', 'adjective', 'other'))`).catch(() => {});
await query(`ALTER TABLE words DROP CONSTRAINT IF EXISTS words_level_check`).catch(() => {});
await query(`ALTER TABLE words ADD CONSTRAINT words_level_check CHECK (level IN ('A1', 'A2', 'B1'))`).catch(() => {});
// Unique-Index auf titel_en — Voraussetzung für ON CONFLICT im CSV-Import.
// Partiell (WHERE IS NOT NULL) damit bestehende NULL-Zeilen den Index nicht blockieren.
// Doppelte non-null titel_en erst bereinigen, dann Index anlegen.
await query(`
DELETE FROM words w
USING (
SELECT titel_en, MAX(created_at) AS keep_at
FROM words WHERE titel_en IS NOT NULL
GROUP BY titel_en HAVING COUNT(*) > 1
) dup
WHERE w.titel_en = dup.titel_en AND w.created_at < dup.keep_at
`).catch(() => {});
await query(
`CREATE UNIQUE INDEX IF NOT EXISTS words_titel_en_key ON words (titel_en) WHERE titel_en IS NOT NULL`
);
// enrich_batches — Status-Tracking für Wort-Anreicherungs-Batches (analog category_batches)
await query(`
CREATE TABLE IF NOT EXISTS enrich_batches (
batch_id TEXT PRIMARY KEY,
status TEXT NOT NULL DEFAULT 'submitted',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
)
`);
// word_generative — Pipeline für KI-generierte Wort-Bilder
await query(`
CREATE TABLE IF NOT EXISTS word_generative (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
word_id UUID NOT NULL REFERENCES words(id) ON DELETE CASCADE,
prompt TEXT,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'generating', 'generated', 'accepted', 'rejected')),
picture_link TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
)
`);
await query(`
DROP TRIGGER IF EXISTS word_generative_updated_at ON word_generative;
CREATE TRIGGER word_generative_updated_at
BEFORE UPDATE ON word_generative
FOR EACH ROW EXECUTE FUNCTION update_updated_at()
`);
// ── Migrate old {{uuid}} placeholders → new {{label.w:uuid}} / {{label.o:uuid}} ── // ── Migrate old {{uuid}} placeholders → new {{label.w:uuid}} / {{label.o:uuid}} ──
await migratePlaceholders(); await migratePlaceholders();
@@ -731,4 +994,135 @@ async function migratePlaceholders() {
if (count > 0) console.log(`Placeholder migration: updated ${count} rows`); if (count > 0) console.log(`Placeholder migration: updated ${count} rows`);
} }
// ── Prompt-Styles & Picture-Jobs ──────────────────────────────────────────────
async function migratePromptStyles() {
await query(`
CREATE TABLE IF NOT EXISTS prompt_styles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
type VARCHAR(20) NOT NULL CHECK (type IN ('fix', 'atmosphere', 'setting')),
kategorie_id UUID,
text_en TEXT NOT NULL
)
`);
// Umbenennung themenfeld_id → kategorie_id (idempotent)
await query(`ALTER TABLE prompt_styles RENAME COLUMN themenfeld_id TO kategorie_id`).catch(() => {});
// FK auf categories nachrüsten (idempotent)
await query(`
ALTER TABLE prompt_styles
ADD CONSTRAINT prompt_styles_kategorie_fk
FOREIGN KEY (kategorie_id) REFERENCES categories(id) ON DELETE SET NULL
`).catch(() => {});
// Seed-Daten aus prompt_styles.csv (idempotent per id, kategorie_id zunächst null)
const seeds = [
{ id: 'b0f5c2a4-a95d-426f-a01c-0edc53e719b8', type: 'fix', text_en: 'hyperrealistic photography, natural unposed moment, shot on Canon EOS R5, ambient natural light, no color grading, razor sharp details, photorealistic textures, each object clearly visible and spatially separated, 8k' },
{ id: '62015070-1fbe-40b8-b293-8c39ae5994c3', type: 'atmosphere', text_en: 'misty autumn morning, golden hour light breaking through cool gray clouds, frost on the ground, dew on surfaces' },
{ id: 'd644f215-25b9-49be-87ea-629d7d8acb78', type: 'atmosphere', text_en: 'bright summer midday, harsh direct sunlight, vivid colors, dry warm air' },
{ id: 'da0a5339-37f5-47be-ba63-1fbf6c1e9f90', type: 'atmosphere', text_en: 'overcast spring day, soft diffused light, fresh green tones, slightly cool atmosphere' },
{ id: '11a8edb4-90a3-48a8-8407-31056644b55a', type: 'atmosphere', text_en: 'golden winter afternoon, low sun casting long shadows, bare trees, cold crisp air' },
{ id: '97bad727-6555-4f48-9a68-17dd5ce85535', type: 'atmosphere', text_en: 'early morning blue hour, soft cool light, calm and quiet atmosphere, slight fog' },
{ id: '6de167ef-5a87-4333-9325-cc31ccd9db05', type: 'atmosphere', text_en: 'warm summer evening, golden orange glow, long shadows, relaxed atmosphere' },
{ id: '082cc098-4c26-4d9a-b3a1-209dd9e507ea', type: 'setting', text_en: 'open green meadow with wooden fence, rolling hills in soft background, natural habitat' },
{ id: 'f0ef007a-c763-4c40-99c0-1bd17901739e', type: 'setting', text_en: 'dense forest edge with dappled light, mossy ground, wild and untouched environment' },
{ id: 'b809f859-2592-4207-8111-7da05e7057c9', type: 'setting', text_en: 'cozy living room corner, warm home environment, soft natural light from window' },
{ id: '28dac228-c335-46d2-9b40-481dc9e2b373', type: 'setting', text_en: 'shallow clear river bank, rocky ground, water reflections, natural wetland' },
{ id: '89cfbdf7-7fbc-439a-9265-73f18124e372', type: 'setting', text_en: 'rustic wooden kitchen counter, natural light from nearby window, linen cloth underneath' },
{ id: 'e7faf2ec-78e1-43bc-b870-c363f7ec2032', type: 'setting', text_en: 'outdoor farmers market stall, weathered wooden crates, morning light, earthy atmosphere' },
{ id: '45dc2aee-d223-4952-943d-cdbe86b7e8c3', type: 'setting', text_en: 'garden harvest scene, soil and greenery visible, freshly picked produce on ground' },
{ id: '5589aa12-ee74-4041-9443-40e9cfa538fd', type: 'setting', text_en: 'simple white kitchen table, clean minimal background, soft indoor daylight' },
{ id: '738365f1-b000-4dde-8e99-9b90f6984b79', type: 'setting', text_en: 'neutral light studio setting, clean background, soft natural sidelight, medical clarity' },
{ id: '98f1c118-b333-43ba-9167-870af883b5ae', type: 'setting', text_en: 'warm bathroom environment, mirror and soft light, everyday personal care setting' },
{ id: '2b81a5c9-7328-41e9-b08e-0d98d9a5c78f', type: 'setting', text_en: 'flat lay on light wooden surface, natural window light, clean and minimal styling' },
{ id: '2a3a4eed-ba32-4b21-8dad-1cf5679b00fb', type: 'setting', text_en: 'cozy bedroom setting, clothes laid out on bed, soft morning light' },
{ id: 'c816e95e-5edc-4ae9-8c0d-9c71a5a4dfb6', type: 'setting', text_en: 'outdoor market rack, hangers visible, casual everyday atmosphere' },
{ id: '33af0241-c19d-4429-91b5-0359c1f973e4', type: 'setting', text_en: 'warm living room, family home atmosphere, soft afternoon light through curtains' },
{ id: '153e70c4-f011-42af-ba0f-8ab82bf920ab', type: 'setting', text_en: 'outdoor garden or backyard, relaxed family setting, natural daylight' },
{ id: '9fe7fc4a-6578-4ee0-8a8e-a885e89e58c1', type: 'setting', text_en: 'bright kitchen countertop, clean and organized, natural window light' },
{ id: '46dab63b-7b3d-45e7-9ea9-4a4a67e9fabd', type: 'setting', text_en: 'utility room or bathroom shelf, everyday cleaning supplies visible, practical setting' },
{ id: '28246e90-4ac8-444f-be23-de401365d38d', type: 'setting', text_en: 'cozy Scandinavian living room, warm tones, natural materials, soft indirect light' },
{ id: '5143c10f-d717-4698-88f5-f1598d0eeef9', type: 'setting', text_en: 'bright airy bedroom, white walls, minimal furniture, morning sunlight' },
{ id: 'd23d7050-dc22-4226-8a5b-79e75f11de8b', type: 'setting', text_en: 'open countryside landscape, wide sky, natural untouched terrain, peaceful atmosphere' },
{ id: '34c6a784-7a32-4f84-a06d-f546c9c9fbea', type: 'setting', text_en: 'forest floor close-up, mossy rocks, fallen leaves, soft filtered light through canopy' },
{ id: '1fc61dd9-57c6-4eba-8328-37cbf5fc135e', type: 'setting', text_en: 'garden bed with rich dark soil, plants at various growth stages, earthy tones' },
{ id: '3244f090-f2a2-4806-875a-88038598fc5e', type: 'setting', text_en: 'quiet suburban street, cobblestone or asphalt road, parked vehicles, everyday scene' },
{ id: '36d80c19-13ea-4672-b2e9-8ceedb4ab178', type: 'setting', text_en: 'rural road with open fields, minimal traffic, wide sky, natural light' },
{ id: '98957b0a-f415-4282-9b3d-863a9bf03a77', type: 'setting', text_en: 'busy European city street, historic buildings in background, natural daylight' },
{ id: '66fa361a-e062-4adc-9c9a-3e01ac8dbbe0', type: 'setting', text_en: 'quiet town square, fountain or bench visible, calm everyday atmosphere' },
{ id: '2dba4303-c743-419f-a7e8-06b6d54ba91d', type: 'setting', text_en: 'clean modern workspace, desk surface, natural sidelight, organized tools' },
{ id: 'a78df43b-8897-40dd-9ccf-de29ff9bf5da', type: 'setting', text_en: 'garage or workshop setting, workbench with tools, practical everyday environment' },
{ id: '949774d1-0678-4683-9b8e-e5568f648ba8', type: 'setting', text_en: 'outdoor park or sports field, open space, natural daylight, active atmosphere' },
{ id: '9b35a717-03dd-41aa-a60e-90dff8bc5aaf', type: 'setting', text_en: 'cozy indoor hobby room, soft warm light, creative materials visible' },
];
for (const s of seeds) {
await query(
`INSERT INTO prompt_styles (id, type, text_en)
SELECT $1, $2, $3
WHERE NOT EXISTS (SELECT 1 FROM prompt_styles WHERE id = $1)`,
[s.id, s.type, s.text_en]
).catch(() => {});
}
// kategorie_id per Kategoriename befüllen (idempotent, unabhängig von Category-UUIDs)
const THEME_MAP = [
{ en: 'Animals', ids: ['082cc098-4c26-4d9a-b3a1-209dd9e507ea', 'f0ef007a-c763-4c40-99c0-1bd17901739e', 'b809f859-2592-4207-8111-7da05e7057c9', '28dac228-c335-46d2-9b40-481dc9e2b373'] },
{ en: 'Food', ids: ['89cfbdf7-7fbc-439a-9265-73f18124e372', 'e7faf2ec-78e1-43bc-b870-c363f7ec2032', '45dc2aee-d223-4952-943d-cdbe86b7e8c3', '5589aa12-ee74-4041-9443-40e9cfa538fd'] },
{ en: 'Body', ids: ['738365f1-b000-4dde-8e99-9b90f6984b79', '98f1c118-b333-43ba-9167-870af883b5ae'] },
{ en: 'Clothing', ids: ['2b81a5c9-7328-41e9-b08e-0d98d9a5c78f', '2a3a4eed-ba32-4b21-8dad-1cf5679b00fb', 'c816e95e-5edc-4ae9-8c0d-9c71a5a4dfb6'] },
{ en: 'Family & People', ids: ['33af0241-c19d-4429-91b5-0359c1f973e4', '153e70c4-f011-42af-ba0f-8ab82bf920ab'] },
{ en: 'Household', ids: ['9fe7fc4a-6578-4ee0-8a8e-a885e89e58c1', '46dab63b-7b3d-45e7-9ea9-4a4a67e9fabd'] },
{ en: 'Home & Furniture', ids: ['28246e90-4ac8-444f-be23-de401365d38d', '5143c10f-d717-4698-88f5-f1598d0eeef9'] },
{ en: 'Nature & Plants', ids: ['d23d7050-dc22-4226-8a5b-79e75f11de8b', '34c6a784-7a32-4f84-a06d-f546c9c9fbea', '1fc61dd9-57c6-4eba-8328-37cbf5fc135e'] },
{ en: 'Transport & Travel',ids: ['3244f090-f2a2-4806-875a-88038598fc5e', '36d80c19-13ea-4672-b2e9-8ceedb4ab178'] },
{ en: 'City & Buildings', ids: ['98957b0a-f415-4282-9b3d-863a9bf03a77', '66fa361a-e062-4adc-9c9a-3e01ac8dbbe0'] },
{ en: 'Tools', ids: ['2dba4303-c743-419f-a7e8-06b6d54ba91d', 'a78df43b-8897-40dd-9ccf-de29ff9bf5da'] },
{ en: 'Sports & Leisure', ids: ['949774d1-0678-4683-9b8e-e5568f648ba8', '9b35a717-03dd-41aa-a60e-90dff8bc5aaf'] },
];
for (const { en, ids } of THEME_MAP) {
await query(
`UPDATE prompt_styles
SET kategorie_id = (SELECT id FROM categories WHERE lower(titel_en) = lower($1) LIMIT 1)
WHERE id = ANY($2::uuid[])
AND kategorie_id IS DISTINCT FROM
(SELECT id FROM categories WHERE lower(titel_en) = lower($1) LIMIT 1)`,
[en, ids]
).catch(() => {});
}
}
async function migratePictureJobs() {
await query(`
CREATE TABLE IF NOT EXISTS picture_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
kategorie_id UUID REFERENCES categories(id) ON DELETE SET NULL,
prompt_fix UUID REFERENCES prompt_styles(id) ON DELETE SET NULL,
prompt_atmosphere UUID REFERENCES prompt_styles(id) ON DELETE SET NULL,
prompt_setting UUID REFERENCES prompt_styles(id) ON DELETE SET NULL,
prompt_final TEXT,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'generating', 'done', 'failed')),
picture_id UUID REFERENCES pictures(id) ON DELETE SET NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
)
`);
await query(`
CREATE TABLE IF NOT EXISTS picture_job_words (
picture_job_id UUID NOT NULL REFERENCES picture_jobs(id) ON DELETE CASCADE,
word_id UUID NOT NULL REFERENCES words(id) ON DELETE CASCADE,
PRIMARY KEY (picture_job_id, word_id)
)
`);
}
async function migrate() {
await migrateCore();
await migratePromptStyles();
await migratePictureJobs();
}
module.exports = migrate; module.exports = migrate;

15
src/index.js Normal file → Executable file
View File

@@ -44,6 +44,9 @@ app.use('/api/audios', auth, require('./routes/audios'));
app.use('/api/tts-settings', auth, require('./routes/tts-settings')); app.use('/api/tts-settings', auth, require('./routes/tts-settings'));
app.use('/api/claude', auth, require('./routes/claude')); app.use('/api/claude', auth, require('./routes/claude'));
app.use('/api/pipeline', auth, require('./routes/pipeline')); app.use('/api/pipeline', auth, require('./routes/pipeline'));
app.use('/api/word-generative', auth, require('./routes/wordGenerative'));
app.use('/api/prompt-styles', auth, require('./routes/prompt-styles'));
app.use('/api/picture-jobs', auth, require('./routes/picture-jobs'));
// 404 // 404
app.use((req, res) => { app.use((req, res) => {
@@ -62,5 +65,17 @@ migrate()
// Hängengebliebene Pipeline-Läufe (z.B. nach Redeploy) wieder aufnehmen // Hängengebliebene Pipeline-Läufe (z.B. nach Redeploy) wieder aufnehmen
require('./lib/pipeline').resumePending() require('./lib/pipeline').resumePending()
.catch(err => console.error('Pipeline-Resume fehlgeschlagen:', err)); .catch(err => console.error('Pipeline-Resume fehlgeschlagen:', err));
// Automatische Wort-Kategorisierung (Message Batches API): kurz nach Boot + stündlich.
// Submit/Collect-Ticks, entkoppelt von generate-words und Publish.
const { runCategorizationTick } = require('./lib/classifyWords');
const { runEnrichTick } = require('./lib/enrichWords');
const HOUR = 60 * 60 * 1000;
const tick = () => runCategorizationTick().catch(err => console.error('Auto-Kategorisierung:', err.message));
const enrichTick = () => runEnrichTick().catch(err => console.error('Auto-Anreicherung:', err.message));
setTimeout(tick, 30_000);
setTimeout(enrichTick, 60_000);
setInterval(tick, HOUR);
setInterval(enrichTick, HOUR);
}) })
.catch(err => { console.error('Migration failed:', err); process.exit(1); }); .catch(err => { console.error('Migration failed:', err); process.exit(1); });

70
src/lib/achievements.js Normal file
View File

@@ -0,0 +1,70 @@
const { query } = require('../db');
const { levelForEp } = require('./leveling');
// Erfolg-Definitionen. check(s) bekommt aggregierte Kennzahlen des Users.
const DEFS = [
{ key: 'first_card', label: 'Erster Schritt', icon: '🌱', check: s => s.total_cards >= 1 },
{ key: 'cards_50', label: '50 Karten', icon: '📦', check: s => s.total_cards >= 50 },
{ key: 'cards_100', label: '100 Karten', icon: '💯', check: s => s.total_cards >= 100 },
{ key: 'streak_3', label: '3 Tage am Stück', icon: '🔥', check: s => s.streak_days >= 3 },
{ key: 'streak_7', label: '7 Tage am Stück', icon: '🔥', check: s => s.streak_days >= 7 },
{ key: 'streak_30', label: '30 Tage am Stück',icon: '🏅', check: s => s.streak_days >= 30 },
{ key: 'level_5', label: 'Level 5', icon: '⭐', check: s => s.level >= 5 },
{ key: 'level_10', label: 'Level 10', icon: '🌟', check: s => s.level >= 10 },
{ key: 'category_master', label: 'Themen-Meister', icon: '🏆', check: s => s.max_cat >= 25 },
];
const BY_KEY = Object.fromEntries(DEFS.map(d => [d.key, d]));
// Aggregierte Kennzahlen für die Erfolg-Checks (eine Query).
async function aggregates(userId, known = {}) {
const r = await query(
`SELECT
COALESCE((SELECT SUM(seen_count) FROM user_pair_progress WHERE user_id = $1), 0)::int AS total_cards,
COALESCE((SELECT MAX(pts) FROM (
SELECT SUM(upp.earned_points) AS pts
FROM user_pair_progress upp
JOIN pair_categories pc ON pc.pair_id = upp.pair_id
WHERE upp.user_id = $1
GROUP BY pc.category_id
) s), 0)::int AS max_cat`,
[userId]
);
return { total_cards: r.rows[0].total_cards, max_cat: r.rows[0].max_cat, ...known };
}
// Wertet Erfolge aus und schaltet neue frei. Gibt NUR neu freigeschaltete zurück
// (ON CONFLICT DO NOTHING … RETURNING liefert ausschließlich neu eingefügte Zeilen).
async function evaluateAchievements(userId, { total_ep, streak_days }) {
const level = levelForEp(total_ep || 0);
const agg = await aggregates(userId, { total_ep, streak_days, level });
const satisfied = DEFS.filter(d => d.check(agg)).map(d => d.key);
if (!satisfied.length) return [];
const values = satisfied.map((_, i) => `($1, $${i + 2})`).join(', ');
const r = await query(
`INSERT INTO user_achievements (user_id, achievement_key)
VALUES ${values}
ON CONFLICT (user_id, achievement_key) DO NOTHING
RETURNING achievement_key`,
[userId, ...satisfied]
);
return r.rows.map(row => {
const d = BY_KEY[row.achievement_key];
return { key: d.key, label: d.label, icon: d.icon };
});
}
// Alle Erfolge mit Freischalt-Status (für die Profil-Sektion).
async function listAchievements(userId) {
const r = await query(
`SELECT achievement_key, unlocked_at FROM user_achievements WHERE user_id = $1`,
[userId]
);
const unlocked = new Map(r.rows.map(x => [x.achievement_key, x.unlocked_at]));
return DEFS.map(d => ({
key: d.key, label: d.label, icon: d.icon,
unlocked: unlocked.has(d.key),
unlocked_at: unlocked.get(d.key) || null,
}));
}
module.exports = { evaluateAchievements, listAchievements, DEFS };

309
src/lib/classifyWords.js Normal file
View File

@@ -0,0 +1,309 @@
// Automatische Wort-Kategorisierung über die Anthropic Message Batches API (asynchron, ~50% günstiger).
// Entkoppelt vom generate-words-Prompt und vom Publish-Flow: ein stündlicher Job (src/index.js)
// findet Wörter, die in Pairs verwendet werden aber noch keine Kategorie haben, lässt sie von Haiku
// gegen die feste Taxonomie (src/db-migrate.js) klassifizieren und materialisiert danach pair_categories.
const { query } = require('../db');
const { resolvePlaceholdersToLabels } = require('./placeholders');
const { derivePairCategories } = require('./pairCategories');
const ANTHROPIC_BASE = 'https://api.anthropic.com';
const MODEL = 'claude-haiku-4-5-20251001';
const BATCH_LIMIT = 500; // max. Wörter pro Submit (Batches API erlaubt bis 100k)
const MAX_EXAMPLES = 3;
let running = false; // Overlap-Schutz zwischen Ticks
function headers() {
const apiKey = process.env.ANTHROPIC_API_KEY;
if (!apiKey) throw new Error('ANTHROPIC_API_KEY nicht konfiguriert');
return { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' };
}
// Veröffentlichte Kategorien laden → Map (lower(titel_de|titel_en) → {id, titel_de}) + Namensliste fürs Prompt.
async function loadCategories() {
const r = await query(`SELECT id, titel_de, titel_en FROM categories WHERE status = 'published'`);
const byName = new Map();
for (const c of r.rows) {
if (c.titel_de) byName.set(c.titel_de.toLowerCase(), c);
if (c.titel_en) byName.set(c.titel_en.toLowerCase(), c);
}
return { rows: r.rows, byName };
}
// Wörter ohne Kategorie, die in Pairs (Statements oder Objekte) verwendet werden.
async function findUncategorizedUsedWords(limit = BATCH_LIMIT) {
const r = await query(
`SELECT w.id, w.titel_de, w.titel_en, w.titel_sv
FROM words w
WHERE NOT EXISTS (SELECT 1 FROM word_categories wc WHERE wc.word_id = w.id)
AND (
EXISTS (SELECT 1 FROM statement_positive_words spw WHERE spw.word_id = w.id)
OR EXISTS (SELECT 1 FROM statement_negative_words snw WHERE snw.word_id = w.id)
OR EXISTS (SELECT 1 FROM object_words ow WHERE ow.word_id = w.id)
)
AND COALESCE(w.titel_de, w.titel_en, w.titel_sv) IS NOT NULL
ORDER BY w.created_at DESC
LIMIT $1`,
[limit]
);
return r.rows;
}
// Bis zu `max` englische Beispielsätze, die das Wort enthalten (Tokens → Labels, ohne uuid).
async function examplesForWord(wordId, max = MAX_EXAMPLES) {
const out = [];
const seen = new Set();
const push = (s) => {
const t = resolvePlaceholdersToLabels(s || '').trim();
if (t && !seen.has(t.toLowerCase())) { seen.add(t.toLowerCase()); out.push(t); }
};
const stmt = await query(
`SELECT s.positive_sentence_en AS s
FROM statement_positive_words spw JOIN statements s ON s.id = spw.statement_id
WHERE spw.word_id = $1 AND s.positive_sentence_en IS NOT NULL
UNION
SELECT s.negative_sentence_en
FROM statement_negative_words snw JOIN statements s ON s.id = snw.statement_id
WHERE snw.word_id = $1 AND s.negative_sentence_en IS NOT NULL
LIMIT 10`,
[wordId]
);
for (const row of stmt.rows) { push(row.s); if (out.length >= max) return out; }
const qs = await query(
`SELECT DISTINCT q.sentence_en AS s
FROM object_words ow
JOIN object_pairs op ON op.object_id = ow.object_id
JOIN pairs p ON p.id = op.pair_id
JOIN questions q ON q.id = p.question_id
WHERE ow.word_id = $1 AND q.sentence_en IS NOT NULL
LIMIT 10`,
[wordId]
);
for (const row of qs.rows) { push(row.s); if (out.length >= max) break; }
return out;
}
// Gemeinsame Klassifizierungs-Regeln. Drückt Sonstiges stark zurück und gibt Wortart-Hinweise.
const CLASSIFY_RULES =
`Rules:\n` +
`- Pick the SINGLE best-fitting category by its exact German name.\n` +
`- Most concrete nouns DO fit a topic: animals→Tiere, food/fruit/vegetables→Lebensmittel, ` +
`sky/star/fire/water/mountain/plants→Natur & Pflanzen, furniture/window/carpet/cushion→Wohnen & Möbel, ` +
`street/building/lamp post→Stadt & Gebäude, books/pages→Schule & Bildung.\n` +
`- Adjectives / properties (warm, fast, sweet, old, fragile, transparent…) → "Eigenschaften".\n` +
`- Verbs / actions → "Verben & Handlungen".\n` +
`- Use "Sonstiges" ONLY as a true last resort when nothing else fits at all.`;
function buildPrompt(word, examples, categoryNamesDe) {
const title = word.titel_en || word.titel_de || word.titel_sv || '';
const titleDe = word.titel_de ? ` (de: "${word.titel_de}")` : '';
const ex = examples.length
? `\n\nExample sentences using the word:\n${examples.map(e => `- ${e}`).join('\n')}`
: '';
return (
`Categories (German names):\n${categoryNamesDe.join(', ')}\n\n${CLASSIFY_RULES}\n\n` +
`Classify this single vocabulary word.\n\nWord: "${title}"${titleDe}${ex}\n\n` +
`Reply with JSON only: {"category":"<exact German category name>"}`
);
}
// Wörter als Batch einreichen (ein Request pro Wort, custom_id = word.id). Gibt batch_id zurück.
async function submitBatch(words, categoryNamesDe) {
const system = 'Du bist ein präziser Klassifizierer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
const requests = [];
for (const w of words) {
const examples = await examplesForWord(w.id);
requests.push({
custom_id: w.id,
params: {
model: MODEL,
max_tokens: 64,
system,
messages: [{ role: 'user', content: buildPrompt(w, examples, categoryNamesDe) }],
},
});
}
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches`, {
method: 'POST', headers: headers(), body: JSON.stringify({ requests }),
});
if (!res.ok) {
const err = await res.text().catch(() => '');
throw new Error(`Batch-Submit fehlgeschlagen (${res.status}): ${err.slice(0, 300)}`);
}
const data = await res.json();
await query(`INSERT INTO category_batches (batch_id, status) VALUES ($1, 'submitted') ON CONFLICT DO NOTHING`, [data.id]);
return data.id;
}
// pair_categories für alle Pairs neu ableiten, die eines der Wörter referenzieren.
async function rederivePairsForWords(wordIds) {
if (!wordIds.length) return;
const pairs = await query(
`SELECT DISTINCT p.id FROM pairs p
WHERE p.positive_statement_id IN (SELECT statement_id FROM statement_positive_words WHERE word_id = ANY($1))
OR p.positive_statement_id IN (SELECT statement_id FROM statement_negative_words WHERE word_id = ANY($1))
OR p.negative_statement_id IN (SELECT statement_id FROM statement_positive_words WHERE word_id = ANY($1))
OR p.negative_statement_id IN (SELECT statement_id FROM statement_negative_words WHERE word_id = ANY($1))
OR p.id IN (SELECT op.pair_id FROM object_pairs op
JOIN object_words ow ON ow.object_id = op.object_id
WHERE ow.word_id = ANY($1))`,
[wordIds]
);
if (pairs.rows.length) await derivePairCategories(pairs.rows.map(p => p.id)).catch(() => {});
}
// Synchroner Claude-Call (/v1/messages) — für den sofortigen One-Shot-Backfill (kein 24h-Batch-Verzug).
async function messagesCall(system, user, maxTokens = 2000) {
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages`, {
method: 'POST', headers: headers(),
body: JSON.stringify({ model: MODEL, max_tokens: maxTokens, system, messages: [{ role: 'user', content: user }] }),
});
if (!res.ok) { const t = await res.text().catch(() => ''); throw new Error(`Claude ${res.status}: ${t.slice(0, 200)}`); }
const data = await res.json();
let raw = (data.content?.[0]?.text || '').trim();
const md = raw.match(/```(?:json)?\s*([\s\S]+?)\s*```/);
if (md) raw = md[1];
return JSON.parse(raw);
}
function parseCategory(text) {
if (!text) return null;
let raw = text.trim();
const md = raw.match(/```(?:json)?\s*([\s\S]+?)\s*```/);
if (md) raw = md[1];
try { return (JSON.parse(raw).category || '').toString().trim() || null; }
catch { return null; }
}
// Batch einsammeln, falls fertig: Ergebnisse anwenden (word_categories + pair_categories).
// Gibt { ended, linked } zurück.
async function collectBatch(batchId) {
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches/${batchId}`, { headers: headers() });
if (!res.ok) {
// Batch unbekannt/gelöscht → Eintrag aufräumen, damit der nächste Tick neu submitten kann
if (res.status === 404) await query(`DELETE FROM category_batches WHERE batch_id = $1`, [batchId]);
return { ended: false, linked: 0 };
}
const batch = await res.json();
if (batch.processing_status !== 'ended' || !batch.results_url) return { ended: false, linked: 0 };
const { byName } = await loadCategories();
const fallback = byName.get('sonstiges') || null;
const r = await fetch(batch.results_url, { headers: headers() });
if (!r.ok) return { ended: false, linked: 0 };
const jsonl = await r.text();
const linkedWordIds = [];
for (const line of jsonl.split('\n')) {
const trimmed = line.trim();
if (!trimmed) continue;
let entry;
try { entry = JSON.parse(trimmed); } catch { continue; }
if (entry.result?.type !== 'succeeded') continue;
const wordId = entry.custom_id;
const text = entry.result.message?.content?.[0]?.text;
const name = parseCategory(text);
const cat = (name && byName.get(name.toLowerCase())) || fallback;
if (!cat) continue;
await query(
`INSERT INTO word_categories (word_id, category_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
[wordId, cat.id]
).catch(() => {});
linkedWordIds.push(wordId);
}
// pair_categories für betroffene Pairs neu ableiten
await rederivePairsForWords(linkedWordIds);
await query(`DELETE FROM category_batches WHERE batch_id = $1`, [batchId]);
return { ended: true, linked: linkedWordIds.length };
}
// Ein Tick: offenen Batch einsammeln; sonst neuen Batch für unkategorisierte Wörter einreichen.
async function runCategorizationTick() {
if (running) return { skipped: true };
running = true;
try {
const open = await query(`SELECT batch_id FROM category_batches ORDER BY created_at ASC LIMIT 1`);
if (open.rows.length) {
const { ended, linked } = await collectBatch(open.rows[0].batch_id);
return { collected: ended, linked, batchId: open.rows[0].batch_id };
}
const words = await findUncategorizedUsedWords();
if (!words.length) return { remaining: 0 };
const { rows } = await loadCategories();
const names = rows.map(c => c.titel_de).filter(Boolean);
const batchId = await submitBatch(words, names);
return { submitted: words.length, batchId };
} finally {
running = false;
}
}
// Sofortiger One-Shot-Backfill (synchron, ohne 24h-Batch-Verzug): klassifiziert bestehende,
// in Pairs verwendete Wörter ohne Kategorie in Schüben per /v1/messages und materialisiert
// pair_categories direkt. Für den Live-Test gedacht; der Stundenjob bleibt für laufenden Nachschub.
async function classifyWordsSync({ max = 2000, reset = false } = {}) {
if (running) return { skipped: true };
running = true;
try {
const { rows: catRows, byName } = await loadCategories();
const names = catRows.map(c => c.titel_de).filter(Boolean);
const fallback = byName.get('sonstiges') || null;
const system = 'Du bist ein präziser Klassifizierer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
let processed = 0, linked = 0;
// reset → bestehende Zuordnungen verwerfen und mit verbesserter Logik/Taxonomie neu klassifizieren
if (reset) await query(`DELETE FROM word_categories`).catch(() => {});
while (processed < max) {
const words = await findUncategorizedUsedWords(Math.min(15, max - processed));
if (!words.length) break;
const lines = [];
for (const w of words) {
const t = w.titel_en || w.titel_de || w.titel_sv || '';
const de = w.titel_de && w.titel_de !== t ? ` (de: ${w.titel_de})` : '';
const ex = await examplesForWord(w.id, 2);
const exStr = ex.length ? ` | e.g.: ${ex.map(e => `"${e}"`).join('; ')}` : '';
lines.push(`${w.id}\t${t}${de}${exStr}`);
}
const user =
`Categories (German names):\n${names.join(', ')}\n\n${CLASSIFY_RULES}\n\n` +
`Classify each vocabulary word below.\nWords (id<TAB>title | examples):\n${lines.join('\n')}\n\n` +
`Reply with JSON only: {"assignments":[{"id":"<id>","category":"<exact German category name>"}]}`;
let assignments = [];
try {
const data = await messagesCall(system, user, 1500);
assignments = Array.isArray(data.assignments) ? data.assignments : [];
} catch { /* Fehler → ganze Charge bekommt Fallback, damit der Lauf fortschreitet */ }
const byId = new Map(assignments.map(a => [String(a.id), a.category]));
const linkedIds = [];
for (const w of words) {
const name = byId.get(String(w.id));
const cat = (name && byName.get(String(name).toLowerCase())) || fallback;
if (!cat) continue;
await query(
`INSERT INTO word_categories (word_id, category_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
[w.id, cat.id]
).catch(() => {});
linkedIds.push(w.id);
}
await rederivePairsForWords(linkedIds);
processed += words.length;
linked += linkedIds.length;
if (!linkedIds.length) break; // Sicherung gegen Endlosschleife (z. B. fehlende Fallback-Kategorie)
}
return { processed, linked };
} finally {
running = false;
}
}
module.exports = { runCategorizationTick, classifyWordsSync, findUncategorizedUsedWords, collectBatch, submitBatch };

77
src/lib/deleteCascade.js Normal file
View File

@@ -0,0 +1,77 @@
const { query } = require('../db');
const { deleteFile, keyFromUrl } = require('../s3');
// Audios (DB-Rows + S3-Dateien) einer Quelle entfernen.
async function deleteAudiosFor(sourceTable, sourceId) {
const audios = await query(
`SELECT id, audio_link FROM audios WHERE source_table = $1 AND source_id = $2`,
[sourceTable, sourceId]
);
for (const a of audios.rows) {
const key = keyFromUrl(a.audio_link);
if (key) await deleteFile(key).catch(() => {});
await query('DELETE FROM audios WHERE id = $1', [a.id]);
}
}
// Pair inkl. Frage, Statements und deren Audios löschen.
// Frage/Statements bleiben stehen, wenn ein anderes Pair sie noch referenziert.
// Objekte werden nicht angefasst (object_pairs kaskadiert per FK).
async function deletePairDeep(pairId) {
const existing = await query(
`SELECT question_id, positive_statement_id, negative_statement_id FROM pairs WHERE id = $1`,
[pairId]
);
if (!existing.rows.length) return false;
const { question_id, positive_statement_id, negative_statement_id } = existing.rows[0];
await query('DELETE FROM pairs WHERE id = $1', [pairId]);
if (question_id) {
const ref = await query('SELECT 1 FROM pairs WHERE question_id = $1 LIMIT 1', [question_id]);
if (!ref.rows.length) {
await deleteAudiosFor('questions', question_id);
await query('DELETE FROM questions WHERE id = $1', [question_id]);
}
}
const stmtIds = [...new Set([positive_statement_id, negative_statement_id].filter(Boolean))];
for (const stmtId of stmtIds) {
const ref = await query(
'SELECT 1 FROM pairs WHERE positive_statement_id = $1 OR negative_statement_id = $1 LIMIT 1',
[stmtId]
);
if (!ref.rows.length) {
await deleteAudiosFor('statements', stmtId);
await query('DELETE FROM statements WHERE id = $1', [stmtId]);
}
}
return true;
}
// Alle Objekte eines Bildes löschen (inkl. deren Pairs), sofern das Objekt
// ausschließlich mit diesem Bild verknüpft ist.
async function deletePictureObjectsDeep(pictureId) {
const objects = await query(
`SELECT object_id FROM object_pictures WHERE picture_id = $1`,
[pictureId]
);
for (const { object_id } of objects.rows) {
const other = await query(
`SELECT 1 FROM object_pictures WHERE object_id = $1 AND picture_id <> $2 LIMIT 1`,
[object_id, pictureId]
);
if (other.rows.length) continue;
const pairs = await query(
`SELECT pair_id FROM object_pairs WHERE object_id = $1`,
[object_id]
);
for (const { pair_id } of pairs.rows) await deletePairDeep(pair_id);
await query('DELETE FROM objects WHERE id = $1', [object_id]);
}
}
module.exports = { deletePairDeep, deletePictureObjectsDeep };

229
src/lib/enrichWords.js Normal file
View File

@@ -0,0 +1,229 @@
// Automatische Wort-Anreicherung über die Anthropic Message Batches API (asynchron, ~50 % günstiger).
// Ziel: Brysbaert-Importwörter (titel_en + conc_m gesetzt) nach DE+SV übersetzen und mit
// dom_pos, CEFR-level und themenfeld_id versehen. Folgt dem Muster von classifyWords.js.
const { query } = require('../db');
const ANTHROPIC_BASE = 'https://api.anthropic.com';
const MODEL = 'claude-haiku-4-5-20251001';
const BATCH_LIMIT = 500;
let running = false;
function headers() {
const apiKey = process.env.ANTHROPIC_API_KEY;
if (!apiKey) throw new Error('ANTHROPIC_API_KEY nicht konfiguriert');
return { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' };
}
// Alle veröffentlichten Kategorien laden (Unter- und Oberkategorien).
// Gibt byName-Map (lower(titel_de|titel_en) → Row) + sortierte Namensliste zurück.
async function loadAllCategories() {
const r = await query(
`SELECT id, titel_de, titel_en, parent_id FROM categories WHERE status = 'published'`
);
const byName = new Map();
for (const c of r.rows) {
if (c.titel_de) byName.set(c.titel_de.toLowerCase(), c);
if (c.titel_en) byName.set(c.titel_en.toLowerCase(), c);
}
// Unterkategorien zuerst → Batch-Prompt bevorzugt granulare Einträge
const subcats = r.rows.filter(c => c.parent_id).map(c => c.titel_de).filter(Boolean);
const topCats = r.rows.filter(c => !c.parent_id).map(c => c.titel_de).filter(Boolean);
return { byName, names: [...subcats, ...topCats] };
}
// Wörter die angereichert werden sollen: haben conc_m + titel_en, aber fehlendes DE/dom_pos/themenfeld.
async function findWordsToEnrich(limit = BATCH_LIMIT) {
const r = await query(
`SELECT id, titel_en FROM words
WHERE conc_m IS NOT NULL
AND titel_en IS NOT NULL
AND (titel_de IS NULL OR dom_pos IS NULL OR themenfeld_id IS NULL)
ORDER BY created_at DESC
LIMIT $1`,
[limit]
);
return r.rows;
}
function buildEnrichPrompt(word, categoryNames) {
return (
`Themenfelder (bevorzuge Unterkategorien wie "Obst", "Haustiere", "Kopf & Gesicht" statt der Oberkategorie):\n` +
`${categoryNames.join(', ')}\n\n` +
`Wort (Englisch): "${word.titel_en}"\n\n` +
`Regeln:\n` +
`- titel_de / titel_sv: Grundform ohne Artikel\n` +
`- dom_pos: noun | verb | adjective | other\n` +
`- level: A1 | A2 | B1 | null (null wenn B2+ oder unklar)\n` +
`- themenfeld: exakter Name aus der Liste oben, Fallback "Sonstiges"\n\n` +
`Antworte NUR mit JSON:\n` +
`{"titel_de":"...","titel_sv":"...","dom_pos":"noun","level":"A1","themenfeld":"Obst"}`
);
}
// Wort-Update in DB (COALESCE: Neuwert wenn vorhanden, sonst bestehender Wert bleibt).
async function applyEnrichResult(wordId, result, byName) {
if (!result) return;
const fallback = byName.get('sonstiges') || null;
const cat = (result.themenfeld && byName.get(result.themenfeld.toLowerCase())) || fallback;
await query(
`UPDATE words SET
titel_de = COALESCE($2, titel_de),
titel_sv = COALESCE($3, titel_sv),
dom_pos = COALESCE($4, dom_pos),
level = COALESCE($5, level),
themenfeld_id = COALESCE($6, themenfeld_id)
WHERE id = $1`,
[wordId, result.titel_de || null, result.titel_sv || null,
result.dom_pos || null, result.level || null, cat?.id || null]
).catch(() => {});
// Auto-Promote: requested → translated wenn jetzt alle 3 Sprachen gefüllt sind
await query(
`UPDATE words SET status = 'translated'
WHERE id = $1 AND status = 'requested'
AND titel_de IS NOT NULL AND titel_en IS NOT NULL AND titel_sv IS NOT NULL`,
[wordId]
).catch(() => {});
}
// ── Asynchroner Batch-Weg ──────────────────────────────────────────────────
async function submitEnrichBatch(words, categoryNames) {
const system = 'Du bist ein präziser Lexikograph. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
const requests = words.map(w => ({
custom_id: w.id,
params: {
model: MODEL,
max_tokens: 150,
system,
messages: [{ role: 'user', content: buildEnrichPrompt(w, categoryNames) }],
},
}));
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches`, {
method: 'POST', headers: headers(), body: JSON.stringify({ requests }),
});
if (!res.ok) {
const err = await res.text().catch(() => '');
throw new Error(`Enrich-Batch-Submit fehlgeschlagen (${res.status}): ${err.slice(0, 300)}`);
}
const data = await res.json();
await query(
`INSERT INTO enrich_batches (batch_id, status) VALUES ($1, 'submitted') ON CONFLICT DO NOTHING`,
[data.id]
);
return data.id;
}
function parseJson(text) {
if (!text) return null;
let raw = text.trim();
const md = raw.match(/```(?:json)?\s*([\s\S]+?)\s*```/);
if (md) raw = md[1];
try { return JSON.parse(raw); } catch { return null; }
}
async function collectEnrichBatch(batchId) {
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches/${batchId}`, { headers: headers() });
if (!res.ok) {
if (res.status === 404) await query(`DELETE FROM enrich_batches WHERE batch_id = $1`, [batchId]);
return { ended: false, enriched: 0 };
}
const batch = await res.json();
if (batch.processing_status !== 'ended' || !batch.results_url) return { ended: false, enriched: 0 };
const { byName } = await loadAllCategories();
const r = await fetch(batch.results_url, { headers: headers() });
if (!r.ok) return { ended: false, enriched: 0 };
let enriched = 0;
for (const line of (await r.text()).split('\n')) {
const trimmed = line.trim();
if (!trimmed) continue;
let entry;
try { entry = JSON.parse(trimmed); } catch { continue; }
if (entry.result?.type !== 'succeeded') continue;
const parsed = parseJson(entry.result.message?.content?.[0]?.text);
await applyEnrichResult(entry.custom_id, parsed, byName);
if (parsed) enriched++;
}
await query(`DELETE FROM enrich_batches WHERE batch_id = $1`, [batchId]);
return { ended: true, enriched };
}
// Ein Tick: offenen Batch einsammeln; sonst neuen Batch für unbereicherte Wörter einreichen.
async function runEnrichTick() {
if (running) return { skipped: true };
running = true;
try {
const open = await query(`SELECT batch_id FROM enrich_batches ORDER BY created_at ASC LIMIT 1`);
if (open.rows.length) {
const { ended, enriched } = await collectEnrichBatch(open.rows[0].batch_id);
return { collected: ended, enriched, batchId: open.rows[0].batch_id };
}
const words = await findWordsToEnrich();
if (!words.length) return { remaining: 0 };
const { names } = await loadAllCategories();
const batchId = await submitEnrichBatch(words, names);
return { submitted: words.length, batchId };
} finally {
running = false;
}
}
// ── Synchroner Weg für ?sync=true ─────────────────────────────────────────
async function enrichWordsSync({ max = 500 } = {}) {
if (running) return { skipped: true };
running = true;
try {
const { byName, names } = await loadAllCategories();
const system = 'Du bist ein präziser Lexikograph. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
let processed = 0;
let enriched = 0;
while (processed < max) {
const words = await findWordsToEnrich(Math.min(20, max - processed));
if (!words.length) break;
const items = words.map((w, i) => `${i + 1}. "${w.titel_en}" (id: ${w.id})`).join('\n');
const user =
`Themenfelder (bevorzuge Unterkategorien):\n${names.join(', ')}\n\n` +
`Regeln:\n` +
`- titel_de / titel_sv: Grundform ohne Artikel\n` +
`- dom_pos: noun | verb | adjective | other\n` +
`- level: A1 | A2 | B1 | null\n` +
`- themenfeld: exakter Name aus der Liste, Fallback "Sonstiges"\n\n` +
`Wörter:\n${items}\n\n` +
`Antworte NUR mit JSON:\n` +
`{"results":[{"id":"<uuid>","titel_de":"...","titel_sv":"...","dom_pos":"noun","level":"A1","themenfeld":"Obst"}]}`;
let results = [];
try {
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages`, {
method: 'POST', headers: headers(),
body: JSON.stringify({ model: MODEL, max_tokens: 3000, system, messages: [{ role: 'user', content: user }] }),
});
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const data = await res.json();
const parsed = parseJson(data.content?.[0]?.text);
results = Array.isArray(parsed?.results) ? parsed.results : [];
} catch { /* Charge überspringen, nächste Runde */ }
for (const r of results) {
await applyEnrichResult(r.id, r, byName);
enriched++;
}
processed += words.length;
if (!results.length) break; // Sicherung gegen Endlosschleife
}
return { processed, enriched };
} finally {
running = false;
}
}
module.exports = { runEnrichTick, enrichWordsSync };

View File

@@ -1,6 +1,7 @@
// Pair-Generierung via Claude (Vision) + serverseitige Persistenz. // Pair-Generierung via Claude (Vision) + serverseitige Persistenz.
// Genutzt von lib/pipeline.js (Automatik) und routes/claude.js (manueller Endpoint). // Genutzt von lib/pipeline.js (Automatik) und routes/claude.js (manueller Endpoint).
const { query } = require('../db'); const { query } = require('../db');
const { tagObjectWords } = require('./objectTagging');
const ANTHROPIC_API_URL = 'https://api.anthropic.com/v1/messages'; const ANTHROPIC_API_URL = 'https://api.anthropic.com/v1/messages';
const GENERATE_MODEL = process.env.GENERATE_MODEL || 'claude-haiku-4-5-20251001'; const GENERATE_MODEL = process.env.GENERATE_MODEL || 'claude-haiku-4-5-20251001';
@@ -39,7 +40,12 @@ async function generatePairsForObject({ imageUrl, objects, selectedObjectId, cou
`Bei yes_no: mix aus answer:true und answer:false. Bei word: positive_words 13 passende Wörter, negative_words genau 3 falsche Wörter.\n\n` + `Bei yes_no: mix aus answer:true und answer:false. Bei word: positive_words 13 passende Wörter, negative_words genau 3 falsche Wörter.\n\n` +
`Regeln: Alle Sätze und Wörter auf Deutsch. Sätze müssen natürlich klingen. Keine Wiederholungen. ` + `Regeln: Alle Sätze und Wörter auf Deutsch. Sätze müssen natürlich klingen. Keine Wiederholungen. ` +
`Wörter beim type "word" sind AUSSCHLIESSLICH Nomen ("pos":"noun") oder Adjektive ("pos":"adjective") — ` + `Wörter beim type "word" sind AUSSCHLIESSLICH Nomen ("pos":"noun") oder Adjektive ("pos":"adjective") — ` +
`KEINE Verben, Pronomen, Artikel, Präpositionen oder Funktionswörter. Gib für jedes Wort das "pos"-Feld an.`; `KEINE Verben, Pronomen, Artikel, Präpositionen oder Funktionswörter. Gib für jedes Wort das "pos"-Feld an.\n\n` +
`NOMEN-MARKUP: Markiere in ALLEN Sätzen (question, positive, negative) jedes Nomen mit ` +
`[Oberflächenform|Grundform] — die Oberflächenform ist das Wort exakt wie es im Satz steht (Beugung/Mehrzahl), ` +
`die Grundform ist Nominativ Singular ohne Artikel. Beispiel: "Die [Wolken|Wolke] schweben am [Himmel|Himmel]." ` +
`Markiere NUR Nomen — keine Verben, Adjektive, Pronomen oder Funktionswörter. ` +
`Die Wörter in positive_words/negative_words bekommen KEIN Markup.`;
const res = await fetch(ANTHROPIC_API_URL, { const res = await fetch(ANTHROPIC_API_URL, {
method: 'POST', method: 'POST',
@@ -64,7 +70,62 @@ async function generatePairsForObject({ imageUrl, objects, selectedObjectId, cou
if (md) raw = md[1]; if (md) raw = md[1];
const parsed = JSON.parse(raw); const parsed = JSON.parse(raw);
if (!Array.isArray(parsed.pairs)) throw new Error('Ungültiges JSON-Format von Claude (pairs fehlt)'); if (!Array.isArray(parsed.pairs)) throw new Error('Ungültiges JSON-Format von Claude (pairs fehlt)');
return parsed.pairs.map(normalizePair).filter(Boolean); const pairs = parsed.pairs.map(normalizePair).filter(Boolean);
for (const p of pairs) {
for (const f of ['question', 'positive', 'negative']) {
if (p[f]) p[f] = await resolveNounMarkup(p[f], objects, selectedObjectId);
}
}
return pairs;
}
// ── Nomen-Markup → Placeholder ───────────────────────────────────────────────
// Claude markiert Nomen als [Oberflächenform|Grundform]. Hier wird daraus:
// - {{surface.o:objectId}} wenn die Grundform ein Objekt-Wort des Bildes ist
// (Zielobjekt hat Vorrang),
// - sonst {{surface.w:wordId}} mit find-or-create des Wortes (status 'requested').
const NOUN_MARKUP_RE = /\[([^\[\]|]+)(?:\|([^\[\]|]*))?\]/g;
async function resolveNounMarkup(text, objects, selectedObjectId) {
// Objekt-Wort-Lookup: lemma (lowercase) → objectId, Zielobjekt zuerst
const objectByLemma = new Map();
const ordered = [...(objects || [])].sort((a, b) =>
(a.id === selectedObjectId ? -1 : 0) - (b.id === selectedObjectId ? -1 : 0));
for (const obj of ordered) {
for (const w of obj.words || []) {
for (const t of [w.titel_de, w.titel_en, w.titel_sv]) {
const key = (t || '').trim().toLowerCase();
if (key && !objectByLemma.has(key)) objectByLemma.set(key, obj.id);
}
}
}
// Erst alle Markups einsammeln (Word-Erstellung ist async, replace nicht)
const matches = [...text.matchAll(NOUN_MARKUP_RE)];
const replacements = new Map();
for (const m of matches) {
if (replacements.has(m[0])) continue;
const surface = m[1].trim();
const lemma = (m[2] || '').trim() || surface;
if (!surface) { replacements.set(m[0], lemma); continue; }
const objectId = objectByLemma.get(lemma.toLowerCase()) || objectByLemma.get(surface.toLowerCase());
if (objectId) {
replacements.set(m[0], `{{${surface}.o:${objectId}}}`);
} else {
try {
const wordId = await findOrCreateWord(lemma);
replacements.set(m[0], `{{${surface}.w:${wordId}}}`);
} catch {
replacements.set(m[0], surface); // DB-Fehler → Wort unmarkiert lassen
}
}
}
let out = text;
for (const [from, to] of replacements) out = out.split(from).join(to);
// Sicherheitsnetz: Objekt-Wörter, die das Modell NICHT als [..]-Nomen markiert hat,
// deterministisch nachtokenisieren (der deutsche Satz wird hier verarbeitet).
out = tagObjectWords(out, 'de', objects);
return out;
} }
// Word-Einträge können {"w":"...","pos":"..."} oder plain Strings sein. // Word-Einträge können {"w":"...","pos":"..."} oder plain Strings sein.
@@ -72,10 +133,11 @@ async function generatePairsForObject({ imageUrl, objects, selectedObjectId, cou
function cleanWordList(list) { function cleanWordList(list) {
if (!Array.isArray(list)) return []; if (!Array.isArray(list)) return [];
const out = []; const out = [];
const unmark = s => s.replace(NOUN_MARKUP_RE, (_, surface) => surface.trim());
for (const item of list) { for (const item of list) {
if (typeof item === 'string') { const t = item.trim(); if (t) out.push(t); continue; } if (typeof item === 'string') { const t = unmark(item).trim(); if (t) out.push(t); continue; }
if (item && typeof item === 'object') { if (item && typeof item === 'object') {
const t = (item.w || item.word || item.text || '').toString().trim(); const t = unmark((item.w || item.word || item.text || '').toString()).trim();
const pos = (item.pos || '').toString().toLowerCase(); const pos = (item.pos || '').toString().toLowerCase();
if (t && (!pos || pos === 'noun' || pos === 'adjective')) out.push(t); if (t && (!pos || pos === 'noun' || pos === 'adjective')) out.push(t);
} }
@@ -176,4 +238,4 @@ async function persistPair(p, objectId) {
return pair.id; return pair.id;
} }
module.exports = { generatePairsForObject, persistPair, findOrCreateWord }; module.exports = { generatePairsForObject, persistPair, findOrCreateWord, resolveNounMarkup };

30
src/lib/leveling.js Normal file
View File

@@ -0,0 +1,30 @@
// Progressive Level-Kurve — Single Source of Truth fürs Backend.
// Kumulative EP, die für Level n nötig sind: 5·n·(n+3).
// Level 1 → 20 EP, Level 2 → 50, Level 3 → 90, Level 4 → 140, Level 5 → 200 …
// Früh schnelle Level (erste Level fallen in der ersten Session), danach sanft steiler.
function epForLevel(level) {
if (level <= 0) return 0;
return 5 * level * (level + 3);
}
// Höchstes n mit 5n²+15n ≤ ep → n ≤ (15 + √(225 + 20·ep)) / 10
function levelForEp(ep) {
const e = Math.max(0, ep || 0);
return Math.floor((-15 + Math.sqrt(225 + 20 * e)) / 10);
}
// Level + Fortschritt innerhalb des Levels (für Momentum-Anzeige im Client).
function levelInfo(ep) {
const e = Math.max(0, ep || 0);
const level = levelForEp(e);
const base = epForLevel(level);
const next = epForLevel(level + 1);
return {
level,
ep_into_level: e - base,
ep_to_next_level: next - e,
ep_for_next_level: next - base,
};
}
module.exports = { epForLevel, levelForEp, levelInfo };

183
src/lib/objectTagging.js Normal file
View File

@@ -0,0 +1,183 @@
// Deterministisches Tokenisieren von OBJEKT-Wörtern in Sätzen.
//
// Hintergrund: Objekt-Tokens ({{label.o:objectId}}) entstehen bisher nur aus dem
// Nomen-Markup [Oberfläche|Grundform], das das Generierungs-Modell setzen SOLL. Tut es das
// nicht (häufig bei kleinen Modellen), fehlt der Token komplett und das Frontend kann das
// Objekt weder als Chip noch als Bildregion hervorheben.
//
// Dieser Tagger findet Objekt-Wörter direkt im Satz anhand der Wort-Titel der Objekte des
// Bildes und ist damit unabhängig vom LLM-Markup. Er wird genutzt:
// - Forward: als Sicherheitsnetz in generatePairs.resolveNounMarkup / pipeline.translatePair
// - Backfill: scripts/backfill-object-tokens.js über bestehende Daten
//
// Wichtig: bereits vorhandene Tokens ({{…}}, ⟦PHn:…⟧) bleiben unangetastet, und es werden NUR
// Objekt-Tokens (.o:) erzeugt Wort-Tokens (.w:) fasst dieser Tagger nicht an.
const { PLACEHOLDER_RE } = require('./placeholders');
// Flexions-Endungen je Sprache (bestimmte Form / Plural / Genitiv), längere zuerst, damit der
// Regex greedy die längste Form greift (z.B. "ryggsäcken" statt nur "ryggsäck").
const SUFFIXES = {
sv: ['ens', 'ets', 'na', 'en', 'et', 'or', 'ar', 'er', 'n', 'a', 's'],
de: ['en', 'es', 'er', 'em', 'e', 'n', 's'],
en: ['es', 's'],
};
// Lemmata, die kürzer als das sind, werden NUR exakt gematcht (keine Flexion) sonst matchen
// kurze Wörter wie "bi" zu viel ("bil", "bin", …).
const MIN_LEN_FOR_SUFFIX = 4;
function escapeRegex(s) {
return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
// Bestehende Tokens (sowohl {{label.type:uuid}} als auch ⟦PHn:label⟧) erkennen, damit wir
// nicht in sie hineinschreiben.
const EXISTING_TOKEN_RE = /\{\{[^.{}]+\.[wo]:[0-9a-f-]{36}\}\}|⟦PH\d+:[^⟧]*⟧/g;
// Baut aus den Objekten der Sprache eine Liste { lemma, lemmaLc, objectId }, längste zuerst.
function buildLemmas(objects, lang) {
const out = [];
const seen = new Set();
for (const obj of objects || []) {
for (const w of obj.words || []) {
const title = (w[`titel_${lang}`] || '').trim();
if (!title) continue;
const key = title.toLowerCase();
if (seen.has(key)) continue;
seen.add(key);
out.push({ lemma: title, lemmaLc: key, objectId: obj.id });
}
}
out.sort((a, b) => b.lemma.length - a.lemma.length);
return out;
}
// Tagged eine zusammenhängende Klartext-Passage (ohne bestehende Tokens).
function tagPlainSegment(text, lemmas, suffixes) {
if (!text) return text;
// Ein kombinierter Regex über alle Lemmata (längste zuerst). Pro Lemma optional eine
// Flexions-Endung, sofern lang genug. Wortgrenzen via Unicode-Lookarounds (statt \b, das
// bei å/ä/ö/ü unzuverlässig ist).
const alts = lemmas.map(({ lemma }) => {
const esc = escapeRegex(lemma);
if (lemma.length >= MIN_LEN_FOR_SUFFIX && suffixes.length) {
return `${esc}(?:${suffixes.map(escapeRegex).join('|')})?`;
}
return esc;
});
if (!alts.length) return text;
const re = new RegExp(`(?<![\\p{L}\\p{N}])(${alts.join('|')})(?![\\p{L}\\p{N}])`, 'giu');
return text.replace(re, (surface) => {
const sLc = surface.toLowerCase();
// Passendes Objekt bestimmen: längstes Lemma, das Präfix der Oberfläche ist und dessen
// Rest eine erlaubte (oder leere) Endung ist.
for (const { lemma, lemmaLc, objectId } of lemmas) {
if (!sLc.startsWith(lemmaLc)) continue;
const rest = sLc.slice(lemmaLc.length);
const restOk = rest === '' ||
(lemma.length >= MIN_LEN_FOR_SUFFIX && suffixes.includes(rest));
if (restOk) return `{{${surface}.o:${objectId}}}`;
}
return surface; // kein sauberer Treffer → unverändert lassen
});
}
// Hauptfunktion: tagged Objekt-Wörter in `sentence` für Sprache `lang`.
// `objects`: [{ id, words: [{titel_de,titel_en,titel_sv}] }]
function tagObjectWords(sentence, lang, objects) {
if (!sentence) return sentence;
const lemmas = buildLemmas(objects, lang);
if (!lemmas.length) return sentence;
const suffixes = SUFFIXES[lang] || [];
// Satz in [Klartext, Token, Klartext, …] zerlegen; nur Klartext-Teile taggen.
let out = '';
let last = 0;
EXISTING_TOKEN_RE.lastIndex = 0;
let m;
while ((m = EXISTING_TOKEN_RE.exec(sentence)) !== null) {
out += tagPlainSegment(sentence.slice(last, m.index), lemmas, suffixes);
out += m[0]; // bestehenden Token unverändert übernehmen
last = m.index + m[0].length;
}
out += tagPlainSegment(sentence.slice(last), lemmas, suffixes);
return out;
}
// Wickelt das erste Vorkommen von `surface` (exakte Zeichenkette, an Wortgrenzen, NICHT
// innerhalb eines bestehenden Tokens) in einen Objekt-Token. Für den LLM-Fallback, der die
// gebeugte Oberflächenform liefert, die der deterministische Tagger nicht erkannt hat.
function wrapSurface(sentence, surface, objectId) {
const surf = (surface || '').trim();
if (!sentence || !surf) return sentence;
let out = '';
let done = false;
EXISTING_TOKEN_RE.lastIndex = 0;
const segments = [];
let m, cursor = 0;
// Klartext-Segmente (außerhalb bestehender Tokens) sammeln
while ((m = EXISTING_TOKEN_RE.exec(sentence)) !== null) {
segments.push({ text: sentence.slice(cursor, m.index), start: cursor, token: false });
segments.push({ text: m[0], start: m.index, token: true });
cursor = m.index + m[0].length;
}
segments.push({ text: sentence.slice(cursor), start: cursor, token: false });
for (const seg of segments) {
if (done || seg.token) { out += seg.text; continue; }
// Erstes Vorkommen an Wortgrenzen im Klartext-Segment ersetzen
const re = new RegExp(`(?<![\\p{L}\\p{N}])(${escapeRegex(surf)})(?![\\p{L}\\p{N}])`, 'u');
const mm = seg.text.match(re);
if (mm) {
out += seg.text.slice(0, mm.index) + `{{${mm[1]}.o:${objectId}}}` + seg.text.slice(mm.index + mm[1].length);
done = true;
} else {
out += seg.text;
}
}
return out;
}
// Liefert die Menge der Objekt-IDs, die in einem Satz als Objekt-Token vorkommen.
function objectIdsInSentence(sentence) {
const ids = new Set();
for (const mm of String(sentence || '').matchAll(PLACEHOLDER_RE)) {
if (mm[2] === 'o') ids.add(mm[3]);
}
return ids;
}
// Alle OBJEKT-Tokens eines Satzes als { full, label, oid }.
const OBJ_TOKEN_RE = /\{\{([^.{}]+)\.o:([0-9a-f-]{36})\}\}/g;
function objectTokensInSentence(sentence) {
const out = [];
for (const m of String(sentence || '').matchAll(OBJ_TOKEN_RE)) out.push({ full: m[0], label: m[1], oid: m[2] });
return out;
}
// Ist `label` eine SICHER gute Form des Objekts `oid` in `lang`? (Exakt oder Lemma+reguläre
// Endung.) Solche Tokens müssen für die Cleanup-Prüfung nicht ans LLM sie sind eindeutig ok.
function isSimpleObjectForm(label, lang, objects, oid) {
const o = (objects || []).find(x => x.id === oid);
if (!o) return false;
const L = (label || '').toLowerCase();
const sfx = SUFFIXES[lang] || [];
for (const w of o.words || []) {
const lemma = (w[`titel_${lang}`] || '').trim().toLowerCase();
if (!lemma) continue;
if (L === lemma) return true;
if (sfx.some(s => L === lemma + s)) return true;
}
return false;
}
// Entfernt ein bestimmtes Objekt-Token (alle Vorkommen) → nur das Label bleibt stehen.
function untagToken(sentence, full, label) {
return String(sentence || '').split(full).join(label);
}
module.exports = {
tagObjectWords, wrapSurface, buildLemmas, objectIdsInSentence,
objectTokensInSentence, isSimpleObjectForm, untagToken,
};

42
src/lib/pairCategories.js Normal file
View File

@@ -0,0 +1,42 @@
const { query } = require('../db');
// Leitet die Kategorien eines (oder mehrerer) Pairs aus den verknüpften Wörtern ab und
// materialisiert sie in pair_categories. Quellen:
// - Statements (positiv/negativ) → statement_*_words → word_categories
// - Objekte → object_words → word_categories
// (Questions haben keine Wort-M2M und entfallen.)
// Re-Run-sicher: löscht vorhandene Zuordnungen der betroffenen Pairs und schreibt neu,
// damit eine erneute Veröffentlichung nach Inhaltsänderungen die Kategorien aktualisiert.
async function derivePairCategories(pairIds) {
const ids = (Array.isArray(pairIds) ? pairIds : [pairIds]).filter(Boolean);
if (!ids.length) return 0;
await query(`DELETE FROM pair_categories WHERE pair_id = ANY($1)`, [ids]);
const r = await query(
`INSERT INTO pair_categories (pair_id, category_id)
SELECT DISTINCT pid, category_id FROM (
SELECT p.id AS pid, wc.category_id
FROM pairs p
JOIN (
SELECT statement_id, word_id FROM statement_positive_words
UNION
SELECT statement_id, word_id FROM statement_negative_words
) sw ON sw.statement_id IN (p.positive_statement_id, p.negative_statement_id)
JOIN word_categories wc ON wc.word_id = sw.word_id
WHERE p.id = ANY($1)
UNION
SELECT op.pair_id AS pid, wc.category_id
FROM object_pairs op
JOIN object_words ow ON ow.object_id = op.object_id
JOIN word_categories wc ON wc.word_id = ow.word_id
WHERE op.pair_id = ANY($1)
) src
WHERE category_id IS NOT NULL
ON CONFLICT (pair_id, category_id) DO NOTHING`,
[ids]
);
return r.rowCount;
}
module.exports = { derivePairCategories };

View File

@@ -1,10 +1,14 @@
// Automatische Content-Pipeline pro Bild: Pairs generieren → übersetzen → Audio → ready. // Automatische Content-Pipeline pro Bild: Pairs generieren → übersetzen → KI-Review → Audio → ready.
// In-Process-Queue mit einem Worker (rate-limit-freundlich). Jeder Schritt ist idempotent, // In-Process-Queue mit einem Worker (rate-limit-freundlich). Jeder Schritt ist idempotent,
// d.h. ein Resume nach Crash/Redeploy überspringt bereits Erledigtes. // d.h. ein Resume nach Crash/Redeploy überspringt bereits Erledigtes.
const { query } = require('../db'); const { query } = require('../db');
const { LANGS, fillMissingRow } = require('./translate'); const { LANGS, fillMissingRow, callClaude } = require('./translate');
const { PLACEHOLDER_RE } = require('./placeholders');
const { tagObjectWords, wrapSurface, objectIdsInSentence,
objectTokensInSentence, isSimpleObjectForm, untagToken } = require('./objectTagging');
const { translateWordGroup } = require('./pairContent'); const { translateWordGroup } = require('./pairContent');
const { generatePairsForObject, persistPair } = require('./generatePairs'); const { generatePairsForObject, persistPair } = require('./generatePairs');
const { reviewPicturePairs } = require('./reviewPairs');
const { generateAndStore, describeError } = require('../routes/audios'); const { generateAndStore, describeError } = require('../routes/audios');
const queue = []; const queue = [];
@@ -85,6 +89,254 @@ async function loadPairs(pictureId) {
ORDER BY p.id`, [pictureId])).rows; ORDER BY p.id`, [pictureId])).rows;
} }
// Satzfelder EINES Pairs (table/id/col/lang) questions + statements.
function pairSentenceFields(p) {
const fields = [];
const add = (table, id, cols) => { if (id) for (const col of cols) fields.push({ table, id, col, lang: col.slice(-2) }); };
add('questions', p.question_id, ['sentence_de', 'sentence_en', 'sentence_sv']);
add('statements', p.positive_statement_id, ['positive_sentence_de', 'positive_sentence_en', 'positive_sentence_sv']);
add('statements', p.negative_statement_id, ['negative_sentence_de', 'negative_sentence_en', 'negative_sentence_sv']);
return fields;
}
// LLM-Fallback: exakte (gebeugte) Oberflächenform eines Objektworts in einem Satz finden.
// WICHTIG: nur zurückgeben, wenn das Wort das Objekt SELBST bezeichnet (Wort, Beugung, Mehrzahl,
// Kopf-Kompositum wie „Landschildkröte" für „Schildkröte", oder Synonym wie „Stiefel" für
// „Schuh"). NICHT, wenn das Objektwort nur BESTIMMUNGSWORT eines anderen Dings ist
// (z.B. „Erdbeerfeld"/„Erdbeerpflanze" ≠ Erdbeere).
async function locateSurfaceLLM(sentence, label) {
try {
const data = await callClaude({
system: 'Du findest die Oberflächenform eines Objektworts in einem Satz. Antworte AUSSCHLIESSLICH mit gültigem JSON.',
user: `Satz: "${sentence}"\nObjekt (Grundform/Bedeutung): "${label}"\n\n` +
`Gib die EXAKTE Zeichenkette zurück, mit der dieses Objekt im Satz benannt ist — als Wort, ` +
`Beugung/Mehrzahl/bestimmte Form, Kopf-Kompositum (Objektwort ist das GRUNDWORT, z.B. ` +
`"Landschildkröte" für "Schildkröte") oder Synonym (z.B. "Stiefel"/"Lederstiefel" für "Schuh").\n` +
`Gib null zurück, wenn das Objekt NICHT vorkommt ODER nur als BESTIMMUNGSWORT eines anderen ` +
`Dings (z.B. "Erdbeerfeld"/"Erdbeerpflanze" bezeichnet Feld/Pflanze, NICHT die Erdbeere).\n` +
`Format: {"surface":"…"|null}`,
maxTokens: 80,
});
const s = data && typeof data.surface === 'string' ? data.surface.trim() : null;
if (!s) return null;
// Nur akzeptieren, wenn die Form wirklich (an Wortgrenzen) im Satz steht.
return new RegExp(`(?<![\\p{L}\\p{N}])${s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}(?![\\p{L}\\p{N}])`, 'u').test(sentence) ? s : null;
} catch { return null; }
}
// LLM-Prüfung für den Cleanup: bezeichnet das markierte `label` wirklich das Objekt `objWord`?
// true ⇒ behalten (Wort/Beugung/Kopf-Kompositum/Synonym), false ⇒ Token entfernen
// (Bestimmungswort eines anderen Dings). Bei Fehler/Unklarheit: behalten (konservativ).
async function denotesObjectLLM(sentence, label, objWord) {
try {
const data = await callClaude({
system: 'Du beurteilst, ob ein markiertes Wort wirklich das genannte Objekt bezeichnet. Antworte AUSSCHLIESSLICH mit gültigem JSON.',
user: `Objekt: "${objWord}"\nSatz: "${sentence}"\nMarkiertes Wort: "${label}"\n\n` +
`Bezeichnet "${label}" das Objekt "${objWord}" SELBST? JA bei: dem Wort, einer Beugung/` +
`Mehrzahl/bestimmten Form, einem Kompositum mit "${objWord}" als GRUNDWORT (z.B. ` +
`"Landschildkröte" für "Schildkröte"), oder einem Synonym (z.B. "Stiefel"/"Lederstiefel" ` +
`für "Schuh"). NEIN, wenn "${objWord}" nur BESTIMMUNGSWORT eines ANDEREN Dings ist (z.B. ` +
`"Erdbeerfeld"/"Erdbeerpflanze" ist ein Feld/eine Pflanze, NICHT die Erdbeere).\n` +
`Format: {"denotes": true|false}`,
maxTokens: 40,
});
return data && typeof data.denotes === 'boolean' ? data.denotes : true;
} catch { return true; }
}
// Tokenisiert OBJEKT-Wörter in den Sätzen EINES Pairs nach.
// Deterministisch (tagObjectWords); optional Hybrid-LLM-Fallback für gebeugte Formen, die
// deterministisch nicht erkannt wurden aber NUR für Objekte, die in einer anderen Sprache
// desselben Pairs bereits als Token bestätigt sind (minimale Calls, keine Halluzinationen).
// Idempotent. `dryRun` ⇒ kein UPDATE. Gibt geänderte Felder { table,id,col,lang,before,after }.
async function retagPair(p, objects, { dryRun = false, useLLM = false } = {}) {
const fields = pairSentenceFields(p);
if (!fields.length) return [];
// Aktuelle Texte laden (gruppiert pro Tabelle/Zeile)
const byRow = new Map(); // `${table}|${id}` → { table, id, cols:Set }
for (const f of fields) {
const k = `${f.table}|${f.id}`;
if (!byRow.has(k)) byRow.set(k, { table: f.table, id: f.id, cols: new Set() });
byRow.get(k).cols.add(f.col);
}
const text = {}; // `${table}|${id}|${col}` → string
for (const { table, id, cols } of byRow.values()) {
const colList = [...cols];
const row = (await query(`SELECT ${colList.join(', ')} FROM ${table} WHERE id = $1`, [id])).rows[0] || {};
for (const col of colList) text[`${table}|${id}|${col}`] = row[col] || '';
}
const key = f => `${f.table}|${f.id}|${f.col}`;
// 1) Deterministischer Sweep (in-memory)
const tagged = {};
for (const f of fields) {
const before = text[key(f)];
tagged[key(f)] = before && before.trim() ? tagObjectWords(before, f.lang, objects) : before;
}
// 2) Hybrid-LLM-Fallback: Objekt-IDs, die in ≥1 Sprache getokt sind, in fehlenden Sprachen suchen.
if (useLLM) {
const presentByObj = new Map(); // objectId → Set<lang>
for (const f of fields) for (const oid of objectIdsInSentence(tagged[key(f)])) {
if (!presentByObj.has(oid)) presentByObj.set(oid, new Set());
presentByObj.get(oid).add(f.lang);
}
const labelOf = (oid, lang) => {
const o = objects.find(x => x.id === oid);
for (const w of o?.words || []) if ((w[`titel_${lang}`] || '').trim()) return w[`titel_${lang}`].trim();
return null;
};
for (const f of fields) {
const cur = tagged[key(f)];
if (!cur || !cur.trim()) continue;
for (const [oid, langs] of presentByObj) {
if (langs.has(f.lang)) continue; // schon getokt in dieser Sprache
if (objectIdsInSentence(cur).has(oid)) continue; // (Sicherheit)
const label = labelOf(oid, f.lang);
if (!label) continue;
const surface = await locateSurfaceLLM(cur, label);
if (surface) tagged[key(f)] = wrapSurface(tagged[key(f)], surface, oid);
}
}
}
// 3) Diff + (optional) schreiben
const changes = [];
for (const { table, id, cols } of byRow.values()) {
const set = {};
for (const col of cols) {
const k = `${table}|${id}|${col}`;
if (tagged[k] !== text[k]) {
set[col] = tagged[k];
changes.push({ table, id, col, lang: col.slice(-2), before: text[k], after: tagged[k] });
}
}
const cells = Object.keys(set);
if (!dryRun && cells.length) {
await query(
`UPDATE ${table} SET ${cells.map((c, i) => `${c} = $${i + 1}`).join(', ')} WHERE id = $${cells.length + 1}`,
[...cells.map(c => set[c]), id]);
}
}
return changes;
}
// Cleanup EINES Pairs: entfernt OBJEKT-Tokens, deren Label das Objekt nicht wirklich bezeichnet
// (Bestimmungswort eines anderen Dings, z.B. „Erdbeerfeld" als Erdbeere). Eindeutig gute Formen
// (exakt / Lemma+Endung) werden ohne LLM behalten; nur die unklaren Tokens gehen ans LLM.
async function cleanPair(p, objects, { dryRun = false } = {}) {
const fields = pairSentenceFields(p);
if (!fields.length) return [];
const byRow = new Map();
for (const f of fields) {
const k = `${f.table}|${f.id}`;
if (!byRow.has(k)) byRow.set(k, { table: f.table, id: f.id, cols: new Set() });
byRow.get(k).cols.add(f.col);
}
const text = {};
for (const { table, id, cols } of byRow.values()) {
const colList = [...cols];
const row = (await query(`SELECT ${colList.join(', ')} FROM ${table} WHERE id = $1`, [id])).rows[0] || {};
for (const col of colList) text[`${table}|${id}|${col}`] = row[col] || '';
}
const key = f => `${f.table}|${f.id}|${f.col}`;
const labelOf = (oid, lang) => {
const o = objects.find(x => x.id === oid);
for (const w of o?.words || []) if ((w[`titel_${lang}`] || '').trim()) return w[`titel_${lang}`].trim();
return null;
};
const cleaned = {};
for (const f of fields) {
let cur = text[key(f)];
cleaned[key(f)] = cur;
if (!cur || !cur.trim()) continue;
for (const tok of objectTokensInSentence(cur)) {
if (isSimpleObjectForm(tok.label, f.lang, objects, tok.oid)) continue; // eindeutig ok
const objWord = labelOf(tok.oid, f.lang);
if (!objWord) continue; // unbekannt → unangetastet lassen
const ok = await denotesObjectLLM(cur, tok.label, objWord);
if (!ok) { cur = untagToken(cur, tok.full, tok.label); cleaned[key(f)] = cur; }
}
}
const changes = [];
for (const { table, id, cols } of byRow.values()) {
const set = {};
for (const col of cols) {
const k = `${table}|${id}|${col}`;
if (cleaned[k] !== text[k]) {
set[col] = cleaned[k];
changes.push({ table, id, col, lang: col.slice(-2), before: text[k], after: cleaned[k] });
}
}
const cells = Object.keys(set);
if (!dryRun && cells.length) {
await query(
`UPDATE ${table} SET ${cells.map((c, i) => `${c} = $${i + 1}`).join(', ')} WHERE id = $${cells.length + 1}`,
[...cells.map(c => set[c]), id]);
}
}
return changes;
}
// Backfill/Retag über ein Bild oder alle Bilder. Gibt eine Zusammenfassung zurück.
// `cleanup:true` ⇒ statt zu taggen werden falsch getokte Objekt-Wörter (Bestimmungswort eines
// anderen Dings) per LLM-Prüfung entfernt.
async function retagObjects({ pictureId = null, dryRun = false, useLLM = false, cleanup = false } = {}) {
const picIds = pictureId
? [pictureId]
: (await query(`SELECT id FROM pictures ORDER BY created_at`)).rows.map(r => r.id);
const report = { pictures: 0, pairs: 0, changedPairs: 0, changedFields: 0, dryRun, useLLM, cleanup, samples: [] };
for (const pid of picIds) {
const objects = await loadObjects(pid);
if (!objects.length) continue;
const pairs = await loadPairs(pid);
report.pictures++;
for (const p of pairs) {
report.pairs++;
let changes = [];
try {
changes = cleanup
? await cleanPair(p, objects, { dryRun })
: await retagPair(p, objects, { dryRun, useLLM });
} catch (err) { console.error(`Retag-Fehler bei Pair ${p.id}:`, err.message); continue; }
if (changes.length) {
report.changedPairs++;
report.changedFields += changes.length;
if (report.samples.length < 25)
report.samples.push({ pair: p.id, changes: changes.map(c => ({ col: c.col, after: c.after })) });
}
}
}
return report;
}
// Word-IDs aller {{label.w:uuid}}-Placeholder in den Sätzen der Pairs.
// Diese Wörter entstehen bei der Generierung (Nomen im Satz) und hängen nicht an
// statement_words/object_words — für Übersetzung + Audio müssen sie mitgenommen werden.
async function collectPlaceholderWordIds(pairs) {
const ids = new Set();
const scan = text => {
for (const m of String(text || '').matchAll(PLACEHOLDER_RE)) if (m[2] === 'w') ids.add(m[3]);
};
const questionIds = [...new Set(pairs.map(p => p.question_id).filter(Boolean))];
const stmtIds = [...new Set(pairs.flatMap(p => [p.positive_statement_id, p.negative_statement_id]).filter(Boolean))];
if (questionIds.length) {
const r = await query(
`SELECT sentence_de, sentence_en, sentence_sv FROM questions WHERE id = ANY($1)`, [questionIds]);
r.rows.forEach(row => Object.values(row).forEach(scan));
}
if (stmtIds.length) {
const r = await query(
`SELECT positive_sentence_de, positive_sentence_en, positive_sentence_sv,
negative_sentence_de, negative_sentence_en, negative_sentence_sv
FROM statements WHERE id = ANY($1)`, [stmtIds]);
r.rows.forEach(row => Object.values(row).forEach(scan));
}
return ids;
}
async function runPicture(pictureId) { async function runPicture(pictureId) {
// Claim — nur Bilder, die in der Pipeline sind // Claim — nur Bilder, die in der Pipeline sind
const claim = await query( const claim = await query(
@@ -152,6 +404,39 @@ async function runPicture(pictureId) {
progress.translatedPairs++; progress.translatedPairs++;
await setStep(pictureId, 'translate', progress); await setStep(pictureId, 'translate', progress);
} }
// Objekt-Wörter, die das Modell nicht als Nomen markiert hat, deterministisch nachtokenisieren
// (Sicherheitsnetz; bestehende Tokens bleiben unangetastet).
for (const p of pairs) {
try { await retagPair(p, objects); }
catch (err) { console.error(`Objekt-Tagging-Fehler bei Pair ${p.id}:`, err.message); }
}
// Nomen-Wörter aus Satz-Placeholdern ({{label.w:id}}) mitübersetzen
try {
for (const wid of await collectPlaceholderWordIds(pairs)) {
try { await fillMissingRow('words', wid, ['titel']); }
catch (err) { progress.translateFailures++; console.error(`Translate-Fehler bei Wort ${wid}:`, err.message); }
}
} catch (err) { console.error(`Placeholder-Wörter sammeln fehlgeschlagen:`, err.message); }
// ── Step 2.5: KI-Review — alle Pairs + Bild an Sonnet zum Korrekturlesen ────
// (Rechtschreibung, Übersetzungs-Konsistenz, Plausibilität zum Bild). Korrekturen
// landen vor der Audio-Erzeugung in der DB; Fehler sind wie beim Übersetzen nicht
// fatal — Audio läuft trotzdem, der Lauf wird nicht abgebrochen.
progress.reviewedPairs = 0;
progress.correctionsApplied = 0;
progress.reviewFailures = 0;
await setStep(pictureId, 'review', progress);
try {
await reviewPicturePairs({
pictureId, pictureUrl: picture.picture_link, pairs, progress,
onProgress: () => setStep(pictureId, 'review', progress),
});
} catch (err) {
progress.reviewFailures++;
console.error(`Review-Fehler bei Bild ${pictureId}:`, err.message);
}
await setStep(pictureId, 'review', progress);
// ── Step 3: Audio für alle Sätze + Wörter des Bildes in allen Sprachen ────── // ── Step 3: Audio für alle Sätze + Wörter des Bildes in allen Sprachen ──────
try { try {
@@ -277,6 +562,8 @@ async function collectAudioUnits(pictureId, pairs) {
JOIN object_pictures op ON op.object_id = ow.object_id JOIN object_pictures op ON op.object_id = ow.object_id
WHERE op.picture_id = $1`, [pictureId]); WHERE op.picture_id = $1`, [pictureId]);
ow.rows.forEach(x => wordIds.add(x.word_id)); ow.rows.forEach(x => wordIds.add(x.word_id));
// + Nomen-Wörter aus Satz-Placeholdern ({{label.w:id}})
(await collectPlaceholderWordIds(pairs)).forEach(id => wordIds.add(id));
const sources = []; const sources = [];
if (questionIds.length) { if (questionIds.length) {
@@ -343,4 +630,4 @@ async function generateWithBackoff(u) {
} }
} }
module.exports = { enqueue, resumePending, loadPairs, collectAudioUnits, generateWithBackoff, translatePair }; module.exports = { enqueue, resumePending, loadObjects, loadPairs, collectAudioUnits, generateWithBackoff, translatePair, retagPair, retagObjects };

View File

@@ -5,14 +5,25 @@ const PLACEHOLDER_RE = /\{\{([^.{}]+)\.(w|o):([0-9a-f-]{36})\}\}/g;
// Legacy-Form ohne Label: {{uuid}} — sollte migriert sein, defensiv trotzdem entfernen. // Legacy-Form ohne Label: {{uuid}} — sollte migriert sein, defensiv trotzdem entfernen.
const LEGACY_PLACEHOLDER_RE = /\{\{\s*[0-9a-f-]{36}\s*\}\}/g; const LEGACY_PLACEHOLDER_RE = /\{\{\s*[0-9a-f-]{36}\s*\}\}/g;
// Schutz-Token während Übersetzung/Review: ⟦PHn:label⟧. Darf nie in der DB landen —
// falls doch (Claude-Halluzination), wird er überall defensiv zum Label aufgelöst.
const TOKEN_RE = /⟦(PH\d+):([^⟧]*)⟧/g;
// Entfernt geleakte ⟦PHn:label⟧-Tokens aus einem Text → nur das Label bleibt.
function stripLeakedTokens(text) {
if (!text) return text;
return String(text).replace(TOKEN_RE, (_, _key, label) => label.trim());
}
// Macht aus "Ist das ein {{Apfel.w:1234-…}}?" → "Ist das ein Apfel?" (für TTS/Anzeige). // Macht aus "Ist das ein {{Apfel.w:1234-…}}?" → "Ist das ein Apfel?" (für TTS/Anzeige).
function resolvePlaceholdersToLabels(text) { function resolvePlaceholdersToLabels(text) {
if (!text) return ''; if (!text) return '';
return String(text) return String(text)
.replace(PLACEHOLDER_RE, (_, label) => label) .replace(PLACEHOLDER_RE, (_, label) => label)
.replace(LEGACY_PLACEHOLDER_RE, '') .replace(LEGACY_PLACEHOLDER_RE, '')
.replace(TOKEN_RE, (_, _key, label) => label.trim())
.replace(/\s{2,}/g, ' ') .replace(/\s{2,}/g, ' ')
.trim(); .trim();
} }
module.exports = { PLACEHOLDER_RE, resolvePlaceholdersToLabels }; module.exports = { PLACEHOLDER_RE, TOKEN_RE, stripLeakedTokens, resolvePlaceholdersToLabels };

225
src/lib/reviewPairs.js Normal file
View File

@@ -0,0 +1,225 @@
// KI-Review der Pipeline: alle Pairs eines Bildes (alle Sprachen) + das Bild selbst
// gehen an Sonnet zum Korrekturlesen (Rechtschreibung, Übersetzungs-Konsistenz,
// Plausibilität zum Bild). Korrekturen werden vor der Audio-Erzeugung in die DB
// geschrieben; bereits vorhandene Audios der korrigierten Zellen werden gelöscht,
// damit Step 3 sie mit dem neuen Text neu erzeugt.
const { query } = require('../db');
const { callClaude, tokenize, LANGS } = require('./translate');
const { TOKEN_RE, stripLeakedTokens } = require('./placeholders');
const { deleteFile, keyFromUrl } = require('../s3');
const REVIEW_MODEL = process.env.REVIEW_MODEL || process.env.TRANSLATE_MODEL || 'claude-sonnet-4-5';
const BATCH_SIZE = 15; // Pairs pro Claude-Call (Bild wird je Batch mitgeschickt)
// Refs der Form "q:<uuid>:sentence_de" — kompakt im Prompt, eindeutig in der itemMap.
const TABLE_PREFIX = { questions: 'q', statements: 's', words: 'w' };
function makeItem(table, id, field, lang, text) {
// Geleakte ⟦PHn:…⟧-Reste im Quelltext zuerst auflösen — sonst sieht Claude sie als
// echte Tokens und die Token-Count-Validierung verhindert jede Korrektur der Zeile.
const { tokenized, tokens } = tokenize(stripLeakedTokens(text));
return {
ref: `${TABLE_PREFIX[table]}:${id}:${field}_${lang}`,
table, id, column: `${field}_${lang}`, field, lang,
tokenized, tokens,
};
}
// Alle gefüllten Textzellen der Pairs + Objekt-Wörter des Bildes laden.
// Rückgabe: { pairBlocks, wordBlock, itemMap } — itemMap: ref → Item (Whitelist).
async function loadReviewItems(pictureId, pairs) {
const itemMap = new Map();
const add = (table, row, field, lang) => {
const text = (row[`${field}_${lang}`] || '').trim();
if (!text) return null;
const item = makeItem(table, row.id, field, lang, text);
if (!itemMap.has(item.ref)) itemMap.set(item.ref, item);
return itemMap.get(item.ref);
};
const questionIds = [...new Set(pairs.map(p => p.question_id).filter(Boolean))];
const stmtIds = [...new Set(pairs.flatMap(p => [p.positive_statement_id, p.negative_statement_id]).filter(Boolean))];
const questions = new Map();
if (questionIds.length) {
const r = await query(
`SELECT id, sentence_de, sentence_en, sentence_sv FROM questions WHERE id = ANY($1)`, [questionIds]);
r.rows.forEach(row => questions.set(row.id, row));
}
const statements = new Map();
if (stmtIds.length) {
const r = await query(
`SELECT id, positive_sentence_de, positive_sentence_en, positive_sentence_sv,
negative_sentence_de, negative_sentence_en, negative_sentence_sv
FROM statements WHERE id = ANY($1)`, [stmtIds]);
r.rows.forEach(row => statements.set(row.id, row));
}
// Wörter: über die Statement-Links der word-Pairs + object_words des Bildes
const stmtWords = new Map(); // statementId → [wordId]
const wordIds = new Set();
if (stmtIds.length) {
for (const link of ['statement_positive_words', 'statement_negative_words']) {
const r = await query(`SELECT statement_id, word_id FROM ${link} WHERE statement_id = ANY($1)`, [stmtIds]);
for (const x of r.rows) {
if (!stmtWords.has(x.statement_id)) stmtWords.set(x.statement_id, []);
stmtWords.get(x.statement_id).push(x.word_id);
wordIds.add(x.word_id);
}
}
}
const objectWordIds = new Set();
const ow = await query(
`SELECT ow.word_id FROM object_words ow
JOIN object_pictures op ON op.object_id = ow.object_id
WHERE op.picture_id = $1`, [pictureId]);
ow.rows.forEach(x => { objectWordIds.add(x.word_id); wordIds.add(x.word_id); });
const words = new Map();
if (wordIds.size) {
const r = await query(
`SELECT id, titel_de, titel_en, titel_sv FROM words WHERE id = ANY($1) AND status <> 'blocked'`,
[[...wordIds]]);
r.rows.forEach(row => words.set(row.id, row));
}
// Prompt-Blöcke pro Pair zusammensetzen
const lines = (table, row, field) =>
LANGS.map(l => add(table, row, field, l)).filter(Boolean)
.map(it => ` ${it.ref} [${it.lang}]: "${it.tokenized}"`);
const pairBlocks = [];
for (const p of pairs) {
const block = [`PAIR (answer_type: ${p.answer_type}):`];
const q = p.question_id && questions.get(p.question_id);
if (q) block.push(...lines('questions', q, 'sentence'));
for (const [stmtId, label] of [[p.positive_statement_id, 'positive_sentence'],
[p.negative_statement_id, 'negative_sentence']]) {
const s = stmtId && statements.get(stmtId);
if (!s) continue;
if (p.answer_type === 'word') {
for (const wid of stmtWords.get(stmtId) || []) {
const w = words.get(wid);
if (w) block.push(...lines('words', w, 'titel'));
}
} else {
block.push(...lines('statements', s, label));
}
}
if (block.length > 1) pairBlocks.push(block.join('\n'));
}
const wordLines = [];
for (const wid of objectWordIds) {
const w = words.get(wid);
if (w) wordLines.push(...lines('words', w, 'titel'));
}
const wordBlock = wordLines.length ? `BILD-WÖRTER (Vokabeln zum Bild):\n${wordLines.join('\n')}` : null;
return { pairBlocks, wordBlock, itemMap };
}
function buildReviewPrompt(pictureUrl, blocks) {
const system =
'Du bist Lektor für eine Kinder-Sprachlern-App (Deutsch, Englisch, Schwedisch). ' +
'Du prüfst Lerninhalte zu einem Bild auf (a) Rechtschreibung und Grammatik je Sprache, ' +
'(b) korrekte und konsistente Übersetzung zwischen Deutsch/Englisch/Schwedisch — die Sprachfassungen ' +
'einer Zeile müssen dieselbe Bedeutung haben, (c) Plausibilität zum Bild. ' +
'Korrigiere NUR echte Fehler, behalte Stil und Länge bei. ' +
'Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown, ohne Erklärungen.';
const text =
`Prüfe die folgenden Inhalte zum beigefügten Bild. Jede Zeile hat eine Referenz (ref), ` +
`eine Sprache und den Text.\n\n` +
`WICHTIG: Tokens der Form ⟦PHn:wort⟧ sind geschützte Platzhalter. Du darfst das Wort INNERHALB ` +
`des Tokens korrigieren, aber das Token-Format muss exakt erhalten bleiben (⟦PHn:wort⟧). ` +
`Kein Token darf gelöscht, verdoppelt oder erfunden werden.\n\n` +
blocks.join('\n\n') + '\n\n' +
`Antwort-Format — NUR Zeilen, die wirklich einen Fehler enthalten (sonst leeres Array):\n` +
`{"corrections":[{"ref":"<ref>","corrected":"<korrigierter Text>"}]}`;
return {
system,
user: [
{ type: 'image', source: { type: 'url', url: pictureUrl } },
{ type: 'text', text },
],
};
}
// Token-Mengen vor/nach Korrektur müssen identisch sein; keine Fremd-Fragmente.
function validateCorrection(item, corrected) {
if (typeof corrected !== 'string' || !corrected.trim()) return { ok: false, reason: 'leer' };
const keys = [...corrected.matchAll(TOKEN_RE)].map(m => m[1]).sort();
const expected = item.tokens.map(t => t.key).sort();
if (keys.length !== expected.length || keys.some((k, i) => k !== expected[i]))
return { ok: false, reason: 'Platzhalter-Tokens verändert' };
const stripped = corrected.replace(TOKEN_RE, '');
if (/[⟦⟧]|\{\{|\}\}/.test(stripped)) return { ok: false, reason: 'Fragment im Text' };
// Detokenisieren: ⟦PHn:label⟧ → {{label.type:uuid}} (Label darf korrigiert sein)
const labels = {};
for (const m of corrected.matchAll(TOKEN_RE)) labels[m[1]] = m[2].trim();
let out = corrected;
for (const t of item.tokens) {
const label = labels[t.key] || t.sourceLabel;
out = out.replace(new RegExp(`${t.key}:[^⟧]*⟧`, 'g'), `{{${label}.${t.type}:${t.uuid}}}`);
}
return { ok: true, detokenized: out.trim() };
}
// Vorhandene Audios der korrigierten Zelle löschen (inkl. S3), damit Step 3 neu erzeugt.
async function invalidateAudio(table, id, field, lang) {
const r = await query(
`SELECT id, audio_link FROM audios
WHERE source_table=$1 AND source_id=$2 AND source_field=$3 AND language=$4`,
[table, id, field, lang]);
for (const row of r.rows) {
const k = keyFromUrl(row.audio_link);
if (k) await deleteFile(k).catch(() => {});
await query(`DELETE FROM audios WHERE id = $1`, [row.id]);
}
}
async function applyCorrection(item, newText) {
await query(`UPDATE ${item.table} SET ${item.column} = $1 WHERE id = $2`, [newText, item.id]);
await invalidateAudio(item.table, item.id, item.field, item.lang);
}
// Haupteinstieg: reviewt alle Pairs eines Bildes in Batches; wirft nie — Fehler
// werden in progress.reviewFailures gezählt, die Pipeline läuft weiter.
async function reviewPicturePairs({ pictureId, pictureUrl, pairs, progress, onProgress }) {
if (!pictureUrl || !pairs.length) return;
const { pairBlocks, wordBlock, itemMap } = await loadReviewItems(pictureId, pairs);
if (!pairBlocks.length && !wordBlock) return;
const batches = [];
for (let i = 0; i < pairBlocks.length; i += BATCH_SIZE)
batches.push(pairBlocks.slice(i, i + BATCH_SIZE));
if (!batches.length) batches.push([]);
if (wordBlock) batches[0] = [wordBlock, ...batches[0]];
for (const batch of batches) {
try {
const { system, user } = buildReviewPrompt(pictureUrl, batch);
const data = await callClaude({ system, user, maxTokens: 8000, model: REVIEW_MODEL });
const corrections = Array.isArray(data.corrections) ? data.corrections : [];
for (const c of corrections) {
const item = itemMap.get(c && c.ref);
if (!item) continue; // unbekannte Ref → verwerfen
const v = validateCorrection(item, c.corrected);
if (!v.ok) {
console.warn(`Review: Korrektur für ${c.ref} verworfen (${v.reason})`);
continue;
}
await applyCorrection(item, v.detokenized);
progress.correctionsApplied++;
}
} catch (err) {
progress.reviewFailures++;
console.error(`Review-Batch-Fehler bei Bild ${pictureId}:`, err.message);
}
progress.reviewedPairs = Math.min(progress.reviewedPairs + BATCH_SIZE, pairs.length);
if (onProgress) await onProgress();
}
}
module.exports = { reviewPicturePairs, loadReviewItems, buildReviewPrompt, validateCorrection, invalidateAudio };

View File

@@ -18,7 +18,7 @@ const TRANSLATE_CONFIG = {
// ── Placeholder-Schutz ──────────────────────────────────────────────────────── // ── Placeholder-Schutz ────────────────────────────────────────────────────────
// Format im Quelltext: {{label.w:uuid}} oder {{label.o:uuid}} // Format im Quelltext: {{label.w:uuid}} oder {{label.o:uuid}}
const { PLACEHOLDER_RE } = require('./placeholders'); const { PLACEHOLDER_RE, stripLeakedTokens } = require('./placeholders');
// Sätze für Claude vorbereiten: jedes Placeholder durch ⟦PHn:label⟧-Token ersetzen. // Sätze für Claude vorbereiten: jedes Placeholder durch ⟦PHn:label⟧-Token ersetzen.
// Token-Format ist absichtlich exotisch, damit Claude es nicht versehentlich ändert. // Token-Format ist absichtlich exotisch, damit Claude es nicht versehentlich ändert.
@@ -54,7 +54,7 @@ function detokenize(translated, tokens, labelsFromClaude) {
return { text: out, missingTokens: tokens.filter(t => !seen.has(t.key)).map(t => t.key) }; return { text: out, missingTokens: tokens.filter(t => !seen.has(t.key)).map(t => t.key) };
} }
async function callClaude({ system, user, maxTokens = 2000 }) { async function callClaude({ system, user, maxTokens = 2000, model = TRANSLATE_MODEL }) {
const apiKey = process.env.ANTHROPIC_API_KEY; const apiKey = process.env.ANTHROPIC_API_KEY;
if (!apiKey) { const e = new Error('ANTHROPIC_API_KEY nicht konfiguriert'); e.status = 500; throw e; } if (!apiKey) { const e = new Error('ANTHROPIC_API_KEY nicht konfiguriert'); e.status = 500; throw e; }
@@ -69,7 +69,7 @@ async function callClaude({ system, user, maxTokens = 2000 }) {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' }, headers: { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' },
body: JSON.stringify({ body: JSON.stringify({
model: TRANSLATE_MODEL, max_tokens: maxTokens, system, model, max_tokens: maxTokens, system,
messages: [{ role: 'user', content: user }], messages: [{ role: 'user', content: user }],
}), }),
}); });
@@ -98,17 +98,24 @@ async function translateText({ text, from, to }) {
if (!text || !text.trim()) return ''; if (!text || !text.trim()) return '';
const { tokenized, tokens } = tokenize(text); const { tokenized, tokens } = tokenize(text);
const system = 'Du bist ein professioneller Übersetzer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown, ohne Erklärungen.'; const system = 'Du bist ein professioneller Übersetzer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown, ohne Erklärungen.';
const user = `Übersetze diesen Text von ${LANG_LABEL[from] || from} nach ${LANG_LABEL[to] || to}.\n\n` + // Token-Erklärung NUR wenn der Text wirklich Tokens enthält — sonst halluziniert
`WICHTIG: Tokens der Form ⟦PHn:wort⟧ sind Platzhalter. Übersetze NUR das Wort innerhalb des Tokens, ` + // Claude gelegentlich ⟦PHn:…⟧-Tokens in die Übersetzung hinein.
`behalte das Token-Format exakt bei (⟦PHn:übersetztesWort⟧). Passe die Beugung des Wortes an den umgebenden Satz an ` + const user = tokens.length
`(Mehrzahl/Kasus). Die Token-Reihenfolge im Satz darfst du frei wählen wie es natürlich klingt.\n\n` + ? `Übersetze diesen Text von ${LANG_LABEL[from] || from} nach ${LANG_LABEL[to] || to}.\n\n` +
`Quelltext:\n${tokenized}\n\n` + `WICHTIG: Tokens der Form ⟦PHn:wort⟧ sind Platzhalter. Übersetze NUR das Wort innerhalb des Tokens, ` +
`Antwort-Format:\n{"translated":"...","labels":{${tokens.map(t => `"${t.key}":"<übersetztes Wort>"`).join(',')}}}`; `behalte das Token-Format exakt bei (⟦PHn:übersetztesWort⟧). Passe die Beugung des Wortes an den umgebenden Satz an ` +
`(Mehrzahl/Kasus). Die Token-Reihenfolge im Satz darfst du frei wählen wie es natürlich klingt.\n\n` +
`Quelltext:\n${tokenized}\n\n` +
`Antwort-Format:\n{"translated":"...","labels":{${tokens.map(t => `"${t.key}":"<übersetztes Wort>"`).join(',')}}}`
: `Übersetze diesen Text von ${LANG_LABEL[from] || from} nach ${LANG_LABEL[to] || to}.\n\n` +
`Quelltext:\n${tokenized}\n\n` +
`Antwort-Format:\n{"translated":"..."}`;
const data = await callClaude({ system, user }); const data = await callClaude({ system, user });
if (typeof data.translated !== 'string') throw new Error('Ungültiges JSON: translated fehlt'); if (typeof data.translated !== 'string') throw new Error('Ungültiges JSON: translated fehlt');
const { text: detok } = detokenize(data.translated, tokens, data.labels || {}); const { text: detok } = detokenize(data.translated, tokens, data.labels || {});
return detok; // Defensiv: von Claude erfundene/umnummerierte Tokens dürfen nie in die DB
return stripLeakedTokens(detok);
} }
// ── Auto-Status für Wörter (Spiegel zum Trigger in words.js) ────────────────── // ── Auto-Status für Wörter (Spiegel zum Trigger in words.js) ──────────────────

View File

@@ -2,6 +2,8 @@ const router = require('express').Router();
const bcrypt = require('bcryptjs'); const bcrypt = require('bcryptjs');
const jwt = require('jsonwebtoken'); const jwt = require('jsonwebtoken');
const { query } = require('../db'); const { query } = require('../db');
const { levelForEp, levelInfo } = require('../lib/leveling');
const { evaluateAchievements, listAchievements } = require('../lib/achievements');
function signToken(user) { function signToken(user) {
return jwt.sign( return jwt.sign(
@@ -138,9 +140,11 @@ router.get('/me', requireJwt, async (req, res, next) => {
un.username, un.username,
COALESCE(up.total_ep, 0) AS total_ep, COALESCE(up.total_ep, 0) AS total_ep,
COALESCE(up.streak_days, 0) AS streak_days, COALESCE(up.streak_days, 0) AS streak_days,
COALESCE(up.daily_goal_ep, 30) AS daily_goal_ep,
up.last_practice_at, up.last_practice_at,
ln.id AS language_native_id, ln.short_en AS language_native_short, ln.titel_de AS language_native_titel, ln.id AS language_native_id, ln.short_en AS language_native_short, ln.titel_de AS language_native_titel,
lt.id AS language_target_id, lt.short_en AS language_target_short, lt.titel_de AS language_target_titel lt.id AS language_target_id, lt.short_en AS language_target_short, lt.titel_de AS language_target_titel,
lt.greeting AS language_target_greeting
FROM users u FROM users u
LEFT JOIN users_public up ON up.user_id = u.id LEFT JOIN users_public up ON up.user_id = u.id
LEFT JOIN user_names un ON un.id = up.username_id LEFT JOIN user_names un ON un.id = up.username_id
@@ -151,7 +155,7 @@ router.get('/me', requireJwt, async (req, res, next) => {
); );
if (!r.rows.length) return res.status(404).json({ error: 'User not found' }); if (!r.rows.length) return res.status(404).json({ error: 'User not found' });
const row = r.rows[0]; const row = r.rows[0];
row.level = Math.floor((row.total_ep || 0) / 500); Object.assign(row, levelInfo(row.total_ep)); // level + ep_into_level + ep_to_next_level
res.json(row); res.json(row);
} catch (err) { next(err); } } catch (err) { next(err); }
}); });
@@ -177,27 +181,185 @@ router.post('/progress', requireJwt, async (req, res, next) => {
[userId, pair_id, isCorrect ? 1 : 0, isCorrect ? 0 : 1, pts] [userId, pair_id, isCorrect ? 1 : 0, isCorrect ? 0 : 1, pts]
); );
// EP + Streak auf users_public; Streak: +1 bei neuem Tag, Reset bei Lücke > 1 Tag // Tagesverlauf upserten (für Streak-Kalender, Wochengraph, Tagesziel).
// RETURNING ep_earned = NEUER Tagesstand → Tagesziel-Übergang erkennbar.
const day = await query(
`INSERT INTO user_daily_activity (user_id, activity_date, ep_earned, cards_done, correct_count)
VALUES ($1, CURRENT_DATE, $2, 1, $3)
ON CONFLICT (user_id, activity_date) DO UPDATE SET
ep_earned = user_daily_activity.ep_earned + $2,
cards_done = user_daily_activity.cards_done + 1,
correct_count = user_daily_activity.correct_count + $3
RETURNING ep_earned`,
[userId, pts, isCorrect ? 1 : 0]
);
// EP + Streak auf users_public; Streak: +1 bei neuem Tag, Reset bei Lücke > 1 Tag.
// CTE fängt die Pre-Update-Werte mit, damit Level-Up/Streak-Up atomar erkennbar sind.
const upd = await query( const upd = await query(
`UPDATE users_public SET `WITH prev AS (
total_ep = total_ep + $2, SELECT total_ep AS prev_ep, streak_days AS prev_streak
FROM users_public WHERE user_id = $1
)
UPDATE users_public up SET
total_ep = up.total_ep + $2,
streak_days = CASE streak_days = CASE
WHEN last_practice_at IS NULL THEN 1 WHEN up.last_practice_at IS NULL THEN 1
WHEN last_practice_at::date = CURRENT_DATE THEN streak_days WHEN up.last_practice_at::date = CURRENT_DATE THEN up.streak_days
WHEN last_practice_at::date = CURRENT_DATE - INTERVAL '1 day' THEN streak_days + 1 WHEN up.last_practice_at::date = CURRENT_DATE - INTERVAL '1 day' THEN up.streak_days + 1
ELSE 1 ELSE 1
END, END,
last_practice_at = NOW() last_practice_at = NOW()
WHERE user_id = $1 FROM prev
RETURNING total_ep, streak_days`, WHERE up.user_id = $1
RETURNING up.total_ep, up.streak_days, up.daily_goal_ep, prev.prev_ep, prev.prev_streak`,
[userId, pts] [userId, pts]
); );
if (!upd.rows.length) if (!upd.rows.length)
return res.status(409).json({ error: 'Kein Profil vorhanden. Bitte zuerst Profil anlegen.' }); return res.status(409).json({ error: 'Kein Profil vorhanden. Bitte zuerst Profil anlegen.' });
const { total_ep, streak_days } = upd.rows[0]; const r = upd.rows[0];
res.json({ total_ep, streak_days, level: Math.floor(total_ep / 500) }); const daily_ep = day.rows[0]?.ep_earned ?? pts;
const daily_goal_ep = r.daily_goal_ep || 30;
// Erfolge auswerten (nur neu freigeschaltete kommen zurück). Fehler hier dürfen
// die Buchung nicht kippen → defensiv leer.
let unlocked_achievements = [];
try {
unlocked_achievements = await evaluateAchievements(userId, {
total_ep: r.total_ep, streak_days: r.streak_days,
});
} catch (e) { /* Erfolge optional Buchung steht bereits */ }
res.json({
total_ep: r.total_ep,
level: levelForEp(r.total_ep),
prev_level: levelForEp(r.prev_ep),
streak_days: r.streak_days,
streak_increased: r.streak_days > r.prev_streak,
daily_ep,
daily_goal_ep,
// Schwellen-Übergang: jetzt erreicht, vorher (ohne diese Karte) noch nicht
goal_just_reached: daily_ep >= daily_goal_ep && (daily_ep - pts) < daily_goal_ep,
unlocked_achievements,
});
} catch (err) { next(err); }
});
// GET /auth/achievements — alle Erfolge mit Freischalt-Status (für die Profil-Sektion)
router.get('/achievements', requireJwt, async (req, res, next) => {
try {
res.json(await listAchievements(req.user.userId));
} catch (err) { next(err); }
});
// GET /auth/stats — Fortschrittsdaten für das Profil (Verlauf, Tagesziel, Skills)
router.get('/stats', requireJwt, async (req, res, next) => {
try {
const userId = req.user.userId;
// Tagesverlauf der letzten ~84 Tage (für Heatmap-Kalender + Wochengraph)
const daily = await query(
`SELECT to_char(activity_date, 'YYYY-MM-DD') AS date, ep_earned AS ep, cards_done AS cards, correct_count AS correct
FROM user_daily_activity
WHERE user_id = $1 AND activity_date >= CURRENT_DATE - INTERVAL '83 days'
ORDER BY activity_date ASC`,
[userId]
);
// Heute (für Tagesziel-Ring) + Tagesziel aus dem Profil
const today = await query(
`SELECT COALESCE(da.ep_earned, 0) AS ep, COALESCE(da.cards_done, 0) AS cards,
COALESCE(up.daily_goal_ep, 30) AS daily_goal_ep
FROM users_public up
LEFT JOIN user_daily_activity da
ON da.user_id = up.user_id AND da.activity_date = CURRENT_DATE
WHERE up.user_id = $1`,
[userId]
);
// Gesamtstatistik aus user_pair_progress
const totals = await query(
`SELECT COUNT(*)::int AS pairs_practiced,
COALESCE(SUM(seen_count), 0)::int AS total_seen,
COALESCE(SUM(correct_count), 0)::int AS total_correct
FROM user_pair_progress
WHERE user_id = $1`,
[userId]
);
// Skills: echte Genauigkeit je answer_type des Pairs.
// Mapping answer_type → Skill-Label: word/question → Vokabular, text → Lesen, yes_no → Verständnis.
const skillRows = await query(
`SELECT p.answer_type,
COALESCE(SUM(upp.correct_count), 0)::int AS correct,
COALESCE(SUM(upp.seen_count), 0)::int AS seen
FROM user_pair_progress upp
JOIN pairs p ON p.id = upp.pair_id
WHERE upp.user_id = $1
GROUP BY p.answer_type`,
[userId]
);
const SKILL_MAP = { word: 'Vokabular', question: 'Vokabular', text: 'Lesen', yes_no: 'Verständnis' };
const skillAcc = {}; // label -> { correct, seen }
for (const r of skillRows.rows) {
const label = SKILL_MAP[r.answer_type] || 'Sonstige';
const acc = (skillAcc[label] ||= { correct: 0, seen: 0 });
acc.correct += r.correct;
acc.seen += r.seen;
}
// Feste Reihenfolge, damit der Radar stabil bleibt; value = Genauigkeit (0..1)
const skills = ['Vokabular', 'Lesen', 'Verständnis'].map((label) => {
const acc = skillAcc[label];
return { label, value: acc && acc.seen > 0 ? acc.correct / acc.seen : 0, seen: acc?.seen || 0 };
});
// Punkte je Kategorie (Lebensmittel/Tiere/Beruf …) — abgeleitet über pair_categories.
// Mehrfach-Kategorien eines Pairs zählen bewusst zu jeder Kategorie.
const categoryRows = await query(
`SELECT c.id, c.titel_de AS label,
COALESCE(SUM(upp.earned_points), 0)::int AS points,
COALESCE(SUM(upp.seen_count), 0)::int AS seen
FROM user_pair_progress upp
JOIN pair_categories pc ON pc.pair_id = upp.pair_id
JOIN categories c ON c.id = pc.category_id
WHERE upp.user_id = $1
GROUP BY c.id, c.titel_de
HAVING SUM(upp.earned_points) > 0
ORDER BY points DESC`,
[userId]
);
const t = totals.rows[0] || { pairs_practiced: 0, total_seen: 0, total_correct: 0 };
const td = today.rows[0] || { ep: 0, cards: 0, daily_goal_ep: 30 };
res.json({
daily: daily.rows,
today: { ep: td.ep, cards: td.cards, daily_goal_ep: td.daily_goal_ep },
totals: {
pairs_practiced: t.pairs_practiced,
total_seen: t.total_seen,
total_correct: t.total_correct,
accuracy: t.total_seen > 0 ? t.total_correct / t.total_seen : 0,
},
skills,
categories: categoryRows.rows,
});
} catch (err) { next(err); }
});
// PUT /auth/goal — Tagesziel (EP/Tag) setzen
router.put('/goal', requireJwt, async (req, res, next) => {
try {
const goal = Math.max(5, Math.min(500, parseInt(req.body?.daily_goal_ep) || 0));
const upd = await query(
`UPDATE users_public SET daily_goal_ep = $2 WHERE user_id = $1 RETURNING daily_goal_ep`,
[req.user.userId, goal]
);
if (!upd.rows.length) return res.status(409).json({ error: 'Kein Profil vorhanden.' });
res.json({ daily_goal_ep: upd.rows[0].daily_goal_ep });
} catch (err) { next(err); } } catch (err) { next(err); }
}); });

View File

@@ -1,8 +1,22 @@
const router = require('express').Router(); const router = require('express').Router();
const { query } = require('../db'); const { query } = require('../db');
const { runCategorizationTick, classifyWordsSync } = require('../lib/classifyWords');
const STATUSES = ['requested', 'blocked', 'published']; const STATUSES = ['requested', 'blocked', 'published'];
// POST /api/categories/auto-assign — Kategorisierung anstoßen.
// ?sync=true → sofortiger One-Shot-Backfill bestehender Wörter (synchron, kein 24h-Verzug)
// ?sync=true&reset=true → bestehende Zuordnungen verwerfen und alles neu klassifizieren
// sonst → ein asynchroner Batch-Tick (submit/collect über die Message Batches API)
router.post('/auto-assign', async (req, res, next) => {
try {
const sync = req.query.sync === 'true' || req.body?.sync === true;
const reset = req.query.reset === 'true' || req.body?.reset === true;
const result = sync ? await classifyWordsSync({ reset }) : await runCategorizationTick();
res.json(result);
} catch (err) { next(err); }
});
const STATUS_TIMESTAMP = { const STATUS_TIMESTAMP = {
requested: 'requested_at', requested: 'requested_at',
published: 'published_at', published: 'published_at',

View File

@@ -39,12 +39,20 @@ function collectIds(lists, filterType) {
router.get('/', requireJwt, async (req, res, next) => { router.get('/', requireJwt, async (req, res, next) => {
try { try {
const lang = ['de', 'en', 'sv'].includes(req.query.lang) ? req.query.lang : 'de'; const lang = ['de', 'en', 'sv'].includes(req.query.lang) ? req.query.lang : 'de';
const limit = Math.min(parseInt(req.query.limit) || 20, 100); const limit = Math.min(parseInt(req.query.limit) || 20, 100);
const userId = req.user.userId;
// Vom Client schon geladene Pairs (In-Session-Dedupe) nur gültige UUIDs übernehmen.
const exclude = String(req.query.exclude || '')
.split(',')
.map(s => s.trim())
.filter(s => /^[0-9a-f-]{36}$/i.test(s));
// 1. Random pairs — only fully ready content: // 1. Random pairs — only fully ready content:
// pair published + linked question/statements published + a published picture exists. // pair published + linked question/statements published + a published picture exists.
// (Audio coverage is additionally enforced in Phase 2.) // (Audio coverage is additionally enforced in Phase 2.)
// Pagination: bereits abgeschlossene (user_pair_progress) und vom Client
// geladene Pairs werden ausgeschlossen; leere Antwort = keine weiteren Karten.
const pairsRes = await query( const pairsRes = await query(
`SELECT p.id, p.answer_type, p.status, p.difficulty_level, `SELECT p.id, p.answer_type, p.status, p.difficulty_level,
p.question_id, p.positive_statement_id, p.negative_statement_id p.question_id, p.positive_statement_id, p.negative_statement_id
@@ -61,9 +69,13 @@ router.get('/', requireJwt, async (req, res, next) => {
JOIN object_pictures pic ON pic.object_id = op.object_id JOIN object_pictures pic ON pic.object_id = op.object_id
JOIN pictures pp ON pp.id = pic.picture_id JOIN pictures pp ON pp.id = pic.picture_id
WHERE op.pair_id = p.id AND pp.status = 'published') WHERE op.pair_id = p.id AND pp.status = 'published')
AND NOT EXISTS (
SELECT 1 FROM user_pair_progress upp
WHERE upp.pair_id = p.id AND upp.user_id = $2)
AND p.id <> ALL($3::uuid[])
ORDER BY random() ORDER BY random()
LIMIT $1`, LIMIT $1`,
[limit] [limit, userId, exclude]
); );
if (!pairsRes.rows.length) return res.json([]); if (!pairsRes.rows.length) return res.json([]);
const pairs = pairsRes.rows; const pairs = pairsRes.rows;

View File

@@ -2,6 +2,8 @@ const router = require('express').Router();
const { query } = require('../db'); const { query } = require('../db');
const { fillMissingRow } = require('../lib/translate'); const { fillMissingRow } = require('../lib/translate');
const { loadPairContext, computeReadiness, loadPairContent, translateWordGroup } = require('../lib/pairContent'); const { loadPairContext, computeReadiness, loadPairContent, translateWordGroup } = require('../lib/pairContent');
const { deletePairDeep } = require('../lib/deleteCascade');
const { derivePairCategories } = require('../lib/pairCategories');
const STATUSES = ['draft', 'reviewed', 'blocked', 'published']; const STATUSES = ['draft', 'reviewed', 'blocked', 'published'];
const ANSWER_TYPES = new Set(['yes_no', 'text', 'question', 'word']); const ANSWER_TYPES = new Set(['yes_no', 'text', 'question', 'word']);
@@ -130,6 +132,11 @@ router.patch('/:id', async (req, res, next) => {
values values
); );
if (!result.rows.length) return res.status(404).json({ error: 'Not found' }); if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
// Beim Veröffentlichen Kategorien aus den verknüpften Wörtern ableiten (best effort).
if (req.body.status === 'published')
await derivePairCategories(result.rows[0].id).catch(() => {});
res.json(result.rows[0]); res.json(result.rows[0]);
} catch (err) { next(err); } } catch (err) { next(err); }
}); });
@@ -294,15 +301,17 @@ router.post('/:id/publish', async (req, res, next) => {
`UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2) WHERE id=$1 RETURNING *`, `UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2) WHERE id=$1 RETURNING *`,
[p.id, now]); [p.id, now]);
await derivePairCategories(p.id).catch(() => {});
res.json({ ...upd.rows[0], published_languages: [lang] }); res.json({ ...upd.rows[0], published_languages: [lang] });
} catch (err) { next(err); } } catch (err) { next(err); }
}); });
// DELETE /api/pairs/:id // DELETE /api/pairs/:id — Pair + (unreferenzierte) Frage/Statements + deren Audios (DB+S3)
router.delete('/:id', async (req, res, next) => { router.delete('/:id', async (req, res, next) => {
try { try {
const result = await query('DELETE FROM pairs WHERE id = $1 RETURNING id', [req.params.id]); const deleted = await deletePairDeep(req.params.id);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' }); if (!deleted) return res.status(404).json({ error: 'Not found' });
res.status(204).end(); res.status(204).end();
} catch (err) { next(err); } } catch (err) { next(err); }
}); });

130
src/routes/picture-jobs.js Normal file
View File

@@ -0,0 +1,130 @@
const router = require('express').Router();
const { query } = require('../db');
const STATUSES = ['pending', 'generating', 'done', 'failed'];
// GET /api/picture-jobs
router.get('/', async (req, res, next) => {
try {
const { status, limit = 50, offset = 0 } = req.query;
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
const conditions = [];
if (status) { conditions.push(`pj.status = $${params.length + 1}`); params.push(status); }
const where = conditions.length ? `WHERE ${conditions.join(' AND ')}` : '';
const result = await query(
`SELECT pj.*,
COALESCE(json_agg(DISTINCT pjw.word_id) FILTER (WHERE pjw.word_id IS NOT NULL), '[]') AS word_ids
FROM picture_jobs pj
LEFT JOIN picture_job_words pjw ON pjw.picture_job_id = pj.id
${where}
GROUP BY pj.id
ORDER BY pj.created_at DESC
LIMIT $1 OFFSET $2`,
params
);
res.json(result.rows);
} catch (err) { next(err); }
});
// GET /api/picture-jobs/:id
router.get('/:id', async (req, res, next) => {
try {
const result = await query(
`SELECT pj.*,
COALESCE(json_agg(DISTINCT pjw.word_id) FILTER (WHERE pjw.word_id IS NOT NULL), '[]') AS word_ids
FROM picture_jobs pj
LEFT JOIN picture_job_words pjw ON pjw.picture_job_id = pj.id
WHERE pj.id = $1
GROUP BY pj.id`,
[req.params.id]
);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.json(result.rows[0]);
} catch (err) { next(err); }
});
// GET /api/picture-jobs/:id/words
router.get('/:id/words', async (req, res, next) => {
try {
const result = await query(
`SELECT w.* FROM words w
JOIN picture_job_words pjw ON pjw.word_id = w.id
WHERE pjw.picture_job_id = $1`,
[req.params.id]
);
res.json(result.rows);
} catch (err) { next(err); }
});
// POST /api/picture-jobs
router.post('/', async (req, res, next) => {
try {
const { kategorie_id, prompt_fix, prompt_atmosphere, prompt_setting, prompt_final, word_ids } = req.body;
const result = await query(
`INSERT INTO picture_jobs (kategorie_id, prompt_fix, prompt_atmosphere, prompt_setting, prompt_final)
VALUES ($1, $2, $3, $4, $5) RETURNING *`,
[kategorie_id || null, prompt_fix || null, prompt_atmosphere || null, prompt_setting || null, prompt_final || null]
);
const job = result.rows[0];
if (Array.isArray(word_ids) && word_ids.length) {
for (const wid of word_ids) {
await query(
`INSERT INTO picture_job_words (picture_job_id, word_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
[job.id, wid]
).catch(() => {});
}
}
res.status(201).json({ ...job, word_ids: word_ids || [] });
} catch (err) { next(err); }
});
// PATCH /api/picture-jobs/:id
router.patch('/:id', async (req, res, next) => {
try {
const allowed = ['kategorie_id', 'prompt_fix', 'prompt_atmosphere', 'prompt_setting', 'prompt_final', 'status', 'picture_id'];
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' });
if (req.body.status && !STATUSES.includes(req.body.status))
return res.status(400).json({ error: `status must be one of: ${STATUSES.join(', ')}` });
const setClauses = fields.map((f, i) => `${f} = $${i + 1}`).join(', ');
const result = await query(
`UPDATE picture_jobs SET ${setClauses} WHERE id = $${fields.length + 1} RETURNING *`,
[...fields.map(f => req.body[f]), req.params.id]
);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.json(result.rows[0]);
} catch (err) { next(err); }
});
// PUT /api/picture-jobs/:id/words/:wordId
router.put('/:id/words/:wordId', async (req, res, next) => {
try {
await query(
`INSERT INTO picture_job_words (picture_job_id, word_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
[req.params.id, req.params.wordId]
);
res.status(204).end();
} catch (err) { next(err); }
});
// DELETE /api/picture-jobs/:id/words/:wordId
router.delete('/:id/words/:wordId', async (req, res, next) => {
try {
await query(
`DELETE FROM picture_job_words WHERE picture_job_id = $1 AND word_id = $2`,
[req.params.id, req.params.wordId]
);
res.status(204).end();
} catch (err) { next(err); }
});
// DELETE /api/picture-jobs/:id
router.delete('/:id', async (req, res, next) => {
try {
const result = await query('DELETE FROM picture_jobs WHERE id = $1 RETURNING id', [req.params.id]);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.status(204).end();
} catch (err) { next(err); }
});
module.exports = router;

View File

@@ -3,6 +3,7 @@ const multer = require('multer');
const { v4: uuidv4 } = require('uuid'); const { v4: uuidv4 } = require('uuid');
const { query } = require('../db'); const { query } = require('../db');
const { uploadFile, deleteFile, keyFromUrl } = require('../s3'); const { uploadFile, deleteFile, keyFromUrl } = require('../s3');
const { deletePictureObjectsDeep } = require('../lib/deleteCascade');
const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 20 * 1024 * 1024 } }); const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 20 * 1024 * 1024 } });
@@ -153,12 +154,15 @@ router.patch('/:id', async (req, res, next) => {
} catch (err) { next(err); } } catch (err) { next(err); }
}); });
// DELETE /api/pictures/:id — Eintrag + Hetzner-Datei löschen // DELETE /api/pictures/:id — Eintrag + Hetzner-Datei löschen,
// inkl. Objekte des Bildes und deren Pairs (Fragen/Statements/Audios kaskadieren)
router.delete('/:id', async (req, res, next) => { router.delete('/:id', async (req, res, next) => {
try { try {
const existing = await query('SELECT picture_link FROM pictures WHERE id = $1', [req.params.id]); const existing = await query('SELECT picture_link FROM pictures WHERE id = $1', [req.params.id]);
if (!existing.rows.length) return res.status(404).json({ error: 'Not found' }); if (!existing.rows.length) return res.status(404).json({ error: 'Not found' });
await deletePictureObjectsDeep(req.params.id);
const key = keyFromUrl(existing.rows[0].picture_link); const key = keyFromUrl(existing.rows[0].picture_link);
if (key) await deleteFile(key).catch(() => {}); if (key) await deleteFile(key).catch(() => {});

View File

@@ -3,9 +3,11 @@ const router = require('express').Router();
const { query } = require('../db'); const { query } = require('../db');
const { LANGS } = require('../lib/translate'); const { LANGS } = require('../lib/translate');
const { loadPairContext, computeReadiness, loadPairContent } = require('../lib/pairContent'); const { loadPairContext, computeReadiness, loadPairContent } = require('../lib/pairContent');
const { enqueue, loadPairs, collectAudioUnits, generateWithBackoff, translatePair } = require('../lib/pipeline'); const { enqueue, loadPairs, collectAudioUnits, generateWithBackoff, translatePair, retagObjects } = require('../lib/pipeline');
const { describeError } = require('./audios'); const { describeError } = require('./audios');
const { PLACEHOLDER_RE } = require('../lib/placeholders'); const { PLACEHOLDER_RE, TOKEN_RE, stripLeakedTokens } = require('../lib/placeholders');
const { invalidateAudio } = require('../lib/reviewPairs');
const { derivePairCategories } = require('../lib/pairCategories');
// ── Objekt-Wort-Erkennung in Sätzen (für die manuelle Zuweisung beim Review) ── // ── Objekt-Wort-Erkennung in Sätzen (für die manuelle Zuweisung beim Review) ──
@@ -241,6 +243,95 @@ router.post('/picture/:id/audio-fill', async (req, res, next) => {
} catch (err) { next(err); } } catch (err) { next(err); }
}); });
// POST /api/pipeline/repair-tokens — Datenreparatur: geleakte ⟦PHn:…⟧-Tokens
// (Claude-Halluzination beim Übersetzen, vor dem Fix) aus allen Sätzen entfernen.
// Betroffene Audios werden gelöscht und direkt mit dem reparierten Text neu erzeugt.
router.post('/repair-tokens', async (req, res, next) => {
try {
const hasToken = v => { TOKEN_RE.lastIndex = 0; return TOKEN_RE.test(v || ''); };
const result = { cells_fixed: 0, audios_regenerated: 0, audios_failed: 0, details: [] };
const targets = [
{ table: 'questions', fields: ['sentence'] },
{ table: 'statements', fields: ['positive_sentence', 'negative_sentence'] },
{ table: 'words', fields: ['titel'] },
];
// 1) Textzellen säubern + zugehörige Audios löschen & neu generieren
for (const t of targets) {
const cols = t.fields.flatMap(f => LANGS.map(l => `${f}_${l}`));
const r = await query(
`SELECT id, ${cols.join(', ')} FROM ${t.table}
WHERE ${cols.map(c => `${c} LIKE '%⟦PH%'`).join(' OR ')}`);
for (const row of r.rows) {
for (const f of t.fields) {
for (const l of LANGS) {
const col = `${f}_${l}`;
if (!hasToken(row[col])) continue;
const fixed = stripLeakedTokens(row[col]).replace(/\s{2,}/g, ' ').trim();
await query(`UPDATE ${t.table} SET ${col} = $1 WHERE id = $2`, [fixed, row.id]);
await invalidateAudio(t.table, row.id, f, l);
result.cells_fixed++;
const detail = { table: t.table, id: row.id, column: col, fixed };
try {
await generateWithBackoff({ text: fixed, language: l, source_table: t.table, source_id: row.id, source_field: f });
result.audios_regenerated++;
} catch (err) {
result.audios_failed++;
detail.audio_error = describeError(err);
}
result.details.push(detail);
}
}
}
}
// 2) Audios, deren vertonter Text noch Tokens enthält (Zelle ggf. schon anderweitig
// korrigiert) → löschen und mit dem aktuellen Zellen-Text neu erzeugen
const audios = await query(
`SELECT id, source_table, source_id, source_field, language FROM audios WHERE text LIKE '%⟦PH%'`);
for (const a of audios.rows) {
const r = await query(
`SELECT ${a.source_field}_${a.language} AS text FROM ${a.source_table} WHERE id = $1`, [a.source_id]);
const text = (r.rows[0]?.text || '').trim();
await invalidateAudio(a.source_table, a.source_id, a.source_field, a.language);
const detail = { table: 'audios', id: a.id, column: `${a.source_field}_${a.language}` };
if (text) {
try {
await generateWithBackoff({ text, language: a.language, source_table: a.source_table, source_id: a.source_id, source_field: a.source_field });
result.audios_regenerated++;
} catch (err) {
result.audios_failed++;
detail.audio_error = describeError(err);
}
}
result.details.push(detail);
}
res.json(result);
} catch (err) { next(err); }
});
// POST /api/pipeline/retag-objects — Backfill: Objekt-Wörter in bestehenden Sätzen
// nachtokenisieren (deterministisch + optional Hybrid-LLM-Fallback für gebeugte Formen).
// Body: { picture_id?, dry_run?, use_llm?, cleanup? }. Ohne picture_id über ALLE Bilder.
// cleanup:true ⇒ statt taggen werden falsch getokte Objekt-Wörter (Objektwort nur als
// Bestimmungswort eines anderen Dings, z.B. „Erdbeerfeld") per LLM-Prüfung wieder entfernt.
// Ändert nur die Satz-Textfelder; Audio/Alignment bleiben gültig (Sprechtext unverändert).
router.post('/retag-objects', async (req, res, next) => {
try {
const pictureId = req.body?.picture_id || null;
const dryRun = !!req.body?.dry_run;
const useLLM = !!req.body?.use_llm;
const cleanup = !!req.body?.cleanup;
if (pictureId) {
const pr = await query(`SELECT id FROM pictures WHERE id = $1`, [pictureId]);
if (!pr.rows.length) return res.status(404).json({ error: 'Bild nicht gefunden' });
}
const report = await retagObjects({ pictureId, dryRun, useLLM, cleanup });
res.json(report);
} catch (err) { next(err); }
});
// GET /api/pipeline/settings // GET /api/pipeline/settings
router.get('/settings', async (req, res, next) => { router.get('/settings', async (req, res, next) => {
try { try {
@@ -316,6 +407,9 @@ router.post('/picture/:id/publish', async (req, res, next) => {
await query(`UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2) await query(`UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2)
WHERE id = ANY($1)`, [pairIds, now]); WHERE id = ANY($1)`, [pairIds, now]);
// Kategorien der veröffentlichten Pairs aus ihren Wörtern ableiten (best effort).
await derivePairCategories(pairIds).catch(() => {});
// Verlinkte Wörter: nur 'generated' → 'published' (translated bleibt für die Bild-Generierung // Verlinkte Wörter: nur 'generated' → 'published' (translated bleibt für die Bild-Generierung
// im ServerMonitor-Flow; published würde diesen Schritt überspringen) // im ServerMonitor-Flow; published würde diesen Schritt überspringen)
let publishedWords = 0; let publishedWords = 0;

View File

@@ -0,0 +1,74 @@
const router = require('express').Router();
const { query } = require('../db');
const TYPES = ['fix', 'atmosphere', 'setting'];
// GET /api/prompt-styles
router.get('/', async (req, res, next) => {
try {
const { type, limit = 100, offset = 0 } = req.query;
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
const conditions = [];
if (type) { conditions.push(`type = $${params.length + 1}`); params.push(type); }
const where = conditions.length ? `WHERE ${conditions.join(' AND ')}` : '';
const result = await query(
`SELECT * FROM prompt_styles ${where} ORDER BY type, id LIMIT $1 OFFSET $2`,
params
);
res.json(result.rows);
} catch (err) { next(err); }
});
// GET /api/prompt-styles/:id
router.get('/:id', async (req, res, next) => {
try {
const result = await query('SELECT * FROM prompt_styles WHERE id = $1', [req.params.id]);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.json(result.rows[0]);
} catch (err) { next(err); }
});
// POST /api/prompt-styles
router.post('/', async (req, res, next) => {
try {
const { type, kategorie_id, text_en } = req.body;
if (!type || !TYPES.includes(type))
return res.status(400).json({ error: `type must be one of: ${TYPES.join(', ')}` });
if (!text_en)
return res.status(400).json({ error: 'text_en is required' });
const result = await query(
`INSERT INTO prompt_styles (type, kategorie_id, text_en) VALUES ($1, $2, $3) RETURNING *`,
[type, kategorie_id || null, text_en]
);
res.status(201).json(result.rows[0]);
} catch (err) { next(err); }
});
// PATCH /api/prompt-styles/:id
router.patch('/:id', async (req, res, next) => {
try {
const allowed = ['type', 'kategorie_id', 'text_en'];
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' });
if (req.body.type && !TYPES.includes(req.body.type))
return res.status(400).json({ error: `type must be one of: ${TYPES.join(', ')}` });
const setClauses = fields.map((f, i) => `${f} = $${i + 1}`).join(', ');
const result = await query(
`UPDATE prompt_styles SET ${setClauses} WHERE id = $${fields.length + 1} RETURNING *`,
[...fields.map(f => req.body[f]), req.params.id]
);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.json(result.rows[0]);
} catch (err) { next(err); }
});
// DELETE /api/prompt-styles/:id
router.delete('/:id', async (req, res, next) => {
try {
const result = await query('DELETE FROM prompt_styles WHERE id = $1 RETURNING id', [req.params.id]);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.status(204).end();
} catch (err) { next(err); }
});
module.exports = router;

View File

@@ -0,0 +1,69 @@
const router = require('express').Router();
const { query } = require('../db');
const STATUSES = ['pending', 'generating', 'generated', 'accepted', 'rejected'];
// GET /api/word-generative
router.get('/', async (req, res, next) => {
try {
const { status, word_id, limit = 50, offset = 0 } = req.query;
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
const conditions = [];
if (status) { conditions.push(`status = $${params.length + 1}`); params.push(status); }
if (word_id) { conditions.push(`word_id = $${params.length + 1}`); params.push(word_id); }
const where = conditions.length ? `WHERE ${conditions.join(' AND ')}` : '';
const result = await query(
`SELECT * FROM word_generative ${where} ORDER BY created_at DESC LIMIT $1 OFFSET $2`,
params
);
res.json(result.rows);
} catch (err) { next(err); }
});
// POST /api/word-generative
router.post('/', async (req, res, next) => {
try {
const { word_id, prompt, status } = req.body;
if (!word_id) return res.status(400).json({ error: 'word_id ist erforderlich' });
if (status && !STATUSES.includes(status))
return res.status(400).json({ error: `status muss eines sein von: ${STATUSES.join(', ')}` });
const result = await query(
`INSERT INTO word_generative (word_id, prompt, status)
VALUES ($1, $2, $3) RETURNING *`,
[word_id, prompt || null, status || 'pending']
);
res.status(201).json(result.rows[0]);
} catch (err) { next(err); }
});
// PATCH /api/word-generative/:id
router.patch('/:id', async (req, res, next) => {
try {
const allowed = ['prompt', 'status', 'picture_link'];
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
if (!fields.length) return res.status(400).json({ error: 'Keine gültigen Felder angegeben' });
if (req.body.status && !STATUSES.includes(req.body.status))
return res.status(400).json({ error: `status muss eines sein von: ${STATUSES.join(', ')}` });
const setClauses = fields.map((f, i) => `${f} = $${i + 1}`).join(', ');
const values = [...fields.map(f => req.body[f]), req.params.id];
const result = await query(
`UPDATE word_generative SET ${setClauses} WHERE id = $${fields.length + 1} RETURNING *`,
values
);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.json(result.rows[0]);
} catch (err) { next(err); }
});
// DELETE /api/word-generative/:id
router.delete('/:id', async (req, res, next) => {
try {
const result = await query(
`DELETE FROM word_generative WHERE id = $1 RETURNING id`, [req.params.id]
);
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
res.status(204).end();
} catch (err) { next(err); }
});
module.exports = router;

View File

@@ -1,5 +1,6 @@
const router = require('express').Router(); const router = require('express').Router();
const { query } = require('../db'); const { query } = require('../db');
const { runEnrichTick, enrichWordsSync } = require('../lib/enrichWords');
const STATUSES = ['requested', 'translated', 'generated', 'blocked', 'published']; const STATUSES = ['requested', 'translated', 'generated', 'blocked', 'published'];
@@ -9,14 +10,32 @@ const STATUS_TIMESTAMP = {
blocked: 'blocked_at', blocked: 'blocked_at',
}; };
// POST /api/words/enrich-batch — manueller Trigger für Wort-Anreicherung
router.post('/enrich-batch', async (req, res, next) => {
try {
const sync = req.query.sync === 'true';
if (sync) {
const max = parseInt(req.query.max) || 500;
return res.json(await enrichWordsSync({ max }));
}
res.json(await runEnrichTick());
} catch (err) { next(err); }
});
// GET /api/words // GET /api/words
router.get('/', async (req, res, next) => { router.get('/', async (req, res, next) => {
try { try {
const { status, titel_de, search, limit = 50, offset = 0 } = req.query; const { status, titel_de, search, dom_pos, level, themenfeld_id, has_conc_m,
limit = 50, offset = 0 } = req.query;
const params = [Math.min(parseInt(limit), 500), parseInt(offset)]; const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
const conditions = []; const conditions = [];
if (status) { conditions.push(`w.status = $${params.length + 1}`); params.push(status); } if (status) { conditions.push(`w.status = $${params.length + 1}`); params.push(status); }
if (titel_de) { conditions.push(`lower(w.titel_de) = lower($${params.length + 1})`); params.push(titel_de); } if (titel_de) { conditions.push(`lower(w.titel_de) = lower($${params.length + 1})`); params.push(titel_de); }
if (dom_pos) { conditions.push(`w.dom_pos = $${params.length + 1}`); params.push(dom_pos); }
if (level) { conditions.push(`w.level = $${params.length + 1}`); params.push(level); }
if (themenfeld_id) { conditions.push(`w.themenfeld_id = $${params.length + 1}`); params.push(themenfeld_id); }
if (has_conc_m === 'true') conditions.push(`w.conc_m IS NOT NULL`);
if (has_conc_m === 'false') conditions.push(`w.conc_m IS NULL`);
if (search) { if (search) {
const p = `%${search.toLowerCase()}%`; const p = `%${search.toLowerCase()}%`;
conditions.push(`(lower(w.titel_de) LIKE $${params.length + 1} OR lower(w.titel_en) LIKE $${params.length + 1} OR lower(w.titel_sv) LIKE $${params.length + 1})`); conditions.push(`(lower(w.titel_de) LIKE $${params.length + 1} OR lower(w.titel_en) LIKE $${params.length + 1} OR lower(w.titel_sv) LIKE $${params.length + 1})`);
@@ -26,12 +45,14 @@ router.get('/', async (req, res, next) => {
const result = await query( const result = await query(
`SELECT w.*, `SELECT w.*,
COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids, COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids,
COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids,
COUNT(DISTINCT wp2.picture_id)::int AS picture_count
FROM words w FROM words w
LEFT JOIN word_pictures wp ON wp.word_id = w.id LEFT JOIN word_pictures wp ON wp.word_id = w.id
LEFT JOIN pictures p ON p.id = wp.picture_id LEFT JOIN pictures p ON p.id = wp.picture_id
LEFT JOIN word_categories wc ON wc.word_id = w.id LEFT JOIN word_categories wc ON wc.word_id = w.id
LEFT JOIN categories c ON c.id = wc.category_id LEFT JOIN categories c ON c.id = wc.category_id
LEFT JOIN word_pictures wp2 ON wp2.word_id = w.id
${where} ${where}
GROUP BY w.id GROUP BY w.id
ORDER BY w.created_at DESC ORDER BY w.created_at DESC
@@ -50,18 +71,32 @@ function autoTranslatedStatus(row) {
// POST /api/words // POST /api/words
router.post('/', async (req, res, next) => { router.post('/', async (req, res, next) => {
try { try {
const { titel_de, titel_en, titel_sv, difficulty_level, status } = req.body; const { titel_de, titel_en, titel_sv, difficulty_level, status, conc_m } = req.body;
if (status && !STATUSES.includes(status)) if (status && !STATUSES.includes(status))
return res.status(400).json({ error: `status must be one of: ${STATUSES.join(', ')}` }); return res.status(400).json({ error: `status must be one of: ${STATUSES.join(', ')}` });
// Auto: alle 3 Sprachen direkt mitgeliefert + kein expliziter Status → 'translated' // Auto: alle 3 Sprachen direkt mitgeliefert + kein expliziter Status → 'translated'
const allLangs = titel_de && titel_en && titel_sv; const allLangs = titel_de && titel_en && titel_sv;
const effectiveStatus = status || (allLangs ? 'translated' : 'requested'); const effectiveStatus = status || (allLangs ? 'translated' : 'requested');
const result = await query( // Upsert: neu anlegen oder bei doppeltem titel_en nur conc_m aktualisieren
`INSERT INTO words (titel_de, titel_en, titel_sv, difficulty_level, status, requested_at) let result = await query(
VALUES ($1, $2, $3, $4, $5, NOW()) RETURNING *`, `INSERT INTO words (titel_de, titel_en, titel_sv, difficulty_level, status, conc_m, requested_at)
[titel_de || null, titel_en || null, titel_sv || null, difficulty_level || null, effectiveStatus] VALUES ($1, $2, $3, $4, $5, $6, NOW()) RETURNING *, true AS is_insert`,
); [titel_de || null, titel_en || null, titel_sv || null,
res.status(201).json({ ...result.rows[0], picture_ids: [], category_ids: [] }); difficulty_level || null, effectiveStatus, conc_m ?? null]
).catch(async err => {
if (err.code === '23505' && titel_en) {
// Duplikat auf titel_en → conc_m aktualisieren und bestehende Zeile zurückgeben
const upd = await query(
`UPDATE words SET conc_m = $1 WHERE titel_en = $2 RETURNING *, false AS is_insert`,
[conc_m ?? null, titel_en]
);
return upd;
}
throw err;
});
const row = result.rows[0];
const { is_insert, ...word } = row;
res.status(is_insert ? 201 : 200).json({ ...word, picture_ids: [], category_ids: [] });
} catch (err) { next(err); } } catch (err) { next(err); }
}); });
@@ -69,7 +104,8 @@ router.post('/', async (req, res, next) => {
router.patch('/:id', async (req, res, next) => { router.patch('/:id', async (req, res, next) => {
try { try {
const allowed = ['titel_de', 'titel_en', 'titel_sv', 'status', const allowed = ['titel_de', 'titel_en', 'titel_sv', 'status',
'difficulty_level', 'requested_at', 'published_at', 'blocked_at']; 'difficulty_level', 'requested_at', 'published_at', 'blocked_at',
'conc_m', 'dom_pos', 'level', 'themenfeld_id'];
const fields = Object.keys(req.body).filter(k => allowed.includes(k)); const fields = Object.keys(req.body).filter(k => allowed.includes(k));
if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' }); if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' });
@@ -117,12 +153,14 @@ router.get('/:id', async (req, res, next) => {
const result = await query( const result = await query(
`SELECT w.*, `SELECT w.*,
COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids, COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids,
COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids,
COUNT(DISTINCT wp2.picture_id)::int AS picture_count
FROM words w FROM words w
LEFT JOIN word_pictures wp ON wp.word_id = w.id LEFT JOIN word_pictures wp ON wp.word_id = w.id
LEFT JOIN pictures p ON p.id = wp.picture_id LEFT JOIN pictures p ON p.id = wp.picture_id
LEFT JOIN word_categories wc ON wc.word_id = w.id LEFT JOIN word_categories wc ON wc.word_id = w.id
LEFT JOIN categories c ON c.id = wc.category_id LEFT JOIN categories c ON c.id = wc.category_id
LEFT JOIN word_pictures wp2 ON wp2.word_id = w.id
WHERE w.id = $1 WHERE w.id = $1
GROUP BY w.id`, GROUP BY w.id`,
[req.params.id] [req.params.id]