Compare commits
23 Commits
96ae76f295
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1085a54761 | ||
|
|
dbd077e239 | ||
|
|
01f6df67f3 | ||
|
|
1335136dcb | ||
| 455969bdec | |||
| 1d25f84f5d | |||
| 294608de22 | |||
| 7ba6b7120b | |||
| 1605d2cdd1 | |||
| 61b3bcb5ff | |||
| bb863640c0 | |||
| 806e25c3ff | |||
| 339a3ed27d | |||
| bd18a9c303 | |||
| d66cff3f61 | |||
| 9738d3e35a | |||
| 508d6993ee | |||
| e44d896f9e | |||
| 434839e1d4 | |||
| f0f768ff2c | |||
| 895d7c56a1 | |||
| 25d1e89446 | |||
| ddbd879dab |
42
CLAUDE.md
Executable file
42
CLAUDE.md
Executable file
@@ -0,0 +1,42 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
REST-API für das snakkimo-Projekt. Node/Express + PostgreSQL (`pg`, kein ORM), Bild-Assets auf Hetzner Object Storage (S3-kompatibel). Ausführliche API-Doku in [README.md](README.md).
|
||||||
|
|
||||||
|
## Befehle
|
||||||
|
- `npm run dev` — lokaler Server mit nodemon (Hot-Reload)
|
||||||
|
- `npm start` — Produktion (`node src/index.js`)
|
||||||
|
- Keine Tests / kein Linter konfiguriert.
|
||||||
|
|
||||||
|
## Architektur
|
||||||
|
- Einstieg: [src/index.js](src/index.js) — registriert alle Routen, jede `/api/*`-Route ist mit der `auth`-Middleware geschützt.
|
||||||
|
- **Migrationen laufen automatisch beim Boot** ([src/db-migrate.js](src/db-migrate.js)), bevor der Server lauscht. Idempotent halten: `CREATE TABLE IF NOT EXISTS`, Spalten-Renames mit `.catch(() => {})`. Es gibt **kein** separates Migrations-Tool — Schema-Änderungen hier eintragen.
|
||||||
|
- `src/db.js` exportiert `query(text, params)` und `pool`. Immer parametrisierte Queries (`$1, $2 …`), nie String-Interpolation von User-Input.
|
||||||
|
- `src/routes/` — eine Datei pro Entität. `src/lib/`, `src/middleware/`, `src/s3.js`, `src/voices.js` für geteilte Logik.
|
||||||
|
- **Hintergrund-Job (Auto-Kategorisierung):** [src/index.js](src/index.js) startet ~30 s nach dem Boot und stündlich `runCategorizationTick()` ([src/lib/classifyWords.js](src/lib/classifyWords.js)). Er klassifiziert in Pairs verwendete Wörter ohne Kategorie per **Anthropic Message Batches API** (Haiku, asynchron, ~50 % günstiger) gegen die feste Taxonomie und materialisiert `pair_categories`. ⚠️ Braucht `ANTHROPIC_API_KEY` und verursacht echte LLM-Kosten — **auch lokal bei `npm run dev`**. Manuell anstoßen: `POST /api/categories/auto-assign` (`?sync=true` = sofort/synchron statt Batch, `&reset=true` = bestehende Zuordnungen verwerfen und neu klassifizieren).
|
||||||
|
- **Kategorie-Datenfluss:** Kategorien hängen an Wörtern (`word_categories`, feste Taxonomie wird in [src/db-migrate.js](src/db-migrate.js) geseedet). `pair_categories` wird daraus abgeleitet ([src/lib/pairCategories.js](src/lib/pairCategories.js) `derivePairCategories`) — beim Pair-Publish (`routes/pairs.js`, `routes/pipeline.js`) und im Job. `GET /auth/stats` liefert daraus die Punkte je Kategorie fürs Profil; `GET /auth/me` liefert `language_target_greeting` (Spalte `languages.greeting`, de/en/sv geseedet). Async-Batch-Status liegt in `category_batches`.
|
||||||
|
|
||||||
|
## Fortschritt / Gamification ([src/routes/auth.js](src/routes/auth.js))
|
||||||
|
|
||||||
|
- **Level-Kurve = Single Source of Truth:** [src/lib/leveling.js](src/lib/leveling.js) (`levelForEp`/`levelInfo`, progressive Kurve — kumulativ `5·n·(n+3)` EP, Level 1 bei 20 EP). Wird in `GET /auth/me` (liefert `level` + `ep_into_level` + `ep_to_next_level`) **und** `POST /auth/progress` genutzt. Das Frontend spiegelt dieselbe Kurve nur als Fallback — Kurvenänderungen hier vornehmen.
|
||||||
|
- **`POST /auth/progress`** bucht EP/Streak/Pair-Statistik und liefert den **Milestone-Vertrag**: `{ total_ep, level, prev_level, streak_days, streak_increased, daily_ep, daily_goal_ep, goal_just_reached, unlocked_achievements }`. Ein CTE fängt die **Pre-Update-Werte** mit, damit Level-Up/Streak-Up atomar erkennbar sind. `daily_goal_ep` via `PUT /auth/goal` (geklemmt 5–500).
|
||||||
|
- **Erfolge (Achievements):** [src/lib/achievements.js](src/lib/achievements.js) definiert die Erfolge und schaltet sie **dedup-sicher** frei (`INSERT … ON CONFLICT DO NOTHING RETURNING` → nur Neues). Persistenz in Tabelle `user_achievements` (Migration in `db-migrate.js`). `/auth/progress` ruft `evaluateAchievements` (defensiv gekapselt, darf die Buchung nicht kippen); `GET /auth/achievements` listet alle mit Status fürs Profil.
|
||||||
|
|
||||||
|
## Konventionen
|
||||||
|
- **Code-Kommentare auf Deutsch**, Code/Bezeichner auf Englisch (dem Bestand folgen).
|
||||||
|
- Route-Handler-Muster: `async (req, res, next) => { try { … } catch (err) { next(err); } }`. Fehler an den zentralen Error-Handler in `index.js` durchreichen, nicht selbst 500en.
|
||||||
|
- Listen-Endpoints: `limit`/`offset` aus Query, `limit` hart begrenzen (z. B. `Math.min(parseInt(limit), 500)`).
|
||||||
|
- Status-Felder gegen eine `STATUSES`-Whitelist prüfen → bei Verstoß `400`.
|
||||||
|
- **Sprachen-Suffixe: `_de`, `_en`, `_sv`.** `_se` ist veraltet (falscher ISO-639-1-Code) und wird beim Boot zu `_sv` umbenannt — niemals neue `_se`-Spalten anlegen.
|
||||||
|
|
||||||
|
## Bild-Generierungs-Pipeline
|
||||||
|
|
||||||
|
- **`prompt_styles`** — Seed-Tabelle mit fertigen Prompt-Bausteinen (Typen: `fix` / `atmosphere` / `setting`). `themenfeld_id` gruppiert Settings nach Thema (plain UUID, kein FK). 38 Einträge werden beim Boot geseeded. Route: `/api/prompt-styles`.
|
||||||
|
- **`picture_jobs`** — Job-Queue für die Bildgenerierung. Referenziert eine Kategorie, bis zu drei Prompt-Styles und nach Generierung ein Bild. M2M-Words über `picture_job_words`. Status-Flow: `pending → generating → done | failed`. Route: `/api/picture-jobs` (inkl. `/words`-Subroute für M2M).
|
||||||
|
|
||||||
|
## Auth (zwei Pfade, siehe [src/middleware/auth.js](src/middleware/auth.js))
|
||||||
|
1. Statische Tokens aus `API_TOKENS` (komma-separiert) → Server-zu-Server / Admin, keine Rollenprüfung.
|
||||||
|
2. JWT aus `/auth/login` · `/auth/register`. Rolle `end-user` bekommt auf allen `/api/*` bewusst **403** (App-Gating).
|
||||||
|
|
||||||
|
Öffentlich (ohne Auth): `GET /health`, `/auth/*`.
|
||||||
|
|
||||||
|
Konfig über `.env` (siehe [.env.example](.env.example)). Deployment via Coolify/Docker.
|
||||||
71
scripts/import-brysbaert.js
Normal file
71
scripts/import-brysbaert.js
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
// Einmaliger Import der Brysbaert-Concreteness-CSV in die words-Tabelle.
|
||||||
|
// Verwendung: node scripts/import-brysbaert.js <pfad-zur-csv>
|
||||||
|
// Setzt titel_en + conc_m; status = 'requested'. Bestehende Zeilen (gleicher titel_en)
|
||||||
|
// bekommen nur conc_m aktualisiert — alle anderen Felder bleiben unverändert.
|
||||||
|
|
||||||
|
require('dotenv').config({ path: require('path').join(__dirname, '..', '.env') });
|
||||||
|
const { query, pool } = require('../src/db');
|
||||||
|
const fs = require('fs');
|
||||||
|
const readline = require('readline');
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
const csvPath = process.argv[2];
|
||||||
|
if (!csvPath) {
|
||||||
|
console.error('Verwendung: node scripts/import-brysbaert.js <pfad-zur-csv>');
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
if (!fs.existsSync(csvPath)) {
|
||||||
|
console.error(`Datei nicht gefunden: ${csvPath}`);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
const rl = readline.createInterface({
|
||||||
|
input: fs.createReadStream(csvPath),
|
||||||
|
crlfDelay: Infinity,
|
||||||
|
});
|
||||||
|
|
||||||
|
let header = true;
|
||||||
|
let inserted = 0;
|
||||||
|
let updated = 0;
|
||||||
|
let skipped = 0;
|
||||||
|
let errors = 0;
|
||||||
|
|
||||||
|
for await (const line of rl) {
|
||||||
|
if (header) { header = false; continue; }
|
||||||
|
const trimmed = line.trim();
|
||||||
|
if (!trimmed) continue;
|
||||||
|
|
||||||
|
// Letztes Komma trennt Wort und Score (Wort kann Leerzeichen enthalten)
|
||||||
|
const comma = trimmed.lastIndexOf(',');
|
||||||
|
if (comma === -1) { skipped++; continue; }
|
||||||
|
const word = trimmed.slice(0, comma).trim();
|
||||||
|
const conc = parseFloat(trimmed.slice(comma + 1).trim());
|
||||||
|
|
||||||
|
if (!word || isNaN(conc)) { skipped++; continue; }
|
||||||
|
|
||||||
|
try {
|
||||||
|
const res = await query(
|
||||||
|
`INSERT INTO words (titel_en, conc_m, status, requested_at)
|
||||||
|
VALUES ($1, $2, 'requested', NOW())
|
||||||
|
ON CONFLICT (titel_en) DO UPDATE SET conc_m = EXCLUDED.conc_m
|
||||||
|
RETURNING (xmax = 0) AS is_insert`,
|
||||||
|
[word, conc]
|
||||||
|
);
|
||||||
|
if (res.rows[0]?.is_insert) inserted++;
|
||||||
|
else updated++;
|
||||||
|
} catch (err) {
|
||||||
|
errors++;
|
||||||
|
if (errors <= 5) console.error(`Fehler bei "${word}":`, err.message);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`Import abgeschlossen:`);
|
||||||
|
console.log(` ${inserted} neu eingefügt`);
|
||||||
|
console.log(` ${updated} aktualisiert (conc_m)`);
|
||||||
|
if (skipped) console.log(` ${skipped} Zeilen übersprungen (leer/ungültig)`);
|
||||||
|
if (errors) console.log(` ${errors} Fehler`);
|
||||||
|
|
||||||
|
await pool.end();
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch(err => { console.error(err); process.exit(1); });
|
||||||
119
scripts/upload-pictures.mjs
Normal file
119
scripts/upload-pictures.mjs
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
/**
|
||||||
|
* Uploads all images from a directory to Hetzner S3 + pictures table.
|
||||||
|
* Re-encodes each file to WebP at 85% quality via cwebp.
|
||||||
|
*
|
||||||
|
* Usage:
|
||||||
|
* TOKEN=your-dev-token node scripts/upload-pictures.mjs /path/to/folder
|
||||||
|
* TOKEN=... BASE_URL=https://hyggecraftery.com/api/snakkimo node scripts/upload-pictures.mjs /path/to/folder
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { readdir, readFile, unlink, writeFile } from 'fs/promises';
|
||||||
|
import { execSync } from 'child_process';
|
||||||
|
import { join, basename, extname } from 'path';
|
||||||
|
import { tmpdir } from 'os';
|
||||||
|
import { randomUUID } from 'crypto';
|
||||||
|
|
||||||
|
const TOKEN = process.env.TOKEN;
|
||||||
|
const BASE_URL = (process.env.BASE_URL || 'https://hyggecraftery.com/api/snakkimo/api').replace(/\/$/, '');
|
||||||
|
const CONCURRENCY = 4;
|
||||||
|
|
||||||
|
if (!TOKEN) {
|
||||||
|
console.error('ERROR: TOKEN env var required. Run: TOKEN=your-token node scripts/upload-pictures.mjs <dir>');
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
const dir = process.argv[2];
|
||||||
|
if (!dir) {
|
||||||
|
console.error('ERROR: Pass the image directory as argument.');
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
function extractDesign(filename) {
|
||||||
|
const name = basename(filename, extname(filename));
|
||||||
|
// Strip trailing _xxxxxxxx hash (8 hex chars)
|
||||||
|
return name.replace(/_[0-9a-f]{8}$/i, '').replace(/_/g, ' ');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function apiPost(path, body) {
|
||||||
|
const res = await fetch(`${BASE_URL}${path}`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { Authorization: `Bearer ${TOKEN}`, 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify(body),
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`POST ${path} → ${res.status}: ${await res.text()}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function apiUpload(pictureId, fileBuffer) {
|
||||||
|
const form = new FormData();
|
||||||
|
const blob = new Blob([fileBuffer], { type: 'image/webp' });
|
||||||
|
form.append('file', blob, `${pictureId}.webp`);
|
||||||
|
const res = await fetch(`${BASE_URL}/pictures/${pictureId}/upload`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { Authorization: `Bearer ${TOKEN}` },
|
||||||
|
body: form,
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`upload → ${res.status}: ${await res.text()}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function processFile(filePath) {
|
||||||
|
const filename = basename(filePath);
|
||||||
|
const design = extractDesign(filename);
|
||||||
|
const tmpFile = join(tmpdir(), `${randomUUID()}.webp`);
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Re-encode to webp at 85% quality
|
||||||
|
execSync(`cwebp -q 85 "${filePath}" -o "${tmpFile}" -quiet`, { stdio: 'pipe' });
|
||||||
|
|
||||||
|
const buffer = await readFile(tmpFile);
|
||||||
|
|
||||||
|
// 1. Create picture record
|
||||||
|
const picture = await apiPost('/pictures', { design });
|
||||||
|
|
||||||
|
// 2. Upload file
|
||||||
|
await apiUpload(picture.id, buffer);
|
||||||
|
|
||||||
|
return { ok: true, design, id: picture.id };
|
||||||
|
} finally {
|
||||||
|
await unlink(tmpFile).catch(() => {});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function run() {
|
||||||
|
const files = (await readdir(dir))
|
||||||
|
.filter(f => /\.(webp|jpg|jpeg|png)$/i.test(f))
|
||||||
|
.map(f => join(dir, f));
|
||||||
|
|
||||||
|
console.log(`Found ${files.length} files. Uploading with concurrency ${CONCURRENCY}...\n`);
|
||||||
|
|
||||||
|
let done = 0;
|
||||||
|
const errors = [];
|
||||||
|
|
||||||
|
// Process in chunks of CONCURRENCY
|
||||||
|
for (let i = 0; i < files.length; i += CONCURRENCY) {
|
||||||
|
const chunk = files.slice(i, i + CONCURRENCY);
|
||||||
|
const results = await Promise.allSettled(chunk.map(processFile));
|
||||||
|
|
||||||
|
for (let j = 0; j < results.length; j++) {
|
||||||
|
const r = results[j];
|
||||||
|
done++;
|
||||||
|
if (r.status === 'fulfilled') {
|
||||||
|
console.log(`[${done}/${files.length}] ✓ ${r.value.design} (${r.value.id})`);
|
||||||
|
} else {
|
||||||
|
const name = basename(chunk[j]);
|
||||||
|
console.error(`[${done}/${files.length}] ✗ ${name}: ${r.reason.message}`);
|
||||||
|
errors.push({ file: name, error: r.reason.message });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`\nDone. ${done - errors.length} succeeded, ${errors.length} failed.`);
|
||||||
|
if (errors.length) {
|
||||||
|
console.error('\nFailed files:');
|
||||||
|
errors.forEach(e => console.error(` ${e.file}: ${e.error}`));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
run().catch(err => { console.error(err); process.exit(1); });
|
||||||
404
src/db-migrate.js
Normal file → Executable file
404
src/db-migrate.js
Normal file → Executable file
@@ -1,6 +1,6 @@
|
|||||||
const { query } = require('./db');
|
const { query } = require('./db');
|
||||||
|
|
||||||
async function migrate() {
|
async function migrateCore() {
|
||||||
// Rename _se → _sv (Swedish ISO 639-1 correction)
|
// Rename _se → _sv (Swedish ISO 639-1 correction)
|
||||||
const renames = [
|
const renames = [
|
||||||
['words', 'titel_se', 'titel_sv'],
|
['words', 'titel_se', 'titel_sv'],
|
||||||
@@ -131,6 +131,58 @@ async function migrate() {
|
|||||||
)
|
)
|
||||||
`);
|
`);
|
||||||
|
|
||||||
|
// Feste Alltags-Taxonomie seeden (de/en/sv, published). Basis für die automatische
|
||||||
|
// Wort-Kategorisierung (src/lib/classifyWords.js) und die Kategorie-Punkte im Profil.
|
||||||
|
// Idempotent: bestehende Kategorie (z. B. "Tiere") wird wiederverwendet, keine Dubletten.
|
||||||
|
const CATEGORY_TAXONOMY = [
|
||||||
|
['Lebensmittel', 'Food', 'Mat'],
|
||||||
|
['Tiere', 'Animals', 'Djur'],
|
||||||
|
['Körper', 'Body', 'Kropp'],
|
||||||
|
['Kleidung', 'Clothing', 'Kläder'],
|
||||||
|
['Familie & Menschen','Family & People', 'Familj & människor'],
|
||||||
|
['Beruf & Arbeit', 'Job & Work', 'Jobb & arbete'],
|
||||||
|
['Haushalt', 'Household', 'Hushåll'],
|
||||||
|
['Wohnen & Möbel', 'Home & Furniture', 'Hem & möbler'],
|
||||||
|
['Natur & Pflanzen', 'Nature & Plants', 'Natur & växter'],
|
||||||
|
['Wetter', 'Weather', 'Väder'],
|
||||||
|
['Verkehr & Reisen', 'Transport & Travel', 'Transport & resor'],
|
||||||
|
['Stadt & Gebäude', 'City & Buildings', 'Stad & byggnader'],
|
||||||
|
['Schule & Bildung', 'School & Education', 'Skola & utbildning'],
|
||||||
|
['Technik & Geräte', 'Technology & Devices','Teknik & apparater'],
|
||||||
|
['Sport & Freizeit', 'Sports & Leisure', 'Sport & fritid'],
|
||||||
|
['Gefühle', 'Emotions', 'Känslor'],
|
||||||
|
['Farben', 'Colors', 'Färger'],
|
||||||
|
['Zahlen & Zeit', 'Numbers & Time', 'Tal & tid'],
|
||||||
|
['Werkzeuge', 'Tools', 'Verktyg'],
|
||||||
|
['Eigenschaften', 'Properties', 'Egenskaper'],
|
||||||
|
['Verben & Handlungen','Verbs & Actions', 'Verb & handlingar'],
|
||||||
|
['Sonstiges', 'Other', 'Övrigt'],
|
||||||
|
];
|
||||||
|
for (const [de, en, sv] of CATEGORY_TAXONOMY) {
|
||||||
|
await query(
|
||||||
|
`INSERT INTO categories (titel_de, titel_en, titel_sv, status, requested_at, published_at)
|
||||||
|
SELECT $1, $2, $3, 'published', NOW(), NOW()
|
||||||
|
WHERE NOT EXISTS (SELECT 1 FROM categories WHERE lower(titel_de) = lower($1))`,
|
||||||
|
[de, en, sv]
|
||||||
|
).catch(() => {});
|
||||||
|
}
|
||||||
|
// Bestehende Treffer auf published heben (z. B. die alte "Tiere"-Kategorie)
|
||||||
|
await query(
|
||||||
|
`UPDATE categories
|
||||||
|
SET status = 'published', published_at = COALESCE(published_at, NOW())
|
||||||
|
WHERE lower(titel_de) = ANY($1) AND status <> 'published'`,
|
||||||
|
[CATEGORY_TAXONOMY.map(([de]) => de.toLowerCase())]
|
||||||
|
).catch(() => {});
|
||||||
|
|
||||||
|
// Asynchroner Kategorisierungs-Batch (Message Batches API) — Status über Boots/Redeploys merken
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS category_batches (
|
||||||
|
batch_id TEXT PRIMARY KEY,
|
||||||
|
status TEXT NOT NULL DEFAULT 'submitted',
|
||||||
|
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
|
||||||
await query(`
|
await query(`
|
||||||
CREATE TABLE IF NOT EXISTS questions (
|
CREATE TABLE IF NOT EXISTS questions (
|
||||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
@@ -331,6 +383,41 @@ async function migrate() {
|
|||||||
)
|
)
|
||||||
`);
|
`);
|
||||||
|
|
||||||
|
// M2M: pairs <-> categories — abgeleitet aus den verknüpften Wörtern (Statements + Objekte).
|
||||||
|
// Wird beim Publish materialisiert (src/lib/pairCategories.js). Basis für die Kategorie-Punkte im Profil.
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS pair_categories (
|
||||||
|
pair_id UUID NOT NULL REFERENCES pairs(id) ON DELETE CASCADE,
|
||||||
|
category_id UUID NOT NULL REFERENCES categories(id) ON DELETE CASCADE,
|
||||||
|
PRIMARY KEY (pair_id, category_id)
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
|
||||||
|
// Backfill: Kategorien für bereits veröffentlichte Pairs ableiten. Idempotent (ON CONFLICT DO NOTHING),
|
||||||
|
// nach dem Erstlauf praktisch leer, da neue Pairs ihre Kategorien beim Publish selbst materialisieren.
|
||||||
|
await query(`
|
||||||
|
INSERT INTO pair_categories (pair_id, category_id)
|
||||||
|
SELECT DISTINCT pid, category_id FROM (
|
||||||
|
SELECT p.id AS pid, wc.category_id
|
||||||
|
FROM pairs p
|
||||||
|
JOIN (
|
||||||
|
SELECT statement_id, word_id FROM statement_positive_words
|
||||||
|
UNION
|
||||||
|
SELECT statement_id, word_id FROM statement_negative_words
|
||||||
|
) sw ON sw.statement_id IN (p.positive_statement_id, p.negative_statement_id)
|
||||||
|
JOIN word_categories wc ON wc.word_id = sw.word_id
|
||||||
|
WHERE p.status = 'published'
|
||||||
|
UNION
|
||||||
|
SELECT op.pair_id AS pid, wc.category_id
|
||||||
|
FROM object_pairs op
|
||||||
|
JOIN pairs p2 ON p2.id = op.pair_id AND p2.status = 'published'
|
||||||
|
JOIN object_words ow ON ow.object_id = op.object_id
|
||||||
|
JOIN word_categories wc ON wc.word_id = ow.word_id
|
||||||
|
) src
|
||||||
|
WHERE category_id IS NOT NULL
|
||||||
|
ON CONFLICT (pair_id, category_id) DO NOTHING
|
||||||
|
`).catch(() => {});
|
||||||
|
|
||||||
// pairs.answer_type → single TEXT (was TEXT[], now back to single value + new 'question' type)
|
// pairs.answer_type → single TEXT (was TEXT[], now back to single value + new 'question' type)
|
||||||
await query(`ALTER TABLE pairs DROP CONSTRAINT IF EXISTS pairs_answer_type_check`).catch(() => {});
|
await query(`ALTER TABLE pairs DROP CONSTRAINT IF EXISTS pairs_answer_type_check`).catch(() => {});
|
||||||
await query(`
|
await query(`
|
||||||
@@ -444,6 +531,9 @@ async function migrate() {
|
|||||||
FOR EACH ROW EXECUTE FUNCTION update_updated_at()
|
FOR EACH ROW EXECUTE FUNCTION update_updated_at()
|
||||||
`);
|
`);
|
||||||
|
|
||||||
|
// Begrüßung pro Sprache (in der Sprache selbst, z. B. sv = "Hej") — für die persönliche Profil-Anrede
|
||||||
|
await query(`ALTER TABLE languages ADD COLUMN IF NOT EXISTS greeting TEXT`).catch(() => {});
|
||||||
|
|
||||||
// user_names
|
// user_names
|
||||||
await query(`
|
await query(`
|
||||||
CREATE TABLE IF NOT EXISTS user_names (
|
CREATE TABLE IF NOT EXISTS user_names (
|
||||||
@@ -489,11 +579,25 @@ async function migrate() {
|
|||||||
// Full unique constraint (not partial) so ON CONFLICT works cleanly
|
// Full unique constraint (not partial) so ON CONFLICT works cleanly
|
||||||
await query(`CREATE UNIQUE INDEX IF NOT EXISTS languages_short_en_idx ON languages (short_en)`).catch(() => {});
|
await query(`CREATE UNIQUE INDEX IF NOT EXISTS languages_short_en_idx ON languages (short_en)`).catch(() => {});
|
||||||
await query(`
|
await query(`
|
||||||
INSERT INTO languages (short_en, titel_de, titel_en, titel_sv, status, published_at)
|
INSERT INTO languages (short_en, titel_de, titel_en, titel_sv, greeting, status, published_at)
|
||||||
VALUES
|
VALUES
|
||||||
('en', 'Englisch', 'English', 'Engelska', 'published', NOW()),
|
('en', 'Englisch', 'English', 'Engelska', 'Hi', 'published', NOW()),
|
||||||
('sv', 'Schwedisch', 'Swedish', 'Svenska', 'published', NOW())
|
('sv', 'Schwedisch', 'Swedish', 'Svenska', 'Hej', 'published', NOW())
|
||||||
ON CONFLICT (short_en) DO UPDATE SET status = EXCLUDED.status, published_at = COALESCE(languages.published_at, EXCLUDED.published_at)
|
ON CONFLICT (short_en) DO UPDATE SET
|
||||||
|
status = EXCLUDED.status,
|
||||||
|
published_at = COALESCE(languages.published_at, EXCLUDED.published_at),
|
||||||
|
greeting = COALESCE(languages.greeting, EXCLUDED.greeting)
|
||||||
|
`).catch(() => {});
|
||||||
|
// Begrüßung robust nachtragen (das ON-CONFLICT-Update oben greift bei bereits
|
||||||
|
// existierenden en/sv-Zeilen nicht zuverlässig → hier explizit, idempotent).
|
||||||
|
await query(`
|
||||||
|
UPDATE languages
|
||||||
|
SET greeting = CASE short_en
|
||||||
|
WHEN 'de' THEN 'Hallo'
|
||||||
|
WHEN 'en' THEN 'Hi'
|
||||||
|
WHEN 'sv' THEN 'Hej'
|
||||||
|
END
|
||||||
|
WHERE short_en IN ('de', 'en', 'sv') AND greeting IS NULL
|
||||||
`).catch(() => {});
|
`).catch(() => {});
|
||||||
|
|
||||||
// Seed bbox for watermelon test object (only if bbox_x is still NULL)
|
// Seed bbox for watermelon test object (only if bbox_x is still NULL)
|
||||||
@@ -534,6 +638,31 @@ async function migrate() {
|
|||||||
FOR EACH ROW EXECUTE FUNCTION update_last_seen_at()
|
FOR EACH ROW EXECUTE FUNCTION update_last_seen_at()
|
||||||
`);
|
`);
|
||||||
|
|
||||||
|
// user_daily_activity — Tagesverlauf für Streak-Kalender, Wochengraph, Tagesziel
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS user_daily_activity (
|
||||||
|
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
||||||
|
activity_date DATE NOT NULL,
|
||||||
|
ep_earned INTEGER NOT NULL DEFAULT 0,
|
||||||
|
cards_done INTEGER NOT NULL DEFAULT 0,
|
||||||
|
correct_count INTEGER NOT NULL DEFAULT 0,
|
||||||
|
PRIMARY KEY (user_id, activity_date)
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
|
||||||
|
// Tagesziel (EP/Tag) auf dem App-Profil
|
||||||
|
await query(`ALTER TABLE users_public ADD COLUMN IF NOT EXISTS daily_goal_ep INTEGER NOT NULL DEFAULT 30`).catch(() => {});
|
||||||
|
|
||||||
|
// Freigeschaltete Erfolge je User (ein Eintrag pro Erfolg, dedup-sicher)
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS user_achievements (
|
||||||
|
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
||||||
|
achievement_key VARCHAR(40) NOT NULL,
|
||||||
|
unlocked_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
PRIMARY KEY (user_id, achievement_key)
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
|
||||||
// audios
|
// audios
|
||||||
await query(`
|
await query(`
|
||||||
CREATE TABLE IF NOT EXISTS audios (
|
CREATE TABLE IF NOT EXISTS audios (
|
||||||
@@ -642,6 +771,140 @@ async function migrate() {
|
|||||||
ON CONFLICT (key) DO NOTHING
|
ON CONFLICT (key) DO NOTHING
|
||||||
`).catch(() => {});
|
`).catch(() => {});
|
||||||
|
|
||||||
|
// ── Brysbaert-Erweiterungen ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// parent_id auf categories (self-referential, Oberkategorie → Unterkategorie)
|
||||||
|
await query(`ALTER TABLE categories ADD COLUMN IF NOT EXISTS parent_id UUID REFERENCES categories(id) ON DELETE SET NULL`).catch(() => {});
|
||||||
|
|
||||||
|
// Unterkategorien seeden. Die bestehenden 22 Einträge sind die Oberkategorien (parent_id = NULL).
|
||||||
|
const SUBCATEGORY_TAXONOMY = [
|
||||||
|
// Lebensmittel
|
||||||
|
['Obst', 'Fruit', 'Frukt', 'Lebensmittel'],
|
||||||
|
['Gemüse', 'Vegetables', 'Grönsaker', 'Lebensmittel'],
|
||||||
|
['Fleisch & Fisch', 'Meat & Fish', 'Kött & fisk', 'Lebensmittel'],
|
||||||
|
['Backwaren & Getreide', 'Baked Goods & Grains', 'Bröd & spannmål', 'Lebensmittel'],
|
||||||
|
['Milchprodukte', 'Dairy', 'Mejeriprodukter', 'Lebensmittel'],
|
||||||
|
['Getränke', 'Drinks', 'Drycker', 'Lebensmittel'],
|
||||||
|
['Gewürze & Kräuter', 'Spices & Herbs', 'Kryddor & örter', 'Lebensmittel'],
|
||||||
|
['Süßigkeiten & Snacks', 'Sweets & Snacks', 'Sötsaker & snacks', 'Lebensmittel'],
|
||||||
|
// Tiere
|
||||||
|
['Haustiere', 'Pets', 'Husdjur', 'Tiere'],
|
||||||
|
['Wildtiere', 'Wild Animals', 'Vilda djur', 'Tiere'],
|
||||||
|
['Vögel', 'Birds', 'Fåglar', 'Tiere'],
|
||||||
|
['Reptilien & Amphibien', 'Reptiles & Amphibians', 'Reptiler & amfibier', 'Tiere'],
|
||||||
|
['Insekten & Spinnen', 'Insects & Spiders', 'Insekter & spindlar', 'Tiere'],
|
||||||
|
['Meerestiere', 'Sea Animals', 'Havsdjur', 'Tiere'],
|
||||||
|
// Körper
|
||||||
|
['Kopf & Gesicht', 'Head & Face', 'Huvud & ansikte', 'Körper'],
|
||||||
|
['Rumpf', 'Torso', 'Bål', 'Körper'],
|
||||||
|
['Arme & Beine', 'Arms & Legs', 'Armar & ben', 'Körper'],
|
||||||
|
['Innere Organe', 'Internal Organs', 'Inre organ', 'Körper'],
|
||||||
|
['Körperpflege', 'Personal Care', 'Kroppsvård', 'Körper'],
|
||||||
|
// Kleidung
|
||||||
|
['Oberbekleidung', 'Tops & Outerwear', 'Överkläder', 'Kleidung'],
|
||||||
|
['Unterbekleidung', 'Underwear', 'Underkläder', 'Kleidung'],
|
||||||
|
['Kopfbedeckung', 'Headwear', 'Huvudbonader', 'Kleidung'],
|
||||||
|
['Schuhe & Socken', 'Shoes & Socks', 'Skor & strumpor', 'Kleidung'],
|
||||||
|
['Accessoires', 'Accessories', 'Accessoarer', 'Kleidung'],
|
||||||
|
// Familie & Menschen
|
||||||
|
['Familienmitglieder', 'Family Members', 'Familjemedlemmar', 'Familie & Menschen'],
|
||||||
|
['Berufe & Titel', 'Professions & Titles', 'Yrken & titlar', 'Familie & Menschen'],
|
||||||
|
['Beziehungen', 'Relationships', 'Relationer', 'Familie & Menschen'],
|
||||||
|
// Haushalt
|
||||||
|
['Küchenutensilien', 'Kitchen Utensils', 'Köksredskap', 'Haushalt'],
|
||||||
|
['Reinigung & Pflege', 'Cleaning & Care', 'Rengöring & vård', 'Haushalt'],
|
||||||
|
['Verpackung & Behälter', 'Packaging & Containers', 'Förpackningar & behållare','Haushalt'],
|
||||||
|
// Wohnen & Möbel
|
||||||
|
['Zimmer & Räume', 'Rooms & Spaces', 'Rum & utrymmen', 'Wohnen & Möbel'],
|
||||||
|
['Möbel', 'Furniture', 'Möbler', 'Wohnen & Möbel'],
|
||||||
|
['Beleuchtung & Elektro', 'Lighting & Electronics', 'Belysning & el', 'Wohnen & Möbel'],
|
||||||
|
// Natur & Pflanzen
|
||||||
|
['Pflanzen & Blumen', 'Plants & Flowers', 'Växter & blommor', 'Natur & Pflanzen'],
|
||||||
|
['Bäume & Sträucher', 'Trees & Shrubs', 'Träd & buskar', 'Natur & Pflanzen'],
|
||||||
|
['Landschaftsmerkmale', 'Landscape Features', 'Landskapsdrag', 'Natur & Pflanzen'],
|
||||||
|
['Gesteine & Böden', 'Rocks & Soils', 'Stenar & jordar', 'Natur & Pflanzen'],
|
||||||
|
// Verkehr & Reisen
|
||||||
|
['Fahrzeuge (Land)', 'Land Vehicles', 'Landfordon', 'Verkehr & Reisen'],
|
||||||
|
['Fahrzeuge (Wasser & Luft)', 'Water & Air Vehicles', 'Vatten- & luftfordon', 'Verkehr & Reisen'],
|
||||||
|
['Straße & Infrastruktur', 'Roads & Infrastructure', 'Vägar & infrastruktur', 'Verkehr & Reisen'],
|
||||||
|
// Stadt & Gebäude
|
||||||
|
['Gebäude & Orte', 'Buildings & Places', 'Byggnader & platser', 'Stadt & Gebäude'],
|
||||||
|
['Innenräume & Bereiche', 'Indoor Spaces & Areas', 'Inomhusutrymmen', 'Stadt & Gebäude'],
|
||||||
|
// Technik & Geräte
|
||||||
|
['Haushaltsgeräte', 'Household Appliances', 'Hushållsapparater', 'Technik & Geräte'],
|
||||||
|
['Elektronik & Computer', 'Electronics & Computers', 'Elektronik & datorer', 'Technik & Geräte'],
|
||||||
|
['Werkzeuge & Maschinen', 'Tools & Machines', 'Verktyg & maskiner', 'Technik & Geräte'],
|
||||||
|
// Sport & Freizeit
|
||||||
|
['Sport & Bewegung', 'Sports & Exercise', 'Sport & rörelse', 'Sport & Freizeit'],
|
||||||
|
['Spiele & Spielzeug', 'Games & Toys', 'Spel & leksaker', 'Sport & Freizeit'],
|
||||||
|
['Kunst & Musik', 'Arts & Music', 'Konst & musik', 'Sport & Freizeit'],
|
||||||
|
];
|
||||||
|
for (const [de, en, sv, parentDe] of SUBCATEGORY_TAXONOMY) {
|
||||||
|
await query(
|
||||||
|
`INSERT INTO categories (titel_de, titel_en, titel_sv, status, published_at, parent_id)
|
||||||
|
SELECT $1, $2, $3, 'published', NOW(),
|
||||||
|
(SELECT id FROM categories WHERE lower(titel_de) = lower($4) AND parent_id IS NULL LIMIT 1)
|
||||||
|
WHERE NOT EXISTS (SELECT 1 FROM categories WHERE lower(titel_de) = lower($1))`,
|
||||||
|
[de, en, sv, parentDe]
|
||||||
|
).catch(() => {});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Neue Spalten auf words (Brysbaert-Import + Anreicherung)
|
||||||
|
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS conc_m NUMERIC(4,2)`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS dom_pos VARCHAR(20)`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS level VARCHAR(5)`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words ADD COLUMN IF NOT EXISTS themenfeld_id UUID`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words ADD CONSTRAINT words_themenfeld_id_fkey FOREIGN KEY (themenfeld_id) REFERENCES categories(id) ON DELETE SET NULL`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words DROP CONSTRAINT IF EXISTS words_dom_pos_check`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words ADD CONSTRAINT words_dom_pos_check CHECK (dom_pos IN ('noun', 'verb', 'adjective', 'other'))`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words DROP CONSTRAINT IF EXISTS words_level_check`).catch(() => {});
|
||||||
|
await query(`ALTER TABLE words ADD CONSTRAINT words_level_check CHECK (level IN ('A1', 'A2', 'B1'))`).catch(() => {});
|
||||||
|
|
||||||
|
// Unique-Index auf titel_en — Voraussetzung für ON CONFLICT im CSV-Import.
|
||||||
|
// Partiell (WHERE IS NOT NULL) damit bestehende NULL-Zeilen den Index nicht blockieren.
|
||||||
|
// Doppelte non-null titel_en erst bereinigen, dann Index anlegen.
|
||||||
|
await query(`
|
||||||
|
DELETE FROM words w
|
||||||
|
USING (
|
||||||
|
SELECT titel_en, MAX(created_at) AS keep_at
|
||||||
|
FROM words WHERE titel_en IS NOT NULL
|
||||||
|
GROUP BY titel_en HAVING COUNT(*) > 1
|
||||||
|
) dup
|
||||||
|
WHERE w.titel_en = dup.titel_en AND w.created_at < dup.keep_at
|
||||||
|
`).catch(() => {});
|
||||||
|
await query(
|
||||||
|
`CREATE UNIQUE INDEX IF NOT EXISTS words_titel_en_key ON words (titel_en) WHERE titel_en IS NOT NULL`
|
||||||
|
);
|
||||||
|
|
||||||
|
// enrich_batches — Status-Tracking für Wort-Anreicherungs-Batches (analog category_batches)
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS enrich_batches (
|
||||||
|
batch_id TEXT PRIMARY KEY,
|
||||||
|
status TEXT NOT NULL DEFAULT 'submitted',
|
||||||
|
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
|
||||||
|
// word_generative — Pipeline für KI-generierte Wort-Bilder
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS word_generative (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
word_id UUID NOT NULL REFERENCES words(id) ON DELETE CASCADE,
|
||||||
|
prompt TEXT,
|
||||||
|
status VARCHAR(20) NOT NULL DEFAULT 'pending'
|
||||||
|
CHECK (status IN ('pending', 'generating', 'generated', 'accepted', 'rejected')),
|
||||||
|
picture_link TEXT,
|
||||||
|
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
await query(`
|
||||||
|
DROP TRIGGER IF EXISTS word_generative_updated_at ON word_generative;
|
||||||
|
CREATE TRIGGER word_generative_updated_at
|
||||||
|
BEFORE UPDATE ON word_generative
|
||||||
|
FOR EACH ROW EXECUTE FUNCTION update_updated_at()
|
||||||
|
`);
|
||||||
|
|
||||||
// ── Migrate old {{uuid}} placeholders → new {{label.w:uuid}} / {{label.o:uuid}} ──
|
// ── Migrate old {{uuid}} placeholders → new {{label.w:uuid}} / {{label.o:uuid}} ──
|
||||||
await migratePlaceholders();
|
await migratePlaceholders();
|
||||||
|
|
||||||
@@ -731,4 +994,135 @@ async function migratePlaceholders() {
|
|||||||
if (count > 0) console.log(`Placeholder migration: updated ${count} rows`);
|
if (count > 0) console.log(`Placeholder migration: updated ${count} rows`);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── Prompt-Styles & Picture-Jobs ──────────────────────────────────────────────
|
||||||
|
|
||||||
|
async function migratePromptStyles() {
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS prompt_styles (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
type VARCHAR(20) NOT NULL CHECK (type IN ('fix', 'atmosphere', 'setting')),
|
||||||
|
kategorie_id UUID,
|
||||||
|
text_en TEXT NOT NULL
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
|
||||||
|
// Umbenennung themenfeld_id → kategorie_id (idempotent)
|
||||||
|
await query(`ALTER TABLE prompt_styles RENAME COLUMN themenfeld_id TO kategorie_id`).catch(() => {});
|
||||||
|
|
||||||
|
// FK auf categories nachrüsten (idempotent)
|
||||||
|
await query(`
|
||||||
|
ALTER TABLE prompt_styles
|
||||||
|
ADD CONSTRAINT prompt_styles_kategorie_fk
|
||||||
|
FOREIGN KEY (kategorie_id) REFERENCES categories(id) ON DELETE SET NULL
|
||||||
|
`).catch(() => {});
|
||||||
|
|
||||||
|
// Seed-Daten aus prompt_styles.csv (idempotent per id, kategorie_id zunächst null)
|
||||||
|
const seeds = [
|
||||||
|
{ id: 'b0f5c2a4-a95d-426f-a01c-0edc53e719b8', type: 'fix', text_en: 'hyperrealistic photography, natural unposed moment, shot on Canon EOS R5, ambient natural light, no color grading, razor sharp details, photorealistic textures, each object clearly visible and spatially separated, 8k' },
|
||||||
|
{ id: '62015070-1fbe-40b8-b293-8c39ae5994c3', type: 'atmosphere', text_en: 'misty autumn morning, golden hour light breaking through cool gray clouds, frost on the ground, dew on surfaces' },
|
||||||
|
{ id: 'd644f215-25b9-49be-87ea-629d7d8acb78', type: 'atmosphere', text_en: 'bright summer midday, harsh direct sunlight, vivid colors, dry warm air' },
|
||||||
|
{ id: 'da0a5339-37f5-47be-ba63-1fbf6c1e9f90', type: 'atmosphere', text_en: 'overcast spring day, soft diffused light, fresh green tones, slightly cool atmosphere' },
|
||||||
|
{ id: '11a8edb4-90a3-48a8-8407-31056644b55a', type: 'atmosphere', text_en: 'golden winter afternoon, low sun casting long shadows, bare trees, cold crisp air' },
|
||||||
|
{ id: '97bad727-6555-4f48-9a68-17dd5ce85535', type: 'atmosphere', text_en: 'early morning blue hour, soft cool light, calm and quiet atmosphere, slight fog' },
|
||||||
|
{ id: '6de167ef-5a87-4333-9325-cc31ccd9db05', type: 'atmosphere', text_en: 'warm summer evening, golden orange glow, long shadows, relaxed atmosphere' },
|
||||||
|
{ id: '082cc098-4c26-4d9a-b3a1-209dd9e507ea', type: 'setting', text_en: 'open green meadow with wooden fence, rolling hills in soft background, natural habitat' },
|
||||||
|
{ id: 'f0ef007a-c763-4c40-99c0-1bd17901739e', type: 'setting', text_en: 'dense forest edge with dappled light, mossy ground, wild and untouched environment' },
|
||||||
|
{ id: 'b809f859-2592-4207-8111-7da05e7057c9', type: 'setting', text_en: 'cozy living room corner, warm home environment, soft natural light from window' },
|
||||||
|
{ id: '28dac228-c335-46d2-9b40-481dc9e2b373', type: 'setting', text_en: 'shallow clear river bank, rocky ground, water reflections, natural wetland' },
|
||||||
|
{ id: '89cfbdf7-7fbc-439a-9265-73f18124e372', type: 'setting', text_en: 'rustic wooden kitchen counter, natural light from nearby window, linen cloth underneath' },
|
||||||
|
{ id: 'e7faf2ec-78e1-43bc-b870-c363f7ec2032', type: 'setting', text_en: 'outdoor farmers market stall, weathered wooden crates, morning light, earthy atmosphere' },
|
||||||
|
{ id: '45dc2aee-d223-4952-943d-cdbe86b7e8c3', type: 'setting', text_en: 'garden harvest scene, soil and greenery visible, freshly picked produce on ground' },
|
||||||
|
{ id: '5589aa12-ee74-4041-9443-40e9cfa538fd', type: 'setting', text_en: 'simple white kitchen table, clean minimal background, soft indoor daylight' },
|
||||||
|
{ id: '738365f1-b000-4dde-8e99-9b90f6984b79', type: 'setting', text_en: 'neutral light studio setting, clean background, soft natural sidelight, medical clarity' },
|
||||||
|
{ id: '98f1c118-b333-43ba-9167-870af883b5ae', type: 'setting', text_en: 'warm bathroom environment, mirror and soft light, everyday personal care setting' },
|
||||||
|
{ id: '2b81a5c9-7328-41e9-b08e-0d98d9a5c78f', type: 'setting', text_en: 'flat lay on light wooden surface, natural window light, clean and minimal styling' },
|
||||||
|
{ id: '2a3a4eed-ba32-4b21-8dad-1cf5679b00fb', type: 'setting', text_en: 'cozy bedroom setting, clothes laid out on bed, soft morning light' },
|
||||||
|
{ id: 'c816e95e-5edc-4ae9-8c0d-9c71a5a4dfb6', type: 'setting', text_en: 'outdoor market rack, hangers visible, casual everyday atmosphere' },
|
||||||
|
{ id: '33af0241-c19d-4429-91b5-0359c1f973e4', type: 'setting', text_en: 'warm living room, family home atmosphere, soft afternoon light through curtains' },
|
||||||
|
{ id: '153e70c4-f011-42af-ba0f-8ab82bf920ab', type: 'setting', text_en: 'outdoor garden or backyard, relaxed family setting, natural daylight' },
|
||||||
|
{ id: '9fe7fc4a-6578-4ee0-8a8e-a885e89e58c1', type: 'setting', text_en: 'bright kitchen countertop, clean and organized, natural window light' },
|
||||||
|
{ id: '46dab63b-7b3d-45e7-9ea9-4a4a67e9fabd', type: 'setting', text_en: 'utility room or bathroom shelf, everyday cleaning supplies visible, practical setting' },
|
||||||
|
{ id: '28246e90-4ac8-444f-be23-de401365d38d', type: 'setting', text_en: 'cozy Scandinavian living room, warm tones, natural materials, soft indirect light' },
|
||||||
|
{ id: '5143c10f-d717-4698-88f5-f1598d0eeef9', type: 'setting', text_en: 'bright airy bedroom, white walls, minimal furniture, morning sunlight' },
|
||||||
|
{ id: 'd23d7050-dc22-4226-8a5b-79e75f11de8b', type: 'setting', text_en: 'open countryside landscape, wide sky, natural untouched terrain, peaceful atmosphere' },
|
||||||
|
{ id: '34c6a784-7a32-4f84-a06d-f546c9c9fbea', type: 'setting', text_en: 'forest floor close-up, mossy rocks, fallen leaves, soft filtered light through canopy' },
|
||||||
|
{ id: '1fc61dd9-57c6-4eba-8328-37cbf5fc135e', type: 'setting', text_en: 'garden bed with rich dark soil, plants at various growth stages, earthy tones' },
|
||||||
|
{ id: '3244f090-f2a2-4806-875a-88038598fc5e', type: 'setting', text_en: 'quiet suburban street, cobblestone or asphalt road, parked vehicles, everyday scene' },
|
||||||
|
{ id: '36d80c19-13ea-4672-b2e9-8ceedb4ab178', type: 'setting', text_en: 'rural road with open fields, minimal traffic, wide sky, natural light' },
|
||||||
|
{ id: '98957b0a-f415-4282-9b3d-863a9bf03a77', type: 'setting', text_en: 'busy European city street, historic buildings in background, natural daylight' },
|
||||||
|
{ id: '66fa361a-e062-4adc-9c9a-3e01ac8dbbe0', type: 'setting', text_en: 'quiet town square, fountain or bench visible, calm everyday atmosphere' },
|
||||||
|
{ id: '2dba4303-c743-419f-a7e8-06b6d54ba91d', type: 'setting', text_en: 'clean modern workspace, desk surface, natural sidelight, organized tools' },
|
||||||
|
{ id: 'a78df43b-8897-40dd-9ccf-de29ff9bf5da', type: 'setting', text_en: 'garage or workshop setting, workbench with tools, practical everyday environment' },
|
||||||
|
{ id: '949774d1-0678-4683-9b8e-e5568f648ba8', type: 'setting', text_en: 'outdoor park or sports field, open space, natural daylight, active atmosphere' },
|
||||||
|
{ id: '9b35a717-03dd-41aa-a60e-90dff8bc5aaf', type: 'setting', text_en: 'cozy indoor hobby room, soft warm light, creative materials visible' },
|
||||||
|
];
|
||||||
|
|
||||||
|
for (const s of seeds) {
|
||||||
|
await query(
|
||||||
|
`INSERT INTO prompt_styles (id, type, text_en)
|
||||||
|
SELECT $1, $2, $3
|
||||||
|
WHERE NOT EXISTS (SELECT 1 FROM prompt_styles WHERE id = $1)`,
|
||||||
|
[s.id, s.type, s.text_en]
|
||||||
|
).catch(() => {});
|
||||||
|
}
|
||||||
|
|
||||||
|
// kategorie_id per Kategoriename befüllen (idempotent, unabhängig von Category-UUIDs)
|
||||||
|
const THEME_MAP = [
|
||||||
|
{ en: 'Animals', ids: ['082cc098-4c26-4d9a-b3a1-209dd9e507ea', 'f0ef007a-c763-4c40-99c0-1bd17901739e', 'b809f859-2592-4207-8111-7da05e7057c9', '28dac228-c335-46d2-9b40-481dc9e2b373'] },
|
||||||
|
{ en: 'Food', ids: ['89cfbdf7-7fbc-439a-9265-73f18124e372', 'e7faf2ec-78e1-43bc-b870-c363f7ec2032', '45dc2aee-d223-4952-943d-cdbe86b7e8c3', '5589aa12-ee74-4041-9443-40e9cfa538fd'] },
|
||||||
|
{ en: 'Body', ids: ['738365f1-b000-4dde-8e99-9b90f6984b79', '98f1c118-b333-43ba-9167-870af883b5ae'] },
|
||||||
|
{ en: 'Clothing', ids: ['2b81a5c9-7328-41e9-b08e-0d98d9a5c78f', '2a3a4eed-ba32-4b21-8dad-1cf5679b00fb', 'c816e95e-5edc-4ae9-8c0d-9c71a5a4dfb6'] },
|
||||||
|
{ en: 'Family & People', ids: ['33af0241-c19d-4429-91b5-0359c1f973e4', '153e70c4-f011-42af-ba0f-8ab82bf920ab'] },
|
||||||
|
{ en: 'Household', ids: ['9fe7fc4a-6578-4ee0-8a8e-a885e89e58c1', '46dab63b-7b3d-45e7-9ea9-4a4a67e9fabd'] },
|
||||||
|
{ en: 'Home & Furniture', ids: ['28246e90-4ac8-444f-be23-de401365d38d', '5143c10f-d717-4698-88f5-f1598d0eeef9'] },
|
||||||
|
{ en: 'Nature & Plants', ids: ['d23d7050-dc22-4226-8a5b-79e75f11de8b', '34c6a784-7a32-4f84-a06d-f546c9c9fbea', '1fc61dd9-57c6-4eba-8328-37cbf5fc135e'] },
|
||||||
|
{ en: 'Transport & Travel',ids: ['3244f090-f2a2-4806-875a-88038598fc5e', '36d80c19-13ea-4672-b2e9-8ceedb4ab178'] },
|
||||||
|
{ en: 'City & Buildings', ids: ['98957b0a-f415-4282-9b3d-863a9bf03a77', '66fa361a-e062-4adc-9c9a-3e01ac8dbbe0'] },
|
||||||
|
{ en: 'Tools', ids: ['2dba4303-c743-419f-a7e8-06b6d54ba91d', 'a78df43b-8897-40dd-9ccf-de29ff9bf5da'] },
|
||||||
|
{ en: 'Sports & Leisure', ids: ['949774d1-0678-4683-9b8e-e5568f648ba8', '9b35a717-03dd-41aa-a60e-90dff8bc5aaf'] },
|
||||||
|
];
|
||||||
|
|
||||||
|
for (const { en, ids } of THEME_MAP) {
|
||||||
|
await query(
|
||||||
|
`UPDATE prompt_styles
|
||||||
|
SET kategorie_id = (SELECT id FROM categories WHERE lower(titel_en) = lower($1) LIMIT 1)
|
||||||
|
WHERE id = ANY($2::uuid[])
|
||||||
|
AND kategorie_id IS DISTINCT FROM
|
||||||
|
(SELECT id FROM categories WHERE lower(titel_en) = lower($1) LIMIT 1)`,
|
||||||
|
[en, ids]
|
||||||
|
).catch(() => {});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function migratePictureJobs() {
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS picture_jobs (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
kategorie_id UUID REFERENCES categories(id) ON DELETE SET NULL,
|
||||||
|
prompt_fix UUID REFERENCES prompt_styles(id) ON DELETE SET NULL,
|
||||||
|
prompt_atmosphere UUID REFERENCES prompt_styles(id) ON DELETE SET NULL,
|
||||||
|
prompt_setting UUID REFERENCES prompt_styles(id) ON DELETE SET NULL,
|
||||||
|
prompt_final TEXT,
|
||||||
|
status VARCHAR(20) NOT NULL DEFAULT 'pending'
|
||||||
|
CHECK (status IN ('pending', 'generating', 'done', 'failed')),
|
||||||
|
picture_id UUID REFERENCES pictures(id) ON DELETE SET NULL,
|
||||||
|
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
|
||||||
|
await query(`
|
||||||
|
CREATE TABLE IF NOT EXISTS picture_job_words (
|
||||||
|
picture_job_id UUID NOT NULL REFERENCES picture_jobs(id) ON DELETE CASCADE,
|
||||||
|
word_id UUID NOT NULL REFERENCES words(id) ON DELETE CASCADE,
|
||||||
|
PRIMARY KEY (picture_job_id, word_id)
|
||||||
|
)
|
||||||
|
`);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function migrate() {
|
||||||
|
await migrateCore();
|
||||||
|
await migratePromptStyles();
|
||||||
|
await migratePictureJobs();
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = migrate;
|
module.exports = migrate;
|
||||||
|
|||||||
15
src/index.js
Normal file → Executable file
15
src/index.js
Normal file → Executable file
@@ -44,6 +44,9 @@ app.use('/api/audios', auth, require('./routes/audios'));
|
|||||||
app.use('/api/tts-settings', auth, require('./routes/tts-settings'));
|
app.use('/api/tts-settings', auth, require('./routes/tts-settings'));
|
||||||
app.use('/api/claude', auth, require('./routes/claude'));
|
app.use('/api/claude', auth, require('./routes/claude'));
|
||||||
app.use('/api/pipeline', auth, require('./routes/pipeline'));
|
app.use('/api/pipeline', auth, require('./routes/pipeline'));
|
||||||
|
app.use('/api/word-generative', auth, require('./routes/wordGenerative'));
|
||||||
|
app.use('/api/prompt-styles', auth, require('./routes/prompt-styles'));
|
||||||
|
app.use('/api/picture-jobs', auth, require('./routes/picture-jobs'));
|
||||||
|
|
||||||
// 404
|
// 404
|
||||||
app.use((req, res) => {
|
app.use((req, res) => {
|
||||||
@@ -62,5 +65,17 @@ migrate()
|
|||||||
// Hängengebliebene Pipeline-Läufe (z.B. nach Redeploy) wieder aufnehmen
|
// Hängengebliebene Pipeline-Läufe (z.B. nach Redeploy) wieder aufnehmen
|
||||||
require('./lib/pipeline').resumePending()
|
require('./lib/pipeline').resumePending()
|
||||||
.catch(err => console.error('Pipeline-Resume fehlgeschlagen:', err));
|
.catch(err => console.error('Pipeline-Resume fehlgeschlagen:', err));
|
||||||
|
|
||||||
|
// Automatische Wort-Kategorisierung (Message Batches API): kurz nach Boot + stündlich.
|
||||||
|
// Submit/Collect-Ticks, entkoppelt von generate-words und Publish.
|
||||||
|
const { runCategorizationTick } = require('./lib/classifyWords');
|
||||||
|
const { runEnrichTick } = require('./lib/enrichWords');
|
||||||
|
const HOUR = 60 * 60 * 1000;
|
||||||
|
const tick = () => runCategorizationTick().catch(err => console.error('Auto-Kategorisierung:', err.message));
|
||||||
|
const enrichTick = () => runEnrichTick().catch(err => console.error('Auto-Anreicherung:', err.message));
|
||||||
|
setTimeout(tick, 30_000);
|
||||||
|
setTimeout(enrichTick, 60_000);
|
||||||
|
setInterval(tick, HOUR);
|
||||||
|
setInterval(enrichTick, HOUR);
|
||||||
})
|
})
|
||||||
.catch(err => { console.error('Migration failed:', err); process.exit(1); });
|
.catch(err => { console.error('Migration failed:', err); process.exit(1); });
|
||||||
|
|||||||
70
src/lib/achievements.js
Normal file
70
src/lib/achievements.js
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
const { query } = require('../db');
|
||||||
|
const { levelForEp } = require('./leveling');
|
||||||
|
|
||||||
|
// Erfolg-Definitionen. check(s) bekommt aggregierte Kennzahlen des Users.
|
||||||
|
const DEFS = [
|
||||||
|
{ key: 'first_card', label: 'Erster Schritt', icon: '🌱', check: s => s.total_cards >= 1 },
|
||||||
|
{ key: 'cards_50', label: '50 Karten', icon: '📦', check: s => s.total_cards >= 50 },
|
||||||
|
{ key: 'cards_100', label: '100 Karten', icon: '💯', check: s => s.total_cards >= 100 },
|
||||||
|
{ key: 'streak_3', label: '3 Tage am Stück', icon: '🔥', check: s => s.streak_days >= 3 },
|
||||||
|
{ key: 'streak_7', label: '7 Tage am Stück', icon: '🔥', check: s => s.streak_days >= 7 },
|
||||||
|
{ key: 'streak_30', label: '30 Tage am Stück',icon: '🏅', check: s => s.streak_days >= 30 },
|
||||||
|
{ key: 'level_5', label: 'Level 5', icon: '⭐', check: s => s.level >= 5 },
|
||||||
|
{ key: 'level_10', label: 'Level 10', icon: '🌟', check: s => s.level >= 10 },
|
||||||
|
{ key: 'category_master', label: 'Themen-Meister', icon: '🏆', check: s => s.max_cat >= 25 },
|
||||||
|
];
|
||||||
|
const BY_KEY = Object.fromEntries(DEFS.map(d => [d.key, d]));
|
||||||
|
|
||||||
|
// Aggregierte Kennzahlen für die Erfolg-Checks (eine Query).
|
||||||
|
async function aggregates(userId, known = {}) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT
|
||||||
|
COALESCE((SELECT SUM(seen_count) FROM user_pair_progress WHERE user_id = $1), 0)::int AS total_cards,
|
||||||
|
COALESCE((SELECT MAX(pts) FROM (
|
||||||
|
SELECT SUM(upp.earned_points) AS pts
|
||||||
|
FROM user_pair_progress upp
|
||||||
|
JOIN pair_categories pc ON pc.pair_id = upp.pair_id
|
||||||
|
WHERE upp.user_id = $1
|
||||||
|
GROUP BY pc.category_id
|
||||||
|
) s), 0)::int AS max_cat`,
|
||||||
|
[userId]
|
||||||
|
);
|
||||||
|
return { total_cards: r.rows[0].total_cards, max_cat: r.rows[0].max_cat, ...known };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wertet Erfolge aus und schaltet neue frei. Gibt NUR neu freigeschaltete zurück
|
||||||
|
// (ON CONFLICT DO NOTHING … RETURNING liefert ausschließlich neu eingefügte Zeilen).
|
||||||
|
async function evaluateAchievements(userId, { total_ep, streak_days }) {
|
||||||
|
const level = levelForEp(total_ep || 0);
|
||||||
|
const agg = await aggregates(userId, { total_ep, streak_days, level });
|
||||||
|
const satisfied = DEFS.filter(d => d.check(agg)).map(d => d.key);
|
||||||
|
if (!satisfied.length) return [];
|
||||||
|
const values = satisfied.map((_, i) => `($1, $${i + 2})`).join(', ');
|
||||||
|
const r = await query(
|
||||||
|
`INSERT INTO user_achievements (user_id, achievement_key)
|
||||||
|
VALUES ${values}
|
||||||
|
ON CONFLICT (user_id, achievement_key) DO NOTHING
|
||||||
|
RETURNING achievement_key`,
|
||||||
|
[userId, ...satisfied]
|
||||||
|
);
|
||||||
|
return r.rows.map(row => {
|
||||||
|
const d = BY_KEY[row.achievement_key];
|
||||||
|
return { key: d.key, label: d.label, icon: d.icon };
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Alle Erfolge mit Freischalt-Status (für die Profil-Sektion).
|
||||||
|
async function listAchievements(userId) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT achievement_key, unlocked_at FROM user_achievements WHERE user_id = $1`,
|
||||||
|
[userId]
|
||||||
|
);
|
||||||
|
const unlocked = new Map(r.rows.map(x => [x.achievement_key, x.unlocked_at]));
|
||||||
|
return DEFS.map(d => ({
|
||||||
|
key: d.key, label: d.label, icon: d.icon,
|
||||||
|
unlocked: unlocked.has(d.key),
|
||||||
|
unlocked_at: unlocked.get(d.key) || null,
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { evaluateAchievements, listAchievements, DEFS };
|
||||||
309
src/lib/classifyWords.js
Normal file
309
src/lib/classifyWords.js
Normal file
@@ -0,0 +1,309 @@
|
|||||||
|
// Automatische Wort-Kategorisierung über die Anthropic Message Batches API (asynchron, ~50% günstiger).
|
||||||
|
// Entkoppelt vom generate-words-Prompt und vom Publish-Flow: ein stündlicher Job (src/index.js)
|
||||||
|
// findet Wörter, die in Pairs verwendet werden aber noch keine Kategorie haben, lässt sie von Haiku
|
||||||
|
// gegen die feste Taxonomie (src/db-migrate.js) klassifizieren und materialisiert danach pair_categories.
|
||||||
|
const { query } = require('../db');
|
||||||
|
const { resolvePlaceholdersToLabels } = require('./placeholders');
|
||||||
|
const { derivePairCategories } = require('./pairCategories');
|
||||||
|
|
||||||
|
const ANTHROPIC_BASE = 'https://api.anthropic.com';
|
||||||
|
const MODEL = 'claude-haiku-4-5-20251001';
|
||||||
|
const BATCH_LIMIT = 500; // max. Wörter pro Submit (Batches API erlaubt bis 100k)
|
||||||
|
const MAX_EXAMPLES = 3;
|
||||||
|
|
||||||
|
let running = false; // Overlap-Schutz zwischen Ticks
|
||||||
|
|
||||||
|
function headers() {
|
||||||
|
const apiKey = process.env.ANTHROPIC_API_KEY;
|
||||||
|
if (!apiKey) throw new Error('ANTHROPIC_API_KEY nicht konfiguriert');
|
||||||
|
return { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Veröffentlichte Kategorien laden → Map (lower(titel_de|titel_en) → {id, titel_de}) + Namensliste fürs Prompt.
|
||||||
|
async function loadCategories() {
|
||||||
|
const r = await query(`SELECT id, titel_de, titel_en FROM categories WHERE status = 'published'`);
|
||||||
|
const byName = new Map();
|
||||||
|
for (const c of r.rows) {
|
||||||
|
if (c.titel_de) byName.set(c.titel_de.toLowerCase(), c);
|
||||||
|
if (c.titel_en) byName.set(c.titel_en.toLowerCase(), c);
|
||||||
|
}
|
||||||
|
return { rows: r.rows, byName };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wörter ohne Kategorie, die in Pairs (Statements oder Objekte) verwendet werden.
|
||||||
|
async function findUncategorizedUsedWords(limit = BATCH_LIMIT) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT w.id, w.titel_de, w.titel_en, w.titel_sv
|
||||||
|
FROM words w
|
||||||
|
WHERE NOT EXISTS (SELECT 1 FROM word_categories wc WHERE wc.word_id = w.id)
|
||||||
|
AND (
|
||||||
|
EXISTS (SELECT 1 FROM statement_positive_words spw WHERE spw.word_id = w.id)
|
||||||
|
OR EXISTS (SELECT 1 FROM statement_negative_words snw WHERE snw.word_id = w.id)
|
||||||
|
OR EXISTS (SELECT 1 FROM object_words ow WHERE ow.word_id = w.id)
|
||||||
|
)
|
||||||
|
AND COALESCE(w.titel_de, w.titel_en, w.titel_sv) IS NOT NULL
|
||||||
|
ORDER BY w.created_at DESC
|
||||||
|
LIMIT $1`,
|
||||||
|
[limit]
|
||||||
|
);
|
||||||
|
return r.rows;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Bis zu `max` englische Beispielsätze, die das Wort enthalten (Tokens → Labels, ohne uuid).
|
||||||
|
async function examplesForWord(wordId, max = MAX_EXAMPLES) {
|
||||||
|
const out = [];
|
||||||
|
const seen = new Set();
|
||||||
|
const push = (s) => {
|
||||||
|
const t = resolvePlaceholdersToLabels(s || '').trim();
|
||||||
|
if (t && !seen.has(t.toLowerCase())) { seen.add(t.toLowerCase()); out.push(t); }
|
||||||
|
};
|
||||||
|
|
||||||
|
const stmt = await query(
|
||||||
|
`SELECT s.positive_sentence_en AS s
|
||||||
|
FROM statement_positive_words spw JOIN statements s ON s.id = spw.statement_id
|
||||||
|
WHERE spw.word_id = $1 AND s.positive_sentence_en IS NOT NULL
|
||||||
|
UNION
|
||||||
|
SELECT s.negative_sentence_en
|
||||||
|
FROM statement_negative_words snw JOIN statements s ON s.id = snw.statement_id
|
||||||
|
WHERE snw.word_id = $1 AND s.negative_sentence_en IS NOT NULL
|
||||||
|
LIMIT 10`,
|
||||||
|
[wordId]
|
||||||
|
);
|
||||||
|
for (const row of stmt.rows) { push(row.s); if (out.length >= max) return out; }
|
||||||
|
|
||||||
|
const qs = await query(
|
||||||
|
`SELECT DISTINCT q.sentence_en AS s
|
||||||
|
FROM object_words ow
|
||||||
|
JOIN object_pairs op ON op.object_id = ow.object_id
|
||||||
|
JOIN pairs p ON p.id = op.pair_id
|
||||||
|
JOIN questions q ON q.id = p.question_id
|
||||||
|
WHERE ow.word_id = $1 AND q.sentence_en IS NOT NULL
|
||||||
|
LIMIT 10`,
|
||||||
|
[wordId]
|
||||||
|
);
|
||||||
|
for (const row of qs.rows) { push(row.s); if (out.length >= max) break; }
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Gemeinsame Klassifizierungs-Regeln. Drückt Sonstiges stark zurück und gibt Wortart-Hinweise.
|
||||||
|
const CLASSIFY_RULES =
|
||||||
|
`Rules:\n` +
|
||||||
|
`- Pick the SINGLE best-fitting category by its exact German name.\n` +
|
||||||
|
`- Most concrete nouns DO fit a topic: animals→Tiere, food/fruit/vegetables→Lebensmittel, ` +
|
||||||
|
`sky/star/fire/water/mountain/plants→Natur & Pflanzen, furniture/window/carpet/cushion→Wohnen & Möbel, ` +
|
||||||
|
`street/building/lamp post→Stadt & Gebäude, books/pages→Schule & Bildung.\n` +
|
||||||
|
`- Adjectives / properties (warm, fast, sweet, old, fragile, transparent…) → "Eigenschaften".\n` +
|
||||||
|
`- Verbs / actions → "Verben & Handlungen".\n` +
|
||||||
|
`- Use "Sonstiges" ONLY as a true last resort when nothing else fits at all.`;
|
||||||
|
|
||||||
|
function buildPrompt(word, examples, categoryNamesDe) {
|
||||||
|
const title = word.titel_en || word.titel_de || word.titel_sv || '';
|
||||||
|
const titleDe = word.titel_de ? ` (de: "${word.titel_de}")` : '';
|
||||||
|
const ex = examples.length
|
||||||
|
? `\n\nExample sentences using the word:\n${examples.map(e => `- ${e}`).join('\n')}`
|
||||||
|
: '';
|
||||||
|
return (
|
||||||
|
`Categories (German names):\n${categoryNamesDe.join(', ')}\n\n${CLASSIFY_RULES}\n\n` +
|
||||||
|
`Classify this single vocabulary word.\n\nWord: "${title}"${titleDe}${ex}\n\n` +
|
||||||
|
`Reply with JSON only: {"category":"<exact German category name>"}`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wörter als Batch einreichen (ein Request pro Wort, custom_id = word.id). Gibt batch_id zurück.
|
||||||
|
async function submitBatch(words, categoryNamesDe) {
|
||||||
|
const system = 'Du bist ein präziser Klassifizierer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
|
||||||
|
const requests = [];
|
||||||
|
for (const w of words) {
|
||||||
|
const examples = await examplesForWord(w.id);
|
||||||
|
requests.push({
|
||||||
|
custom_id: w.id,
|
||||||
|
params: {
|
||||||
|
model: MODEL,
|
||||||
|
max_tokens: 64,
|
||||||
|
system,
|
||||||
|
messages: [{ role: 'user', content: buildPrompt(w, examples, categoryNamesDe) }],
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches`, {
|
||||||
|
method: 'POST', headers: headers(), body: JSON.stringify({ requests }),
|
||||||
|
});
|
||||||
|
if (!res.ok) {
|
||||||
|
const err = await res.text().catch(() => '');
|
||||||
|
throw new Error(`Batch-Submit fehlgeschlagen (${res.status}): ${err.slice(0, 300)}`);
|
||||||
|
}
|
||||||
|
const data = await res.json();
|
||||||
|
await query(`INSERT INTO category_batches (batch_id, status) VALUES ($1, 'submitted') ON CONFLICT DO NOTHING`, [data.id]);
|
||||||
|
return data.id;
|
||||||
|
}
|
||||||
|
|
||||||
|
// pair_categories für alle Pairs neu ableiten, die eines der Wörter referenzieren.
|
||||||
|
async function rederivePairsForWords(wordIds) {
|
||||||
|
if (!wordIds.length) return;
|
||||||
|
const pairs = await query(
|
||||||
|
`SELECT DISTINCT p.id FROM pairs p
|
||||||
|
WHERE p.positive_statement_id IN (SELECT statement_id FROM statement_positive_words WHERE word_id = ANY($1))
|
||||||
|
OR p.positive_statement_id IN (SELECT statement_id FROM statement_negative_words WHERE word_id = ANY($1))
|
||||||
|
OR p.negative_statement_id IN (SELECT statement_id FROM statement_positive_words WHERE word_id = ANY($1))
|
||||||
|
OR p.negative_statement_id IN (SELECT statement_id FROM statement_negative_words WHERE word_id = ANY($1))
|
||||||
|
OR p.id IN (SELECT op.pair_id FROM object_pairs op
|
||||||
|
JOIN object_words ow ON ow.object_id = op.object_id
|
||||||
|
WHERE ow.word_id = ANY($1))`,
|
||||||
|
[wordIds]
|
||||||
|
);
|
||||||
|
if (pairs.rows.length) await derivePairCategories(pairs.rows.map(p => p.id)).catch(() => {});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Synchroner Claude-Call (/v1/messages) — für den sofortigen One-Shot-Backfill (kein 24h-Batch-Verzug).
|
||||||
|
async function messagesCall(system, user, maxTokens = 2000) {
|
||||||
|
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages`, {
|
||||||
|
method: 'POST', headers: headers(),
|
||||||
|
body: JSON.stringify({ model: MODEL, max_tokens: maxTokens, system, messages: [{ role: 'user', content: user }] }),
|
||||||
|
});
|
||||||
|
if (!res.ok) { const t = await res.text().catch(() => ''); throw new Error(`Claude ${res.status}: ${t.slice(0, 200)}`); }
|
||||||
|
const data = await res.json();
|
||||||
|
let raw = (data.content?.[0]?.text || '').trim();
|
||||||
|
const md = raw.match(/```(?:json)?\s*([\s\S]+?)\s*```/);
|
||||||
|
if (md) raw = md[1];
|
||||||
|
return JSON.parse(raw);
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseCategory(text) {
|
||||||
|
if (!text) return null;
|
||||||
|
let raw = text.trim();
|
||||||
|
const md = raw.match(/```(?:json)?\s*([\s\S]+?)\s*```/);
|
||||||
|
if (md) raw = md[1];
|
||||||
|
try { return (JSON.parse(raw).category || '').toString().trim() || null; }
|
||||||
|
catch { return null; }
|
||||||
|
}
|
||||||
|
|
||||||
|
// Batch einsammeln, falls fertig: Ergebnisse anwenden (word_categories + pair_categories).
|
||||||
|
// Gibt { ended, linked } zurück.
|
||||||
|
async function collectBatch(batchId) {
|
||||||
|
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches/${batchId}`, { headers: headers() });
|
||||||
|
if (!res.ok) {
|
||||||
|
// Batch unbekannt/gelöscht → Eintrag aufräumen, damit der nächste Tick neu submitten kann
|
||||||
|
if (res.status === 404) await query(`DELETE FROM category_batches WHERE batch_id = $1`, [batchId]);
|
||||||
|
return { ended: false, linked: 0 };
|
||||||
|
}
|
||||||
|
const batch = await res.json();
|
||||||
|
if (batch.processing_status !== 'ended' || !batch.results_url) return { ended: false, linked: 0 };
|
||||||
|
|
||||||
|
const { byName } = await loadCategories();
|
||||||
|
const fallback = byName.get('sonstiges') || null;
|
||||||
|
|
||||||
|
const r = await fetch(batch.results_url, { headers: headers() });
|
||||||
|
if (!r.ok) return { ended: false, linked: 0 };
|
||||||
|
const jsonl = await r.text();
|
||||||
|
|
||||||
|
const linkedWordIds = [];
|
||||||
|
for (const line of jsonl.split('\n')) {
|
||||||
|
const trimmed = line.trim();
|
||||||
|
if (!trimmed) continue;
|
||||||
|
let entry;
|
||||||
|
try { entry = JSON.parse(trimmed); } catch { continue; }
|
||||||
|
if (entry.result?.type !== 'succeeded') continue;
|
||||||
|
const wordId = entry.custom_id;
|
||||||
|
const text = entry.result.message?.content?.[0]?.text;
|
||||||
|
const name = parseCategory(text);
|
||||||
|
const cat = (name && byName.get(name.toLowerCase())) || fallback;
|
||||||
|
if (!cat) continue;
|
||||||
|
await query(
|
||||||
|
`INSERT INTO word_categories (word_id, category_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
|
||||||
|
[wordId, cat.id]
|
||||||
|
).catch(() => {});
|
||||||
|
linkedWordIds.push(wordId);
|
||||||
|
}
|
||||||
|
|
||||||
|
// pair_categories für betroffene Pairs neu ableiten
|
||||||
|
await rederivePairsForWords(linkedWordIds);
|
||||||
|
|
||||||
|
await query(`DELETE FROM category_batches WHERE batch_id = $1`, [batchId]);
|
||||||
|
return { ended: true, linked: linkedWordIds.length };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ein Tick: offenen Batch einsammeln; sonst neuen Batch für unkategorisierte Wörter einreichen.
|
||||||
|
async function runCategorizationTick() {
|
||||||
|
if (running) return { skipped: true };
|
||||||
|
running = true;
|
||||||
|
try {
|
||||||
|
const open = await query(`SELECT batch_id FROM category_batches ORDER BY created_at ASC LIMIT 1`);
|
||||||
|
if (open.rows.length) {
|
||||||
|
const { ended, linked } = await collectBatch(open.rows[0].batch_id);
|
||||||
|
return { collected: ended, linked, batchId: open.rows[0].batch_id };
|
||||||
|
}
|
||||||
|
const words = await findUncategorizedUsedWords();
|
||||||
|
if (!words.length) return { remaining: 0 };
|
||||||
|
const { rows } = await loadCategories();
|
||||||
|
const names = rows.map(c => c.titel_de).filter(Boolean);
|
||||||
|
const batchId = await submitBatch(words, names);
|
||||||
|
return { submitted: words.length, batchId };
|
||||||
|
} finally {
|
||||||
|
running = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sofortiger One-Shot-Backfill (synchron, ohne 24h-Batch-Verzug): klassifiziert bestehende,
|
||||||
|
// in Pairs verwendete Wörter ohne Kategorie in Schüben per /v1/messages und materialisiert
|
||||||
|
// pair_categories direkt. Für den Live-Test gedacht; der Stundenjob bleibt für laufenden Nachschub.
|
||||||
|
async function classifyWordsSync({ max = 2000, reset = false } = {}) {
|
||||||
|
if (running) return { skipped: true };
|
||||||
|
running = true;
|
||||||
|
try {
|
||||||
|
const { rows: catRows, byName } = await loadCategories();
|
||||||
|
const names = catRows.map(c => c.titel_de).filter(Boolean);
|
||||||
|
const fallback = byName.get('sonstiges') || null;
|
||||||
|
const system = 'Du bist ein präziser Klassifizierer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
|
||||||
|
let processed = 0, linked = 0;
|
||||||
|
|
||||||
|
// reset → bestehende Zuordnungen verwerfen und mit verbesserter Logik/Taxonomie neu klassifizieren
|
||||||
|
if (reset) await query(`DELETE FROM word_categories`).catch(() => {});
|
||||||
|
|
||||||
|
while (processed < max) {
|
||||||
|
const words = await findUncategorizedUsedWords(Math.min(15, max - processed));
|
||||||
|
if (!words.length) break;
|
||||||
|
|
||||||
|
const lines = [];
|
||||||
|
for (const w of words) {
|
||||||
|
const t = w.titel_en || w.titel_de || w.titel_sv || '';
|
||||||
|
const de = w.titel_de && w.titel_de !== t ? ` (de: ${w.titel_de})` : '';
|
||||||
|
const ex = await examplesForWord(w.id, 2);
|
||||||
|
const exStr = ex.length ? ` | e.g.: ${ex.map(e => `"${e}"`).join('; ')}` : '';
|
||||||
|
lines.push(`${w.id}\t${t}${de}${exStr}`);
|
||||||
|
}
|
||||||
|
const user =
|
||||||
|
`Categories (German names):\n${names.join(', ')}\n\n${CLASSIFY_RULES}\n\n` +
|
||||||
|
`Classify each vocabulary word below.\nWords (id<TAB>title | examples):\n${lines.join('\n')}\n\n` +
|
||||||
|
`Reply with JSON only: {"assignments":[{"id":"<id>","category":"<exact German category name>"}]}`;
|
||||||
|
|
||||||
|
let assignments = [];
|
||||||
|
try {
|
||||||
|
const data = await messagesCall(system, user, 1500);
|
||||||
|
assignments = Array.isArray(data.assignments) ? data.assignments : [];
|
||||||
|
} catch { /* Fehler → ganze Charge bekommt Fallback, damit der Lauf fortschreitet */ }
|
||||||
|
|
||||||
|
const byId = new Map(assignments.map(a => [String(a.id), a.category]));
|
||||||
|
const linkedIds = [];
|
||||||
|
for (const w of words) {
|
||||||
|
const name = byId.get(String(w.id));
|
||||||
|
const cat = (name && byName.get(String(name).toLowerCase())) || fallback;
|
||||||
|
if (!cat) continue;
|
||||||
|
await query(
|
||||||
|
`INSERT INTO word_categories (word_id, category_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
|
||||||
|
[w.id, cat.id]
|
||||||
|
).catch(() => {});
|
||||||
|
linkedIds.push(w.id);
|
||||||
|
}
|
||||||
|
await rederivePairsForWords(linkedIds);
|
||||||
|
|
||||||
|
processed += words.length;
|
||||||
|
linked += linkedIds.length;
|
||||||
|
if (!linkedIds.length) break; // Sicherung gegen Endlosschleife (z. B. fehlende Fallback-Kategorie)
|
||||||
|
}
|
||||||
|
return { processed, linked };
|
||||||
|
} finally {
|
||||||
|
running = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { runCategorizationTick, classifyWordsSync, findUncategorizedUsedWords, collectBatch, submitBatch };
|
||||||
77
src/lib/deleteCascade.js
Normal file
77
src/lib/deleteCascade.js
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
const { query } = require('../db');
|
||||||
|
const { deleteFile, keyFromUrl } = require('../s3');
|
||||||
|
|
||||||
|
// Audios (DB-Rows + S3-Dateien) einer Quelle entfernen.
|
||||||
|
async function deleteAudiosFor(sourceTable, sourceId) {
|
||||||
|
const audios = await query(
|
||||||
|
`SELECT id, audio_link FROM audios WHERE source_table = $1 AND source_id = $2`,
|
||||||
|
[sourceTable, sourceId]
|
||||||
|
);
|
||||||
|
for (const a of audios.rows) {
|
||||||
|
const key = keyFromUrl(a.audio_link);
|
||||||
|
if (key) await deleteFile(key).catch(() => {});
|
||||||
|
await query('DELETE FROM audios WHERE id = $1', [a.id]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pair inkl. Frage, Statements und deren Audios löschen.
|
||||||
|
// Frage/Statements bleiben stehen, wenn ein anderes Pair sie noch referenziert.
|
||||||
|
// Objekte werden nicht angefasst (object_pairs kaskadiert per FK).
|
||||||
|
async function deletePairDeep(pairId) {
|
||||||
|
const existing = await query(
|
||||||
|
`SELECT question_id, positive_statement_id, negative_statement_id FROM pairs WHERE id = $1`,
|
||||||
|
[pairId]
|
||||||
|
);
|
||||||
|
if (!existing.rows.length) return false;
|
||||||
|
const { question_id, positive_statement_id, negative_statement_id } = existing.rows[0];
|
||||||
|
|
||||||
|
await query('DELETE FROM pairs WHERE id = $1', [pairId]);
|
||||||
|
|
||||||
|
if (question_id) {
|
||||||
|
const ref = await query('SELECT 1 FROM pairs WHERE question_id = $1 LIMIT 1', [question_id]);
|
||||||
|
if (!ref.rows.length) {
|
||||||
|
await deleteAudiosFor('questions', question_id);
|
||||||
|
await query('DELETE FROM questions WHERE id = $1', [question_id]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const stmtIds = [...new Set([positive_statement_id, negative_statement_id].filter(Boolean))];
|
||||||
|
for (const stmtId of stmtIds) {
|
||||||
|
const ref = await query(
|
||||||
|
'SELECT 1 FROM pairs WHERE positive_statement_id = $1 OR negative_statement_id = $1 LIMIT 1',
|
||||||
|
[stmtId]
|
||||||
|
);
|
||||||
|
if (!ref.rows.length) {
|
||||||
|
await deleteAudiosFor('statements', stmtId);
|
||||||
|
await query('DELETE FROM statements WHERE id = $1', [stmtId]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Alle Objekte eines Bildes löschen (inkl. deren Pairs), sofern das Objekt
|
||||||
|
// ausschließlich mit diesem Bild verknüpft ist.
|
||||||
|
async function deletePictureObjectsDeep(pictureId) {
|
||||||
|
const objects = await query(
|
||||||
|
`SELECT object_id FROM object_pictures WHERE picture_id = $1`,
|
||||||
|
[pictureId]
|
||||||
|
);
|
||||||
|
for (const { object_id } of objects.rows) {
|
||||||
|
const other = await query(
|
||||||
|
`SELECT 1 FROM object_pictures WHERE object_id = $1 AND picture_id <> $2 LIMIT 1`,
|
||||||
|
[object_id, pictureId]
|
||||||
|
);
|
||||||
|
if (other.rows.length) continue;
|
||||||
|
|
||||||
|
const pairs = await query(
|
||||||
|
`SELECT pair_id FROM object_pairs WHERE object_id = $1`,
|
||||||
|
[object_id]
|
||||||
|
);
|
||||||
|
for (const { pair_id } of pairs.rows) await deletePairDeep(pair_id);
|
||||||
|
|
||||||
|
await query('DELETE FROM objects WHERE id = $1', [object_id]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { deletePairDeep, deletePictureObjectsDeep };
|
||||||
229
src/lib/enrichWords.js
Normal file
229
src/lib/enrichWords.js
Normal file
@@ -0,0 +1,229 @@
|
|||||||
|
// Automatische Wort-Anreicherung über die Anthropic Message Batches API (asynchron, ~50 % günstiger).
|
||||||
|
// Ziel: Brysbaert-Importwörter (titel_en + conc_m gesetzt) nach DE+SV übersetzen und mit
|
||||||
|
// dom_pos, CEFR-level und themenfeld_id versehen. Folgt dem Muster von classifyWords.js.
|
||||||
|
const { query } = require('../db');
|
||||||
|
|
||||||
|
const ANTHROPIC_BASE = 'https://api.anthropic.com';
|
||||||
|
const MODEL = 'claude-haiku-4-5-20251001';
|
||||||
|
const BATCH_LIMIT = 500;
|
||||||
|
|
||||||
|
let running = false;
|
||||||
|
|
||||||
|
function headers() {
|
||||||
|
const apiKey = process.env.ANTHROPIC_API_KEY;
|
||||||
|
if (!apiKey) throw new Error('ANTHROPIC_API_KEY nicht konfiguriert');
|
||||||
|
return { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Alle veröffentlichten Kategorien laden (Unter- und Oberkategorien).
|
||||||
|
// Gibt byName-Map (lower(titel_de|titel_en) → Row) + sortierte Namensliste zurück.
|
||||||
|
async function loadAllCategories() {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT id, titel_de, titel_en, parent_id FROM categories WHERE status = 'published'`
|
||||||
|
);
|
||||||
|
const byName = new Map();
|
||||||
|
for (const c of r.rows) {
|
||||||
|
if (c.titel_de) byName.set(c.titel_de.toLowerCase(), c);
|
||||||
|
if (c.titel_en) byName.set(c.titel_en.toLowerCase(), c);
|
||||||
|
}
|
||||||
|
// Unterkategorien zuerst → Batch-Prompt bevorzugt granulare Einträge
|
||||||
|
const subcats = r.rows.filter(c => c.parent_id).map(c => c.titel_de).filter(Boolean);
|
||||||
|
const topCats = r.rows.filter(c => !c.parent_id).map(c => c.titel_de).filter(Boolean);
|
||||||
|
return { byName, names: [...subcats, ...topCats] };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wörter die angereichert werden sollen: haben conc_m + titel_en, aber fehlendes DE/dom_pos/themenfeld.
|
||||||
|
async function findWordsToEnrich(limit = BATCH_LIMIT) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT id, titel_en FROM words
|
||||||
|
WHERE conc_m IS NOT NULL
|
||||||
|
AND titel_en IS NOT NULL
|
||||||
|
AND (titel_de IS NULL OR dom_pos IS NULL OR themenfeld_id IS NULL)
|
||||||
|
ORDER BY created_at DESC
|
||||||
|
LIMIT $1`,
|
||||||
|
[limit]
|
||||||
|
);
|
||||||
|
return r.rows;
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildEnrichPrompt(word, categoryNames) {
|
||||||
|
return (
|
||||||
|
`Themenfelder (bevorzuge Unterkategorien wie "Obst", "Haustiere", "Kopf & Gesicht" statt der Oberkategorie):\n` +
|
||||||
|
`${categoryNames.join(', ')}\n\n` +
|
||||||
|
`Wort (Englisch): "${word.titel_en}"\n\n` +
|
||||||
|
`Regeln:\n` +
|
||||||
|
`- titel_de / titel_sv: Grundform ohne Artikel\n` +
|
||||||
|
`- dom_pos: noun | verb | adjective | other\n` +
|
||||||
|
`- level: A1 | A2 | B1 | null (null wenn B2+ oder unklar)\n` +
|
||||||
|
`- themenfeld: exakter Name aus der Liste oben, Fallback "Sonstiges"\n\n` +
|
||||||
|
`Antworte NUR mit JSON:\n` +
|
||||||
|
`{"titel_de":"...","titel_sv":"...","dom_pos":"noun","level":"A1","themenfeld":"Obst"}`
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wort-Update in DB (COALESCE: Neuwert wenn vorhanden, sonst bestehender Wert bleibt).
|
||||||
|
async function applyEnrichResult(wordId, result, byName) {
|
||||||
|
if (!result) return;
|
||||||
|
const fallback = byName.get('sonstiges') || null;
|
||||||
|
const cat = (result.themenfeld && byName.get(result.themenfeld.toLowerCase())) || fallback;
|
||||||
|
|
||||||
|
await query(
|
||||||
|
`UPDATE words SET
|
||||||
|
titel_de = COALESCE($2, titel_de),
|
||||||
|
titel_sv = COALESCE($3, titel_sv),
|
||||||
|
dom_pos = COALESCE($4, dom_pos),
|
||||||
|
level = COALESCE($5, level),
|
||||||
|
themenfeld_id = COALESCE($6, themenfeld_id)
|
||||||
|
WHERE id = $1`,
|
||||||
|
[wordId, result.titel_de || null, result.titel_sv || null,
|
||||||
|
result.dom_pos || null, result.level || null, cat?.id || null]
|
||||||
|
).catch(() => {});
|
||||||
|
|
||||||
|
// Auto-Promote: requested → translated wenn jetzt alle 3 Sprachen gefüllt sind
|
||||||
|
await query(
|
||||||
|
`UPDATE words SET status = 'translated'
|
||||||
|
WHERE id = $1 AND status = 'requested'
|
||||||
|
AND titel_de IS NOT NULL AND titel_en IS NOT NULL AND titel_sv IS NOT NULL`,
|
||||||
|
[wordId]
|
||||||
|
).catch(() => {});
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Asynchroner Batch-Weg ──────────────────────────────────────────────────
|
||||||
|
|
||||||
|
async function submitEnrichBatch(words, categoryNames) {
|
||||||
|
const system = 'Du bist ein präziser Lexikograph. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
|
||||||
|
const requests = words.map(w => ({
|
||||||
|
custom_id: w.id,
|
||||||
|
params: {
|
||||||
|
model: MODEL,
|
||||||
|
max_tokens: 150,
|
||||||
|
system,
|
||||||
|
messages: [{ role: 'user', content: buildEnrichPrompt(w, categoryNames) }],
|
||||||
|
},
|
||||||
|
}));
|
||||||
|
|
||||||
|
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches`, {
|
||||||
|
method: 'POST', headers: headers(), body: JSON.stringify({ requests }),
|
||||||
|
});
|
||||||
|
if (!res.ok) {
|
||||||
|
const err = await res.text().catch(() => '');
|
||||||
|
throw new Error(`Enrich-Batch-Submit fehlgeschlagen (${res.status}): ${err.slice(0, 300)}`);
|
||||||
|
}
|
||||||
|
const data = await res.json();
|
||||||
|
await query(
|
||||||
|
`INSERT INTO enrich_batches (batch_id, status) VALUES ($1, 'submitted') ON CONFLICT DO NOTHING`,
|
||||||
|
[data.id]
|
||||||
|
);
|
||||||
|
return data.id;
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseJson(text) {
|
||||||
|
if (!text) return null;
|
||||||
|
let raw = text.trim();
|
||||||
|
const md = raw.match(/```(?:json)?\s*([\s\S]+?)\s*```/);
|
||||||
|
if (md) raw = md[1];
|
||||||
|
try { return JSON.parse(raw); } catch { return null; }
|
||||||
|
}
|
||||||
|
|
||||||
|
async function collectEnrichBatch(batchId) {
|
||||||
|
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages/batches/${batchId}`, { headers: headers() });
|
||||||
|
if (!res.ok) {
|
||||||
|
if (res.status === 404) await query(`DELETE FROM enrich_batches WHERE batch_id = $1`, [batchId]);
|
||||||
|
return { ended: false, enriched: 0 };
|
||||||
|
}
|
||||||
|
const batch = await res.json();
|
||||||
|
if (batch.processing_status !== 'ended' || !batch.results_url) return { ended: false, enriched: 0 };
|
||||||
|
|
||||||
|
const { byName } = await loadAllCategories();
|
||||||
|
const r = await fetch(batch.results_url, { headers: headers() });
|
||||||
|
if (!r.ok) return { ended: false, enriched: 0 };
|
||||||
|
|
||||||
|
let enriched = 0;
|
||||||
|
for (const line of (await r.text()).split('\n')) {
|
||||||
|
const trimmed = line.trim();
|
||||||
|
if (!trimmed) continue;
|
||||||
|
let entry;
|
||||||
|
try { entry = JSON.parse(trimmed); } catch { continue; }
|
||||||
|
if (entry.result?.type !== 'succeeded') continue;
|
||||||
|
const parsed = parseJson(entry.result.message?.content?.[0]?.text);
|
||||||
|
await applyEnrichResult(entry.custom_id, parsed, byName);
|
||||||
|
if (parsed) enriched++;
|
||||||
|
}
|
||||||
|
|
||||||
|
await query(`DELETE FROM enrich_batches WHERE batch_id = $1`, [batchId]);
|
||||||
|
return { ended: true, enriched };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ein Tick: offenen Batch einsammeln; sonst neuen Batch für unbereicherte Wörter einreichen.
|
||||||
|
async function runEnrichTick() {
|
||||||
|
if (running) return { skipped: true };
|
||||||
|
running = true;
|
||||||
|
try {
|
||||||
|
const open = await query(`SELECT batch_id FROM enrich_batches ORDER BY created_at ASC LIMIT 1`);
|
||||||
|
if (open.rows.length) {
|
||||||
|
const { ended, enriched } = await collectEnrichBatch(open.rows[0].batch_id);
|
||||||
|
return { collected: ended, enriched, batchId: open.rows[0].batch_id };
|
||||||
|
}
|
||||||
|
const words = await findWordsToEnrich();
|
||||||
|
if (!words.length) return { remaining: 0 };
|
||||||
|
const { names } = await loadAllCategories();
|
||||||
|
const batchId = await submitEnrichBatch(words, names);
|
||||||
|
return { submitted: words.length, batchId };
|
||||||
|
} finally {
|
||||||
|
running = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Synchroner Weg für ?sync=true ─────────────────────────────────────────
|
||||||
|
|
||||||
|
async function enrichWordsSync({ max = 500 } = {}) {
|
||||||
|
if (running) return { skipped: true };
|
||||||
|
running = true;
|
||||||
|
try {
|
||||||
|
const { byName, names } = await loadAllCategories();
|
||||||
|
const system = 'Du bist ein präziser Lexikograph. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown.';
|
||||||
|
let processed = 0;
|
||||||
|
let enriched = 0;
|
||||||
|
|
||||||
|
while (processed < max) {
|
||||||
|
const words = await findWordsToEnrich(Math.min(20, max - processed));
|
||||||
|
if (!words.length) break;
|
||||||
|
|
||||||
|
const items = words.map((w, i) => `${i + 1}. "${w.titel_en}" (id: ${w.id})`).join('\n');
|
||||||
|
const user =
|
||||||
|
`Themenfelder (bevorzuge Unterkategorien):\n${names.join(', ')}\n\n` +
|
||||||
|
`Regeln:\n` +
|
||||||
|
`- titel_de / titel_sv: Grundform ohne Artikel\n` +
|
||||||
|
`- dom_pos: noun | verb | adjective | other\n` +
|
||||||
|
`- level: A1 | A2 | B1 | null\n` +
|
||||||
|
`- themenfeld: exakter Name aus der Liste, Fallback "Sonstiges"\n\n` +
|
||||||
|
`Wörter:\n${items}\n\n` +
|
||||||
|
`Antworte NUR mit JSON:\n` +
|
||||||
|
`{"results":[{"id":"<uuid>","titel_de":"...","titel_sv":"...","dom_pos":"noun","level":"A1","themenfeld":"Obst"}]}`;
|
||||||
|
|
||||||
|
let results = [];
|
||||||
|
try {
|
||||||
|
const res = await fetch(`${ANTHROPIC_BASE}/v1/messages`, {
|
||||||
|
method: 'POST', headers: headers(),
|
||||||
|
body: JSON.stringify({ model: MODEL, max_tokens: 3000, system, messages: [{ role: 'user', content: user }] }),
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`HTTP ${res.status}`);
|
||||||
|
const data = await res.json();
|
||||||
|
const parsed = parseJson(data.content?.[0]?.text);
|
||||||
|
results = Array.isArray(parsed?.results) ? parsed.results : [];
|
||||||
|
} catch { /* Charge überspringen, nächste Runde */ }
|
||||||
|
|
||||||
|
for (const r of results) {
|
||||||
|
await applyEnrichResult(r.id, r, byName);
|
||||||
|
enriched++;
|
||||||
|
}
|
||||||
|
processed += words.length;
|
||||||
|
if (!results.length) break; // Sicherung gegen Endlosschleife
|
||||||
|
}
|
||||||
|
return { processed, enriched };
|
||||||
|
} finally {
|
||||||
|
running = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { runEnrichTick, enrichWordsSync };
|
||||||
@@ -1,6 +1,7 @@
|
|||||||
// Pair-Generierung via Claude (Vision) + serverseitige Persistenz.
|
// Pair-Generierung via Claude (Vision) + serverseitige Persistenz.
|
||||||
// Genutzt von lib/pipeline.js (Automatik) und routes/claude.js (manueller Endpoint).
|
// Genutzt von lib/pipeline.js (Automatik) und routes/claude.js (manueller Endpoint).
|
||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
|
const { tagObjectWords } = require('./objectTagging');
|
||||||
|
|
||||||
const ANTHROPIC_API_URL = 'https://api.anthropic.com/v1/messages';
|
const ANTHROPIC_API_URL = 'https://api.anthropic.com/v1/messages';
|
||||||
const GENERATE_MODEL = process.env.GENERATE_MODEL || 'claude-haiku-4-5-20251001';
|
const GENERATE_MODEL = process.env.GENERATE_MODEL || 'claude-haiku-4-5-20251001';
|
||||||
@@ -39,7 +40,12 @@ async function generatePairsForObject({ imageUrl, objects, selectedObjectId, cou
|
|||||||
`Bei yes_no: mix aus answer:true und answer:false. Bei word: positive_words 1–3 passende Wörter, negative_words genau 3 falsche Wörter.\n\n` +
|
`Bei yes_no: mix aus answer:true und answer:false. Bei word: positive_words 1–3 passende Wörter, negative_words genau 3 falsche Wörter.\n\n` +
|
||||||
`Regeln: Alle Sätze und Wörter auf Deutsch. Sätze müssen natürlich klingen. Keine Wiederholungen. ` +
|
`Regeln: Alle Sätze und Wörter auf Deutsch. Sätze müssen natürlich klingen. Keine Wiederholungen. ` +
|
||||||
`Wörter beim type "word" sind AUSSCHLIESSLICH Nomen ("pos":"noun") oder Adjektive ("pos":"adjective") — ` +
|
`Wörter beim type "word" sind AUSSCHLIESSLICH Nomen ("pos":"noun") oder Adjektive ("pos":"adjective") — ` +
|
||||||
`KEINE Verben, Pronomen, Artikel, Präpositionen oder Funktionswörter. Gib für jedes Wort das "pos"-Feld an.`;
|
`KEINE Verben, Pronomen, Artikel, Präpositionen oder Funktionswörter. Gib für jedes Wort das "pos"-Feld an.\n\n` +
|
||||||
|
`NOMEN-MARKUP: Markiere in ALLEN Sätzen (question, positive, negative) jedes Nomen mit ` +
|
||||||
|
`[Oberflächenform|Grundform] — die Oberflächenform ist das Wort exakt wie es im Satz steht (Beugung/Mehrzahl), ` +
|
||||||
|
`die Grundform ist Nominativ Singular ohne Artikel. Beispiel: "Die [Wolken|Wolke] schweben am [Himmel|Himmel]." ` +
|
||||||
|
`Markiere NUR Nomen — keine Verben, Adjektive, Pronomen oder Funktionswörter. ` +
|
||||||
|
`Die Wörter in positive_words/negative_words bekommen KEIN Markup.`;
|
||||||
|
|
||||||
const res = await fetch(ANTHROPIC_API_URL, {
|
const res = await fetch(ANTHROPIC_API_URL, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
@@ -64,7 +70,62 @@ async function generatePairsForObject({ imageUrl, objects, selectedObjectId, cou
|
|||||||
if (md) raw = md[1];
|
if (md) raw = md[1];
|
||||||
const parsed = JSON.parse(raw);
|
const parsed = JSON.parse(raw);
|
||||||
if (!Array.isArray(parsed.pairs)) throw new Error('Ungültiges JSON-Format von Claude (pairs fehlt)');
|
if (!Array.isArray(parsed.pairs)) throw new Error('Ungültiges JSON-Format von Claude (pairs fehlt)');
|
||||||
return parsed.pairs.map(normalizePair).filter(Boolean);
|
const pairs = parsed.pairs.map(normalizePair).filter(Boolean);
|
||||||
|
for (const p of pairs) {
|
||||||
|
for (const f of ['question', 'positive', 'negative']) {
|
||||||
|
if (p[f]) p[f] = await resolveNounMarkup(p[f], objects, selectedObjectId);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return pairs;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Nomen-Markup → Placeholder ───────────────────────────────────────────────
|
||||||
|
// Claude markiert Nomen als [Oberflächenform|Grundform]. Hier wird daraus:
|
||||||
|
// - {{surface.o:objectId}} wenn die Grundform ein Objekt-Wort des Bildes ist
|
||||||
|
// (Zielobjekt hat Vorrang),
|
||||||
|
// - sonst {{surface.w:wordId}} mit find-or-create des Wortes (status 'requested').
|
||||||
|
const NOUN_MARKUP_RE = /\[([^\[\]|]+)(?:\|([^\[\]|]*))?\]/g;
|
||||||
|
|
||||||
|
async function resolveNounMarkup(text, objects, selectedObjectId) {
|
||||||
|
// Objekt-Wort-Lookup: lemma (lowercase) → objectId, Zielobjekt zuerst
|
||||||
|
const objectByLemma = new Map();
|
||||||
|
const ordered = [...(objects || [])].sort((a, b) =>
|
||||||
|
(a.id === selectedObjectId ? -1 : 0) - (b.id === selectedObjectId ? -1 : 0));
|
||||||
|
for (const obj of ordered) {
|
||||||
|
for (const w of obj.words || []) {
|
||||||
|
for (const t of [w.titel_de, w.titel_en, w.titel_sv]) {
|
||||||
|
const key = (t || '').trim().toLowerCase();
|
||||||
|
if (key && !objectByLemma.has(key)) objectByLemma.set(key, obj.id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Erst alle Markups einsammeln (Word-Erstellung ist async, replace nicht)
|
||||||
|
const matches = [...text.matchAll(NOUN_MARKUP_RE)];
|
||||||
|
const replacements = new Map();
|
||||||
|
for (const m of matches) {
|
||||||
|
if (replacements.has(m[0])) continue;
|
||||||
|
const surface = m[1].trim();
|
||||||
|
const lemma = (m[2] || '').trim() || surface;
|
||||||
|
if (!surface) { replacements.set(m[0], lemma); continue; }
|
||||||
|
const objectId = objectByLemma.get(lemma.toLowerCase()) || objectByLemma.get(surface.toLowerCase());
|
||||||
|
if (objectId) {
|
||||||
|
replacements.set(m[0], `{{${surface}.o:${objectId}}}`);
|
||||||
|
} else {
|
||||||
|
try {
|
||||||
|
const wordId = await findOrCreateWord(lemma);
|
||||||
|
replacements.set(m[0], `{{${surface}.w:${wordId}}}`);
|
||||||
|
} catch {
|
||||||
|
replacements.set(m[0], surface); // DB-Fehler → Wort unmarkiert lassen
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
let out = text;
|
||||||
|
for (const [from, to] of replacements) out = out.split(from).join(to);
|
||||||
|
// Sicherheitsnetz: Objekt-Wörter, die das Modell NICHT als [..]-Nomen markiert hat,
|
||||||
|
// deterministisch nachtokenisieren (der deutsche Satz wird hier verarbeitet).
|
||||||
|
out = tagObjectWords(out, 'de', objects);
|
||||||
|
return out;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Word-Einträge können {"w":"...","pos":"..."} oder plain Strings sein.
|
// Word-Einträge können {"w":"...","pos":"..."} oder plain Strings sein.
|
||||||
@@ -72,10 +133,11 @@ async function generatePairsForObject({ imageUrl, objects, selectedObjectId, cou
|
|||||||
function cleanWordList(list) {
|
function cleanWordList(list) {
|
||||||
if (!Array.isArray(list)) return [];
|
if (!Array.isArray(list)) return [];
|
||||||
const out = [];
|
const out = [];
|
||||||
|
const unmark = s => s.replace(NOUN_MARKUP_RE, (_, surface) => surface.trim());
|
||||||
for (const item of list) {
|
for (const item of list) {
|
||||||
if (typeof item === 'string') { const t = item.trim(); if (t) out.push(t); continue; }
|
if (typeof item === 'string') { const t = unmark(item).trim(); if (t) out.push(t); continue; }
|
||||||
if (item && typeof item === 'object') {
|
if (item && typeof item === 'object') {
|
||||||
const t = (item.w || item.word || item.text || '').toString().trim();
|
const t = unmark((item.w || item.word || item.text || '').toString()).trim();
|
||||||
const pos = (item.pos || '').toString().toLowerCase();
|
const pos = (item.pos || '').toString().toLowerCase();
|
||||||
if (t && (!pos || pos === 'noun' || pos === 'adjective')) out.push(t);
|
if (t && (!pos || pos === 'noun' || pos === 'adjective')) out.push(t);
|
||||||
}
|
}
|
||||||
@@ -176,4 +238,4 @@ async function persistPair(p, objectId) {
|
|||||||
return pair.id;
|
return pair.id;
|
||||||
}
|
}
|
||||||
|
|
||||||
module.exports = { generatePairsForObject, persistPair, findOrCreateWord };
|
module.exports = { generatePairsForObject, persistPair, findOrCreateWord, resolveNounMarkup };
|
||||||
|
|||||||
30
src/lib/leveling.js
Normal file
30
src/lib/leveling.js
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
// Progressive Level-Kurve — Single Source of Truth fürs Backend.
|
||||||
|
// Kumulative EP, die für Level n nötig sind: 5·n·(n+3).
|
||||||
|
// Level 1 → 20 EP, Level 2 → 50, Level 3 → 90, Level 4 → 140, Level 5 → 200 …
|
||||||
|
// Früh schnelle Level (erste Level fallen in der ersten Session), danach sanft steiler.
|
||||||
|
function epForLevel(level) {
|
||||||
|
if (level <= 0) return 0;
|
||||||
|
return 5 * level * (level + 3);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Höchstes n mit 5n²+15n ≤ ep → n ≤ (−15 + √(225 + 20·ep)) / 10
|
||||||
|
function levelForEp(ep) {
|
||||||
|
const e = Math.max(0, ep || 0);
|
||||||
|
return Math.floor((-15 + Math.sqrt(225 + 20 * e)) / 10);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Level + Fortschritt innerhalb des Levels (für Momentum-Anzeige im Client).
|
||||||
|
function levelInfo(ep) {
|
||||||
|
const e = Math.max(0, ep || 0);
|
||||||
|
const level = levelForEp(e);
|
||||||
|
const base = epForLevel(level);
|
||||||
|
const next = epForLevel(level + 1);
|
||||||
|
return {
|
||||||
|
level,
|
||||||
|
ep_into_level: e - base,
|
||||||
|
ep_to_next_level: next - e,
|
||||||
|
ep_for_next_level: next - base,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { epForLevel, levelForEp, levelInfo };
|
||||||
183
src/lib/objectTagging.js
Normal file
183
src/lib/objectTagging.js
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
// Deterministisches Tokenisieren von OBJEKT-Wörtern in Sätzen.
|
||||||
|
//
|
||||||
|
// Hintergrund: Objekt-Tokens ({{label.o:objectId}}) entstehen bisher nur aus dem
|
||||||
|
// Nomen-Markup [Oberfläche|Grundform], das das Generierungs-Modell setzen SOLL. Tut es das
|
||||||
|
// nicht (häufig bei kleinen Modellen), fehlt der Token komplett und das Frontend kann das
|
||||||
|
// Objekt weder als Chip noch als Bildregion hervorheben.
|
||||||
|
//
|
||||||
|
// Dieser Tagger findet Objekt-Wörter direkt im Satz – anhand der Wort-Titel der Objekte des
|
||||||
|
// Bildes – und ist damit unabhängig vom LLM-Markup. Er wird genutzt:
|
||||||
|
// - Forward: als Sicherheitsnetz in generatePairs.resolveNounMarkup / pipeline.translatePair
|
||||||
|
// - Backfill: scripts/backfill-object-tokens.js über bestehende Daten
|
||||||
|
//
|
||||||
|
// Wichtig: bereits vorhandene Tokens ({{…}}, ⟦PHn:…⟧) bleiben unangetastet, und es werden NUR
|
||||||
|
// Objekt-Tokens (.o:) erzeugt – Wort-Tokens (.w:) fasst dieser Tagger nicht an.
|
||||||
|
|
||||||
|
const { PLACEHOLDER_RE } = require('./placeholders');
|
||||||
|
|
||||||
|
// Flexions-Endungen je Sprache (bestimmte Form / Plural / Genitiv), längere zuerst, damit der
|
||||||
|
// Regex greedy die längste Form greift (z.B. "ryggsäcken" statt nur "ryggsäck").
|
||||||
|
const SUFFIXES = {
|
||||||
|
sv: ['ens', 'ets', 'na', 'en', 'et', 'or', 'ar', 'er', 'n', 'a', 's'],
|
||||||
|
de: ['en', 'es', 'er', 'em', 'e', 'n', 's'],
|
||||||
|
en: ['es', 's'],
|
||||||
|
};
|
||||||
|
// Lemmata, die kürzer als das sind, werden NUR exakt gematcht (keine Flexion) – sonst matchen
|
||||||
|
// kurze Wörter wie "bi" zu viel ("bil", "bin", …).
|
||||||
|
const MIN_LEN_FOR_SUFFIX = 4;
|
||||||
|
|
||||||
|
function escapeRegex(s) {
|
||||||
|
return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
|
||||||
|
}
|
||||||
|
|
||||||
|
// Bestehende Tokens (sowohl {{label.type:uuid}} als auch ⟦PHn:label⟧) erkennen, damit wir
|
||||||
|
// nicht in sie hineinschreiben.
|
||||||
|
const EXISTING_TOKEN_RE = /\{\{[^.{}]+\.[wo]:[0-9a-f-]{36}\}\}|⟦PH\d+:[^⟧]*⟧/g;
|
||||||
|
|
||||||
|
// Baut aus den Objekten der Sprache eine Liste { lemma, lemmaLc, objectId }, längste zuerst.
|
||||||
|
function buildLemmas(objects, lang) {
|
||||||
|
const out = [];
|
||||||
|
const seen = new Set();
|
||||||
|
for (const obj of objects || []) {
|
||||||
|
for (const w of obj.words || []) {
|
||||||
|
const title = (w[`titel_${lang}`] || '').trim();
|
||||||
|
if (!title) continue;
|
||||||
|
const key = title.toLowerCase();
|
||||||
|
if (seen.has(key)) continue;
|
||||||
|
seen.add(key);
|
||||||
|
out.push({ lemma: title, lemmaLc: key, objectId: obj.id });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
out.sort((a, b) => b.lemma.length - a.lemma.length);
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tagged eine zusammenhängende Klartext-Passage (ohne bestehende Tokens).
|
||||||
|
function tagPlainSegment(text, lemmas, suffixes) {
|
||||||
|
if (!text) return text;
|
||||||
|
// Ein kombinierter Regex über alle Lemmata (längste zuerst). Pro Lemma optional eine
|
||||||
|
// Flexions-Endung, sofern lang genug. Wortgrenzen via Unicode-Lookarounds (statt \b, das
|
||||||
|
// bei å/ä/ö/ü unzuverlässig ist).
|
||||||
|
const alts = lemmas.map(({ lemma }) => {
|
||||||
|
const esc = escapeRegex(lemma);
|
||||||
|
if (lemma.length >= MIN_LEN_FOR_SUFFIX && suffixes.length) {
|
||||||
|
return `${esc}(?:${suffixes.map(escapeRegex).join('|')})?`;
|
||||||
|
}
|
||||||
|
return esc;
|
||||||
|
});
|
||||||
|
if (!alts.length) return text;
|
||||||
|
const re = new RegExp(`(?<![\\p{L}\\p{N}])(${alts.join('|')})(?![\\p{L}\\p{N}])`, 'giu');
|
||||||
|
|
||||||
|
return text.replace(re, (surface) => {
|
||||||
|
const sLc = surface.toLowerCase();
|
||||||
|
// Passendes Objekt bestimmen: längstes Lemma, das Präfix der Oberfläche ist und dessen
|
||||||
|
// Rest eine erlaubte (oder leere) Endung ist.
|
||||||
|
for (const { lemma, lemmaLc, objectId } of lemmas) {
|
||||||
|
if (!sLc.startsWith(lemmaLc)) continue;
|
||||||
|
const rest = sLc.slice(lemmaLc.length);
|
||||||
|
const restOk = rest === '' ||
|
||||||
|
(lemma.length >= MIN_LEN_FOR_SUFFIX && suffixes.includes(rest));
|
||||||
|
if (restOk) return `{{${surface}.o:${objectId}}}`;
|
||||||
|
}
|
||||||
|
return surface; // kein sauberer Treffer → unverändert lassen
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
// Hauptfunktion: tagged Objekt-Wörter in `sentence` für Sprache `lang`.
|
||||||
|
// `objects`: [{ id, words: [{titel_de,titel_en,titel_sv}] }]
|
||||||
|
function tagObjectWords(sentence, lang, objects) {
|
||||||
|
if (!sentence) return sentence;
|
||||||
|
const lemmas = buildLemmas(objects, lang);
|
||||||
|
if (!lemmas.length) return sentence;
|
||||||
|
const suffixes = SUFFIXES[lang] || [];
|
||||||
|
|
||||||
|
// Satz in [Klartext, Token, Klartext, …] zerlegen; nur Klartext-Teile taggen.
|
||||||
|
let out = '';
|
||||||
|
let last = 0;
|
||||||
|
EXISTING_TOKEN_RE.lastIndex = 0;
|
||||||
|
let m;
|
||||||
|
while ((m = EXISTING_TOKEN_RE.exec(sentence)) !== null) {
|
||||||
|
out += tagPlainSegment(sentence.slice(last, m.index), lemmas, suffixes);
|
||||||
|
out += m[0]; // bestehenden Token unverändert übernehmen
|
||||||
|
last = m.index + m[0].length;
|
||||||
|
}
|
||||||
|
out += tagPlainSegment(sentence.slice(last), lemmas, suffixes);
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wickelt das erste Vorkommen von `surface` (exakte Zeichenkette, an Wortgrenzen, NICHT
|
||||||
|
// innerhalb eines bestehenden Tokens) in einen Objekt-Token. Für den LLM-Fallback, der die
|
||||||
|
// gebeugte Oberflächenform liefert, die der deterministische Tagger nicht erkannt hat.
|
||||||
|
function wrapSurface(sentence, surface, objectId) {
|
||||||
|
const surf = (surface || '').trim();
|
||||||
|
if (!sentence || !surf) return sentence;
|
||||||
|
let out = '';
|
||||||
|
let done = false;
|
||||||
|
EXISTING_TOKEN_RE.lastIndex = 0;
|
||||||
|
const segments = [];
|
||||||
|
let m, cursor = 0;
|
||||||
|
// Klartext-Segmente (außerhalb bestehender Tokens) sammeln
|
||||||
|
while ((m = EXISTING_TOKEN_RE.exec(sentence)) !== null) {
|
||||||
|
segments.push({ text: sentence.slice(cursor, m.index), start: cursor, token: false });
|
||||||
|
segments.push({ text: m[0], start: m.index, token: true });
|
||||||
|
cursor = m.index + m[0].length;
|
||||||
|
}
|
||||||
|
segments.push({ text: sentence.slice(cursor), start: cursor, token: false });
|
||||||
|
|
||||||
|
for (const seg of segments) {
|
||||||
|
if (done || seg.token) { out += seg.text; continue; }
|
||||||
|
// Erstes Vorkommen an Wortgrenzen im Klartext-Segment ersetzen
|
||||||
|
const re = new RegExp(`(?<![\\p{L}\\p{N}])(${escapeRegex(surf)})(?![\\p{L}\\p{N}])`, 'u');
|
||||||
|
const mm = seg.text.match(re);
|
||||||
|
if (mm) {
|
||||||
|
out += seg.text.slice(0, mm.index) + `{{${mm[1]}.o:${objectId}}}` + seg.text.slice(mm.index + mm[1].length);
|
||||||
|
done = true;
|
||||||
|
} else {
|
||||||
|
out += seg.text;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Liefert die Menge der Objekt-IDs, die in einem Satz als Objekt-Token vorkommen.
|
||||||
|
function objectIdsInSentence(sentence) {
|
||||||
|
const ids = new Set();
|
||||||
|
for (const mm of String(sentence || '').matchAll(PLACEHOLDER_RE)) {
|
||||||
|
if (mm[2] === 'o') ids.add(mm[3]);
|
||||||
|
}
|
||||||
|
return ids;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Alle OBJEKT-Tokens eines Satzes als { full, label, oid }.
|
||||||
|
const OBJ_TOKEN_RE = /\{\{([^.{}]+)\.o:([0-9a-f-]{36})\}\}/g;
|
||||||
|
function objectTokensInSentence(sentence) {
|
||||||
|
const out = [];
|
||||||
|
for (const m of String(sentence || '').matchAll(OBJ_TOKEN_RE)) out.push({ full: m[0], label: m[1], oid: m[2] });
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ist `label` eine SICHER gute Form des Objekts `oid` in `lang`? (Exakt oder Lemma+reguläre
|
||||||
|
// Endung.) Solche Tokens müssen für die Cleanup-Prüfung nicht ans LLM – sie sind eindeutig ok.
|
||||||
|
function isSimpleObjectForm(label, lang, objects, oid) {
|
||||||
|
const o = (objects || []).find(x => x.id === oid);
|
||||||
|
if (!o) return false;
|
||||||
|
const L = (label || '').toLowerCase();
|
||||||
|
const sfx = SUFFIXES[lang] || [];
|
||||||
|
for (const w of o.words || []) {
|
||||||
|
const lemma = (w[`titel_${lang}`] || '').trim().toLowerCase();
|
||||||
|
if (!lemma) continue;
|
||||||
|
if (L === lemma) return true;
|
||||||
|
if (sfx.some(s => L === lemma + s)) return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Entfernt ein bestimmtes Objekt-Token (alle Vorkommen) → nur das Label bleibt stehen.
|
||||||
|
function untagToken(sentence, full, label) {
|
||||||
|
return String(sentence || '').split(full).join(label);
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = {
|
||||||
|
tagObjectWords, wrapSurface, buildLemmas, objectIdsInSentence,
|
||||||
|
objectTokensInSentence, isSimpleObjectForm, untagToken,
|
||||||
|
};
|
||||||
42
src/lib/pairCategories.js
Normal file
42
src/lib/pairCategories.js
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
const { query } = require('../db');
|
||||||
|
|
||||||
|
// Leitet die Kategorien eines (oder mehrerer) Pairs aus den verknüpften Wörtern ab und
|
||||||
|
// materialisiert sie in pair_categories. Quellen:
|
||||||
|
// - Statements (positiv/negativ) → statement_*_words → word_categories
|
||||||
|
// - Objekte → object_words → word_categories
|
||||||
|
// (Questions haben keine Wort-M2M und entfallen.)
|
||||||
|
// Re-Run-sicher: löscht vorhandene Zuordnungen der betroffenen Pairs und schreibt neu,
|
||||||
|
// damit eine erneute Veröffentlichung nach Inhaltsänderungen die Kategorien aktualisiert.
|
||||||
|
async function derivePairCategories(pairIds) {
|
||||||
|
const ids = (Array.isArray(pairIds) ? pairIds : [pairIds]).filter(Boolean);
|
||||||
|
if (!ids.length) return 0;
|
||||||
|
|
||||||
|
await query(`DELETE FROM pair_categories WHERE pair_id = ANY($1)`, [ids]);
|
||||||
|
|
||||||
|
const r = await query(
|
||||||
|
`INSERT INTO pair_categories (pair_id, category_id)
|
||||||
|
SELECT DISTINCT pid, category_id FROM (
|
||||||
|
SELECT p.id AS pid, wc.category_id
|
||||||
|
FROM pairs p
|
||||||
|
JOIN (
|
||||||
|
SELECT statement_id, word_id FROM statement_positive_words
|
||||||
|
UNION
|
||||||
|
SELECT statement_id, word_id FROM statement_negative_words
|
||||||
|
) sw ON sw.statement_id IN (p.positive_statement_id, p.negative_statement_id)
|
||||||
|
JOIN word_categories wc ON wc.word_id = sw.word_id
|
||||||
|
WHERE p.id = ANY($1)
|
||||||
|
UNION
|
||||||
|
SELECT op.pair_id AS pid, wc.category_id
|
||||||
|
FROM object_pairs op
|
||||||
|
JOIN object_words ow ON ow.object_id = op.object_id
|
||||||
|
JOIN word_categories wc ON wc.word_id = ow.word_id
|
||||||
|
WHERE op.pair_id = ANY($1)
|
||||||
|
) src
|
||||||
|
WHERE category_id IS NOT NULL
|
||||||
|
ON CONFLICT (pair_id, category_id) DO NOTHING`,
|
||||||
|
[ids]
|
||||||
|
);
|
||||||
|
return r.rowCount;
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { derivePairCategories };
|
||||||
@@ -1,10 +1,14 @@
|
|||||||
// Automatische Content-Pipeline pro Bild: Pairs generieren → übersetzen → Audio → ready.
|
// Automatische Content-Pipeline pro Bild: Pairs generieren → übersetzen → KI-Review → Audio → ready.
|
||||||
// In-Process-Queue mit einem Worker (rate-limit-freundlich). Jeder Schritt ist idempotent,
|
// In-Process-Queue mit einem Worker (rate-limit-freundlich). Jeder Schritt ist idempotent,
|
||||||
// d.h. ein Resume nach Crash/Redeploy überspringt bereits Erledigtes.
|
// d.h. ein Resume nach Crash/Redeploy überspringt bereits Erledigtes.
|
||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
const { LANGS, fillMissingRow } = require('./translate');
|
const { LANGS, fillMissingRow, callClaude } = require('./translate');
|
||||||
|
const { PLACEHOLDER_RE } = require('./placeholders');
|
||||||
|
const { tagObjectWords, wrapSurface, objectIdsInSentence,
|
||||||
|
objectTokensInSentence, isSimpleObjectForm, untagToken } = require('./objectTagging');
|
||||||
const { translateWordGroup } = require('./pairContent');
|
const { translateWordGroup } = require('./pairContent');
|
||||||
const { generatePairsForObject, persistPair } = require('./generatePairs');
|
const { generatePairsForObject, persistPair } = require('./generatePairs');
|
||||||
|
const { reviewPicturePairs } = require('./reviewPairs');
|
||||||
const { generateAndStore, describeError } = require('../routes/audios');
|
const { generateAndStore, describeError } = require('../routes/audios');
|
||||||
|
|
||||||
const queue = [];
|
const queue = [];
|
||||||
@@ -85,6 +89,254 @@ async function loadPairs(pictureId) {
|
|||||||
ORDER BY p.id`, [pictureId])).rows;
|
ORDER BY p.id`, [pictureId])).rows;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Satzfelder EINES Pairs (table/id/col/lang) – questions + statements.
|
||||||
|
function pairSentenceFields(p) {
|
||||||
|
const fields = [];
|
||||||
|
const add = (table, id, cols) => { if (id) for (const col of cols) fields.push({ table, id, col, lang: col.slice(-2) }); };
|
||||||
|
add('questions', p.question_id, ['sentence_de', 'sentence_en', 'sentence_sv']);
|
||||||
|
add('statements', p.positive_statement_id, ['positive_sentence_de', 'positive_sentence_en', 'positive_sentence_sv']);
|
||||||
|
add('statements', p.negative_statement_id, ['negative_sentence_de', 'negative_sentence_en', 'negative_sentence_sv']);
|
||||||
|
return fields;
|
||||||
|
}
|
||||||
|
|
||||||
|
// LLM-Fallback: exakte (gebeugte) Oberflächenform eines Objektworts in einem Satz finden.
|
||||||
|
// WICHTIG: nur zurückgeben, wenn das Wort das Objekt SELBST bezeichnet (Wort, Beugung, Mehrzahl,
|
||||||
|
// Kopf-Kompositum wie „Landschildkröte" für „Schildkröte", oder Synonym wie „Stiefel" für
|
||||||
|
// „Schuh"). NICHT, wenn das Objektwort nur BESTIMMUNGSWORT eines anderen Dings ist
|
||||||
|
// (z.B. „Erdbeerfeld"/„Erdbeerpflanze" ≠ Erdbeere).
|
||||||
|
async function locateSurfaceLLM(sentence, label) {
|
||||||
|
try {
|
||||||
|
const data = await callClaude({
|
||||||
|
system: 'Du findest die Oberflächenform eines Objektworts in einem Satz. Antworte AUSSCHLIESSLICH mit gültigem JSON.',
|
||||||
|
user: `Satz: "${sentence}"\nObjekt (Grundform/Bedeutung): "${label}"\n\n` +
|
||||||
|
`Gib die EXAKTE Zeichenkette zurück, mit der dieses Objekt im Satz benannt ist — als Wort, ` +
|
||||||
|
`Beugung/Mehrzahl/bestimmte Form, Kopf-Kompositum (Objektwort ist das GRUNDWORT, z.B. ` +
|
||||||
|
`"Landschildkröte" für "Schildkröte") oder Synonym (z.B. "Stiefel"/"Lederstiefel" für "Schuh").\n` +
|
||||||
|
`Gib null zurück, wenn das Objekt NICHT vorkommt ODER nur als BESTIMMUNGSWORT eines anderen ` +
|
||||||
|
`Dings (z.B. "Erdbeerfeld"/"Erdbeerpflanze" bezeichnet Feld/Pflanze, NICHT die Erdbeere).\n` +
|
||||||
|
`Format: {"surface":"…"|null}`,
|
||||||
|
maxTokens: 80,
|
||||||
|
});
|
||||||
|
const s = data && typeof data.surface === 'string' ? data.surface.trim() : null;
|
||||||
|
if (!s) return null;
|
||||||
|
// Nur akzeptieren, wenn die Form wirklich (an Wortgrenzen) im Satz steht.
|
||||||
|
return new RegExp(`(?<![\\p{L}\\p{N}])${s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}(?![\\p{L}\\p{N}])`, 'u').test(sentence) ? s : null;
|
||||||
|
} catch { return null; }
|
||||||
|
}
|
||||||
|
|
||||||
|
// LLM-Prüfung für den Cleanup: bezeichnet das markierte `label` wirklich das Objekt `objWord`?
|
||||||
|
// true ⇒ behalten (Wort/Beugung/Kopf-Kompositum/Synonym), false ⇒ Token entfernen
|
||||||
|
// (Bestimmungswort eines anderen Dings). Bei Fehler/Unklarheit: behalten (konservativ).
|
||||||
|
async function denotesObjectLLM(sentence, label, objWord) {
|
||||||
|
try {
|
||||||
|
const data = await callClaude({
|
||||||
|
system: 'Du beurteilst, ob ein markiertes Wort wirklich das genannte Objekt bezeichnet. Antworte AUSSCHLIESSLICH mit gültigem JSON.',
|
||||||
|
user: `Objekt: "${objWord}"\nSatz: "${sentence}"\nMarkiertes Wort: "${label}"\n\n` +
|
||||||
|
`Bezeichnet "${label}" das Objekt "${objWord}" SELBST? JA bei: dem Wort, einer Beugung/` +
|
||||||
|
`Mehrzahl/bestimmten Form, einem Kompositum mit "${objWord}" als GRUNDWORT (z.B. ` +
|
||||||
|
`"Landschildkröte" für "Schildkröte"), oder einem Synonym (z.B. "Stiefel"/"Lederstiefel" ` +
|
||||||
|
`für "Schuh"). NEIN, wenn "${objWord}" nur BESTIMMUNGSWORT eines ANDEREN Dings ist (z.B. ` +
|
||||||
|
`"Erdbeerfeld"/"Erdbeerpflanze" ist ein Feld/eine Pflanze, NICHT die Erdbeere).\n` +
|
||||||
|
`Format: {"denotes": true|false}`,
|
||||||
|
maxTokens: 40,
|
||||||
|
});
|
||||||
|
return data && typeof data.denotes === 'boolean' ? data.denotes : true;
|
||||||
|
} catch { return true; }
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tokenisiert OBJEKT-Wörter in den Sätzen EINES Pairs nach.
|
||||||
|
// Deterministisch (tagObjectWords); optional Hybrid-LLM-Fallback für gebeugte Formen, die
|
||||||
|
// deterministisch nicht erkannt wurden – aber NUR für Objekte, die in einer anderen Sprache
|
||||||
|
// desselben Pairs bereits als Token bestätigt sind (minimale Calls, keine Halluzinationen).
|
||||||
|
// Idempotent. `dryRun` ⇒ kein UPDATE. Gibt geänderte Felder { table,id,col,lang,before,after }.
|
||||||
|
async function retagPair(p, objects, { dryRun = false, useLLM = false } = {}) {
|
||||||
|
const fields = pairSentenceFields(p);
|
||||||
|
if (!fields.length) return [];
|
||||||
|
// Aktuelle Texte laden (gruppiert pro Tabelle/Zeile)
|
||||||
|
const byRow = new Map(); // `${table}|${id}` → { table, id, cols:Set }
|
||||||
|
for (const f of fields) {
|
||||||
|
const k = `${f.table}|${f.id}`;
|
||||||
|
if (!byRow.has(k)) byRow.set(k, { table: f.table, id: f.id, cols: new Set() });
|
||||||
|
byRow.get(k).cols.add(f.col);
|
||||||
|
}
|
||||||
|
const text = {}; // `${table}|${id}|${col}` → string
|
||||||
|
for (const { table, id, cols } of byRow.values()) {
|
||||||
|
const colList = [...cols];
|
||||||
|
const row = (await query(`SELECT ${colList.join(', ')} FROM ${table} WHERE id = $1`, [id])).rows[0] || {};
|
||||||
|
for (const col of colList) text[`${table}|${id}|${col}`] = row[col] || '';
|
||||||
|
}
|
||||||
|
const key = f => `${f.table}|${f.id}|${f.col}`;
|
||||||
|
|
||||||
|
// 1) Deterministischer Sweep (in-memory)
|
||||||
|
const tagged = {};
|
||||||
|
for (const f of fields) {
|
||||||
|
const before = text[key(f)];
|
||||||
|
tagged[key(f)] = before && before.trim() ? tagObjectWords(before, f.lang, objects) : before;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2) Hybrid-LLM-Fallback: Objekt-IDs, die in ≥1 Sprache getokt sind, in fehlenden Sprachen suchen.
|
||||||
|
if (useLLM) {
|
||||||
|
const presentByObj = new Map(); // objectId → Set<lang>
|
||||||
|
for (const f of fields) for (const oid of objectIdsInSentence(tagged[key(f)])) {
|
||||||
|
if (!presentByObj.has(oid)) presentByObj.set(oid, new Set());
|
||||||
|
presentByObj.get(oid).add(f.lang);
|
||||||
|
}
|
||||||
|
const labelOf = (oid, lang) => {
|
||||||
|
const o = objects.find(x => x.id === oid);
|
||||||
|
for (const w of o?.words || []) if ((w[`titel_${lang}`] || '').trim()) return w[`titel_${lang}`].trim();
|
||||||
|
return null;
|
||||||
|
};
|
||||||
|
for (const f of fields) {
|
||||||
|
const cur = tagged[key(f)];
|
||||||
|
if (!cur || !cur.trim()) continue;
|
||||||
|
for (const [oid, langs] of presentByObj) {
|
||||||
|
if (langs.has(f.lang)) continue; // schon getokt in dieser Sprache
|
||||||
|
if (objectIdsInSentence(cur).has(oid)) continue; // (Sicherheit)
|
||||||
|
const label = labelOf(oid, f.lang);
|
||||||
|
if (!label) continue;
|
||||||
|
const surface = await locateSurfaceLLM(cur, label);
|
||||||
|
if (surface) tagged[key(f)] = wrapSurface(tagged[key(f)], surface, oid);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3) Diff + (optional) schreiben
|
||||||
|
const changes = [];
|
||||||
|
for (const { table, id, cols } of byRow.values()) {
|
||||||
|
const set = {};
|
||||||
|
for (const col of cols) {
|
||||||
|
const k = `${table}|${id}|${col}`;
|
||||||
|
if (tagged[k] !== text[k]) {
|
||||||
|
set[col] = tagged[k];
|
||||||
|
changes.push({ table, id, col, lang: col.slice(-2), before: text[k], after: tagged[k] });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const cells = Object.keys(set);
|
||||||
|
if (!dryRun && cells.length) {
|
||||||
|
await query(
|
||||||
|
`UPDATE ${table} SET ${cells.map((c, i) => `${c} = $${i + 1}`).join(', ')} WHERE id = $${cells.length + 1}`,
|
||||||
|
[...cells.map(c => set[c]), id]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return changes;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cleanup EINES Pairs: entfernt OBJEKT-Tokens, deren Label das Objekt nicht wirklich bezeichnet
|
||||||
|
// (Bestimmungswort eines anderen Dings, z.B. „Erdbeerfeld" als Erdbeere). Eindeutig gute Formen
|
||||||
|
// (exakt / Lemma+Endung) werden ohne LLM behalten; nur die unklaren Tokens gehen ans LLM.
|
||||||
|
async function cleanPair(p, objects, { dryRun = false } = {}) {
|
||||||
|
const fields = pairSentenceFields(p);
|
||||||
|
if (!fields.length) return [];
|
||||||
|
const byRow = new Map();
|
||||||
|
for (const f of fields) {
|
||||||
|
const k = `${f.table}|${f.id}`;
|
||||||
|
if (!byRow.has(k)) byRow.set(k, { table: f.table, id: f.id, cols: new Set() });
|
||||||
|
byRow.get(k).cols.add(f.col);
|
||||||
|
}
|
||||||
|
const text = {};
|
||||||
|
for (const { table, id, cols } of byRow.values()) {
|
||||||
|
const colList = [...cols];
|
||||||
|
const row = (await query(`SELECT ${colList.join(', ')} FROM ${table} WHERE id = $1`, [id])).rows[0] || {};
|
||||||
|
for (const col of colList) text[`${table}|${id}|${col}`] = row[col] || '';
|
||||||
|
}
|
||||||
|
const key = f => `${f.table}|${f.id}|${f.col}`;
|
||||||
|
const labelOf = (oid, lang) => {
|
||||||
|
const o = objects.find(x => x.id === oid);
|
||||||
|
for (const w of o?.words || []) if ((w[`titel_${lang}`] || '').trim()) return w[`titel_${lang}`].trim();
|
||||||
|
return null;
|
||||||
|
};
|
||||||
|
|
||||||
|
const cleaned = {};
|
||||||
|
for (const f of fields) {
|
||||||
|
let cur = text[key(f)];
|
||||||
|
cleaned[key(f)] = cur;
|
||||||
|
if (!cur || !cur.trim()) continue;
|
||||||
|
for (const tok of objectTokensInSentence(cur)) {
|
||||||
|
if (isSimpleObjectForm(tok.label, f.lang, objects, tok.oid)) continue; // eindeutig ok
|
||||||
|
const objWord = labelOf(tok.oid, f.lang);
|
||||||
|
if (!objWord) continue; // unbekannt → unangetastet lassen
|
||||||
|
const ok = await denotesObjectLLM(cur, tok.label, objWord);
|
||||||
|
if (!ok) { cur = untagToken(cur, tok.full, tok.label); cleaned[key(f)] = cur; }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const changes = [];
|
||||||
|
for (const { table, id, cols } of byRow.values()) {
|
||||||
|
const set = {};
|
||||||
|
for (const col of cols) {
|
||||||
|
const k = `${table}|${id}|${col}`;
|
||||||
|
if (cleaned[k] !== text[k]) {
|
||||||
|
set[col] = cleaned[k];
|
||||||
|
changes.push({ table, id, col, lang: col.slice(-2), before: text[k], after: cleaned[k] });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const cells = Object.keys(set);
|
||||||
|
if (!dryRun && cells.length) {
|
||||||
|
await query(
|
||||||
|
`UPDATE ${table} SET ${cells.map((c, i) => `${c} = $${i + 1}`).join(', ')} WHERE id = $${cells.length + 1}`,
|
||||||
|
[...cells.map(c => set[c]), id]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return changes;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Backfill/Retag über ein Bild oder alle Bilder. Gibt eine Zusammenfassung zurück.
|
||||||
|
// `cleanup:true` ⇒ statt zu taggen werden falsch getokte Objekt-Wörter (Bestimmungswort eines
|
||||||
|
// anderen Dings) per LLM-Prüfung entfernt.
|
||||||
|
async function retagObjects({ pictureId = null, dryRun = false, useLLM = false, cleanup = false } = {}) {
|
||||||
|
const picIds = pictureId
|
||||||
|
? [pictureId]
|
||||||
|
: (await query(`SELECT id FROM pictures ORDER BY created_at`)).rows.map(r => r.id);
|
||||||
|
const report = { pictures: 0, pairs: 0, changedPairs: 0, changedFields: 0, dryRun, useLLM, cleanup, samples: [] };
|
||||||
|
for (const pid of picIds) {
|
||||||
|
const objects = await loadObjects(pid);
|
||||||
|
if (!objects.length) continue;
|
||||||
|
const pairs = await loadPairs(pid);
|
||||||
|
report.pictures++;
|
||||||
|
for (const p of pairs) {
|
||||||
|
report.pairs++;
|
||||||
|
let changes = [];
|
||||||
|
try {
|
||||||
|
changes = cleanup
|
||||||
|
? await cleanPair(p, objects, { dryRun })
|
||||||
|
: await retagPair(p, objects, { dryRun, useLLM });
|
||||||
|
} catch (err) { console.error(`Retag-Fehler bei Pair ${p.id}:`, err.message); continue; }
|
||||||
|
if (changes.length) {
|
||||||
|
report.changedPairs++;
|
||||||
|
report.changedFields += changes.length;
|
||||||
|
if (report.samples.length < 25)
|
||||||
|
report.samples.push({ pair: p.id, changes: changes.map(c => ({ col: c.col, after: c.after })) });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return report;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Word-IDs aller {{label.w:uuid}}-Placeholder in den Sätzen der Pairs.
|
||||||
|
// Diese Wörter entstehen bei der Generierung (Nomen im Satz) und hängen nicht an
|
||||||
|
// statement_words/object_words — für Übersetzung + Audio müssen sie mitgenommen werden.
|
||||||
|
async function collectPlaceholderWordIds(pairs) {
|
||||||
|
const ids = new Set();
|
||||||
|
const scan = text => {
|
||||||
|
for (const m of String(text || '').matchAll(PLACEHOLDER_RE)) if (m[2] === 'w') ids.add(m[3]);
|
||||||
|
};
|
||||||
|
const questionIds = [...new Set(pairs.map(p => p.question_id).filter(Boolean))];
|
||||||
|
const stmtIds = [...new Set(pairs.flatMap(p => [p.positive_statement_id, p.negative_statement_id]).filter(Boolean))];
|
||||||
|
if (questionIds.length) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT sentence_de, sentence_en, sentence_sv FROM questions WHERE id = ANY($1)`, [questionIds]);
|
||||||
|
r.rows.forEach(row => Object.values(row).forEach(scan));
|
||||||
|
}
|
||||||
|
if (stmtIds.length) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT positive_sentence_de, positive_sentence_en, positive_sentence_sv,
|
||||||
|
negative_sentence_de, negative_sentence_en, negative_sentence_sv
|
||||||
|
FROM statements WHERE id = ANY($1)`, [stmtIds]);
|
||||||
|
r.rows.forEach(row => Object.values(row).forEach(scan));
|
||||||
|
}
|
||||||
|
return ids;
|
||||||
|
}
|
||||||
|
|
||||||
async function runPicture(pictureId) {
|
async function runPicture(pictureId) {
|
||||||
// Claim — nur Bilder, die in der Pipeline sind
|
// Claim — nur Bilder, die in der Pipeline sind
|
||||||
const claim = await query(
|
const claim = await query(
|
||||||
@@ -152,6 +404,39 @@ async function runPicture(pictureId) {
|
|||||||
progress.translatedPairs++;
|
progress.translatedPairs++;
|
||||||
await setStep(pictureId, 'translate', progress);
|
await setStep(pictureId, 'translate', progress);
|
||||||
}
|
}
|
||||||
|
// Objekt-Wörter, die das Modell nicht als Nomen markiert hat, deterministisch nachtokenisieren
|
||||||
|
// (Sicherheitsnetz; bestehende Tokens bleiben unangetastet).
|
||||||
|
for (const p of pairs) {
|
||||||
|
try { await retagPair(p, objects); }
|
||||||
|
catch (err) { console.error(`Objekt-Tagging-Fehler bei Pair ${p.id}:`, err.message); }
|
||||||
|
}
|
||||||
|
|
||||||
|
// Nomen-Wörter aus Satz-Placeholdern ({{label.w:id}}) mitübersetzen
|
||||||
|
try {
|
||||||
|
for (const wid of await collectPlaceholderWordIds(pairs)) {
|
||||||
|
try { await fillMissingRow('words', wid, ['titel']); }
|
||||||
|
catch (err) { progress.translateFailures++; console.error(`Translate-Fehler bei Wort ${wid}:`, err.message); }
|
||||||
|
}
|
||||||
|
} catch (err) { console.error(`Placeholder-Wörter sammeln fehlgeschlagen:`, err.message); }
|
||||||
|
|
||||||
|
// ── Step 2.5: KI-Review — alle Pairs + Bild an Sonnet zum Korrekturlesen ────
|
||||||
|
// (Rechtschreibung, Übersetzungs-Konsistenz, Plausibilität zum Bild). Korrekturen
|
||||||
|
// landen vor der Audio-Erzeugung in der DB; Fehler sind wie beim Übersetzen nicht
|
||||||
|
// fatal — Audio läuft trotzdem, der Lauf wird nicht abgebrochen.
|
||||||
|
progress.reviewedPairs = 0;
|
||||||
|
progress.correctionsApplied = 0;
|
||||||
|
progress.reviewFailures = 0;
|
||||||
|
await setStep(pictureId, 'review', progress);
|
||||||
|
try {
|
||||||
|
await reviewPicturePairs({
|
||||||
|
pictureId, pictureUrl: picture.picture_link, pairs, progress,
|
||||||
|
onProgress: () => setStep(pictureId, 'review', progress),
|
||||||
|
});
|
||||||
|
} catch (err) {
|
||||||
|
progress.reviewFailures++;
|
||||||
|
console.error(`Review-Fehler bei Bild ${pictureId}:`, err.message);
|
||||||
|
}
|
||||||
|
await setStep(pictureId, 'review', progress);
|
||||||
|
|
||||||
// ── Step 3: Audio für alle Sätze + Wörter des Bildes in allen Sprachen ──────
|
// ── Step 3: Audio für alle Sätze + Wörter des Bildes in allen Sprachen ──────
|
||||||
try {
|
try {
|
||||||
@@ -277,6 +562,8 @@ async function collectAudioUnits(pictureId, pairs) {
|
|||||||
JOIN object_pictures op ON op.object_id = ow.object_id
|
JOIN object_pictures op ON op.object_id = ow.object_id
|
||||||
WHERE op.picture_id = $1`, [pictureId]);
|
WHERE op.picture_id = $1`, [pictureId]);
|
||||||
ow.rows.forEach(x => wordIds.add(x.word_id));
|
ow.rows.forEach(x => wordIds.add(x.word_id));
|
||||||
|
// + Nomen-Wörter aus Satz-Placeholdern ({{label.w:id}})
|
||||||
|
(await collectPlaceholderWordIds(pairs)).forEach(id => wordIds.add(id));
|
||||||
|
|
||||||
const sources = [];
|
const sources = [];
|
||||||
if (questionIds.length) {
|
if (questionIds.length) {
|
||||||
@@ -343,4 +630,4 @@ async function generateWithBackoff(u) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
module.exports = { enqueue, resumePending, loadPairs, collectAudioUnits, generateWithBackoff, translatePair };
|
module.exports = { enqueue, resumePending, loadObjects, loadPairs, collectAudioUnits, generateWithBackoff, translatePair, retagPair, retagObjects };
|
||||||
|
|||||||
@@ -5,14 +5,25 @@ const PLACEHOLDER_RE = /\{\{([^.{}]+)\.(w|o):([0-9a-f-]{36})\}\}/g;
|
|||||||
// Legacy-Form ohne Label: {{uuid}} — sollte migriert sein, defensiv trotzdem entfernen.
|
// Legacy-Form ohne Label: {{uuid}} — sollte migriert sein, defensiv trotzdem entfernen.
|
||||||
const LEGACY_PLACEHOLDER_RE = /\{\{\s*[0-9a-f-]{36}\s*\}\}/g;
|
const LEGACY_PLACEHOLDER_RE = /\{\{\s*[0-9a-f-]{36}\s*\}\}/g;
|
||||||
|
|
||||||
|
// Schutz-Token während Übersetzung/Review: ⟦PHn:label⟧. Darf nie in der DB landen —
|
||||||
|
// falls doch (Claude-Halluzination), wird er überall defensiv zum Label aufgelöst.
|
||||||
|
const TOKEN_RE = /⟦(PH\d+):([^⟧]*)⟧/g;
|
||||||
|
|
||||||
|
// Entfernt geleakte ⟦PHn:label⟧-Tokens aus einem Text → nur das Label bleibt.
|
||||||
|
function stripLeakedTokens(text) {
|
||||||
|
if (!text) return text;
|
||||||
|
return String(text).replace(TOKEN_RE, (_, _key, label) => label.trim());
|
||||||
|
}
|
||||||
|
|
||||||
// Macht aus "Ist das ein {{Apfel.w:1234-…}}?" → "Ist das ein Apfel?" (für TTS/Anzeige).
|
// Macht aus "Ist das ein {{Apfel.w:1234-…}}?" → "Ist das ein Apfel?" (für TTS/Anzeige).
|
||||||
function resolvePlaceholdersToLabels(text) {
|
function resolvePlaceholdersToLabels(text) {
|
||||||
if (!text) return '';
|
if (!text) return '';
|
||||||
return String(text)
|
return String(text)
|
||||||
.replace(PLACEHOLDER_RE, (_, label) => label)
|
.replace(PLACEHOLDER_RE, (_, label) => label)
|
||||||
.replace(LEGACY_PLACEHOLDER_RE, '')
|
.replace(LEGACY_PLACEHOLDER_RE, '')
|
||||||
|
.replace(TOKEN_RE, (_, _key, label) => label.trim())
|
||||||
.replace(/\s{2,}/g, ' ')
|
.replace(/\s{2,}/g, ' ')
|
||||||
.trim();
|
.trim();
|
||||||
}
|
}
|
||||||
|
|
||||||
module.exports = { PLACEHOLDER_RE, resolvePlaceholdersToLabels };
|
module.exports = { PLACEHOLDER_RE, TOKEN_RE, stripLeakedTokens, resolvePlaceholdersToLabels };
|
||||||
|
|||||||
225
src/lib/reviewPairs.js
Normal file
225
src/lib/reviewPairs.js
Normal file
@@ -0,0 +1,225 @@
|
|||||||
|
// KI-Review der Pipeline: alle Pairs eines Bildes (alle Sprachen) + das Bild selbst
|
||||||
|
// gehen an Sonnet zum Korrekturlesen (Rechtschreibung, Übersetzungs-Konsistenz,
|
||||||
|
// Plausibilität zum Bild). Korrekturen werden vor der Audio-Erzeugung in die DB
|
||||||
|
// geschrieben; bereits vorhandene Audios der korrigierten Zellen werden gelöscht,
|
||||||
|
// damit Step 3 sie mit dem neuen Text neu erzeugt.
|
||||||
|
const { query } = require('../db');
|
||||||
|
const { callClaude, tokenize, LANGS } = require('./translate');
|
||||||
|
const { TOKEN_RE, stripLeakedTokens } = require('./placeholders');
|
||||||
|
const { deleteFile, keyFromUrl } = require('../s3');
|
||||||
|
|
||||||
|
const REVIEW_MODEL = process.env.REVIEW_MODEL || process.env.TRANSLATE_MODEL || 'claude-sonnet-4-5';
|
||||||
|
const BATCH_SIZE = 15; // Pairs pro Claude-Call (Bild wird je Batch mitgeschickt)
|
||||||
|
|
||||||
|
// Refs der Form "q:<uuid>:sentence_de" — kompakt im Prompt, eindeutig in der itemMap.
|
||||||
|
const TABLE_PREFIX = { questions: 'q', statements: 's', words: 'w' };
|
||||||
|
|
||||||
|
function makeItem(table, id, field, lang, text) {
|
||||||
|
// Geleakte ⟦PHn:…⟧-Reste im Quelltext zuerst auflösen — sonst sieht Claude sie als
|
||||||
|
// echte Tokens und die Token-Count-Validierung verhindert jede Korrektur der Zeile.
|
||||||
|
const { tokenized, tokens } = tokenize(stripLeakedTokens(text));
|
||||||
|
return {
|
||||||
|
ref: `${TABLE_PREFIX[table]}:${id}:${field}_${lang}`,
|
||||||
|
table, id, column: `${field}_${lang}`, field, lang,
|
||||||
|
tokenized, tokens,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Alle gefüllten Textzellen der Pairs + Objekt-Wörter des Bildes laden.
|
||||||
|
// Rückgabe: { pairBlocks, wordBlock, itemMap } — itemMap: ref → Item (Whitelist).
|
||||||
|
async function loadReviewItems(pictureId, pairs) {
|
||||||
|
const itemMap = new Map();
|
||||||
|
const add = (table, row, field, lang) => {
|
||||||
|
const text = (row[`${field}_${lang}`] || '').trim();
|
||||||
|
if (!text) return null;
|
||||||
|
const item = makeItem(table, row.id, field, lang, text);
|
||||||
|
if (!itemMap.has(item.ref)) itemMap.set(item.ref, item);
|
||||||
|
return itemMap.get(item.ref);
|
||||||
|
};
|
||||||
|
|
||||||
|
const questionIds = [...new Set(pairs.map(p => p.question_id).filter(Boolean))];
|
||||||
|
const stmtIds = [...new Set(pairs.flatMap(p => [p.positive_statement_id, p.negative_statement_id]).filter(Boolean))];
|
||||||
|
|
||||||
|
const questions = new Map();
|
||||||
|
if (questionIds.length) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT id, sentence_de, sentence_en, sentence_sv FROM questions WHERE id = ANY($1)`, [questionIds]);
|
||||||
|
r.rows.forEach(row => questions.set(row.id, row));
|
||||||
|
}
|
||||||
|
const statements = new Map();
|
||||||
|
if (stmtIds.length) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT id, positive_sentence_de, positive_sentence_en, positive_sentence_sv,
|
||||||
|
negative_sentence_de, negative_sentence_en, negative_sentence_sv
|
||||||
|
FROM statements WHERE id = ANY($1)`, [stmtIds]);
|
||||||
|
r.rows.forEach(row => statements.set(row.id, row));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wörter: über die Statement-Links der word-Pairs + object_words des Bildes
|
||||||
|
const stmtWords = new Map(); // statementId → [wordId]
|
||||||
|
const wordIds = new Set();
|
||||||
|
if (stmtIds.length) {
|
||||||
|
for (const link of ['statement_positive_words', 'statement_negative_words']) {
|
||||||
|
const r = await query(`SELECT statement_id, word_id FROM ${link} WHERE statement_id = ANY($1)`, [stmtIds]);
|
||||||
|
for (const x of r.rows) {
|
||||||
|
if (!stmtWords.has(x.statement_id)) stmtWords.set(x.statement_id, []);
|
||||||
|
stmtWords.get(x.statement_id).push(x.word_id);
|
||||||
|
wordIds.add(x.word_id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const objectWordIds = new Set();
|
||||||
|
const ow = await query(
|
||||||
|
`SELECT ow.word_id FROM object_words ow
|
||||||
|
JOIN object_pictures op ON op.object_id = ow.object_id
|
||||||
|
WHERE op.picture_id = $1`, [pictureId]);
|
||||||
|
ow.rows.forEach(x => { objectWordIds.add(x.word_id); wordIds.add(x.word_id); });
|
||||||
|
|
||||||
|
const words = new Map();
|
||||||
|
if (wordIds.size) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT id, titel_de, titel_en, titel_sv FROM words WHERE id = ANY($1) AND status <> 'blocked'`,
|
||||||
|
[[...wordIds]]);
|
||||||
|
r.rows.forEach(row => words.set(row.id, row));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Prompt-Blöcke pro Pair zusammensetzen
|
||||||
|
const lines = (table, row, field) =>
|
||||||
|
LANGS.map(l => add(table, row, field, l)).filter(Boolean)
|
||||||
|
.map(it => ` ${it.ref} [${it.lang}]: "${it.tokenized}"`);
|
||||||
|
|
||||||
|
const pairBlocks = [];
|
||||||
|
for (const p of pairs) {
|
||||||
|
const block = [`PAIR (answer_type: ${p.answer_type}):`];
|
||||||
|
const q = p.question_id && questions.get(p.question_id);
|
||||||
|
if (q) block.push(...lines('questions', q, 'sentence'));
|
||||||
|
for (const [stmtId, label] of [[p.positive_statement_id, 'positive_sentence'],
|
||||||
|
[p.negative_statement_id, 'negative_sentence']]) {
|
||||||
|
const s = stmtId && statements.get(stmtId);
|
||||||
|
if (!s) continue;
|
||||||
|
if (p.answer_type === 'word') {
|
||||||
|
for (const wid of stmtWords.get(stmtId) || []) {
|
||||||
|
const w = words.get(wid);
|
||||||
|
if (w) block.push(...lines('words', w, 'titel'));
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
block.push(...lines('statements', s, label));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (block.length > 1) pairBlocks.push(block.join('\n'));
|
||||||
|
}
|
||||||
|
|
||||||
|
const wordLines = [];
|
||||||
|
for (const wid of objectWordIds) {
|
||||||
|
const w = words.get(wid);
|
||||||
|
if (w) wordLines.push(...lines('words', w, 'titel'));
|
||||||
|
}
|
||||||
|
const wordBlock = wordLines.length ? `BILD-WÖRTER (Vokabeln zum Bild):\n${wordLines.join('\n')}` : null;
|
||||||
|
|
||||||
|
return { pairBlocks, wordBlock, itemMap };
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildReviewPrompt(pictureUrl, blocks) {
|
||||||
|
const system =
|
||||||
|
'Du bist Lektor für eine Kinder-Sprachlern-App (Deutsch, Englisch, Schwedisch). ' +
|
||||||
|
'Du prüfst Lerninhalte zu einem Bild auf (a) Rechtschreibung und Grammatik je Sprache, ' +
|
||||||
|
'(b) korrekte und konsistente Übersetzung zwischen Deutsch/Englisch/Schwedisch — die Sprachfassungen ' +
|
||||||
|
'einer Zeile müssen dieselbe Bedeutung haben, (c) Plausibilität zum Bild. ' +
|
||||||
|
'Korrigiere NUR echte Fehler, behalte Stil und Länge bei. ' +
|
||||||
|
'Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown, ohne Erklärungen.';
|
||||||
|
const text =
|
||||||
|
`Prüfe die folgenden Inhalte zum beigefügten Bild. Jede Zeile hat eine Referenz (ref), ` +
|
||||||
|
`eine Sprache und den Text.\n\n` +
|
||||||
|
`WICHTIG: Tokens der Form ⟦PHn:wort⟧ sind geschützte Platzhalter. Du darfst das Wort INNERHALB ` +
|
||||||
|
`des Tokens korrigieren, aber das Token-Format muss exakt erhalten bleiben (⟦PHn:wort⟧). ` +
|
||||||
|
`Kein Token darf gelöscht, verdoppelt oder erfunden werden.\n\n` +
|
||||||
|
blocks.join('\n\n') + '\n\n' +
|
||||||
|
`Antwort-Format — NUR Zeilen, die wirklich einen Fehler enthalten (sonst leeres Array):\n` +
|
||||||
|
`{"corrections":[{"ref":"<ref>","corrected":"<korrigierter Text>"}]}`;
|
||||||
|
return {
|
||||||
|
system,
|
||||||
|
user: [
|
||||||
|
{ type: 'image', source: { type: 'url', url: pictureUrl } },
|
||||||
|
{ type: 'text', text },
|
||||||
|
],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Token-Mengen vor/nach Korrektur müssen identisch sein; keine Fremd-Fragmente.
|
||||||
|
function validateCorrection(item, corrected) {
|
||||||
|
if (typeof corrected !== 'string' || !corrected.trim()) return { ok: false, reason: 'leer' };
|
||||||
|
const keys = [...corrected.matchAll(TOKEN_RE)].map(m => m[1]).sort();
|
||||||
|
const expected = item.tokens.map(t => t.key).sort();
|
||||||
|
if (keys.length !== expected.length || keys.some((k, i) => k !== expected[i]))
|
||||||
|
return { ok: false, reason: 'Platzhalter-Tokens verändert' };
|
||||||
|
const stripped = corrected.replace(TOKEN_RE, '');
|
||||||
|
if (/[⟦⟧]|\{\{|\}\}/.test(stripped)) return { ok: false, reason: 'Fragment im Text' };
|
||||||
|
|
||||||
|
// Detokenisieren: ⟦PHn:label⟧ → {{label.type:uuid}} (Label darf korrigiert sein)
|
||||||
|
const labels = {};
|
||||||
|
for (const m of corrected.matchAll(TOKEN_RE)) labels[m[1]] = m[2].trim();
|
||||||
|
let out = corrected;
|
||||||
|
for (const t of item.tokens) {
|
||||||
|
const label = labels[t.key] || t.sourceLabel;
|
||||||
|
out = out.replace(new RegExp(`⟦${t.key}:[^⟧]*⟧`, 'g'), `{{${label}.${t.type}:${t.uuid}}}`);
|
||||||
|
}
|
||||||
|
return { ok: true, detokenized: out.trim() };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Vorhandene Audios der korrigierten Zelle löschen (inkl. S3), damit Step 3 neu erzeugt.
|
||||||
|
async function invalidateAudio(table, id, field, lang) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT id, audio_link FROM audios
|
||||||
|
WHERE source_table=$1 AND source_id=$2 AND source_field=$3 AND language=$4`,
|
||||||
|
[table, id, field, lang]);
|
||||||
|
for (const row of r.rows) {
|
||||||
|
const k = keyFromUrl(row.audio_link);
|
||||||
|
if (k) await deleteFile(k).catch(() => {});
|
||||||
|
await query(`DELETE FROM audios WHERE id = $1`, [row.id]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function applyCorrection(item, newText) {
|
||||||
|
await query(`UPDATE ${item.table} SET ${item.column} = $1 WHERE id = $2`, [newText, item.id]);
|
||||||
|
await invalidateAudio(item.table, item.id, item.field, item.lang);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Haupteinstieg: reviewt alle Pairs eines Bildes in Batches; wirft nie — Fehler
|
||||||
|
// werden in progress.reviewFailures gezählt, die Pipeline läuft weiter.
|
||||||
|
async function reviewPicturePairs({ pictureId, pictureUrl, pairs, progress, onProgress }) {
|
||||||
|
if (!pictureUrl || !pairs.length) return;
|
||||||
|
const { pairBlocks, wordBlock, itemMap } = await loadReviewItems(pictureId, pairs);
|
||||||
|
if (!pairBlocks.length && !wordBlock) return;
|
||||||
|
|
||||||
|
const batches = [];
|
||||||
|
for (let i = 0; i < pairBlocks.length; i += BATCH_SIZE)
|
||||||
|
batches.push(pairBlocks.slice(i, i + BATCH_SIZE));
|
||||||
|
if (!batches.length) batches.push([]);
|
||||||
|
if (wordBlock) batches[0] = [wordBlock, ...batches[0]];
|
||||||
|
|
||||||
|
for (const batch of batches) {
|
||||||
|
try {
|
||||||
|
const { system, user } = buildReviewPrompt(pictureUrl, batch);
|
||||||
|
const data = await callClaude({ system, user, maxTokens: 8000, model: REVIEW_MODEL });
|
||||||
|
const corrections = Array.isArray(data.corrections) ? data.corrections : [];
|
||||||
|
for (const c of corrections) {
|
||||||
|
const item = itemMap.get(c && c.ref);
|
||||||
|
if (!item) continue; // unbekannte Ref → verwerfen
|
||||||
|
const v = validateCorrection(item, c.corrected);
|
||||||
|
if (!v.ok) {
|
||||||
|
console.warn(`Review: Korrektur für ${c.ref} verworfen (${v.reason})`);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
await applyCorrection(item, v.detokenized);
|
||||||
|
progress.correctionsApplied++;
|
||||||
|
}
|
||||||
|
} catch (err) {
|
||||||
|
progress.reviewFailures++;
|
||||||
|
console.error(`Review-Batch-Fehler bei Bild ${pictureId}:`, err.message);
|
||||||
|
}
|
||||||
|
progress.reviewedPairs = Math.min(progress.reviewedPairs + BATCH_SIZE, pairs.length);
|
||||||
|
if (onProgress) await onProgress();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { reviewPicturePairs, loadReviewItems, buildReviewPrompt, validateCorrection, invalidateAudio };
|
||||||
@@ -18,7 +18,7 @@ const TRANSLATE_CONFIG = {
|
|||||||
|
|
||||||
// ── Placeholder-Schutz ────────────────────────────────────────────────────────
|
// ── Placeholder-Schutz ────────────────────────────────────────────────────────
|
||||||
// Format im Quelltext: {{label.w:uuid}} oder {{label.o:uuid}}
|
// Format im Quelltext: {{label.w:uuid}} oder {{label.o:uuid}}
|
||||||
const { PLACEHOLDER_RE } = require('./placeholders');
|
const { PLACEHOLDER_RE, stripLeakedTokens } = require('./placeholders');
|
||||||
|
|
||||||
// Sätze für Claude vorbereiten: jedes Placeholder durch ⟦PHn:label⟧-Token ersetzen.
|
// Sätze für Claude vorbereiten: jedes Placeholder durch ⟦PHn:label⟧-Token ersetzen.
|
||||||
// Token-Format ist absichtlich exotisch, damit Claude es nicht versehentlich ändert.
|
// Token-Format ist absichtlich exotisch, damit Claude es nicht versehentlich ändert.
|
||||||
@@ -54,7 +54,7 @@ function detokenize(translated, tokens, labelsFromClaude) {
|
|||||||
return { text: out, missingTokens: tokens.filter(t => !seen.has(t.key)).map(t => t.key) };
|
return { text: out, missingTokens: tokens.filter(t => !seen.has(t.key)).map(t => t.key) };
|
||||||
}
|
}
|
||||||
|
|
||||||
async function callClaude({ system, user, maxTokens = 2000 }) {
|
async function callClaude({ system, user, maxTokens = 2000, model = TRANSLATE_MODEL }) {
|
||||||
const apiKey = process.env.ANTHROPIC_API_KEY;
|
const apiKey = process.env.ANTHROPIC_API_KEY;
|
||||||
if (!apiKey) { const e = new Error('ANTHROPIC_API_KEY nicht konfiguriert'); e.status = 500; throw e; }
|
if (!apiKey) { const e = new Error('ANTHROPIC_API_KEY nicht konfiguriert'); e.status = 500; throw e; }
|
||||||
|
|
||||||
@@ -69,7 +69,7 @@ async function callClaude({ system, user, maxTokens = 2000 }) {
|
|||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' },
|
headers: { 'Content-Type': 'application/json', 'x-api-key': apiKey, 'anthropic-version': '2023-06-01' },
|
||||||
body: JSON.stringify({
|
body: JSON.stringify({
|
||||||
model: TRANSLATE_MODEL, max_tokens: maxTokens, system,
|
model, max_tokens: maxTokens, system,
|
||||||
messages: [{ role: 'user', content: user }],
|
messages: [{ role: 'user', content: user }],
|
||||||
}),
|
}),
|
||||||
});
|
});
|
||||||
@@ -98,17 +98,24 @@ async function translateText({ text, from, to }) {
|
|||||||
if (!text || !text.trim()) return '';
|
if (!text || !text.trim()) return '';
|
||||||
const { tokenized, tokens } = tokenize(text);
|
const { tokenized, tokens } = tokenize(text);
|
||||||
const system = 'Du bist ein professioneller Übersetzer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown, ohne Erklärungen.';
|
const system = 'Du bist ein professioneller Übersetzer. Antworte AUSSCHLIESSLICH mit gültigem JSON, ohne Markdown, ohne Erklärungen.';
|
||||||
const user = `Übersetze diesen Text von ${LANG_LABEL[from] || from} nach ${LANG_LABEL[to] || to}.\n\n` +
|
// Token-Erklärung NUR wenn der Text wirklich Tokens enthält — sonst halluziniert
|
||||||
`WICHTIG: Tokens der Form ⟦PHn:wort⟧ sind Platzhalter. Übersetze NUR das Wort innerhalb des Tokens, ` +
|
// Claude gelegentlich ⟦PHn:…⟧-Tokens in die Übersetzung hinein.
|
||||||
`behalte das Token-Format exakt bei (⟦PHn:übersetztesWort⟧). Passe die Beugung des Wortes an den umgebenden Satz an ` +
|
const user = tokens.length
|
||||||
`(Mehrzahl/Kasus). Die Token-Reihenfolge im Satz darfst du frei wählen wie es natürlich klingt.\n\n` +
|
? `Übersetze diesen Text von ${LANG_LABEL[from] || from} nach ${LANG_LABEL[to] || to}.\n\n` +
|
||||||
`Quelltext:\n${tokenized}\n\n` +
|
`WICHTIG: Tokens der Form ⟦PHn:wort⟧ sind Platzhalter. Übersetze NUR das Wort innerhalb des Tokens, ` +
|
||||||
`Antwort-Format:\n{"translated":"...","labels":{${tokens.map(t => `"${t.key}":"<übersetztes Wort>"`).join(',')}}}`;
|
`behalte das Token-Format exakt bei (⟦PHn:übersetztesWort⟧). Passe die Beugung des Wortes an den umgebenden Satz an ` +
|
||||||
|
`(Mehrzahl/Kasus). Die Token-Reihenfolge im Satz darfst du frei wählen wie es natürlich klingt.\n\n` +
|
||||||
|
`Quelltext:\n${tokenized}\n\n` +
|
||||||
|
`Antwort-Format:\n{"translated":"...","labels":{${tokens.map(t => `"${t.key}":"<übersetztes Wort>"`).join(',')}}}`
|
||||||
|
: `Übersetze diesen Text von ${LANG_LABEL[from] || from} nach ${LANG_LABEL[to] || to}.\n\n` +
|
||||||
|
`Quelltext:\n${tokenized}\n\n` +
|
||||||
|
`Antwort-Format:\n{"translated":"..."}`;
|
||||||
|
|
||||||
const data = await callClaude({ system, user });
|
const data = await callClaude({ system, user });
|
||||||
if (typeof data.translated !== 'string') throw new Error('Ungültiges JSON: translated fehlt');
|
if (typeof data.translated !== 'string') throw new Error('Ungültiges JSON: translated fehlt');
|
||||||
const { text: detok } = detokenize(data.translated, tokens, data.labels || {});
|
const { text: detok } = detokenize(data.translated, tokens, data.labels || {});
|
||||||
return detok;
|
// Defensiv: von Claude erfundene/umnummerierte Tokens dürfen nie in die DB
|
||||||
|
return stripLeakedTokens(detok);
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── Auto-Status für Wörter (Spiegel zum Trigger in words.js) ──────────────────
|
// ── Auto-Status für Wörter (Spiegel zum Trigger in words.js) ──────────────────
|
||||||
|
|||||||
@@ -2,6 +2,8 @@ const router = require('express').Router();
|
|||||||
const bcrypt = require('bcryptjs');
|
const bcrypt = require('bcryptjs');
|
||||||
const jwt = require('jsonwebtoken');
|
const jwt = require('jsonwebtoken');
|
||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
|
const { levelForEp, levelInfo } = require('../lib/leveling');
|
||||||
|
const { evaluateAchievements, listAchievements } = require('../lib/achievements');
|
||||||
|
|
||||||
function signToken(user) {
|
function signToken(user) {
|
||||||
return jwt.sign(
|
return jwt.sign(
|
||||||
@@ -138,9 +140,11 @@ router.get('/me', requireJwt, async (req, res, next) => {
|
|||||||
un.username,
|
un.username,
|
||||||
COALESCE(up.total_ep, 0) AS total_ep,
|
COALESCE(up.total_ep, 0) AS total_ep,
|
||||||
COALESCE(up.streak_days, 0) AS streak_days,
|
COALESCE(up.streak_days, 0) AS streak_days,
|
||||||
|
COALESCE(up.daily_goal_ep, 30) AS daily_goal_ep,
|
||||||
up.last_practice_at,
|
up.last_practice_at,
|
||||||
ln.id AS language_native_id, ln.short_en AS language_native_short, ln.titel_de AS language_native_titel,
|
ln.id AS language_native_id, ln.short_en AS language_native_short, ln.titel_de AS language_native_titel,
|
||||||
lt.id AS language_target_id, lt.short_en AS language_target_short, lt.titel_de AS language_target_titel
|
lt.id AS language_target_id, lt.short_en AS language_target_short, lt.titel_de AS language_target_titel,
|
||||||
|
lt.greeting AS language_target_greeting
|
||||||
FROM users u
|
FROM users u
|
||||||
LEFT JOIN users_public up ON up.user_id = u.id
|
LEFT JOIN users_public up ON up.user_id = u.id
|
||||||
LEFT JOIN user_names un ON un.id = up.username_id
|
LEFT JOIN user_names un ON un.id = up.username_id
|
||||||
@@ -151,7 +155,7 @@ router.get('/me', requireJwt, async (req, res, next) => {
|
|||||||
);
|
);
|
||||||
if (!r.rows.length) return res.status(404).json({ error: 'User not found' });
|
if (!r.rows.length) return res.status(404).json({ error: 'User not found' });
|
||||||
const row = r.rows[0];
|
const row = r.rows[0];
|
||||||
row.level = Math.floor((row.total_ep || 0) / 500);
|
Object.assign(row, levelInfo(row.total_ep)); // level + ep_into_level + ep_to_next_level
|
||||||
res.json(row);
|
res.json(row);
|
||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
@@ -177,27 +181,185 @@ router.post('/progress', requireJwt, async (req, res, next) => {
|
|||||||
[userId, pair_id, isCorrect ? 1 : 0, isCorrect ? 0 : 1, pts]
|
[userId, pair_id, isCorrect ? 1 : 0, isCorrect ? 0 : 1, pts]
|
||||||
);
|
);
|
||||||
|
|
||||||
// EP + Streak auf users_public; Streak: +1 bei neuem Tag, Reset bei Lücke > 1 Tag
|
// Tagesverlauf upserten (für Streak-Kalender, Wochengraph, Tagesziel).
|
||||||
|
// RETURNING ep_earned = NEUER Tagesstand → Tagesziel-Übergang erkennbar.
|
||||||
|
const day = await query(
|
||||||
|
`INSERT INTO user_daily_activity (user_id, activity_date, ep_earned, cards_done, correct_count)
|
||||||
|
VALUES ($1, CURRENT_DATE, $2, 1, $3)
|
||||||
|
ON CONFLICT (user_id, activity_date) DO UPDATE SET
|
||||||
|
ep_earned = user_daily_activity.ep_earned + $2,
|
||||||
|
cards_done = user_daily_activity.cards_done + 1,
|
||||||
|
correct_count = user_daily_activity.correct_count + $3
|
||||||
|
RETURNING ep_earned`,
|
||||||
|
[userId, pts, isCorrect ? 1 : 0]
|
||||||
|
);
|
||||||
|
|
||||||
|
// EP + Streak auf users_public; Streak: +1 bei neuem Tag, Reset bei Lücke > 1 Tag.
|
||||||
|
// CTE fängt die Pre-Update-Werte mit, damit Level-Up/Streak-Up atomar erkennbar sind.
|
||||||
const upd = await query(
|
const upd = await query(
|
||||||
`UPDATE users_public SET
|
`WITH prev AS (
|
||||||
total_ep = total_ep + $2,
|
SELECT total_ep AS prev_ep, streak_days AS prev_streak
|
||||||
|
FROM users_public WHERE user_id = $1
|
||||||
|
)
|
||||||
|
UPDATE users_public up SET
|
||||||
|
total_ep = up.total_ep + $2,
|
||||||
streak_days = CASE
|
streak_days = CASE
|
||||||
WHEN last_practice_at IS NULL THEN 1
|
WHEN up.last_practice_at IS NULL THEN 1
|
||||||
WHEN last_practice_at::date = CURRENT_DATE THEN streak_days
|
WHEN up.last_practice_at::date = CURRENT_DATE THEN up.streak_days
|
||||||
WHEN last_practice_at::date = CURRENT_DATE - INTERVAL '1 day' THEN streak_days + 1
|
WHEN up.last_practice_at::date = CURRENT_DATE - INTERVAL '1 day' THEN up.streak_days + 1
|
||||||
ELSE 1
|
ELSE 1
|
||||||
END,
|
END,
|
||||||
last_practice_at = NOW()
|
last_practice_at = NOW()
|
||||||
WHERE user_id = $1
|
FROM prev
|
||||||
RETURNING total_ep, streak_days`,
|
WHERE up.user_id = $1
|
||||||
|
RETURNING up.total_ep, up.streak_days, up.daily_goal_ep, prev.prev_ep, prev.prev_streak`,
|
||||||
[userId, pts]
|
[userId, pts]
|
||||||
);
|
);
|
||||||
|
|
||||||
if (!upd.rows.length)
|
if (!upd.rows.length)
|
||||||
return res.status(409).json({ error: 'Kein Profil vorhanden. Bitte zuerst Profil anlegen.' });
|
return res.status(409).json({ error: 'Kein Profil vorhanden. Bitte zuerst Profil anlegen.' });
|
||||||
|
|
||||||
const { total_ep, streak_days } = upd.rows[0];
|
const r = upd.rows[0];
|
||||||
res.json({ total_ep, streak_days, level: Math.floor(total_ep / 500) });
|
const daily_ep = day.rows[0]?.ep_earned ?? pts;
|
||||||
|
const daily_goal_ep = r.daily_goal_ep || 30;
|
||||||
|
|
||||||
|
// Erfolge auswerten (nur neu freigeschaltete kommen zurück). Fehler hier dürfen
|
||||||
|
// die Buchung nicht kippen → defensiv leer.
|
||||||
|
let unlocked_achievements = [];
|
||||||
|
try {
|
||||||
|
unlocked_achievements = await evaluateAchievements(userId, {
|
||||||
|
total_ep: r.total_ep, streak_days: r.streak_days,
|
||||||
|
});
|
||||||
|
} catch (e) { /* Erfolge optional – Buchung steht bereits */ }
|
||||||
|
|
||||||
|
res.json({
|
||||||
|
total_ep: r.total_ep,
|
||||||
|
level: levelForEp(r.total_ep),
|
||||||
|
prev_level: levelForEp(r.prev_ep),
|
||||||
|
streak_days: r.streak_days,
|
||||||
|
streak_increased: r.streak_days > r.prev_streak,
|
||||||
|
daily_ep,
|
||||||
|
daily_goal_ep,
|
||||||
|
// Schwellen-Übergang: jetzt erreicht, vorher (ohne diese Karte) noch nicht
|
||||||
|
goal_just_reached: daily_ep >= daily_goal_ep && (daily_ep - pts) < daily_goal_ep,
|
||||||
|
unlocked_achievements,
|
||||||
|
});
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// GET /auth/achievements — alle Erfolge mit Freischalt-Status (für die Profil-Sektion)
|
||||||
|
router.get('/achievements', requireJwt, async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
res.json(await listAchievements(req.user.userId));
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// GET /auth/stats — Fortschrittsdaten für das Profil (Verlauf, Tagesziel, Skills)
|
||||||
|
router.get('/stats', requireJwt, async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const userId = req.user.userId;
|
||||||
|
|
||||||
|
// Tagesverlauf der letzten ~84 Tage (für Heatmap-Kalender + Wochengraph)
|
||||||
|
const daily = await query(
|
||||||
|
`SELECT to_char(activity_date, 'YYYY-MM-DD') AS date, ep_earned AS ep, cards_done AS cards, correct_count AS correct
|
||||||
|
FROM user_daily_activity
|
||||||
|
WHERE user_id = $1 AND activity_date >= CURRENT_DATE - INTERVAL '83 days'
|
||||||
|
ORDER BY activity_date ASC`,
|
||||||
|
[userId]
|
||||||
|
);
|
||||||
|
|
||||||
|
// Heute (für Tagesziel-Ring) + Tagesziel aus dem Profil
|
||||||
|
const today = await query(
|
||||||
|
`SELECT COALESCE(da.ep_earned, 0) AS ep, COALESCE(da.cards_done, 0) AS cards,
|
||||||
|
COALESCE(up.daily_goal_ep, 30) AS daily_goal_ep
|
||||||
|
FROM users_public up
|
||||||
|
LEFT JOIN user_daily_activity da
|
||||||
|
ON da.user_id = up.user_id AND da.activity_date = CURRENT_DATE
|
||||||
|
WHERE up.user_id = $1`,
|
||||||
|
[userId]
|
||||||
|
);
|
||||||
|
|
||||||
|
// Gesamtstatistik aus user_pair_progress
|
||||||
|
const totals = await query(
|
||||||
|
`SELECT COUNT(*)::int AS pairs_practiced,
|
||||||
|
COALESCE(SUM(seen_count), 0)::int AS total_seen,
|
||||||
|
COALESCE(SUM(correct_count), 0)::int AS total_correct
|
||||||
|
FROM user_pair_progress
|
||||||
|
WHERE user_id = $1`,
|
||||||
|
[userId]
|
||||||
|
);
|
||||||
|
|
||||||
|
// Skills: echte Genauigkeit je answer_type des Pairs.
|
||||||
|
// Mapping answer_type → Skill-Label: word/question → Vokabular, text → Lesen, yes_no → Verständnis.
|
||||||
|
const skillRows = await query(
|
||||||
|
`SELECT p.answer_type,
|
||||||
|
COALESCE(SUM(upp.correct_count), 0)::int AS correct,
|
||||||
|
COALESCE(SUM(upp.seen_count), 0)::int AS seen
|
||||||
|
FROM user_pair_progress upp
|
||||||
|
JOIN pairs p ON p.id = upp.pair_id
|
||||||
|
WHERE upp.user_id = $1
|
||||||
|
GROUP BY p.answer_type`,
|
||||||
|
[userId]
|
||||||
|
);
|
||||||
|
|
||||||
|
const SKILL_MAP = { word: 'Vokabular', question: 'Vokabular', text: 'Lesen', yes_no: 'Verständnis' };
|
||||||
|
const skillAcc = {}; // label -> { correct, seen }
|
||||||
|
for (const r of skillRows.rows) {
|
||||||
|
const label = SKILL_MAP[r.answer_type] || 'Sonstige';
|
||||||
|
const acc = (skillAcc[label] ||= { correct: 0, seen: 0 });
|
||||||
|
acc.correct += r.correct;
|
||||||
|
acc.seen += r.seen;
|
||||||
|
}
|
||||||
|
// Feste Reihenfolge, damit der Radar stabil bleibt; value = Genauigkeit (0..1)
|
||||||
|
const skills = ['Vokabular', 'Lesen', 'Verständnis'].map((label) => {
|
||||||
|
const acc = skillAcc[label];
|
||||||
|
return { label, value: acc && acc.seen > 0 ? acc.correct / acc.seen : 0, seen: acc?.seen || 0 };
|
||||||
|
});
|
||||||
|
|
||||||
|
// Punkte je Kategorie (Lebensmittel/Tiere/Beruf …) — abgeleitet über pair_categories.
|
||||||
|
// Mehrfach-Kategorien eines Pairs zählen bewusst zu jeder Kategorie.
|
||||||
|
const categoryRows = await query(
|
||||||
|
`SELECT c.id, c.titel_de AS label,
|
||||||
|
COALESCE(SUM(upp.earned_points), 0)::int AS points,
|
||||||
|
COALESCE(SUM(upp.seen_count), 0)::int AS seen
|
||||||
|
FROM user_pair_progress upp
|
||||||
|
JOIN pair_categories pc ON pc.pair_id = upp.pair_id
|
||||||
|
JOIN categories c ON c.id = pc.category_id
|
||||||
|
WHERE upp.user_id = $1
|
||||||
|
GROUP BY c.id, c.titel_de
|
||||||
|
HAVING SUM(upp.earned_points) > 0
|
||||||
|
ORDER BY points DESC`,
|
||||||
|
[userId]
|
||||||
|
);
|
||||||
|
|
||||||
|
const t = totals.rows[0] || { pairs_practiced: 0, total_seen: 0, total_correct: 0 };
|
||||||
|
const td = today.rows[0] || { ep: 0, cards: 0, daily_goal_ep: 30 };
|
||||||
|
|
||||||
|
res.json({
|
||||||
|
daily: daily.rows,
|
||||||
|
today: { ep: td.ep, cards: td.cards, daily_goal_ep: td.daily_goal_ep },
|
||||||
|
totals: {
|
||||||
|
pairs_practiced: t.pairs_practiced,
|
||||||
|
total_seen: t.total_seen,
|
||||||
|
total_correct: t.total_correct,
|
||||||
|
accuracy: t.total_seen > 0 ? t.total_correct / t.total_seen : 0,
|
||||||
|
},
|
||||||
|
skills,
|
||||||
|
categories: categoryRows.rows,
|
||||||
|
});
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// PUT /auth/goal — Tagesziel (EP/Tag) setzen
|
||||||
|
router.put('/goal', requireJwt, async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const goal = Math.max(5, Math.min(500, parseInt(req.body?.daily_goal_ep) || 0));
|
||||||
|
const upd = await query(
|
||||||
|
`UPDATE users_public SET daily_goal_ep = $2 WHERE user_id = $1 RETURNING daily_goal_ep`,
|
||||||
|
[req.user.userId, goal]
|
||||||
|
);
|
||||||
|
if (!upd.rows.length) return res.status(409).json({ error: 'Kein Profil vorhanden.' });
|
||||||
|
res.json({ daily_goal_ep: upd.rows[0].daily_goal_ep });
|
||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -1,8 +1,22 @@
|
|||||||
const router = require('express').Router();
|
const router = require('express').Router();
|
||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
|
const { runCategorizationTick, classifyWordsSync } = require('../lib/classifyWords');
|
||||||
|
|
||||||
const STATUSES = ['requested', 'blocked', 'published'];
|
const STATUSES = ['requested', 'blocked', 'published'];
|
||||||
|
|
||||||
|
// POST /api/categories/auto-assign — Kategorisierung anstoßen.
|
||||||
|
// ?sync=true → sofortiger One-Shot-Backfill bestehender Wörter (synchron, kein 24h-Verzug)
|
||||||
|
// ?sync=true&reset=true → bestehende Zuordnungen verwerfen und alles neu klassifizieren
|
||||||
|
// sonst → ein asynchroner Batch-Tick (submit/collect über die Message Batches API)
|
||||||
|
router.post('/auto-assign', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const sync = req.query.sync === 'true' || req.body?.sync === true;
|
||||||
|
const reset = req.query.reset === 'true' || req.body?.reset === true;
|
||||||
|
const result = sync ? await classifyWordsSync({ reset }) : await runCategorizationTick();
|
||||||
|
res.json(result);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
const STATUS_TIMESTAMP = {
|
const STATUS_TIMESTAMP = {
|
||||||
requested: 'requested_at',
|
requested: 'requested_at',
|
||||||
published: 'published_at',
|
published: 'published_at',
|
||||||
|
|||||||
@@ -39,12 +39,20 @@ function collectIds(lists, filterType) {
|
|||||||
|
|
||||||
router.get('/', requireJwt, async (req, res, next) => {
|
router.get('/', requireJwt, async (req, res, next) => {
|
||||||
try {
|
try {
|
||||||
const lang = ['de', 'en', 'sv'].includes(req.query.lang) ? req.query.lang : 'de';
|
const lang = ['de', 'en', 'sv'].includes(req.query.lang) ? req.query.lang : 'de';
|
||||||
const limit = Math.min(parseInt(req.query.limit) || 20, 100);
|
const limit = Math.min(parseInt(req.query.limit) || 20, 100);
|
||||||
|
const userId = req.user.userId;
|
||||||
|
// Vom Client schon geladene Pairs (In-Session-Dedupe) – nur gültige UUIDs übernehmen.
|
||||||
|
const exclude = String(req.query.exclude || '')
|
||||||
|
.split(',')
|
||||||
|
.map(s => s.trim())
|
||||||
|
.filter(s => /^[0-9a-f-]{36}$/i.test(s));
|
||||||
|
|
||||||
// 1. Random pairs — only fully ready content:
|
// 1. Random pairs — only fully ready content:
|
||||||
// pair published + linked question/statements published + a published picture exists.
|
// pair published + linked question/statements published + a published picture exists.
|
||||||
// (Audio coverage is additionally enforced in Phase 2.)
|
// (Audio coverage is additionally enforced in Phase 2.)
|
||||||
|
// Pagination: bereits abgeschlossene (user_pair_progress) und vom Client
|
||||||
|
// geladene Pairs werden ausgeschlossen; leere Antwort = keine weiteren Karten.
|
||||||
const pairsRes = await query(
|
const pairsRes = await query(
|
||||||
`SELECT p.id, p.answer_type, p.status, p.difficulty_level,
|
`SELECT p.id, p.answer_type, p.status, p.difficulty_level,
|
||||||
p.question_id, p.positive_statement_id, p.negative_statement_id
|
p.question_id, p.positive_statement_id, p.negative_statement_id
|
||||||
@@ -61,9 +69,13 @@ router.get('/', requireJwt, async (req, res, next) => {
|
|||||||
JOIN object_pictures pic ON pic.object_id = op.object_id
|
JOIN object_pictures pic ON pic.object_id = op.object_id
|
||||||
JOIN pictures pp ON pp.id = pic.picture_id
|
JOIN pictures pp ON pp.id = pic.picture_id
|
||||||
WHERE op.pair_id = p.id AND pp.status = 'published')
|
WHERE op.pair_id = p.id AND pp.status = 'published')
|
||||||
|
AND NOT EXISTS (
|
||||||
|
SELECT 1 FROM user_pair_progress upp
|
||||||
|
WHERE upp.pair_id = p.id AND upp.user_id = $2)
|
||||||
|
AND p.id <> ALL($3::uuid[])
|
||||||
ORDER BY random()
|
ORDER BY random()
|
||||||
LIMIT $1`,
|
LIMIT $1`,
|
||||||
[limit]
|
[limit, userId, exclude]
|
||||||
);
|
);
|
||||||
if (!pairsRes.rows.length) return res.json([]);
|
if (!pairsRes.rows.length) return res.json([]);
|
||||||
const pairs = pairsRes.rows;
|
const pairs = pairsRes.rows;
|
||||||
|
|||||||
@@ -2,6 +2,8 @@ const router = require('express').Router();
|
|||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
const { fillMissingRow } = require('../lib/translate');
|
const { fillMissingRow } = require('../lib/translate');
|
||||||
const { loadPairContext, computeReadiness, loadPairContent, translateWordGroup } = require('../lib/pairContent');
|
const { loadPairContext, computeReadiness, loadPairContent, translateWordGroup } = require('../lib/pairContent');
|
||||||
|
const { deletePairDeep } = require('../lib/deleteCascade');
|
||||||
|
const { derivePairCategories } = require('../lib/pairCategories');
|
||||||
|
|
||||||
const STATUSES = ['draft', 'reviewed', 'blocked', 'published'];
|
const STATUSES = ['draft', 'reviewed', 'blocked', 'published'];
|
||||||
const ANSWER_TYPES = new Set(['yes_no', 'text', 'question', 'word']);
|
const ANSWER_TYPES = new Set(['yes_no', 'text', 'question', 'word']);
|
||||||
@@ -130,6 +132,11 @@ router.patch('/:id', async (req, res, next) => {
|
|||||||
values
|
values
|
||||||
);
|
);
|
||||||
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
|
||||||
|
// Beim Veröffentlichen Kategorien aus den verknüpften Wörtern ableiten (best effort).
|
||||||
|
if (req.body.status === 'published')
|
||||||
|
await derivePairCategories(result.rows[0].id).catch(() => {});
|
||||||
|
|
||||||
res.json(result.rows[0]);
|
res.json(result.rows[0]);
|
||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
@@ -294,15 +301,17 @@ router.post('/:id/publish', async (req, res, next) => {
|
|||||||
`UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2) WHERE id=$1 RETURNING *`,
|
`UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2) WHERE id=$1 RETURNING *`,
|
||||||
[p.id, now]);
|
[p.id, now]);
|
||||||
|
|
||||||
|
await derivePairCategories(p.id).catch(() => {});
|
||||||
|
|
||||||
res.json({ ...upd.rows[0], published_languages: [lang] });
|
res.json({ ...upd.rows[0], published_languages: [lang] });
|
||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
|
|
||||||
// DELETE /api/pairs/:id
|
// DELETE /api/pairs/:id — Pair + (unreferenzierte) Frage/Statements + deren Audios (DB+S3)
|
||||||
router.delete('/:id', async (req, res, next) => {
|
router.delete('/:id', async (req, res, next) => {
|
||||||
try {
|
try {
|
||||||
const result = await query('DELETE FROM pairs WHERE id = $1 RETURNING id', [req.params.id]);
|
const deleted = await deletePairDeep(req.params.id);
|
||||||
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
if (!deleted) return res.status(404).json({ error: 'Not found' });
|
||||||
res.status(204).end();
|
res.status(204).end();
|
||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
|
|||||||
130
src/routes/picture-jobs.js
Normal file
130
src/routes/picture-jobs.js
Normal file
@@ -0,0 +1,130 @@
|
|||||||
|
const router = require('express').Router();
|
||||||
|
const { query } = require('../db');
|
||||||
|
|
||||||
|
const STATUSES = ['pending', 'generating', 'done', 'failed'];
|
||||||
|
|
||||||
|
// GET /api/picture-jobs
|
||||||
|
router.get('/', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const { status, limit = 50, offset = 0 } = req.query;
|
||||||
|
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
|
||||||
|
const conditions = [];
|
||||||
|
if (status) { conditions.push(`pj.status = $${params.length + 1}`); params.push(status); }
|
||||||
|
const where = conditions.length ? `WHERE ${conditions.join(' AND ')}` : '';
|
||||||
|
const result = await query(
|
||||||
|
`SELECT pj.*,
|
||||||
|
COALESCE(json_agg(DISTINCT pjw.word_id) FILTER (WHERE pjw.word_id IS NOT NULL), '[]') AS word_ids
|
||||||
|
FROM picture_jobs pj
|
||||||
|
LEFT JOIN picture_job_words pjw ON pjw.picture_job_id = pj.id
|
||||||
|
${where}
|
||||||
|
GROUP BY pj.id
|
||||||
|
ORDER BY pj.created_at DESC
|
||||||
|
LIMIT $1 OFFSET $2`,
|
||||||
|
params
|
||||||
|
);
|
||||||
|
res.json(result.rows);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// GET /api/picture-jobs/:id
|
||||||
|
router.get('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const result = await query(
|
||||||
|
`SELECT pj.*,
|
||||||
|
COALESCE(json_agg(DISTINCT pjw.word_id) FILTER (WHERE pjw.word_id IS NOT NULL), '[]') AS word_ids
|
||||||
|
FROM picture_jobs pj
|
||||||
|
LEFT JOIN picture_job_words pjw ON pjw.picture_job_id = pj.id
|
||||||
|
WHERE pj.id = $1
|
||||||
|
GROUP BY pj.id`,
|
||||||
|
[req.params.id]
|
||||||
|
);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.json(result.rows[0]);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// GET /api/picture-jobs/:id/words
|
||||||
|
router.get('/:id/words', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const result = await query(
|
||||||
|
`SELECT w.* FROM words w
|
||||||
|
JOIN picture_job_words pjw ON pjw.word_id = w.id
|
||||||
|
WHERE pjw.picture_job_id = $1`,
|
||||||
|
[req.params.id]
|
||||||
|
);
|
||||||
|
res.json(result.rows);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// POST /api/picture-jobs
|
||||||
|
router.post('/', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const { kategorie_id, prompt_fix, prompt_atmosphere, prompt_setting, prompt_final, word_ids } = req.body;
|
||||||
|
const result = await query(
|
||||||
|
`INSERT INTO picture_jobs (kategorie_id, prompt_fix, prompt_atmosphere, prompt_setting, prompt_final)
|
||||||
|
VALUES ($1, $2, $3, $4, $5) RETURNING *`,
|
||||||
|
[kategorie_id || null, prompt_fix || null, prompt_atmosphere || null, prompt_setting || null, prompt_final || null]
|
||||||
|
);
|
||||||
|
const job = result.rows[0];
|
||||||
|
if (Array.isArray(word_ids) && word_ids.length) {
|
||||||
|
for (const wid of word_ids) {
|
||||||
|
await query(
|
||||||
|
`INSERT INTO picture_job_words (picture_job_id, word_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
|
||||||
|
[job.id, wid]
|
||||||
|
).catch(() => {});
|
||||||
|
}
|
||||||
|
}
|
||||||
|
res.status(201).json({ ...job, word_ids: word_ids || [] });
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// PATCH /api/picture-jobs/:id
|
||||||
|
router.patch('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const allowed = ['kategorie_id', 'prompt_fix', 'prompt_atmosphere', 'prompt_setting', 'prompt_final', 'status', 'picture_id'];
|
||||||
|
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
|
||||||
|
if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' });
|
||||||
|
if (req.body.status && !STATUSES.includes(req.body.status))
|
||||||
|
return res.status(400).json({ error: `status must be one of: ${STATUSES.join(', ')}` });
|
||||||
|
const setClauses = fields.map((f, i) => `${f} = $${i + 1}`).join(', ');
|
||||||
|
const result = await query(
|
||||||
|
`UPDATE picture_jobs SET ${setClauses} WHERE id = $${fields.length + 1} RETURNING *`,
|
||||||
|
[...fields.map(f => req.body[f]), req.params.id]
|
||||||
|
);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.json(result.rows[0]);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// PUT /api/picture-jobs/:id/words/:wordId
|
||||||
|
router.put('/:id/words/:wordId', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
await query(
|
||||||
|
`INSERT INTO picture_job_words (picture_job_id, word_id) VALUES ($1, $2) ON CONFLICT DO NOTHING`,
|
||||||
|
[req.params.id, req.params.wordId]
|
||||||
|
);
|
||||||
|
res.status(204).end();
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// DELETE /api/picture-jobs/:id/words/:wordId
|
||||||
|
router.delete('/:id/words/:wordId', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
await query(
|
||||||
|
`DELETE FROM picture_job_words WHERE picture_job_id = $1 AND word_id = $2`,
|
||||||
|
[req.params.id, req.params.wordId]
|
||||||
|
);
|
||||||
|
res.status(204).end();
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// DELETE /api/picture-jobs/:id
|
||||||
|
router.delete('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const result = await query('DELETE FROM picture_jobs WHERE id = $1 RETURNING id', [req.params.id]);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.status(204).end();
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
module.exports = router;
|
||||||
@@ -3,6 +3,7 @@ const multer = require('multer');
|
|||||||
const { v4: uuidv4 } = require('uuid');
|
const { v4: uuidv4 } = require('uuid');
|
||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
const { uploadFile, deleteFile, keyFromUrl } = require('../s3');
|
const { uploadFile, deleteFile, keyFromUrl } = require('../s3');
|
||||||
|
const { deletePictureObjectsDeep } = require('../lib/deleteCascade');
|
||||||
|
|
||||||
const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 20 * 1024 * 1024 } });
|
const upload = multer({ storage: multer.memoryStorage(), limits: { fileSize: 20 * 1024 * 1024 } });
|
||||||
|
|
||||||
@@ -153,12 +154,15 @@ router.patch('/:id', async (req, res, next) => {
|
|||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
|
|
||||||
// DELETE /api/pictures/:id — Eintrag + Hetzner-Datei löschen
|
// DELETE /api/pictures/:id — Eintrag + Hetzner-Datei löschen,
|
||||||
|
// inkl. Objekte des Bildes und deren Pairs (Fragen/Statements/Audios kaskadieren)
|
||||||
router.delete('/:id', async (req, res, next) => {
|
router.delete('/:id', async (req, res, next) => {
|
||||||
try {
|
try {
|
||||||
const existing = await query('SELECT picture_link FROM pictures WHERE id = $1', [req.params.id]);
|
const existing = await query('SELECT picture_link FROM pictures WHERE id = $1', [req.params.id]);
|
||||||
if (!existing.rows.length) return res.status(404).json({ error: 'Not found' });
|
if (!existing.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
|
||||||
|
await deletePictureObjectsDeep(req.params.id);
|
||||||
|
|
||||||
const key = keyFromUrl(existing.rows[0].picture_link);
|
const key = keyFromUrl(existing.rows[0].picture_link);
|
||||||
if (key) await deleteFile(key).catch(() => {});
|
if (key) await deleteFile(key).catch(() => {});
|
||||||
|
|
||||||
|
|||||||
@@ -3,9 +3,11 @@ const router = require('express').Router();
|
|||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
const { LANGS } = require('../lib/translate');
|
const { LANGS } = require('../lib/translate');
|
||||||
const { loadPairContext, computeReadiness, loadPairContent } = require('../lib/pairContent');
|
const { loadPairContext, computeReadiness, loadPairContent } = require('../lib/pairContent');
|
||||||
const { enqueue, loadPairs, collectAudioUnits, generateWithBackoff, translatePair } = require('../lib/pipeline');
|
const { enqueue, loadPairs, collectAudioUnits, generateWithBackoff, translatePair, retagObjects } = require('../lib/pipeline');
|
||||||
const { describeError } = require('./audios');
|
const { describeError } = require('./audios');
|
||||||
const { PLACEHOLDER_RE } = require('../lib/placeholders');
|
const { PLACEHOLDER_RE, TOKEN_RE, stripLeakedTokens } = require('../lib/placeholders');
|
||||||
|
const { invalidateAudio } = require('../lib/reviewPairs');
|
||||||
|
const { derivePairCategories } = require('../lib/pairCategories');
|
||||||
|
|
||||||
// ── Objekt-Wort-Erkennung in Sätzen (für die manuelle Zuweisung beim Review) ──
|
// ── Objekt-Wort-Erkennung in Sätzen (für die manuelle Zuweisung beim Review) ──
|
||||||
|
|
||||||
@@ -241,6 +243,95 @@ router.post('/picture/:id/audio-fill', async (req, res, next) => {
|
|||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// POST /api/pipeline/repair-tokens — Datenreparatur: geleakte ⟦PHn:…⟧-Tokens
|
||||||
|
// (Claude-Halluzination beim Übersetzen, vor dem Fix) aus allen Sätzen entfernen.
|
||||||
|
// Betroffene Audios werden gelöscht und direkt mit dem reparierten Text neu erzeugt.
|
||||||
|
router.post('/repair-tokens', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const hasToken = v => { TOKEN_RE.lastIndex = 0; return TOKEN_RE.test(v || ''); };
|
||||||
|
const result = { cells_fixed: 0, audios_regenerated: 0, audios_failed: 0, details: [] };
|
||||||
|
const targets = [
|
||||||
|
{ table: 'questions', fields: ['sentence'] },
|
||||||
|
{ table: 'statements', fields: ['positive_sentence', 'negative_sentence'] },
|
||||||
|
{ table: 'words', fields: ['titel'] },
|
||||||
|
];
|
||||||
|
|
||||||
|
// 1) Textzellen säubern + zugehörige Audios löschen & neu generieren
|
||||||
|
for (const t of targets) {
|
||||||
|
const cols = t.fields.flatMap(f => LANGS.map(l => `${f}_${l}`));
|
||||||
|
const r = await query(
|
||||||
|
`SELECT id, ${cols.join(', ')} FROM ${t.table}
|
||||||
|
WHERE ${cols.map(c => `${c} LIKE '%⟦PH%'`).join(' OR ')}`);
|
||||||
|
for (const row of r.rows) {
|
||||||
|
for (const f of t.fields) {
|
||||||
|
for (const l of LANGS) {
|
||||||
|
const col = `${f}_${l}`;
|
||||||
|
if (!hasToken(row[col])) continue;
|
||||||
|
const fixed = stripLeakedTokens(row[col]).replace(/\s{2,}/g, ' ').trim();
|
||||||
|
await query(`UPDATE ${t.table} SET ${col} = $1 WHERE id = $2`, [fixed, row.id]);
|
||||||
|
await invalidateAudio(t.table, row.id, f, l);
|
||||||
|
result.cells_fixed++;
|
||||||
|
const detail = { table: t.table, id: row.id, column: col, fixed };
|
||||||
|
try {
|
||||||
|
await generateWithBackoff({ text: fixed, language: l, source_table: t.table, source_id: row.id, source_field: f });
|
||||||
|
result.audios_regenerated++;
|
||||||
|
} catch (err) {
|
||||||
|
result.audios_failed++;
|
||||||
|
detail.audio_error = describeError(err);
|
||||||
|
}
|
||||||
|
result.details.push(detail);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2) Audios, deren vertonter Text noch Tokens enthält (Zelle ggf. schon anderweitig
|
||||||
|
// korrigiert) → löschen und mit dem aktuellen Zellen-Text neu erzeugen
|
||||||
|
const audios = await query(
|
||||||
|
`SELECT id, source_table, source_id, source_field, language FROM audios WHERE text LIKE '%⟦PH%'`);
|
||||||
|
for (const a of audios.rows) {
|
||||||
|
const r = await query(
|
||||||
|
`SELECT ${a.source_field}_${a.language} AS text FROM ${a.source_table} WHERE id = $1`, [a.source_id]);
|
||||||
|
const text = (r.rows[0]?.text || '').trim();
|
||||||
|
await invalidateAudio(a.source_table, a.source_id, a.source_field, a.language);
|
||||||
|
const detail = { table: 'audios', id: a.id, column: `${a.source_field}_${a.language}` };
|
||||||
|
if (text) {
|
||||||
|
try {
|
||||||
|
await generateWithBackoff({ text, language: a.language, source_table: a.source_table, source_id: a.source_id, source_field: a.source_field });
|
||||||
|
result.audios_regenerated++;
|
||||||
|
} catch (err) {
|
||||||
|
result.audios_failed++;
|
||||||
|
detail.audio_error = describeError(err);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
result.details.push(detail);
|
||||||
|
}
|
||||||
|
|
||||||
|
res.json(result);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// POST /api/pipeline/retag-objects — Backfill: Objekt-Wörter in bestehenden Sätzen
|
||||||
|
// nachtokenisieren (deterministisch + optional Hybrid-LLM-Fallback für gebeugte Formen).
|
||||||
|
// Body: { picture_id?, dry_run?, use_llm?, cleanup? }. Ohne picture_id über ALLE Bilder.
|
||||||
|
// cleanup:true ⇒ statt taggen werden falsch getokte Objekt-Wörter (Objektwort nur als
|
||||||
|
// Bestimmungswort eines anderen Dings, z.B. „Erdbeerfeld") per LLM-Prüfung wieder entfernt.
|
||||||
|
// Ändert nur die Satz-Textfelder; Audio/Alignment bleiben gültig (Sprechtext unverändert).
|
||||||
|
router.post('/retag-objects', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const pictureId = req.body?.picture_id || null;
|
||||||
|
const dryRun = !!req.body?.dry_run;
|
||||||
|
const useLLM = !!req.body?.use_llm;
|
||||||
|
const cleanup = !!req.body?.cleanup;
|
||||||
|
if (pictureId) {
|
||||||
|
const pr = await query(`SELECT id FROM pictures WHERE id = $1`, [pictureId]);
|
||||||
|
if (!pr.rows.length) return res.status(404).json({ error: 'Bild nicht gefunden' });
|
||||||
|
}
|
||||||
|
const report = await retagObjects({ pictureId, dryRun, useLLM, cleanup });
|
||||||
|
res.json(report);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
// GET /api/pipeline/settings
|
// GET /api/pipeline/settings
|
||||||
router.get('/settings', async (req, res, next) => {
|
router.get('/settings', async (req, res, next) => {
|
||||||
try {
|
try {
|
||||||
@@ -316,6 +407,9 @@ router.post('/picture/:id/publish', async (req, res, next) => {
|
|||||||
await query(`UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2)
|
await query(`UPDATE pairs SET status='published', published_at=COALESCE(published_at,$2)
|
||||||
WHERE id = ANY($1)`, [pairIds, now]);
|
WHERE id = ANY($1)`, [pairIds, now]);
|
||||||
|
|
||||||
|
// Kategorien der veröffentlichten Pairs aus ihren Wörtern ableiten (best effort).
|
||||||
|
await derivePairCategories(pairIds).catch(() => {});
|
||||||
|
|
||||||
// Verlinkte Wörter: nur 'generated' → 'published' (translated bleibt für die Bild-Generierung
|
// Verlinkte Wörter: nur 'generated' → 'published' (translated bleibt für die Bild-Generierung
|
||||||
// im ServerMonitor-Flow; published würde diesen Schritt überspringen)
|
// im ServerMonitor-Flow; published würde diesen Schritt überspringen)
|
||||||
let publishedWords = 0;
|
let publishedWords = 0;
|
||||||
|
|||||||
74
src/routes/prompt-styles.js
Normal file
74
src/routes/prompt-styles.js
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
const router = require('express').Router();
|
||||||
|
const { query } = require('../db');
|
||||||
|
|
||||||
|
const TYPES = ['fix', 'atmosphere', 'setting'];
|
||||||
|
|
||||||
|
// GET /api/prompt-styles
|
||||||
|
router.get('/', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const { type, limit = 100, offset = 0 } = req.query;
|
||||||
|
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
|
||||||
|
const conditions = [];
|
||||||
|
if (type) { conditions.push(`type = $${params.length + 1}`); params.push(type); }
|
||||||
|
const where = conditions.length ? `WHERE ${conditions.join(' AND ')}` : '';
|
||||||
|
const result = await query(
|
||||||
|
`SELECT * FROM prompt_styles ${where} ORDER BY type, id LIMIT $1 OFFSET $2`,
|
||||||
|
params
|
||||||
|
);
|
||||||
|
res.json(result.rows);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// GET /api/prompt-styles/:id
|
||||||
|
router.get('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const result = await query('SELECT * FROM prompt_styles WHERE id = $1', [req.params.id]);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.json(result.rows[0]);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// POST /api/prompt-styles
|
||||||
|
router.post('/', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const { type, kategorie_id, text_en } = req.body;
|
||||||
|
if (!type || !TYPES.includes(type))
|
||||||
|
return res.status(400).json({ error: `type must be one of: ${TYPES.join(', ')}` });
|
||||||
|
if (!text_en)
|
||||||
|
return res.status(400).json({ error: 'text_en is required' });
|
||||||
|
const result = await query(
|
||||||
|
`INSERT INTO prompt_styles (type, kategorie_id, text_en) VALUES ($1, $2, $3) RETURNING *`,
|
||||||
|
[type, kategorie_id || null, text_en]
|
||||||
|
);
|
||||||
|
res.status(201).json(result.rows[0]);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// PATCH /api/prompt-styles/:id
|
||||||
|
router.patch('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const allowed = ['type', 'kategorie_id', 'text_en'];
|
||||||
|
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
|
||||||
|
if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' });
|
||||||
|
if (req.body.type && !TYPES.includes(req.body.type))
|
||||||
|
return res.status(400).json({ error: `type must be one of: ${TYPES.join(', ')}` });
|
||||||
|
const setClauses = fields.map((f, i) => `${f} = $${i + 1}`).join(', ');
|
||||||
|
const result = await query(
|
||||||
|
`UPDATE prompt_styles SET ${setClauses} WHERE id = $${fields.length + 1} RETURNING *`,
|
||||||
|
[...fields.map(f => req.body[f]), req.params.id]
|
||||||
|
);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.json(result.rows[0]);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// DELETE /api/prompt-styles/:id
|
||||||
|
router.delete('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const result = await query('DELETE FROM prompt_styles WHERE id = $1 RETURNING id', [req.params.id]);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.status(204).end();
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
module.exports = router;
|
||||||
69
src/routes/wordGenerative.js
Normal file
69
src/routes/wordGenerative.js
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
const router = require('express').Router();
|
||||||
|
const { query } = require('../db');
|
||||||
|
|
||||||
|
const STATUSES = ['pending', 'generating', 'generated', 'accepted', 'rejected'];
|
||||||
|
|
||||||
|
// GET /api/word-generative
|
||||||
|
router.get('/', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const { status, word_id, limit = 50, offset = 0 } = req.query;
|
||||||
|
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
|
||||||
|
const conditions = [];
|
||||||
|
if (status) { conditions.push(`status = $${params.length + 1}`); params.push(status); }
|
||||||
|
if (word_id) { conditions.push(`word_id = $${params.length + 1}`); params.push(word_id); }
|
||||||
|
const where = conditions.length ? `WHERE ${conditions.join(' AND ')}` : '';
|
||||||
|
const result = await query(
|
||||||
|
`SELECT * FROM word_generative ${where} ORDER BY created_at DESC LIMIT $1 OFFSET $2`,
|
||||||
|
params
|
||||||
|
);
|
||||||
|
res.json(result.rows);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// POST /api/word-generative
|
||||||
|
router.post('/', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const { word_id, prompt, status } = req.body;
|
||||||
|
if (!word_id) return res.status(400).json({ error: 'word_id ist erforderlich' });
|
||||||
|
if (status && !STATUSES.includes(status))
|
||||||
|
return res.status(400).json({ error: `status muss eines sein von: ${STATUSES.join(', ')}` });
|
||||||
|
const result = await query(
|
||||||
|
`INSERT INTO word_generative (word_id, prompt, status)
|
||||||
|
VALUES ($1, $2, $3) RETURNING *`,
|
||||||
|
[word_id, prompt || null, status || 'pending']
|
||||||
|
);
|
||||||
|
res.status(201).json(result.rows[0]);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// PATCH /api/word-generative/:id
|
||||||
|
router.patch('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const allowed = ['prompt', 'status', 'picture_link'];
|
||||||
|
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
|
||||||
|
if (!fields.length) return res.status(400).json({ error: 'Keine gültigen Felder angegeben' });
|
||||||
|
if (req.body.status && !STATUSES.includes(req.body.status))
|
||||||
|
return res.status(400).json({ error: `status muss eines sein von: ${STATUSES.join(', ')}` });
|
||||||
|
const setClauses = fields.map((f, i) => `${f} = $${i + 1}`).join(', ');
|
||||||
|
const values = [...fields.map(f => req.body[f]), req.params.id];
|
||||||
|
const result = await query(
|
||||||
|
`UPDATE word_generative SET ${setClauses} WHERE id = $${fields.length + 1} RETURNING *`,
|
||||||
|
values
|
||||||
|
);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.json(result.rows[0]);
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// DELETE /api/word-generative/:id
|
||||||
|
router.delete('/:id', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const result = await query(
|
||||||
|
`DELETE FROM word_generative WHERE id = $1 RETURNING id`, [req.params.id]
|
||||||
|
);
|
||||||
|
if (!result.rows.length) return res.status(404).json({ error: 'Not found' });
|
||||||
|
res.status(204).end();
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
|
module.exports = router;
|
||||||
@@ -1,5 +1,6 @@
|
|||||||
const router = require('express').Router();
|
const router = require('express').Router();
|
||||||
const { query } = require('../db');
|
const { query } = require('../db');
|
||||||
|
const { runEnrichTick, enrichWordsSync } = require('../lib/enrichWords');
|
||||||
|
|
||||||
const STATUSES = ['requested', 'translated', 'generated', 'blocked', 'published'];
|
const STATUSES = ['requested', 'translated', 'generated', 'blocked', 'published'];
|
||||||
|
|
||||||
@@ -9,14 +10,32 @@ const STATUS_TIMESTAMP = {
|
|||||||
blocked: 'blocked_at',
|
blocked: 'blocked_at',
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// POST /api/words/enrich-batch — manueller Trigger für Wort-Anreicherung
|
||||||
|
router.post('/enrich-batch', async (req, res, next) => {
|
||||||
|
try {
|
||||||
|
const sync = req.query.sync === 'true';
|
||||||
|
if (sync) {
|
||||||
|
const max = parseInt(req.query.max) || 500;
|
||||||
|
return res.json(await enrichWordsSync({ max }));
|
||||||
|
}
|
||||||
|
res.json(await runEnrichTick());
|
||||||
|
} catch (err) { next(err); }
|
||||||
|
});
|
||||||
|
|
||||||
// GET /api/words
|
// GET /api/words
|
||||||
router.get('/', async (req, res, next) => {
|
router.get('/', async (req, res, next) => {
|
||||||
try {
|
try {
|
||||||
const { status, titel_de, search, limit = 50, offset = 0 } = req.query;
|
const { status, titel_de, search, dom_pos, level, themenfeld_id, has_conc_m,
|
||||||
|
limit = 50, offset = 0 } = req.query;
|
||||||
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
|
const params = [Math.min(parseInt(limit), 500), parseInt(offset)];
|
||||||
const conditions = [];
|
const conditions = [];
|
||||||
if (status) { conditions.push(`w.status = $${params.length + 1}`); params.push(status); }
|
if (status) { conditions.push(`w.status = $${params.length + 1}`); params.push(status); }
|
||||||
if (titel_de) { conditions.push(`lower(w.titel_de) = lower($${params.length + 1})`); params.push(titel_de); }
|
if (titel_de) { conditions.push(`lower(w.titel_de) = lower($${params.length + 1})`); params.push(titel_de); }
|
||||||
|
if (dom_pos) { conditions.push(`w.dom_pos = $${params.length + 1}`); params.push(dom_pos); }
|
||||||
|
if (level) { conditions.push(`w.level = $${params.length + 1}`); params.push(level); }
|
||||||
|
if (themenfeld_id) { conditions.push(`w.themenfeld_id = $${params.length + 1}`); params.push(themenfeld_id); }
|
||||||
|
if (has_conc_m === 'true') conditions.push(`w.conc_m IS NOT NULL`);
|
||||||
|
if (has_conc_m === 'false') conditions.push(`w.conc_m IS NULL`);
|
||||||
if (search) {
|
if (search) {
|
||||||
const p = `%${search.toLowerCase()}%`;
|
const p = `%${search.toLowerCase()}%`;
|
||||||
conditions.push(`(lower(w.titel_de) LIKE $${params.length + 1} OR lower(w.titel_en) LIKE $${params.length + 1} OR lower(w.titel_sv) LIKE $${params.length + 1})`);
|
conditions.push(`(lower(w.titel_de) LIKE $${params.length + 1} OR lower(w.titel_en) LIKE $${params.length + 1} OR lower(w.titel_sv) LIKE $${params.length + 1})`);
|
||||||
@@ -26,12 +45,14 @@ router.get('/', async (req, res, next) => {
|
|||||||
const result = await query(
|
const result = await query(
|
||||||
`SELECT w.*,
|
`SELECT w.*,
|
||||||
COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids,
|
COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids,
|
||||||
COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids
|
COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids,
|
||||||
|
COUNT(DISTINCT wp2.picture_id)::int AS picture_count
|
||||||
FROM words w
|
FROM words w
|
||||||
LEFT JOIN word_pictures wp ON wp.word_id = w.id
|
LEFT JOIN word_pictures wp ON wp.word_id = w.id
|
||||||
LEFT JOIN pictures p ON p.id = wp.picture_id
|
LEFT JOIN pictures p ON p.id = wp.picture_id
|
||||||
LEFT JOIN word_categories wc ON wc.word_id = w.id
|
LEFT JOIN word_categories wc ON wc.word_id = w.id
|
||||||
LEFT JOIN categories c ON c.id = wc.category_id
|
LEFT JOIN categories c ON c.id = wc.category_id
|
||||||
|
LEFT JOIN word_pictures wp2 ON wp2.word_id = w.id
|
||||||
${where}
|
${where}
|
||||||
GROUP BY w.id
|
GROUP BY w.id
|
||||||
ORDER BY w.created_at DESC
|
ORDER BY w.created_at DESC
|
||||||
@@ -50,18 +71,32 @@ function autoTranslatedStatus(row) {
|
|||||||
// POST /api/words
|
// POST /api/words
|
||||||
router.post('/', async (req, res, next) => {
|
router.post('/', async (req, res, next) => {
|
||||||
try {
|
try {
|
||||||
const { titel_de, titel_en, titel_sv, difficulty_level, status } = req.body;
|
const { titel_de, titel_en, titel_sv, difficulty_level, status, conc_m } = req.body;
|
||||||
if (status && !STATUSES.includes(status))
|
if (status && !STATUSES.includes(status))
|
||||||
return res.status(400).json({ error: `status must be one of: ${STATUSES.join(', ')}` });
|
return res.status(400).json({ error: `status must be one of: ${STATUSES.join(', ')}` });
|
||||||
// Auto: alle 3 Sprachen direkt mitgeliefert + kein expliziter Status → 'translated'
|
// Auto: alle 3 Sprachen direkt mitgeliefert + kein expliziter Status → 'translated'
|
||||||
const allLangs = titel_de && titel_en && titel_sv;
|
const allLangs = titel_de && titel_en && titel_sv;
|
||||||
const effectiveStatus = status || (allLangs ? 'translated' : 'requested');
|
const effectiveStatus = status || (allLangs ? 'translated' : 'requested');
|
||||||
const result = await query(
|
// Upsert: neu anlegen oder bei doppeltem titel_en nur conc_m aktualisieren
|
||||||
`INSERT INTO words (titel_de, titel_en, titel_sv, difficulty_level, status, requested_at)
|
let result = await query(
|
||||||
VALUES ($1, $2, $3, $4, $5, NOW()) RETURNING *`,
|
`INSERT INTO words (titel_de, titel_en, titel_sv, difficulty_level, status, conc_m, requested_at)
|
||||||
[titel_de || null, titel_en || null, titel_sv || null, difficulty_level || null, effectiveStatus]
|
VALUES ($1, $2, $3, $4, $5, $6, NOW()) RETURNING *, true AS is_insert`,
|
||||||
);
|
[titel_de || null, titel_en || null, titel_sv || null,
|
||||||
res.status(201).json({ ...result.rows[0], picture_ids: [], category_ids: [] });
|
difficulty_level || null, effectiveStatus, conc_m ?? null]
|
||||||
|
).catch(async err => {
|
||||||
|
if (err.code === '23505' && titel_en) {
|
||||||
|
// Duplikat auf titel_en → conc_m aktualisieren und bestehende Zeile zurückgeben
|
||||||
|
const upd = await query(
|
||||||
|
`UPDATE words SET conc_m = $1 WHERE titel_en = $2 RETURNING *, false AS is_insert`,
|
||||||
|
[conc_m ?? null, titel_en]
|
||||||
|
);
|
||||||
|
return upd;
|
||||||
|
}
|
||||||
|
throw err;
|
||||||
|
});
|
||||||
|
const row = result.rows[0];
|
||||||
|
const { is_insert, ...word } = row;
|
||||||
|
res.status(is_insert ? 201 : 200).json({ ...word, picture_ids: [], category_ids: [] });
|
||||||
} catch (err) { next(err); }
|
} catch (err) { next(err); }
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -69,7 +104,8 @@ router.post('/', async (req, res, next) => {
|
|||||||
router.patch('/:id', async (req, res, next) => {
|
router.patch('/:id', async (req, res, next) => {
|
||||||
try {
|
try {
|
||||||
const allowed = ['titel_de', 'titel_en', 'titel_sv', 'status',
|
const allowed = ['titel_de', 'titel_en', 'titel_sv', 'status',
|
||||||
'difficulty_level', 'requested_at', 'published_at', 'blocked_at'];
|
'difficulty_level', 'requested_at', 'published_at', 'blocked_at',
|
||||||
|
'conc_m', 'dom_pos', 'level', 'themenfeld_id'];
|
||||||
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
|
const fields = Object.keys(req.body).filter(k => allowed.includes(k));
|
||||||
if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' });
|
if (!fields.length) return res.status(400).json({ error: 'No valid fields provided' });
|
||||||
|
|
||||||
@@ -117,12 +153,14 @@ router.get('/:id', async (req, res, next) => {
|
|||||||
const result = await query(
|
const result = await query(
|
||||||
`SELECT w.*,
|
`SELECT w.*,
|
||||||
COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids,
|
COALESCE(json_agg(DISTINCT p.id) FILTER (WHERE p.id IS NOT NULL), '[]') AS picture_ids,
|
||||||
COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids
|
COALESCE(json_agg(DISTINCT c.id) FILTER (WHERE c.id IS NOT NULL), '[]') AS category_ids,
|
||||||
|
COUNT(DISTINCT wp2.picture_id)::int AS picture_count
|
||||||
FROM words w
|
FROM words w
|
||||||
LEFT JOIN word_pictures wp ON wp.word_id = w.id
|
LEFT JOIN word_pictures wp ON wp.word_id = w.id
|
||||||
LEFT JOIN pictures p ON p.id = wp.picture_id
|
LEFT JOIN pictures p ON p.id = wp.picture_id
|
||||||
LEFT JOIN word_categories wc ON wc.word_id = w.id
|
LEFT JOIN word_categories wc ON wc.word_id = w.id
|
||||||
LEFT JOIN categories c ON c.id = wc.category_id
|
LEFT JOIN categories c ON c.id = wc.category_id
|
||||||
|
LEFT JOIN word_pictures wp2 ON wp2.word_id = w.id
|
||||||
WHERE w.id = $1
|
WHERE w.id = $1
|
||||||
GROUP BY w.id`,
|
GROUP BY w.id`,
|
||||||
[req.params.id]
|
[req.params.id]
|
||||||
|
|||||||
Reference in New Issue
Block a user