# Changelog
All notable changes to the Skraak MCP Server are documented here.
## [2026-03-07] JSON Schema for AviaNZ .data Files
**New feature:** Added JSON Schema (Draft 2020-12) for validating AviaNZ .data annotation files.
**Added:**
- `db/avianz_data_schema.json` — Comprehensive schema for .data file format
**Schema coverage:**
- Root array with metadata object first, then segment arrays
- Meta object with `Operator`, `Reviewer`, `Duration` (optional, allows extra fields)
- Segment array: 5-element tuple `[starttime, endtime, freq_low, freq_high, labels]`
- Label object with required `species` and `certainty` (0-100)
- Optional fields: `filter`, `calltype`, `comment` (max 140 chars)
- Additional properties allowed on all objects (extensibility)
- Pattern constraint: `species` must not contain `>` separator
**Validation tests:**
- Missing required fields caught
- Certainty range (0-100) enforced
- Comment length (max 140) enforced
- Minimal valid files accepted
## [2026-03-07] Comment Feature in Classify TUI
**New feature:** Press spacebar in the classify TUI to add/edit comments on labels.
**Changes:**
- `utils/data_file.go` — Added `Comment` field to `Label` struct, parse/write handling
- `tools/calls_classify.go` — Added `SetComment()` and `GetCurrentComment()` methods, `Comment` field in `BindingResult`
- `tui/classify.go` — Added `commentMode`/`commentText` state, spacebar opens dialog, text input handling, dialog rendering
**AviaNZ spec compliance:** The spec allows "any additional attributes defined for this call" as key-value pairs. Comments are stored as `"comment": "text"` in the label object.
**Usage:**
- `[space]` — Open comment dialog (pre-fills existing comment)
- Type comment (max 140 chars, ASCII only)
- `[enter]` — Save comment
- `[esc]` — Cancel (discard changes)
- `[backspace]` — Delete last character
- `[ctrl+u]` — Clear all
**Help text:** `[esc]quit [,]prev [.]next [space]comment [enter]play [shift+enter]½speed`
## [2026-03-04] Half-Speed Audio Playback in Classify TUI
**New feature:** Press Shift+Enter in the classify TUI to play audio at half speed.
**Changes:**
- `utils/resample.go` — **NEW** Linear interpolation resampling for speed changes
- `utils/audio_player.go` — Added `PlayAtSpeed(samples, sampleRate, speed)` method
- `tools/calls_classify.go` — Added `PlaybackSpeed` field to `ClassifyState`
- `tui/classify.go` — Detect Shift+Enter modifier, display "▶ Playing 0.5x..." in status
- `tui/classify.go` — Changed quit key from `q` to `Escape` (frees `q` for bindings)
**Usage:** `[esc]quit [enter]play [shift+enter]½speed`
## [2026-03-04] Performance Optimizations for calls-from-preds
**Problem:** Processing 7617 WAV files took 16 minutes due to excessive I/O and sequential processing.
**Changes:**
- `utils/wav_metadata.go` — Added `ParseWAVHeaderMinimal()` that reads only 4KB instead of 200KB per file (50× less I/O). Added separate buffer pool for minimal headers.
- `tools/calls_from_preds.go` — Added parallel processing with 8 workers for .data file generation. Small batches (<10 files) use sequential processing to avoid goroutine overhead.
- `tools/calls_from_preds.go` — Added `ProgressHandler` callback type for progress reporting during long operations.
- `cmd/calls.go` — Added progress indicator showing "Processing WAV files: X/Y (Z%)" during .data file writing.
**Expected improvement:** ~8× faster on multi-core systems due to parallel processing + reduced I/O overhead.
## [2026-03-04] Add iTerm2 Inline Image Protocol Support
**New feature:** Added `--iterm` flag for terminals supporting the iTerm2 Inline Image Protocol (WezTerm, iTerm2, VS Code terminal).
- `utils/terminal_image.go` — Added `ProtocolITerm` enum value and `WriteITermImage()` using charm's `x/ansi/iterm2` package; PNG-encodes then base64-encodes for the iTerm2 escape sequence
- `tools/calls_show_images.go` — Added `ITerm` field to `CallsShowImagesInput`, checked before `Sixel` in protocol selection
- `tools/calls_classify.go` — Added `ITerm` field to `ClassifyConfig`
- `cmd/calls.go` — Added `--iterm` flag to `show-images` subcommand
- `cmd/calls_classify.go` — Added `--iterm` flag to `classify` subcommand
- `tui/classify.go` — Renamed `sixelImageCmd` to `inlineImageCmd` with protocol parameter; changed conditionals from `== ProtocolSixel` to `!= ProtocolKitty` so both sixel and iTerm2 use the same inline rendering path
- `utils/terminal_image_test.go` — Tests for `WriteITermImage`, `WriteImage` routing, and `ClearImages` no-op
## [2026-02-28] Fix Kitty Image Rendering at 448px in Classify TUI
**Bug fix:** Spectrogram display upgraded from 224x224 to 448x448 pixels. Old image artifacts persisted between segment navigations at the larger size.
- `utils/kitty_image.go` — Chunked Kitty protocol transmission (4096-byte chunks) per spec; small images still sent as single payload
- `tui/classify.go` — Return `tea.ClearScreen` on navigation keys (`,`, `.`, bindings) to force full redraw and reliable image clearing
- `tui/classify.go` — `ResizeImage` call updated from 224x224 to 448x448
- `utils/kitty_image_test.go` — Tests for single-chunk, multi-chunk, and clear behavior
## [2026-02-28] Audio Playback in Classify TUI
**New feature:** Press Enter to play the current segment's audio during classification.
- Added `utils/audio_player.go` — wraps ebitengine/oto v3 for PCM playback
- Oto context created lazily on first play, reused across segments
- Converts `[]float64` samples → signed int16 LE for oto
- Playback stops automatically on navigation (`,`/`.`), binding keys, and quit
- "▶ Playing..." indicator shown in segment info line
- New dependency: `github.com/ebitengine/oto/v3` (requires `libasound2-dev` on Linux)
## [2026-02-22] New CLI Command: calls-from-preds
**New feature:** Extract clustered bird calls from ML predictions CSV files.
**Usage:**
```bash
./skraak calls-from-preds --csv predictions.csv > calls.json
```
**How it works:**
1. Reads prediction CSV (file, start_time, end_time, ebird_code columns with 1/0 values)
2. Auto-detects clip duration from first row
3. Groups detections by (file, ebird_code) and sorts by start_time
4. Clusters consecutive detections where gap ≤ 3 × clip_duration
5. Filters out single detections (configurable via constant)
**Constants (easily changeable):**
```go
CLUSTER_GAP_MULTIPLIER = 3 // Gap threshold = 3 × clip_duration
MIN_DETECTIONS_PER_CLUSTER = 1 // Filter single detections
```
**Performance:** 400k+ rows processed in ~0.67 seconds
**Output example:**
```json
{
"calls": [
{"file": "path.WAV", "start_time": 0, "end_time": 32, "ebird_code": "tomtit1", "detections": 11}
],
"total_calls": 62593,
"species_count": {"tomtit1": 12636, ...},
"files_count": 14017
}
```
**Files:**
- `tools/calls_from_preds.go` — Core clustering logic
- `cmd/calls_from_preds.go` — CLI handler
---
## [2026-02-21] Remove import_audio_file MCP Tool
**Breaking change:** Removed `import_audio_file` MCP tool. Use CLI command `skraak import file` for single file imports.
**Rationale:** The MCP tool was redundant since:
1. Single file imports are better suited for CLI use (requires file path on local machine)
2. `import_audio_files` handles batch imports efficiently via MCP
3. Reduces MCP tool count from 11 to 10
**Changes:**
- **`cmd/mcp.go`** — Removed `import_audio_file` tool registration and adapter
- **`tools/import_file.go`** — Kept for CLI use only
- **`cmd/import.go`** — CLI command `skraak import file` unchanged
**Migration:** Use CLI command instead:
```bash
./skraak import file --db ./db/skraak.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/file.wav
```
---
## [2026-02-21] Verb-First CLI Commands
**Breaking change:** Replaced resource-first CLI commands with natural language verb-first structure.
**Before:**
```bash
./skraak dataset create --name "Test"
./skraak location update --id abc123 --name "Updated"
```
**After:**
```bash
./skraak create dataset --name "Test"
./skraak update location --id abc123 --name "Updated"
```
**Changes:**
- **`main.go`** — Removed legacy `dataset`, `location`, `cluster`, `pattern` commands
- **`cmd/create.go`** — New verb-first create handler
- **`cmd/update.go`** — New verb-first update handler
- **`cmd/dataset.go`, `cmd/location.go`, `cmd/cluster.go`, `cmd/pattern.go`** — Exported create/update functions
- **Shell scripts** — Updated `test_bulk_import.sh` and `test_event_log.sh` to use new syntax
**Benefits:**
- Natural language flow: "create dataset" vs "dataset create"
- Consistent with `skraak import file/folder/bulk` pattern
- More intuitive for users
- Maintains clean tool separation in `@tools/` directory
**Migration:** Legacy commands now return "Unknown command" error, forcing adoption of new syntax.
---
## [2026-02-21] Fix Event Log Pointer Serialization
**Bug fix:** Event log contained pointer addresses instead of values for nullable database fields (`*float64`, `*GainLevel`, etc.), causing replay failures.
**Root cause:** `marshalParam()` in `db/tx_logger.go` didn't handle pointer types for numeric values or named type aliases (like `db.GainLevel`). These fell through to `fmt.Sprintf("%v", pointer)` which printed memory addresses like `"0x38a7bfb12078"`.
**Example of corrupted data:**
```json
"parameters": ["file_id", "2025-05-18T18:30:00+13:00", "248AB50053AB1B4A", "0x38a7bfb12078", "0x38a7bfb12088", "0x38a7bfb12090"]
```
The last three values should have been `gain`, `battery_v`, `temp_c` but were pointer addresses.
**Fixed:**
- `db/tx_logger.go` — Added explicit cases for all pointer types (`*int`, `*int64`, `*float64`, `*bool`, etc.)
- `db/tx_logger.go` — Added reflection-based fallback in default case to handle pointer-to-named-type (e.g., `*GainLevel`)
- `cmd/replay.go` — Increased `bufio.Scanner` buffer from 64KB to 20MB to handle large event lines (17,000 files = ~16 MB JSON line)
**Tests added:**
- `db/tx_logger_test.go` — Tests for `*int`, `*int64`, `*float64`, `*float32`, `*bool` with nil and value cases
- `db/tx_logger_test.go` — Tests for named type aliases and pointer-to-named-type
---
## [2026-02-19] Fix Update Commands - Preserve Unset Fields
**Bug fix:** Update commands were overwriting existing values with empty strings when optional flags weren't provided.
**Root cause:** CLI code set pointers to empty strings even when flags weren't provided, causing tools layer to interpret them as intentional empty values.
**Fixed:**
- `cmd/dataset.go` — `runDatasetUpdate()` now only sets pointer fields when flags have non-empty values
- `cmd/location.go` — `runLocationUpdate()` now only sets pointer fields when flags have non-empty values
- `cmd/cluster.go` — Already correct (only sets fields when provided)
- `cmd/pattern.go` — Already correct (only sets fields when provided)
**Tests added:**
- `tools/update_test.go` — Unit tests verifying update preserves unset fields for all entity types
---
## [2026-02-19] Schema Simplification - Remove species_dataset and ebird_taxonomy_v2024
**Database schema changes:**
- Dropped `species_dataset` table — all species now available across all datasets
- Dropped `ebird_taxonomy_v2024` table — use `WHERE taxonomy_version = '2024'` on `ebird_taxonomy` instead
**Rationale:**
- Simplifies species management (no duplicate species names across datasets)
- Reduces schema complexity (one fewer join for species lookups)
- `ebird_taxonomy_v2024` was redundant; filtering `ebird_taxonomy` directly is sufficient
**Code changes:**
- `tools/export.go` — Simplified manifest: `species` and `call_type` now "copy" (full table)
- `tools/export.go` — Removed `buildDerivedTableCreate()`, `populateDerivedTable()`, simplified `buildReferencedQuery()`
- `tools/import_ml_selections.go` — Species lookup no longer joins `species_dataset`
- `resources/schema.go` — Removed tables from list
- `db/schema_test.go` — Removed obsolete test cases
- `prompts/examples.go` — Updated taxonomy schema description
**Export manifest changes:**
- `species_dataset` → removed (no longer exists)
- `ebird_taxonomy_v2024` → removed (no longer exists)
- `species` → changed from "referenced" to "copy"
- `call_type` → changed from "referenced" to "copy"
- `filter` → changed from "referenced" to "copy"
- All "referenced" and "derived" handling code removed
---
## [2026-02-19] Dataset Export for Collaboration and Testing
**New feature: Export a dataset with all related data to a new database**
**Purpose:** Enable dataset-level exports for collaboration (export, modify, replay changes), testing (small focused test DBs), and archival.
**Architecture:**
- Schema read from embedded `db/schema.sql` (DDL statements extracted dynamically)
- Table copy order computed from FK relationships using `duckdb_constraints()`
- ATTACH mechanism for efficient cross-database copying
- Declarative manifest defines table relationships
**Added:**
- `tools/export.go` — `ExportDataset()` with table manifest and copy logic
- `cmd/export.go` — `skraak export dataset` CLI command
- `db/schema.go` — Schema utilities: `ReadSchemaSQL()`, `ExtractDDLStatements()`, `GetFKOrder()`
- `shell_scripts/test_export.sh` — Integration test script
**Command:**
```bash
skraak export dataset --db skraak.duckdb --id abc123 --output export.duckdb
skraak export dataset --db skraak.duckdb --id abc123 --output export.duckdb --dry-run
skraak export dataset --db skraak.duckdb --id abc123 --output export.duckdb --force
```
**What's exported:**
- Dataset row and all owned data (locations, clusters, files, selections, labels)
- Reference tables copied in full (`ebird_taxonomy`, `species`, `call_type`, `cyclic_recording_pattern`, `filter`)
- Empty event log created for capturing changes
**Design decisions:**
- Schema from `schema.sql` ensures schema-resilience (new columns auto-included)
- FK order computed dynamically via `duckdb_constraints()` function
- Close source DB before output DB (DuckDB single-connection limit)
- `SELECT *` copies all columns without hard-coding
**Testing:**
- `db/schema_test.go` — Unit tests for DDL extraction and FK ordering
- Integration tests verify row counts match source
- Error handling tests for missing dataset, existing file
---
## [2026-02-18] Event Log for Database Mutation Replay
**New feature: SQL-level event logging for backup synchronization**
**Purpose:** Capture all mutating SQL operations (INSERT, UPDATE, DELETE) to enable replay on backup databases for synchronization.
**Architecture:**
- Transaction wrapper (`db.LoggedTx`) intercepts all mutations
- Logged only on successful commit (rollback discards recorded queries)
- Events written to JSONL file (`<database>.events.jsonl`)
- Prepared statements fully supported via `LoggedStmt` wrapper
**Added:**
- `db/tx_logger.go` — LoggedTx, LoggedStmt, TransactionEvent types
- `cmd/replay.go` — `skraak replay events` CLI command
- `shell_scripts/test_event_log.sh` — Integration test script
**Modified:**
- All CLI commands initialize event log with defer close
- All tools use `db.BeginLoggedTx()` instead of `database.BeginTx()`
- `utils/cluster_import.go` updated for batch imports
**Event format (JSONL):**
```json
{
"id": "V1StGXR8_Z5jdHi6B-myT",
"timestamp": "2026-02-18T14:30:22+13:00",
"tool": "create_or_update_dataset",
"queries": [
{"sql": "INSERT INTO ...", "parameters": [...]}
],
"success": true,
"duration_ms": 45
}
```
**Replay command:**
```bash
skraak replay events --db backup.duckdb --log skraak.duckdb.events.jsonl
skraak replay events --db backup.duckdb --log events.jsonl --dry-run
skraak replay events --db backup.duckdb --log events.jsonl --last 10
```
**Key design decisions:**
- SQL-level (not tool-level) for complete fidelity including imports
- Tool name included for context/debugging
- Only successful transactions logged
- Failed events skipped during replay
- `--continue` flag to proceed past errors
**Testing:**
- `db/tx_logger_test.go` — 123 unit tests, 75.9% coverage
- Pure function tests (isMutation, marshalParam, JSON marshaling)
- Integration tests with real DuckDB and file system
- Race detector verified
---
## [2026-02-11] CLI Refactoring — Two-Layer Architecture
**Major refactoring: Separated core logic from MCP types, added CLI commands**
**Problem:** All tool functions were tightly coupled to MCP SDK types (`*mcp.CallToolRequest`, `*mcp.CallToolResult`). This meant functionality could only be invoked via MCP protocol — no CLI access for power users.
**Solution:** Two-layer architecture separating core logic from MCP adapters.
**Created:**
- `cmd/mcp.go` — MCP server setup + 10 thin adapter wrappers (~3 lines each)
- `cmd/import.go` — `skraak import bulk` CLI command with flag parsing
- `cmd/sql.go` — `skraak sql` CLI command for ad-hoc queries
**Modified (mechanical, all tools/):**
- Removed `*mcp.CallToolRequest` parameter (was never used — `req` always ignored)
- Removed `*mcp.CallToolResult` from returns (was always empty `&mcp.CallToolResult{}`)
- Removed `import "github.com/modelcontextprotocol/go-sdk/mcp"` from all tool files
- Updated test files (`integration_test.go`, `pattern_test.go`) to match new signatures
- Updated `main.go` to pure dispatcher: `mcp | import | sql`
**Architecture:**
```
main.go → pure dispatcher
cmd/mcp.go → MCP server + adapter wrappers (ONLY file importing mcp SDK)
cmd/import.go → CLI: skraak import bulk --db ... --dataset ... --csv ... --log ...
cmd/sql.go → CLI: skraak sql --db ... "SELECT ..."
tools/*.go → core logic, NO mcp dependency (plain Go structs in/out)
utils/, db/, etc. → unchanged
```
**Benefits:**
- CLI access for power users without MCP
- Token savings (CLI avoids MCP protocol overhead)
- Code sharing between CLI and MCP
- MCP SDK contained to one file
- All tests pass
---
## [2026-02-10] Bulk File Import Cluster Assignment Bug Fix
**Critical Bug Fix: Files now correctly distributed across multiple clusters for same location**
**Problem:** When the same location appeared multiple times in the CSV with different date ranges, all files ended up in the last cluster created instead of being distributed across their respective clusters.
**Root Cause:** The `clusterIDMap` used only `LocationID` as the key, causing each new cluster for the same location to overwrite the previous one in the map.
**Solution:** Changed map key from `LocationID` to composite key `LocationID|DateRange`.
**Modified:**
- `tools/bulk_file_import.go` (lines 125, 171-172, 183-184)
**Impact:**
- Data integrity restored
- Multiple date ranges per location now works correctly
- Simple 3-line fix, backwards compatible
---
## [2026-02-07] File Modification Time Fallback
**Enhancement: Added file modification time as third timestamp fallback**
**Problem:** Small clusters (1-2 files) failed variance-based filename disambiguation because the algorithm needs multiple samples to determine date format (YYYYMMDD vs YYMMDD vs DDMMYY).
**Timestamp Resolution Order:**
```
1. AudioMoth comment → timestamp
2. Filename parsing → timestamp
3. File modification time → timestamp (NEW!)
4. FAIL (skip file with error)
```
**Modified:**
- `utils/cluster_import.go` - Added FileModTime fallback in `batchProcessFiles()`
**Benefits:**
- Fewer failures in small clusters
- No performance impact
- Backwards compatible
- Simple 10-line change
---
## [2026-02-07] Cluster Import Logic Extraction
**Major refactoring: Extracted shared cluster import logic into utils module**
**Key Insight:** A cluster is the atomic unit of import (one SD card / one recording session / one folder).
**Created:**
- `utils/cluster_import.go` (553 lines) - Single source of truth for cluster imports
- `ImportCluster()` - Main entry point
- `scanClusterFiles()` - Recursive WAV file scanning
- `batchProcessFiles()` - Batch processing with variance-based parsing
- `insertClusterFiles()` - Transactional insertion
**Modified:**
- `tools/import_files.go` - 75% code reduction (650 lines → 161 lines)
- `tools/bulk_file_import.go` - Bug fixes:
- **CRITICAL BUG FIXED:** Now inserts into `file_dataset` table (was missing!)
- **CRITICAL BUG FIXED:** Now inserts into `moth_metadata` table (was missing!)
**Benefits:**
- Bug fixed: 68,043 orphaned files found in test database
- ~500 lines of duplicated code eliminated
- Single source of truth for all import logic
---
## [2026-02-06] Tool Consolidation
**Consolidated 8 write/update tools → 4 create_or_update tools**
**Deleted:**
- 8 separate create/update tool files
**Added:**
- `tools/dataset.go` - `create_or_update_dataset`
- `tools/location.go` - `create_or_update_location`
- `tools/cluster.go` - `create_or_update_cluster`
- `tools/pattern.go` - `create_or_update_pattern`
**Design:**
- Omit `id` field → CREATE mode (generates nanoid)
- Provide `id` field → UPDATE mode (verifies exists)
**Benefits:**
- Tool count: 14 → 10
- ~31% less code (~320 lines removed)
- Shared validation logic
---
## [2026-02-06] Test Script Consolidation
**Rationalized and consolidated shell test scripts**
**Removed redundant scripts:**
- 6 incomplete/redundant test scripts
**Current test suite (8 scripts):**
1. `get_time.sh` - Time tool
2. `test_sql.sh` - SQL query tool
3. `test_tools.sh` - All create_or_update tools
4. `test_import_file.sh` - Single file import
5. `test_import_selections.sh` - ML selection import
6. `test_bulk_import.sh` - Bulk CSV import
7. `test_resources_prompts.sh` - Resources/prompts
8. `test_all_prompts.sh` - All 6 prompts
---
## [2026-02-06] Bulk File Import Tool
**New Feature: CSV-based bulk import across multiple locations and clusters**
**Added:**
- `tools/bulk_file_import.go` - CSV-based bulk import (~500 lines)
**Features:**
- CSV-driven import for multiple locations
- Auto-cluster creation
- Progress logging to file
- Summary statistics
**CSV Format:**
```csv
location_name,location_id,directory_path,date_range,sample_rate,file_count
Site A,loc123456789,/path/to/recordings,2024-01,48000,150
```
---
## [2026-02-02] Single File Import Tool
**New Feature: Import individual WAV files**
**Added:**
- `tools/import_file.go` - Single file import implementation (~300 lines)
**Features:**
- Import one WAV file at a time with detailed feedback
- Same processing pipeline as batch import
- Duplicate detection with `is_duplicate` flag
- Atomic operation (succeeds completely or fails)
---
## [2026-01-29] ML Selection Import Tool
**New Feature: Import ML-detected kiwi call selections from folder structure**
**Added:**
- `utils/selection_parser.go` - Selection parsing utilities
- `utils/selection_parser_test.go` - 34 test cases
- `tools/import_ml_selections.go` - MCP tool (~1050 lines)
**Features:**
- Folder structure: `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`
- Two-pass file matching (exact, then fuzzy)
- Comprehensive validation
- Transactional import
---
## [2026-01-28] Comprehensive Go Unit Testing
**Added comprehensive unit test suite**
**Added:**
- `utils/astronomical_test.go` - 11 test cases
- `utils/audiomoth_parser_test.go` - 36 test cases
- `utils/filename_parser_test.go` - 60 test cases
- `utils/wav_metadata_test.go` - 22 test cases
- `utils/xxh64_test.go` - 6 test cases
**Coverage:**
- 170+ tests total
- 91.5% code coverage
---
## [2026-01-26] Generic SQL Tool + Codebase Rationalization
**Major architectural change: Replaced 6 specialized tools with generic SQL**
**Deleted:**
- 6 specialized query tools (datasets, locations, clusters, files)
- 2 obsolete test scripts
**Added:**
- `tools/sql.go` - Generic `execute_sql` tool (~200 lines)
- `shell_scripts/test_sql.sh` - Comprehensive SQL test suite
**Modified:**
- `prompts/examples.go` - Rewritten to teach SQL patterns
**Benefits:**
- Full SQL expressiveness (JOINs, aggregates, CTEs)
- Infinite query possibilities vs 6 fixed queries
- More aligned with MCP philosophy
- Smaller codebase (2 tools instead of 8)
**Security:**
- Database read-only
- Validation blocks write operations
- Parameterized queries prevent SQL injection
- Row limits prevent overwhelming responses
---
## [2026-01-26] Shell Scripts Organization
**Reorganized all shell scripts into `shell_scripts/` directory**
- Keeps project root clean
- All scripts updated with correct relative paths