QQOATNCITSSIPKVUFNZEPN73TGU244GIAW6K37SILAGQWVQ4TCQQC OKSUMZ3O5L5WJU563XBC5VYLJOZXI7QXNN5BTR4DUMYFP6IRZZLAC RIQOYQLXANWURHI4HFTAS5HYBFONPBGFKD76XLY7PZ67FDZNIGYQC JCXEAHITH33V2SFDZX6JNPNFHSZPVSVLF556TMXDVJPDWS2TBINAC IFVRAERTCCDICNTYTG3TX2WASB6RXQQEJWWXQMQZJSQDQ3HLE5OQC VZGXBNYYO3E7EPFQ4GOLNVMRXXTQDDQZUU2BZ6JHNBDY4B2QLDAAC NKQAT3RE4IBIWXVMI5LJUINDPHTANNMORZ5N2JFA4AN6UUB72KGAC 5LMYPB2QHNVDLYCRWLOMCPY35ZKHHPYVW5XHASE66L6PJZSOCXYQC ZVYOPUNH7UJL3YALGNNXQW2B4H4ONI5Z6XWAUZUONFG7LR55W4SQC DORZF5HSV672ZP5HUDYB3J6TBH5O2LMXJE4HPSE7H5SOGZQBDCXQC HCOPW5FXSTZYJNQNXT3H7SVD34DMPYHRFGL5UFGNR5JYNDXVK7LQC L4STQEXDGCPZXDHTEUBCOQKBMTFDRVXRLNFQHPDHOVXDCJO33LQQC OGLLBQQYE5KICDMI6EX7ZI4TZT5RB7UFHH7O2DUOZ44QQXVL5YAAC 47GPFVLW7RWBBHHUZYMEEYWG3KBJBWELR7RDKMJRWMNRWYJUBR7QC 7NS27QXZMVTZBK4VPMYL5IKGSTTAWR6NDG5SOVITNX44VNIRZPMAC X3K56A54LNNXODOH6MK22NTSEUQ54BUEZ3EL6ANKXYNL4RROL73QC 2GJMZ6YA6OPHNS5KFFFI6POQ2BJ33SSS3NIPXYBFTJSN4BZBVEVAC 4AFSDSVWQCDWDJEH3DD2S7UUB2LHLQOZLH5SZ6LS4LCSBG4EORXAC OQ7Q4PCX3EKNP5IGOCSCTZGWAKX6HQYSEHRI7BPGINJFAXSVUANQC GXVVTHNXT2IZPR4OB77VMU6GXFEA5TUFZ2MHMA5ASU2DSTFPLDLQC PZHNIV62T77A3VPGPAYURINYRMUJKMNQHTHYD7L22X7WDZSSKQ7QC 2Y2ZW565SRONQ2UXPLX5SRP2HDFWMRF5KDXKSKVRCHBBGEGMTVIQC 3KVRYKAIMTYO3PNM4PYGCEB2C522NKKJYH5RXLHILFHXVJ6XZT7QC MK5UPYCRKUKCIBWBKLCEW754G4JJWSU2R2JCYNECUG2IRFLQKVNAC **Claude Code config***claude mcp add --transport stdio skraak_mcp -- /home/david/go/src/skraak/skraak mcp --db /home/david/go/src/skraak/db/skraak.duckdbclaude mcp add --transport stdio test_mcp -- /home/david/go/src/skraak/skraak mcp --db /home/david/go/src/skraak/db/test.duckdbremove: claude mcp remove skraak_mcp
**⚠️ EXTREMELY IMPORTANT**: When testing shell scripts or any end-to-end functionality, **ALWAYS** use the test database, **NEVER** the production database!
- CHANGELOG.md: Detailed change history with rationale- CLAUDE.md: Essential patterns, policies, and quick reference- This file is expensive (loaded every session) - keep it concise
**WRONG - DO NOT USE production database:**```bash# ❌ NEVER DO THIS - WILL CORRUPT PRODUCTION DATABASE./test_sql.sh ../db/skraak.duckdb./test_sql.sh # Uses skraak.duckdb by default - DANGEROUS!```**Why this matters:**- `db/skraak.duckdb` is the **PRODUCTION** database with real data- `db/test.duckdb` is the **TEST** database for safe testing- Even though the database is read-only, repeated connections during testing can cause lock issues- DuckDB may create temporary files (.wal, .tmp) that can interfere with production access- Test scripts make many rapid connections that can stress the database
## 🚨 Critical Database Safety
**Default behavior:**- All shell scripts default to `../db/skraak.duckdb` if no argument is provided- **YOU MUST EXPLICITLY SPECIFY** `../db/test.duckdb` when testing- Better yet: Always pipe to files to avoid accidents
### ALWAYS Use Test Database for Testing
# All shell scripts are in shell_scripts/ directorycd shell_scripts# ALWAYS use test database and pipe output to file./test_sql.sh ../db/test.duckdb > test.txt 2>&1# Then use targeted searches to verify resultsrg -i "error" test.txt # Check for errorsrg '"result":' test.txt | wc -l # Count successful responsesrg '"isError":true' test.txt | wc -l # Count validation errors (expected)
./test_sql.sh # Uses skraak.duckdb by default - DANGEROUS!
**WRONG APPROACH:**```bash# ❌ DON'T DO THIS - may crash with massive outputcd shell_scripts && ./test_sql.sh```
- `db/skraak.duckdb` = **PRODUCTION** (1.19M files, 139 locations)- `db/test.duckdb` = **TEST** (safe for testing)- **Always specify test.duckdb explicitly**
**IMPORTANT: All shell scripts are located in the `shell_scripts/` directory** to keep the project organized.
- **Always pipe to file** (prevents token overflow from large output)- Navigate to `shell_scripts/` before running tests- Verify: `rg '"result":' test.txt | wc -l`
All test scripts accept an optional database path argument and **default to `../db/test.duckdb`** for safety!- Default: `../db/test.duckdb` ✅ **SAFE - Use for testing**- Production: `../db/skraak.duckdb` ⚠️ **Only use in production**
---
**Core functionality:**1. **get_time.sh** - Quick test of get_current_time tool (no database needed)2. **test_sql.sh [db_path]** - Tests execute_sql tool with various queries- Tests: simple SELECT, parameterized queries, JOINs, aggregates, security validation- Always pipe to file and use test database!
## Package Organization
**Write tools (create/update):**3. **test_tools.sh [db_path]** - Comprehensive test of all 4 create_or_update tools- Tests: create_or_update_dataset, create_or_update_location, create_or_update_cluster, create_or_update_pattern- Tests both create mode (no id) and update mode (with id)- Tests both valid inputs (should succeed) and invalid inputs (should fail)
**Simple rule:** If called by `cmd/`, it goes in `tools/`. If called by `tools/`, it goes in `utils/`.
**Import tools:**4. **test_import_file.sh [db_path]** - Tests import_audio_file tool (single file import)5. **test_import_selections.sh [db_path]** - Tests import_ml_selections tool setup
- **`utils/`** - Reusable helpers (no MCP types, no `*Input`/`*Output` structs)- **`tools/`** - MCP/CLI tools (one file per tool, defines input/output types)- **`cmd/mcp.go`** - MCP adapters (only file importing MCP SDK)- **`cmd/*.go`** - CLI commands (parse flags, call tools, print JSON)
**Resources and prompts:**6. **test_resources_prompts.sh [db_path]** - Tests resources and prompts7. **test_all_prompts.sh [db_path]** - Tests all 6 prompts### Verifying Test Success
---
## Package Organization Policy### tools/ vs utils/ - When to Use Which**`utils/`** - Utility functions (helpers):- Reusable helpers called by tools, CLI, or other utils- No MCP tool types (no `*Input`/`*Output` structs)- Examples: `ValidateXxx()`, `ParseFilename()`, `EntityExists()`, `ImportCluster()`**`tools/`** - MCP/CLI tools (the actual tools):- One file per tool, called directly by CLI commands and MCP adapters- Defines tool input/output types (`DatasetInput`, `ClusterOutput`, etc.)- Examples: `dataset.go`, `cluster.go`, `import_files.go`**Simple rule:** If it's called by `cmd/` or `cmd/mcp.go`, it goes in `tools/`. If it's called by `tools/`, it goes in `utils/`.## Project Overview### ArchitectureThe Skraak MCP Server is a Model Context Protocol (MCP) server **and CLI tool** written in Go that provides a **generic SQL query interface** for an acoustic monitoring system. It follows a two-layer architecture:- **Tools** (`tools/`): MCP/CLI tool implementations. Each file = one tool. Defines input/output types.- **Utils** (`utils/`): Reusable helper functions. Called by tools, never by CLI/MCP directly.- **MCP adapters** (`cmd/mcp.go`): Thin wrappers bridging MCP types to tool functions.- **CLI commands** (`cmd/`): Parse flags, call tool functions, print JSON results.MCP capabilities:- **Tools** (model-controlled): Generic SQL query execution + time utility- **Resources** (application-driven): Full database schema for context- **Prompts** (user-controlled): SQL workflow templates that teach query patterns### Philosophy: Schema + Generic SQL > Specialized Tools**Why Generic SQL:**- LLMs can construct any query given the schema (infinite flexibility)- No rigid tool APIs to learn (just SQL)- Full SQL expressiveness: JOINs, aggregates, CTEs, subqueries- Prompts teach SQL patterns instead of tool calling
main.go → CLI dispatcher (mcp | import | sql | dataset | ...)cmd/mcp.go → MCP server + thin adapters (ONLY MCP SDK import)cmd/*.go → CLI commands (flags → tools → JSON output)tools/*.go → Core logic (plain Go structs, no MCP dependency)utils/*.go → Reusable helpersdb/ → Database connection + types```
**Previous specialized tools were limiting:**- Each tool = one fixed query- Couldn't filter beyond hardcoded parameters- Couldn't JOIN tables or use aggregates- Created artificial boundaries
**Philosophy:** Schema + Generic SQL > Specialized Tools- LLMs construct queries from schema (infinite flexibility)- Full SQL expressiveness (JOINs, aggregates, CTEs)- Prompts teach SQL patterns, not tool APIs
├── main.go # CLI dispatcher (mcp | import | sql)├── cmd/ # Command entry points (only MCP importer)│ ├── mcp.go # MCP server setup + adapter wrappers│ ├── import.go # CLI: skraak import bulk ...│ └── sql.go # CLI: skraak sql ...
├── main.go # CLI dispatcher├── cmd/ # MCP adapters + CLI commands│ ├── mcp.go # MCP server (only file with MCP SDK)│ ├── import.go, sql.go # CLI commands
│ ├── db.go # Database connection (read-only mode)│ ├── types.go # Type definitions│ ├── schema.sql # Database schema (348 lines)│ ├── skraak.duckdb # Production database ⚠️│ └── test.duckdb # Test database (use for testing) ✅├── tools/ # MCP tool implementations│ ├── time.go # get_current_time│ ├── sql.go # execute_sql (generic query)│ ├── import_files.go # import_audio_files (batch WAV import)│ ├── import_file.go # import_file (single WAV file import)│ ├── import_unstructured.go # import_unstructured (unstructured dataset import)│ ├── import_ml_selections.go # import_ml_selections (ML detection import)│ ├── bulk_file_import.go # bulk_file_import (CSV-based bulk import)│ ├── dataset.go # create_or_update_dataset│ ├── location.go # create_or_update_location│ ├── cluster.go # create_or_update_cluster│ └── pattern.go # create_or_update_pattern├── resources/│ └── schema.go # Schema resources (full & per-table)├── prompts/│ └── examples.go # SQL workflow templates (6 prompts)├── utils/ # Pure utility functions (reusable)│ ├── astronomical.go # Solar/civil night, moon phase calculations│ ├── astronomical_test.go # Tests (11 test cases)│ ├── audiomoth_parser.go # AudioMoth WAV comment parsing│ ├── audiomoth_parser_test.go # Tests (36 test cases)│ ├── filename_parser.go # Filename timestamp parsing + timezone│ ├── filename_parser_test.go # Tests (60 test cases)│ ├── selection_parser.go # ML selection filename/folder parsing│ ├── selection_parser_test.go # Tests (34 test cases)│ ├── validation.go # Input validation helpers│ ├── validation_test.go # Validation tests│ ├── wav_metadata.go # WAV file header parsing│ ├── wav_metadata_test.go # Tests (22 test cases)│ ├── xxh64.go # XXH64 hash computation│ └── xxh64_test.go # Tests (6 test cases)└── shell_scripts/ # Shell test scripts (end-to-end MCP tests)├── get_time.sh # Time tool test (no database)├── test_sql.sh # SQL tool tests├── test_tools.sh # All write tools tests (create/update)├── test_import_file.sh # Single file import tests├── test_import_selections.sh # ML selection import tests├── test_bulk_import.sh # Bulk file import tests├── test_resources_prompts.sh # Resources/prompts tests├── test_all_prompts.sh # All 6 prompts tests└── TESTING.md # Comprehensive testing documentation
│ ├── skraak.duckdb # Production ⚠️│ └── test.duckdb # Test ✅├── tools/ # 11 tools (MCP-free)│ ├── time.go, sql.go # Read tools (2)│ ├── dataset.go, location.go, cluster.go, pattern.go # Write tools (4)│ ├── import_{files,file,ml_selections,unstructured}.go, bulk_file_import.go # Import (5)├── utils/ # Reusable helpers│ ├── cluster_import.go # Centralized import logic (553 lines)│ ├── validation.go # Input validation│ ├── astronomical.go # Solar/civil night, moon phase│ ├── audiomoth_parser.go # AudioMoth WAV comment parsing│ ├── filename_parser.go # Filename timestamp parsing + timezone│ ├── selection_parser.go # ML selection parsing│ ├── wav_metadata.go # WAV header parsing│ ├── xxh64.go # XXH64 hash│ └── *_test.go # 170+ tests, 91.5% coverage├── resources/schema.go # Schema resources├── prompts/examples.go # 6 SQL workflow templates└── shell_scripts/ # 8 end-to-end test scripts
### Time Tool- `get_current_time` - Returns current system time with timezone and Unix timestamp
### Read Tools (2)- `get_current_time` - System time with timezone- `execute_sql` - Generic SQL SELECT queries (supports JOINs, aggregates, CTEs)- Security: Read-only database + keyword validation- Limits: Default 1000 rows (max 10000)
### Generic SQL Query Tool- `execute_sql` - Execute arbitrary SQL SELECT queries- **Supports**: SELECT, WITH (CTEs), parameterized queries (? placeholders)- **Security**: Database is read-only (enforced by DuckDB), forbidden keyword validation- **Limits**: Default 1000 rows (max 10000) to prevent overwhelming responses- **Output**: Generic JSON results with column metadata- **Use with**: Schema resources to construct any query you need### Import Tools
### Write Tools (4)- `create_or_update_dataset` - Omit `id` to create, provide `id` to update- `create_or_update_location` - GPS + timezone- `create_or_update_cluster` - Recording groups- `create_or_update_pattern` - Record/sleep cycles
- `import_audio_files`, `import_file`, `bulk_file_import`, `import_ml_selections`: Require `'structured'` datasets- `import_unstructured`: Requires `'unstructured'` datasets
- Structured imports: `import_audio_files`, `import_file`, `bulk_file_import`, `import_ml_selections`- Unstructured imports: `import_unstructured`
- `import_audio_files` - Batch import WAV files from folder (structured datasets)- Automatically parses AudioMoth and filename timestamps- Calculates XXH64 hashes, extracts metadata- Computes astronomical data (solar/civil night, moon phase)- Skips duplicates (by hash), imports in single transaction
**Tools:**- `import_audio_files` - Batch import folder → cluster- `import_audio_file` - Single file import- `bulk_file_import` - CSV-driven multi-location import- `import_ml_selections` - ML detection folders (`Clips_{filter}_{date}/Species/CallType/*.wav`)- `import_unstructured` - CLI-only, no location/cluster hierarchy
- `import_audio_file` - Import a single WAV file into the database (structured datasets)- **Input**: Absolute path to WAV file, dataset/location/cluster IDs- **Processing**: Same as batch import (AudioMoth/filename timestamps, hash, metadata, astronomical data)- **Output**: Detailed file metadata including file_id, hash, duration, sample_rate, timestamps- **Duplicate detection**: Returns `is_duplicate=true` if file hash already exists- **Use case**: Import individual files without scanning folders- **Example**:```json{"name": "import_audio_file","arguments": {"file_path": "/path/to/recording.wav","dataset_id": "abc123xyz789","location_id": "def456uvw012","cluster_id": "ghi789rst345"}}```- **Output**:```json{"file_id": "nB3xK8pLm9qR5sT7uV2wX","file_name": "recording.wav","hash": "a1b2c3d4e5f6g7h8","duration_seconds": 60.0,"sample_rate": 250000,"timestamp_local": "2024-01-15T20:30:00+13:00","is_audiomoth": true,"is_duplicate": false,"processing_time": "250ms"}```- `import_ml_selections` - Import ML-detected kiwi call selections- **Input**: Folder structure `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`- **Parses**: Selection filenames `{base}-{start}-{end}.wav`- **Validates**: Filter, species, call types, files, selection bounds- **Two-pass file matching**: Exact match then date_time pattern match- **Inserts**: selection → label (species) → label_subtype (call type)- **Transaction**: All-or-nothing import with comprehensive error reporting- `import_unstructured` - Import WAV files into unstructured dataset (CLI only)- **Input**: Folder path and dataset ID (must be 'unstructured' type)- **Processing**: Minimal metadata - hash, duration, sample_rate, file modification time as timestamp- **No hierarchy**: location_id and cluster_id are NULL- **No astronomical data**: maybe_solar_night, maybe_civil_night, moon_phase are NULL- **Duplicate detection**: By hash - skips files that already exist in database (no linking, no modification)- **Use case**: Import miscellaneous recordings without location/cluster structure- **CLI**: `skraak import unstructured --db ./db/skraak.duckdb --dataset abc123 --path /path/to/folder`### Create/Update Tools- `create_or_update_dataset` - Create (omit id) or update (provide id) a dataset- `create_or_update_location` - Create or update a location with GPS coordinates and timezone- `create_or_update_cluster` - Create or update a cluster within a location- `create_or_update_pattern` - Create or update a cyclic recording pattern (record/sleep cycle)### Security**Database is read-only** (db/db.go:27):```goreadOnlyPath := dbPath + "?access_mode=read_only"```**Validation layers:**1. Regex validation: Must start with SELECT or WITH2. Forbidden keywords: Blocks INSERT/UPDATE/DELETE/DROP/CREATE/ALTER3. Row limiting: Prevents overwhelming responsesAll write operations are blocked at both database and validation levels.
**Import pipeline:**- AudioMoth comment → filename → file modtime (timestamp fallback chain)- XXH64 hash, metadata extraction, astronomical calculations- Centralized logic in `utils/cluster_import.go`
**ID Format Validation:**- Short IDs (12 chars): dataset, location, cluster, pattern, species, filter- Long IDs (21 chars): file, selection, label- Validates alphanumeric format (nanoid compatible)
- Database **read-only** (`db/db.go:27` appends `?access_mode=read_only`)- Validation: Regex (SELECT/WITH only) + forbidden keywords- Parameterized queries prevent SQL injection- Application-level validation: ID format, numeric bounds, string lengths, entity existence
**String Length Validation:**- Names: max 140 characters (location, cluster)- Dataset names: max 255 characters- Descriptions: max 255 characters- Paths: max 255 characters- Timezone IDs: max 40 characters**Entity Validation:**- Existence checks before foreign key references- Active status validation (cannot update inactive entities)- **Dataset type validation for imports**:- Structured imports (import_file, import_audio_files, bulk_file_import, import_ml_selections) require 'structured' datasets- Unstructured imports (import_unstructured) require 'unstructured' datasets- Hierarchy consistency (location must belong to dataset, cluster to location)
## Resources & Prompts
**Duplicate Handling:**- Pattern: Returns existing if same record_s/sleep_s- Dataset: Returns existing if same name- Location: Returns existing if same name in dataset- Cluster: Returns existing if same name in location## Resources### Schema Resources- `schema://full` - Complete 348-line database schema (SQL)
### Resources- `schema://full` - Complete 348-line schema
**Valid table names**: dataset, location, cluster, file, selection, label, species, species_group, genus, family_group, order_group, family, order, class, phylum, kingdom, kiwi_call, call, syllable, and more (see schema.sql)
### PromptsSix SQL workflow templates:1. `query_active_datasets` - Dataset querying2. `explore_database_schema` - Schema exploration3. `explore_location_hierarchy` - Hierarchy navigation4. `query_location_data` - Location analysis5. `analyze_cluster_files` - File analysis6. `system_status_check` - Health check
1. **query_active_datasets** - Dataset querying with SQL SELECT and GROUP BY2. **explore_database_schema** - Interactive schema exploration (resource-based)3. **explore_location_hierarchy** - Hierarchy navigation with SQL JOINs4. **query_location_data** - Location analysis with SQL filtering and aggregates5. **analyze_cluster_files** - File analysis with SQL aggregate functions6. **system_status_check** - Comprehensive health check workflowAll prompts teach SQL patterns with complete examples.## Example SQL Queries### Basic Queries
## SQL Examples
**Get locations for a dataset (parameterized):**```json{"query": "SELECT id, name, latitude, longitude FROM location WHERE dataset_id = ? AND active = true","parameters": ["vgIr9JSH_lFj"]}```### JOINs (Now Possible!)
-- Parameterized (use execute_sql with parameters array)SELECT * FROM location WHERE dataset_id = ? AND active = true;
GROUP BY d.nameORDER BY d.name;```### Aggregates (Now Possible!)**Cluster file statistics:**```sqlSELECTCOUNT(*) as total_files,SUM(duration) as total_duration,AVG(duration) as avg_duration,MIN(timestamp_local) as first_recording,MAX(timestamp_local) as last_recording,SUM(CASE WHEN maybe_solar_night THEN 1 ELSE 0 END) as night_filesFROM fileWHERE cluster_id = ? AND active = true;```### Complex Analysis (New Possibilities!)**Geographic distribution:**```sqlSELECTd.name as dataset,COUNT(DISTINCT l.id) as locations,AVG(l.latitude) as avg_latitude,AVG(l.longitude) as avg_longitudeFROM dataset dLEFT JOIN location l ON d.id = l.dataset_idWHERE d.active = true
## Database Information### Database PathDefault: `./db/skraak.duckdb`
**Best practices:** Always `WHERE active = true`, use parameterized queries for IDs, use `LEFT JOIN` to include parent records, use `COUNT(DISTINCT)` when joining.
### Key Tables- **dataset** - Project datasets (organise/test/train types)- **location** - Recording locations with GPS coordinates (139 active locations)- **cluster** - Grouped recordings at locations- **file** - Individual audio files with metadata- **label** - Annotations and classifications- **species** - Taxonomy information
---
./skraak sql --db ./db/skraak.duckdb "SELECT COUNT(*) FROM file WHERE active = true"./skraak sql --db ./db/skraak.duckdb --limit 10 "SELECT * FROM dataset WHERE active = true"# Dataset management./skraak dataset create --db ./db/skraak.duckdb --name "My Dataset" --type unstructured./skraak dataset update --db ./db/skraak.duckdb --id abc123 --name "Updated Name"# Location management./skraak location create --db ./db/skraak.duckdb --dataset abc123 --name "Site A" --lat -36.85 --lon 174.76 --timezone Pacific/Auckland./skraak location update --db ./db/skraak.duckdb --id loc123 --name "Updated Name"# Cluster management./skraak cluster create --db ./db/skraak.duckdb --dataset abc123 --location loc456 --name "2024-01" --sample-rate 250000./skraak cluster update --db ./db/skraak.duckdb --id cluster123 --name "Updated Name"
./skraak sql --db ./db/test.duckdb "SELECT COUNT(*) FROM file WHERE active = true"
# Recording pattern management./skraak pattern create --db ./db/skraak.duckdb --record 60 --sleep 1740./skraak pattern update --db ./db/skraak.duckdb --id pattern123 --record 30
# Entity management./skraak dataset create --db ./db/test.duckdb --name "Test" --type unstructured./skraak location create --db ./db/test.duckdb --dataset abc123 --name "Site A" --lat -36.85 --lon 174.76 --timezone Pacific/Auckland./skraak cluster create --db ./db/test.duckdb --dataset abc123 --location loc456 --name "2024-01" --sample-rate 250000
# Import commands./skraak import bulk --db ./db/skraak.duckdb --dataset abc123 --csv import.csv --log progress.log./skraak import file --db ./db/skraak.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/file.wav./skraak import folder --db ./db/skraak.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/folder./skraak import selections --db ./db/skraak.duckdb --dataset abc123 --cluster clust789 --path /path/to/Clips_filter_date
# Import./skraak import file --db ./db/test.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/file.wav./skraak import folder --db ./db/test.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/folder./skraak import bulk --db ./db/test.duckdb --dataset abc123 --csv import.csv --log progress.log
./skraak xxhash --file recording.wav # Compute XXH64 hash (same format as DB)./skraak metadata --file recording.wav # Extract WAV metadata./skraak time # Current time as JSON
./skraak xxhash --file recording.wav./skraak metadata --file recording.wav./skraak time
```bash# JSON output allows piping to jq./skraak xxhash --file recording.wav | jq '.hash'./skraak metadata --file recording.wav | jq '.duration_seconds'./skraak time | jq '.unix'./skraak dataset create --db ./db/test.duckdb --name "Test" --type unstructured | jq '.dataset.id'
---
# JSON output allows chaining commandsHASH=$(./skraak xxhash --file recording.wav | jq -r '.hash')DURATION=$(./skraak metadata --file recording.wav | jq -r '.duration_seconds')DATASET_ID=$(./skraak dataset create --db ./db/test.duckdb --name "New Dataset" --type unstructured | jq -r '.dataset.id')```
## Testing
# SQL tool tests (ALWAYS use test.duckdb and pipe to file!)./test_sql.sh ../db/test.duckdb > test.txt 2>&1rg '"result":' test.txt | wc -l # Should show 8 responses (6 successful + 2 validations)rg '"isError":true' test.txt | wc -l # Should show 2 (security tests)
# Write tools./test_tools.sh ../db/test.duckdb > test.txt 2>&1
# Resources and prompts (use test.duckdb!)./test_resources_prompts.sh ../db/test.duckdb > test_resources.txt 2>&1cat test_resources.txt | jq '.'
# Import tools./test_import_file.sh ../db/test.duckdb > test.txt 2>&1./test_bulk_import.sh ../db/test.duckdb > test.txt 2>&1
# All prompts test (use test.duckdb!)./test_all_prompts.sh ../db/test.duckdb > test_prompts.txt 2>&1rg '"result":' test_prompts.txt | wc -l
# Resources/prompts./test_resources_prompts.sh ../db/test.duckdb | jq '.'./test_all_prompts.sh ../db/test.duckdb > test.txt 2>&1
**Basic query:**```json{"name": "execute_sql","arguments": {"query": "SELECT * FROM dataset WHERE active = true"}}
### Go Unit Tests```bashgo test ./... # All testsgo test -v ./utils/ # Verbosego test -cover ./utils/ # Coveragego test -coverprofile=coverage.out ./utils/ && go tool cover -html=coverage.out
**Parameterized query (recommended for user input):**```json{"name": "execute_sql","arguments": {"query": "SELECT * FROM location WHERE dataset_id = ?","parameters": ["vgIr9JSH_lFj"]}}```
170+ tests, 91.5% coverage.
**With custom row limit:**```json{"name": "execute_sql","arguments": {"query": "SELECT * FROM file WHERE active = true","limit": 100}}```### SQL Best Practices1. **Always use WHERE active = true** for main tables (dataset, location, cluster, file)2. **Use parameterized queries** (? placeholders) for filtering by IDs3. **Use LEFT JOIN** to include parent records even if children don't exist4. **Use COUNT(DISTINCT)** when joining to avoid double-counting5. **Use LIMIT** to restrict large result sets6. **Use DATE_TRUNC** to group temporal data7. **Use CASE WHEN** for conditional aggregates (e.g., count night vs day files)## Common Issues and Solutions### Query Results Too Large**Problem**: Query returns too many rows**Solution**: Use LIMIT clause (default 1000, max 10000)### Server Exits Immediately**Normal behavior** - Server runs in stdio mode, waiting for JSON-RPC input### No Response from Tool Call**Check**: Must initialize connection first with `initialize` method before calling tools### Database Connection Failed**Check**: Database path exists and is readable### SQL Syntax Error**Check**: Query syntax, table names (use schema resources), column names## Go Unit Testing### Test CoverageThe project includes comprehensive unit tests for all utility packages with **91.5% code coverage**.
---
**Test files:**- `utils/astronomical_test.go` - Astronomical calculations (solar/civil night, moon phase)- `utils/audiomoth_parser_test.go` - AudioMoth WAV comment parsing- `utils/filename_parser_test.go` - Filename timestamp parsing with timezone handling- `utils/selection_parser_test.go` - ML selection filename/folder parsing- `utils/wav_metadata_test.go` - WAV file metadata extraction- `utils/xxh64_test.go` - XXH64 hash computation
## Common Issues
**Total: 170+ tests covering:**- Date format detection (YYYYMMDD, YYMMDD, DDMMYY)- Variance-based disambiguation- Timezone offset calculation with fixed-offset strategy- DST transition handling- UTC conversion correctness- AudioMoth metadata parsing (all gain levels, temperature, battery)- WAV header parsing (duration, sample rate, channels, INFO chunks)- XXH64 hash validation- ML selection filename parsing (base-start-end format)- ML folder name parsing (Clips_filter_date format)- WAV/PNG pair validation- Date/time pattern extraction for fuzzy file matching- Edge cases (invalid dates, leap years, case sensitivity)### Running Go Tests```bash# Run all testsgo test ./...# Run specific packagego test ./utils/# Run with verbose outputgo test -v ./utils/
- **Query too large:** Use `LIMIT` (default 1000, max 10000)- **Server exits immediately:** Normal - runs in stdio mode (waits for MCP input)- **Database connection failed:** Check path exists and is readable- **SQL syntax error:** Check query syntax, table/column names (use schema resources)
# Run with coverage reportgo test -cover ./utils/# Generate coverage profilego test -coverprofile=coverage.out ./utils/go tool cover -html=coverage.out```### Test OrganizationTests follow Go conventions:- Test files named `*_test.go`- Test functions named `Test*`- Use table-driven tests where appropriate- Include edge cases and error conditions- Match TypeScript test suite from original project**Key differences from TypeScript tests:**- Go separates filename parsing from timezone application (better design)- Go validates dates strictly (TypeScript's Date constructor auto-corrects)- Console logging tests omitted (not applicable to MCP servers)- All essential functionality covered with equivalent or better tests
Remember to restart Claude Desktop after configuration changes.## Recent Changes### Latest Update: CLI Refactoring — Two-Layer Architecture (2026-02-11)**Major refactoring: Separated core logic from MCP types, added CLI commands****Problem:** All tool functions were tightly coupled to MCP SDK types (`*mcp.CallToolRequest`, `*mcp.CallToolResult`). This meant functionality could only be invoked via MCP protocol — no CLI access for power users.
**Created:**- `cmd/mcp.go` — MCP server setup + 10 thin adapter wrappers (~3 lines each)- `cmd/import.go` — `skraak import bulk` CLI command with flag parsing- `cmd/sql.go` — `skraak sql` CLI command for ad-hoc queries**Modified (mechanical, all tools/):**- Removed `*mcp.CallToolRequest` parameter (was never used — `req` always ignored)- Removed `*mcp.CallToolResult` from returns (was always empty `&mcp.CallToolResult{}`)- Removed `import "github.com/modelcontextprotocol/go-sdk/mcp"` from all tool files- Updated test files (`integration_test.go`, `pattern_test.go`) to match new signatures- Updated `main.go` to pure dispatcher: `mcp | import | sql`**Architecture:**```main.go → pure dispatchercmd/mcp.go → MCP server + adapter wrappers (ONLY file importing mcp SDK)cmd/import.go → CLI: skraak import bulk --db ... --dataset ... --csv ... --log ...cmd/sql.go → CLI: skraak sql --db ... "SELECT ..."tools/*.go → core logic, NO mcp dependency (plain Go structs in/out)utils/, db/, etc. → unchanged```**CLI Usage:**```bash# MCP server (unchanged)skraak mcp --db ./db/skraak.duckdb# Power-user CLI commands (new)skraak sql --db ./db/skraak.duckdb "SELECT COUNT(*) FROM file WHERE active = true"skraak sql --db ./db/skraak.duckdb --limit 10 "SELECT * FROM dataset"skraak import bulk --db ./db/skraak.duckdb --dataset abc123 --csv import.csv --log progress.log```**Benefits:**- ✅ **CLI access:** Power users can run imports and queries without MCP- ✅ **Token savings:** CLI commands avoid MCP protocol overhead- ✅ **Code sharing:** CLI and MCP call identical core functions- ✅ **MCP SDK contained:** Only `cmd/mcp.go` imports the MCP SDK- ✅ **Extensible:** New CLI commands just need a file in `cmd/` calling `tools/`- ✅ **No logic changes:** All core tool logic unchanged, just signature cleanup- ✅ **All tests pass:** `go test ./...`, all 8 shell test scripts verified
### Previous Update: Bulk File Import Cluster Assignment Bug Fix (2026-02-10)**Critical Bug Fix: Files now correctly distributed across multiple clusters for same location****Problem:** When the same location appeared multiple times in the CSV with different date ranges, all files ended up in the **last cluster created** instead of being distributed across their respective clusters.**Root Cause:** The `clusterIDMap` used only `LocationID` as the key, causing each new cluster for the same location to overwrite the previous one in the map.**Example of Bug:**```csvA12,loc123,/path/2019,2019,8000,864A12,loc123,/path/2020,2020,8000,180A12,loc123,/path/2022,2022,8000,180A12,loc123,/path/2024,2024,8000,549```- **Before fix:** 4 clusters created, ALL 1773 files go into 2024 cluster- **After fix:** 4 clusters created, files distributed correctly (864 in 2019, 180 in 2020, etc.)**Solution:** Changed map key from `LocationID` to composite key `LocationID|DateRange`.**Modified:**- `tools/bulk_file_import.go` (lines 125, 171-172, 183-184)- Line 125: Updated map comment to reflect composite key- Line 171-172: Store with `compositeKey := loc.LocationID + "|" + loc.DateRange`- Line 183-184: Retrieve with same composite key**Impact:**- ✅ **Data integrity restored:** Files now go to correct clusters- ✅ **Multiple date ranges per location:** Now works correctly- ✅ **No breaking changes:** Simple 3-line fix- ✅ **Backwards compatible:** Single location CSV rows work identically**Verification:**```sqlSELECTl.name as location_name,c.name as cluster_name,COUNT(f.id) as file_countFROM cluster cJOIN location l ON c.location_id = l.idLEFT JOIN file f ON f.cluster_id = c.idWHERE l.name = 'A12'GROUP BY c.id, l.name, c.nameORDER BY c.name;```**Note:** Data previously imported with the buggy code will need to be re-imported to fix cluster assignments.---### Previous Update: File Modification Time Fallback (2026-02-07)**Enhancement: Added file modification time as third timestamp fallback****Problem:** Small clusters (1-2 files) failed variance-based filename disambiguation because the algorithm needs multiple samples to determine date format (YYYYMMDD vs YYMMDD vs DDMMYY).**Solution:** Added file modification time as third fallback in timestamp resolution chain.**Timestamp Resolution Order:**```1. AudioMoth comment → timestamp2. Filename parsing → timestamp3. File modification time → timestamp (NEW!)4. FAIL (skip file with error)```**Modified:**- `utils/cluster_import.go` - Added FileModTime fallback in `batchProcessFiles()`- Silent fallback (no warning logged)- Assumes FileModTime is in location timezone- Reduces import failures in small clusters**Benefits:**- ✅ **Fewer failures:** Small clusters (1-2 files) no longer fail when filename parsing can't disambiguate- ✅ **No performance impact:** FileModTime already extracted in `ParseWAVHeader()`- ✅ **Backwards compatible:** Only helps files that would have failed- ✅ **Simple:** 10 lines of code, defensive checks, no complexity**Use Case:** User has 1-2 files with unparseable filenames (e.g., `recording001.wav`) → Previously failed, now uses FileModTime.**See Also:**- `TIMESTAMP_FALLBACK_PLAN.md` - Complete implementation plan---### Previous Update: Cluster Import Logic Extraction (2026-02-07)**Major refactoring: Extracted shared cluster import logic into utils module****Key Insight:** A cluster is the atomic unit of import (one SD card / one recording session / one folder).**Created:**- `utils/cluster_import.go` (553 lines) - Single source of truth for cluster imports- `ImportCluster()` - Main entry point used by both import_files.go and bulk_file_import.go- `scanClusterFiles()` - Recursive WAV file scanning- `batchProcessFiles()` - Batch processing with variance-based filename timestamp parsing- `insertClusterFiles()` - Transactional database insertion- Moved `FileImportError` type from tools/ to utils/**Modified:**- `tools/import_files.go` - **75% code reduction** (650 lines → 161 lines)- Now just calls `utils.ImportCluster()` for all the heavy lifting- Removed ~500 lines of duplicated logic- `tools/bulk_file_import.go` - **Bug fixes + simplification**- **🐛 CRITICAL BUG FIXED:** Now inserts into `file_dataset` table (was missing!)- **🐛 CRITICAL BUG FIXED:** Now inserts into `moth_metadata` table (was missing!)- Now uses shared `utils.ImportCluster()` logic- Files are no longer orphaned from datasets- `tools/import_file.go` - Added helper wrappers for compatibility**Benefits:**- ✅ **Bug Fixed:** 68,043 orphaned files found in test database (confirms bug was real)- ✅ **Single source of truth:** All cluster import logic in one place- ✅ **Code reduction:** ~500 lines of duplicated code eliminated- ✅ **Consistency:** Both single-cluster and multi-cluster imports use identical logic- ✅ **Maintainability:** Changes to import logic made in one place- ✅ **Performance:** No regression, same batch processing as before**Architecture:**```Before:tools/import_files.go (650 lines) - Custom logictools/bulk_file_import.go (460 lines) - Different logic (BUGGY)After:utils/cluster_import.go (553 lines) - Shared logictools/import_files.go (161 lines) - Calls utils.ImportCluster()tools/bulk_file_import.go (393 lines) - Calls utils.ImportCluster()```**See Also:**- `plan.md` - Complete refactoring plan with implementation checklist- `REFACTORING_SUMMARY.md` - Detailed summary of changes- `VERIFICATION_RESULTS.md` - Test results and database analysis---### Previous Update: Generic SQL Tool + Codebase Rationalization (2026-01-26)**Major architectural change: Replaced 6 specialized tools with generic SQL approach****Deleted:**- `tools/dataset.go` - query_datasets tool- `tools/location.go` - query_locations, query_locations_by_dataset tools- `tools/cluster.go` - query_clusters, query_clusters_by_location tools- `tools/file.go` - query_files_by_cluster tool- `shell_scripts/test_new_tools.sh` - Obsolete test script- `shell_scripts/test_mcp.sh` - Obsolete test script**Added:**- `tools/sql.go` - Generic execute_sql tool (~200 lines)- `shell_scripts/test_sql.sh` - Comprehensive SQL test suite**Modified:**- `main.go` - Removed 6 tool registrations, kept only get_current_time and execute_sql- `prompts/examples.go` - Completely rewritten to teach SQL patterns instead of tool calls- All 6 prompts now include SQL examples with SELECT, JOIN, GROUP BY, aggregates**Benefits:**- Full SQL expressiveness (JOINs, aggregates, CTEs, subqueries) - **previously impossible**- Infinite query possibilities vs 6 fixed queries- More aligned with MCP philosophy (context over APIs)- LLMs can answer any question given the schema- Smaller codebase (2 tools instead of 8)- More maintainable (no new tool for each query pattern)**Security:**- Database already read-only (verified in db/db.go)- Validation layers block write operations- Parameterized queries prevent SQL injection- Row limits prevent overwhelming responses**Migration Notes:**- Old tool calls must be replaced with SQL queries- All old functionality is still available via SQL- Prompts provide SQL examples for common patterns- Schema resources provide full context for query construction### Previous Update: Shell Scripts Organization (2026-01-26)- Reorganized all shell scripts into `shell_scripts/` directory- Keeps project root clean and organized- All scripts updated with correct relative paths### Latest Update: Comprehensive Go Unit Testing (2026-01-28)**Added comprehensive unit test suite for utility packages****Added:**- `utils/astronomical_test.go` - 11 test cases for astronomical calculations- `utils/audiomoth_parser_test.go` - 36 test cases for AudioMoth parsing- `utils/filename_parser_test.go` - 60 test cases for filename/timezone parsing- `utils/wav_metadata_test.go` - 22 test cases for WAV metadata extraction- `utils/xxh64_test.go` - 6 test cases for hash computation**Test Coverage:**- **Total: 136 tests**- **Coverage: 91.5%** of statements- All tests ported from TypeScript test suite- Additional Go-specific tests for date validation**Key Test Areas:**- Filename parsing: YYYYMMDD, YYMMDD, DDMMYY formats with variance-based disambiguation- Timezone handling: Fixed-offset strategy, DST transitions (Auckland, US timezones)- UTC conversion: Mathematical correctness validation- AudioMoth: Comment parsing, all gain levels, timezone formats- WAV metadata: Duration, sample rate, INFO chunks- Astronomical: Solar/civil night, moon phase calculations- ML selection parsing: Filename format, folder structure, WAV/PNG pairing- Edge cases: Invalid dates, leap years, case sensitivity
**New Feature: Import ML-detected kiwi call selections from folder structure**
**Status:** CLI refactoring complete (2026-02-14)**Architecture:** Two-layer (tools=MCP-free, cmd/mcp.go=adapters)**Tools:** 11 total (read: 2, write: 4, import: 5)**CLI Commands:** `mcp`, `sql`, `dataset`, `location`, `cluster`, `pattern`, `import`**Test Scripts:** 8 comprehensive shell scripts**Test Coverage:** 170+ Go unit tests (91.5%)**Import Logic:** Centralized in `utils/cluster_import.go` (553 lines)**Timestamp Fallback:** AudioMoth → Filename → FileModTime**Databases:** `skraak.duckdb` (production ⚠️), `test.duckdb` (testing ✅)**Current Data:** 1.19M files, 139 locations, 8 active datasets
**Added:**- `utils/selection_parser.go` - Selection filename and folder parsing utilities- `utils/selection_parser_test.go` - Comprehensive tests (34 test cases)- `tools/import_ml_selections.go` - MCP tool implementation (~1050 lines)- `shell_scripts/test_import_selections.sh` - Integration test script**Features:**- **Folder structure support**: `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`- **Filename parsing**: Extracts `{base}-{start}-{end}.wav` format- **Two-pass file matching**:1. Exact match by filename2. Fuzzy match by date_time pattern (handles prefix variations)- **Comprehensive validation**:- Filter exists in database- Species linked to dataset- Call types exist for species- Files exist in cluster- Selection bounds within file duration- **Transactional import**: All-or-nothing with error collection- **Database relations**: selection → label (species) → label_subtype (call type)**Usage Example:**```bash# Folder structure:# Clips_opensoundscape-kiwi-1.0_2025-11-14/# └── Brown Kiwi/# ├── Male - Solo/# │ ├── A05-20250517_214501-102-133.wav# │ └── A05-20250517_214501-102-133.png# └── Female - Solo/# └── ...# MCP tool call:{"name": "import_ml_selections","arguments": {"folder_path": "/path/to/Clips_opensoundscape-kiwi-1.0_2025-11-14","dataset_id": "abc123xyz789","cluster_id": "def456uvw012"}}```**Validation Features:**- Batch validation of all entities before any database writes- Comprehensive error reporting (filename, species, call type, stage)- Fuzzy file matching handles filename prefix variations- Strict selection bounds checking (end_time ≤ file.duration)- Ambiguous match detection (multiple files with same date_time pattern)**Test Coverage:**- 34 unit tests for selection parsing utilities- Tests for various filename formats (with/without dashes, decimal times)- Tests for folder name parsing (filter + date extraction)- Tests for WAV/PNG pair validation- Tests for date_time pattern extraction (8-digit and 6-digit formats)**Tool Count Update**: Now **7 total tools** (read: 2, write: 4, import: 2)*Note: Tool count increased to 10 in later updates (import: 4)*### Latest Update: Single File Import Tool (2026-02-02)**New Feature: Import individual WAV files with `import_audio_file` tool****Added:**- `tools/import_file.go` - Single file import implementation (~300 lines)- `shell_scripts/test_import_file.sh` - Integration test script**Features:**- **Single file import**: Import one WAV file at a time with detailed feedback- **Same processing pipeline**: Reuses all utilities from batch import (AudioMoth parsing, timestamp extraction, hash computation, astronomical calculations)- **Shared helper functions**: Reuses `validateImportInput()`, `getLocationData()`, `ensureClusterPath()` from import_files.go- **Detailed output**: Returns file_id, hash, duration, sample_rate, timestamps, processing time- **Duplicate detection**: Checks hash before insertion, returns `is_duplicate=true` if exists- **Fail-fast errors**: Single file import is atomic - succeeds completely or fails with clear error message**Input:**```json{"file_path": "/absolute/path/to/file.wav","dataset_id": "12-char-id","location_id": "12-char-id","cluster_id": "12-char-id"}```**Output:**```json{"file_id": "21-char-nanoid","file_name": "filename.wav","hash": "16-char-xxh64-hex","duration_seconds": 60.0,"sample_rate": 250000,"timestamp_local": "2024-01-15T20:30:00+13:00","is_audiomoth": true,"is_duplicate": false,"processing_time": "250ms"}```**Use Cases:**- Import files one at a time with detailed feedback per file- Programmatic import where you already know the file path- Import files from different locations without folder scanning- Get immediate feedback on duplicate detection- Alternative to batch import for small numbers of files**Tool Count Update**: Now **8 total tools** (read: 2, write: 4, import: 3)*Note: Tool count increased to 10 in later update (import: 4)*### Latest Update: Bulk File Import Tool (2026-02-06)**New Feature: CSV-based bulk import across multiple locations and clusters****Added:**- `tools/bulk_file_import.go` - CSV-based bulk import implementation (~500 lines)**Features:**- **CSV-driven import**: Single CSV file specifies multiple locations, directories, and clusters- **Auto-cluster creation**: Automatically creates clusters if they don't exist for location/date_range- **Progress logging**: Real-time progress logging to file (monitor with `tail -f log_file.log`)- **Synchronous/fail-fast**: Processes sequentially with immediate error reporting- **Summary statistics**: Returns detailed counts for locations, clusters, files, duplicates, errors- **Shared utilities**: Reuses all WAV processing utilities (AudioMoth, timestamps, hash, astronomical)**CSV Format:**```csvlocation_name,location_id,directory_path,date_range,sample_rate,file_countSite A,loc123456789,/path/to/recordings,2024-01,48000,150Site B,loc987654321,/path/to/recordings,2024-02,250000,200```**Input:**```json{"dataset_id": "12-char-id","csv_path": "/absolute/path/to/import.csv","log_file_path": "/absolute/path/to/progress.log"}```**Output:**```json{"total_locations": 10,"clusters_created": 5,"clusters_existing": 5,"total_files_scanned": 1500,"files_imported": 1200,"files_duplicate": 250,"files_error": 50,"processing_time": "5m30s","errors": []}```**Use Cases:**- Bulk import across many locations in one operation- Automated import pipelines with CSV generation- Large-scale data migration from existing systems- Batch processing with progress monitoring via log file**Comparison with other import tools:**- `import_audio_files`: Single folder → single cluster- `import_audio_file`: Single file → single cluster- `import_ml_selections`: ML detection folder structure → selections- `bulk_file_import`: CSV with multiple folders → multiple clusters (auto-creates)**Tool Count Update**: Now **10 total tools** (read: 2, write: 4, import: 4)### Latest Update: Test Script Consolidation (2026-02-06)**Rationalized and consolidated shell test scripts for better organization****Removed redundant scripts:**- `test_import_simple.sh` - Only tested registration (redundant)- `test_import_tool.sh` - Incomplete, just schema validation- `test_write_simple.sh` - Incomplete happy-path test- `test_write_tools.sh` - Replaced by comprehensive test_tools.sh- `test_write_e2e.sh` - Required manual ID replacement (not automated)- `test_update_tools.sh` - Replaced by test_tools.sh**Added comprehensive test scripts:**- `test_tools.sh` - All 4 create_or_update tools (create + update modes) with validation- `test_bulk_import.sh` - Tests bulk_file_import tool with CSV parsing**Updated documentation:**- `shell_scripts/TESTING.md` - Complete rewrite with current tool set- Removed references to deleted tools (query_datasets, etc.)- Added examples for all 14 current tools- Added SQL query examples (JOINs, aggregates, temporal analysis)- Added troubleshooting section and best practices**Current test suite (8 scripts):**1. `get_time.sh` - Time tool (no database)2. `test_sql.sh` - SQL query tool (comprehensive)3. `test_tools.sh` - All create_or_update tools (4 tools, create + update modes)4. `test_import_file.sh` - Single file import5. `test_import_selections.sh` - ML selection import6. `test_bulk_import.sh` - Bulk CSV import7. `test_resources_prompts.sh` - Resources/prompts8. `test_all_prompts.sh` - All 6 prompts**Benefits:**- Cleaner shell_scripts directory (8 scripts vs 14)- Better organization by functionality- No redundant/incomplete tests- Comprehensive coverage of all 10 tools- Up-to-date documentation matching current codebase- All tests default to test.duckdb for safety### Latest Update: Tool Consolidation - 8 write/update tools → 4 create_or_update tools (2026-02-06)**Consolidated 4 create_* + 4 update_* tools into 4 create_or_update_* tools****Deleted (8 files):**- `tools/write_dataset.go`, `tools/write_location.go`, `tools/write_cluster.go`, `tools/write_pattern.go`- `tools/update_dataset.go`, `tools/update_location.go`, `tools/update_cluster.go`, `tools/update_pattern.go`- `tools/write_pattern_test.go`**Added (4 files + 1 test):**- `tools/dataset.go` - `create_or_update_dataset` (create when no id, update when id provided)- `tools/location.go` - `create_or_update_location`- `tools/cluster.go` - `create_or_update_cluster`- `tools/pattern.go` - `create_or_update_pattern`- `tools/pattern_test.go` - Updated tests for consolidated pattern tool**Modified:**- `main.go` - 8 tool registrations → 4- `tools/integration_test.go` - Updated to use new ClusterInput/CreateOrUpdateCluster types- `shell_scripts/test_tools.sh` - Updated to test 4 tools (both create and update modes)- `shell_scripts/test_bulk_import.sh` - Updated tool names- `shell_scripts/TESTING.md` - Updated documentation- `CLAUDE.md` - Updated tool counts, directory structure, documentation**Design:**- Omit `id` field → CREATE mode (generates nanoid, inserts, returns entity)- Provide `id` field → UPDATE mode (verifies exists, builds dynamic UPDATE, returns entity)- Shared validation logic per entity (e.g., coordinate bounds, name length)- Both modes now return the full entity (update previously only returned success boolean)**Benefits:**- Tool count reduced from 14 → 10 (fewer tools for LLM to reason about)- File count reduced from 8 → 4 (fewer files to maintain)- ~31% less code (~320 lines removed)- Shared validation logic eliminates duplication- Consistent return types (both modes return the entity)---**Last Updated**: 2026-02-14 NZDT**Status**: CLI refactoring complete — two-layer architecture ✅**Architecture**: `tools/` = core logic (MCP-free), `cmd/mcp.go` = MCP adapters, `cmd/*.go` = CLI commands**Current Tools**: 11 (read: 2, write: 4, import: 5) — 10 via MCP, 1 CLI-only (import_unstructured)**CLI Commands**: `skraak mcp`, `skraak sql`, `skraak dataset`, `skraak location`, `skraak cluster`, `skraak import {bulk|file|folder|selections|unstructured}`**Test Scripts**: 8 comprehensive shell scripts + verify_database_state.sh**Test Coverage**: 170+ Go unit tests (91.5% coverage)**Import Logic**: Centralized in utils/cluster_import.go (553 lines)**Timestamp Fallback**: AudioMoth → Filename → FileModTime (reduces failures in small clusters)**Code Quality**: ~500 lines of duplication eliminated (75% reduction in import_files.go)**Current Database**: 139 locations, 8 active datasets, 1.19M files in test.duckdb**Test Databases**: skraak.duckdb (production) ⚠️, test.duckdb (testing) ✅**Known Issue**: test.duckdb has 68K orphaned files from old buggy import (historical data)
**Last Updated:** 2026-02-15 NZDT