quietlight/skraak_mcp - Change QQOATNCITSSIPKVUFNZEPN73TGU244GIAW6K37SILAGQWVQ4TCQQC

some doc changes, and upgarde to go 1.26.0

Created by quietlight on February 15, 2026

QQOATNCITSSIPKVUFNZEPN73TGU244GIAW6K37SILAGQWVQ4TCQQC

Dependencies

In channels

main

Change contents

Replacement in go.mod at line 3 [5.159737]
B:BD[5.159757] → [5.159757:159767]
```
go 1.25.6
```
[5.159757]
[5.159767]
```
go 1.26.0
```

Insertion in README.md at line 20 [5.334405]

[5.335586]

[6.65666]


**Claude Code config***
claude mcp add --transport stdio skraak_mcp -- /home/david/go/src/skraak/skraak mcp --db /home/david/go/src/skraak/db/skraak.duckdb
claude mcp add --transport stdio test_mcp -- /home/david/go/src/skraak/skraak mcp --db /home/david/go/src/skraak/db/test.duckdb
remove: claude mcp remove skraak_mcp

Deletion in README.md at line 63 [5.334405]
∅:D[7.29138] → [8.151947:152045]
B:BD[8.151947] → [8.151947:152045]
```
./skraak import bulk --db ./db/skraak.duckdb --dataset abc123 --csv import.csv --log progress.log
```

Insertion in README.md at line 65 [5.334405]

[7.29387]

./skraak import bulk --db ./db/skraak.duckdb --dataset abc123 --csv import.csv --log progress.log
./skraak import unstructured --db ./db/skraak.duckdb --dataset 4Sh8_7p1ocks --path "/media/david/Misc-2/Manu o Kahurangi kiwi survey (3)/Andrew Digby LSK - sorted files"

Replacement in CLAUDE.md at line 3 [5.363912]

B:BD[5.363951] → [5.363951:364058]

This file contains important reminders and best practices for working with the Skraak MCP Server codebase.

[5.363951]

[6.72538]

Essential reminders and best practices for the Skraak CLI/MCP Server codebase.

Replacement in CLAUDE.md at line 5 [5.363912]
B:BD[6.72539] → [6.72539:72585]
```
## 🚨 CRITICAL DATABASE SAFETY WARNING 🚨
```
[6.72539]
[6.72585]
```
## Documentation Policy
```

Replacement in CLAUDE.md at line 7 [5.363912]

B:BD[6.72586] → [6.72586:72627]

### ALWAYS USE TEST DATABASE FOR TESTING

[6.72586]

[6.72627]

**When making code changes, update CHANGELOG.md first, then CLAUDE.md only if architectural concepts change.**

Replacement in CLAUDE.md at line 9 [5.363912]

B:BD[6.72628] → [6.72628:72789]

**⚠️ EXTREMELY IMPORTANT**: When testing shell scripts or any end-to-end functionality, **ALWAYS** use the test database, **NEVER** the production database!

[6.72628]

[6.72789]

- CHANGELOG.md: Detailed change history with rationale
- CLAUDE.md: Essential patterns, policies, and quick reference
- This file is expensive (loaded every session) - keep it concise

Replacement in CLAUDE.md at line 13 [5.363912]

B:BD[6.72790] → [6.72790:72955]

**CORRECT - Use test database:**
```bash
cd shell_scripts
./test_sql.sh ../db/test.duckdb > test.txt 2>&1
./test_resources_prompts.sh ../db/test.duckdb | jq '.'
```

[6.72790]

[6.72955]

---

Replacement in CLAUDE.md at line 15 [5.363912]

B:BD[6.72956] → [6.72956:73574]

**WRONG - DO NOT USE production database:**
```bash
# ❌ NEVER DO THIS - WILL CORRUPT PRODUCTION DATABASE
./test_sql.sh ../db/skraak.duckdb
./test_sql.sh  # Uses skraak.duckdb by default - DANGEROUS!
```
**Why this matters:**
- `db/skraak.duckdb` is the **PRODUCTION** database with real data
- `db/test.duckdb` is the **TEST** database for safe testing
- Even though the database is read-only, repeated connections during testing can cause lock issues
- DuckDB may create temporary files (.wal, .tmp) that can interfere with production access
- Test scripts make many rapid connections that can stress the database

[6.72956]

[5.364058]

## 🚨 Critical Database Safety

Replacement in CLAUDE.md at line 17 [5.363912]

B:BD[5.364059] → [6.73575:73798]

**Default behavior:**
- All shell scripts default to `../db/skraak.duckdb` if no argument is provided
- **YOU MUST EXPLICITLY SPECIFY** `../db/test.duckdb` when testing
- Better yet: Always pipe to files to avoid accidents

[5.364059]

[6.73798]

### ALWAYS Use Test Database for Testing

Replacement in CLAUDE.md at line 19 [5.363912]
B:BD[6.73799] → [6.73799:73826]
```
**Safe testing workflow:**
```
[6.73799]
[6.73826]
```
**CORRECT:**
```
Deletion in CLAUDE.md at line 22 [5.363912]
B:BD[6.73851] → [6.73851:73894]
```
# Always specify test database explicitly
```

Deletion in CLAUDE.md at line 23 [5.363912]

B:BD[6.73942] → [6.73942:74171]

./test_resources_prompts.sh ../db/test.duckdb > test_resources.txt 2>&1
./test_all_prompts.sh ../db/test.duckdb > test_prompts.txt 2>&1
# Then analyze results
rg '"result":' test.txt | wc -l
rg '"isError":true' test.txt | wc -l

Replacement in CLAUDE.md at line 25 [5.363912]

∅:D[6.74176] → [5.364059:364265]

B:BD[5.364059] → [5.364059:364265]

## ⚠️ CRITICAL TESTING REMINDER
### Running Test Scripts with Large Output
**NEVER** run test scripts directly without piping to a file. Large outputs can cause token overflow.
**CORRECT APPROACH:**

[6.74176]

[5.364265]

**WRONG:**

Replacement in CLAUDE.md at line 27 [5.363912]

B:BD[5.364273] → [5.364273:364343]

B:BD[5.364343] → [6.74177:74276]

∅:D[6.74276] → [5.364401:364660]

B:BD[5.364401] → [5.364401:364660]

# All shell scripts are in shell_scripts/ directory
cd shell_scripts
# ALWAYS use test database and pipe output to file
./test_sql.sh ../db/test.duckdb > test.txt 2>&1
# Then use targeted searches to verify results
rg -i "error" test.txt                    # Check for errors
rg '"result":' test.txt | wc -l           # Count successful responses
rg '"isError":true' test.txt | wc -l      # Count validation errors (expected)

[5.364273]

[5.364660]

./test_sql.sh  # Uses skraak.duckdb by default - DANGEROUS!

Replacement in CLAUDE.md at line 30 [5.363912]

B:BD[5.364665] → [5.364665:364783]

**WRONG APPROACH:**
```bash
# ❌ DON'T DO THIS - may crash with massive output
cd shell_scripts && ./test_sql.sh
```

[5.364665]

[5.364783]

- `db/skraak.duckdb` = **PRODUCTION** (1.19M files, 139 locations)
- `db/test.duckdb` = **TEST** (safe for testing)
- **Always specify test.duckdb explicitly**

Replacement in CLAUDE.md at line 34 [5.363912]
B:BD[5.364784] → [5.364784:364811]
```
### Available Test Scripts
```
[5.364784]
[5.364811]
```
### Testing Best Practices
```

Replacement in CLAUDE.md at line 36 [5.363912]

B:BD[5.364812] → [5.364812:364922]

**IMPORTANT: All shell scripts are located in the `shell_scripts/` directory** to keep the project organized.

[5.364812]

[5.364922]

- **Always pipe to file** (prevents token overflow from large output)
- Navigate to `shell_scripts/` before running tests
- Verify: `rg '"result":' test.txt | wc -l`

Replacement in CLAUDE.md at line 40 [5.363912]

B:BD[5.364923] → [9.41654:41896]

All test scripts accept an optional database path argument and **default to `../db/test.duckdb`** for safety!
- Default: `../db/test.duckdb` ✅ **SAFE - Use for testing**
- Production: `../db/skraak.duckdb` ⚠️ **Only use in production**

[5.364923]

[5.365019]

---

Replacement in CLAUDE.md at line 42 [5.363912]

B:BD[5.365020] → [9.41897:42074]

∅:D[9.42074] → [5.365085:365174]

∅:D[6.74643] → [5.365085:365174]

B:BD[5.365085] → [5.365085:365174]

B:BD[5.365174] → [6.74644:74692]

**Core functionality:**
1. **get_time.sh** - Quick test of get_current_time tool (no database needed)
2. **test_sql.sh [db_path]** - Tests execute_sql tool with various queries
   - Tests: simple SELECT, parameterized queries, JOINs, aggregates, security validation
   - Always pipe to file and use test database!

[5.365020]

[9.42075]

## Package Organization

Replacement in CLAUDE.md at line 44 [5.363912]

B:BD[9.42076] → [9.42076:42109]

B:BD[9.42109] → [10.48928:49190]

∅:D[10.49190] → [9.42351:42430]

B:BD[9.42351] → [9.42351:42430]

**Write tools (create/update):**
3. **test_tools.sh [db_path]** - Comprehensive test of all 4 create_or_update tools
   - Tests: create_or_update_dataset, create_or_update_location, create_or_update_cluster, create_or_update_pattern
   - Tests both create mode (no id) and update mode (with id)
   - Tests both valid inputs (should succeed) and invalid inputs (should fail)

[9.42076]

[9.42430]

**Simple rule:** If called by `cmd/`, it goes in `tools/`. If called by `tools/`, it goes in `utils/`.

Replacement in CLAUDE.md at line 46 [5.363912]

B:BD[9.42431] → [9.42431:42449]

B:BD[9.42449] → [11.1359:1448]

∅:D[11.1448] → [12.17667:17750]

B:BD[12.17667] → [12.17667:17750]

**Import tools:**
4. **test_import_file.sh [db_path]** - Tests import_audio_file tool (single file import)
5. **test_import_selections.sh [db_path]** - Tests import_ml_selections tool setup

[9.42431]

[9.42541]

- **`utils/`** - Reusable helpers (no MCP types, no `*Input`/`*Output` structs)
- **`tools/`** - MCP/CLI tools (one file per tool, defines input/output types)
- **`cmd/mcp.go`** - MCP adapters (only file importing MCP SDK)
- **`cmd/*.go`** - CLI commands (parse flags, call tools, print JSON)

Replacement in CLAUDE.md at line 51 [5.363912]
B:BD[9.42542] → [9.42542:42569]
B:BD[9.42569] → [2.475:607]
∅:D[2.607] → [5.365369:365397]
∅:D[12.17828] → [5.365369:365397]
∅:D[9.42701] → [5.365369:365397]
∅:D[13.68368] → [5.365369:365397]
∅:D[6.74902] → [5.365369:365397]
B:BD[5.365369] → [5.365369:365397]
```
**Resources and prompts:**
6. **test_resources_prompts.sh [db_path]** - Tests resources and prompts
7. **test_all_prompts.sh [db_path]** - Tests all 6 prompts
### Verifying Test Success
```
[9.42542]
[5.365397]
```
---
```

Replacement in CLAUDE.md at line 53 [5.363912]

B:BD[5.365398] → [5.365398:365546]

After piping to test.txt, check for:
```bash
# Count successful responses (should equal number of successful tests)
rg '"result":' test.txt | wc -l

[5.365398]

[5.365546]

## Architecture

Replacement in CLAUDE.md at line 55 [5.363912]

B:BD[5.365547] → [5.365547:365640]

# Count validation errors (expected for security tests)
rg '"isError":true' test.txt | wc -l

[5.365547]

[5.365640]

Two-layer architecture: tools are MCP-free, adapters bridge to MCP protocol.

Deletion in CLAUDE.md at line 57 [5.363912]
B:BD[5.365641] → [5.365641:365711]
```
# No unexpected errors
rg -i '"error"' test.txt | grep -v '"isError"'
```

Replacement in CLAUDE.md at line 58 [5.363912]

B:BD[5.365715] → [14.18455:18530]

B:BD[14.18530] → [3.74:174]

∅:D[3.174] → [14.18569:18621]

B:BD[14.18569] → [14.18569:18621]

B:BD[14.18621] → [3.175:259]

∅:D[3.259] → [14.19448:19449]

B:BD[14.19448] → [14.19448:19449]

B:BD[14.19449] → [3.260:511]

∅:D[3.511] → [14.19831:19832]

B:BD[14.19831] → [14.19831:19832]

B:BD[14.19832] → [3.512:641]

∅:D[3.641] → [5.365715:365754]

∅:D[14.20026] → [5.365715:365754]

B:BD[5.365715] → [5.365715:365754]

B:BD[5.365754] → [15.25049:25264]

∅:D[15.25264] → [5.365754:365755]

B:BD[5.365754] → [5.365754:365755]

B:BD[5.365755] → [3.642:1011]

∅:D[3.1011] → [5.365990:365991]

∅:D[15.25586] → [5.365990:365991]

B:BD[5.365990] → [5.365990:365991]

B:BD[5.365991] → [15.25587:25605]

∅:D[15.25605] → [5.365991:366527]

B:BD[5.365991] → [5.365991:366527]


## Package Organization Policy
### tools/ vs utils/ - When to Use Which
**`utils/`** - Utility functions (helpers):
- Reusable helpers called by tools, CLI, or other utils
- No MCP tool types (no `*Input`/`*Output` structs)
- Examples: `ValidateXxx()`, `ParseFilename()`, `EntityExists()`, `ImportCluster()`
**`tools/`** - MCP/CLI tools (the actual tools):
- One file per tool, called directly by CLI commands and MCP adapters
- Defines tool input/output types (`DatasetInput`, `ClusterOutput`, etc.)
- Examples: `dataset.go`, `cluster.go`, `import_files.go`
**Simple rule:** If it's called by `cmd/` or `cmd/mcp.go`, it goes in `tools/`. If it's called by `tools/`, it goes in `utils/`.
## Project Overview
### Architecture
The Skraak MCP Server is a Model Context Protocol (MCP) server **and CLI tool** written in Go that provides a **generic SQL query interface** for an acoustic monitoring system. It follows a two-layer architecture:
- **Tools** (`tools/`): MCP/CLI tool implementations. Each file = one tool. Defines input/output types.
- **Utils** (`utils/`): Reusable helper functions. Called by tools, never by CLI/MCP directly.
- **MCP adapters** (`cmd/mcp.go`): Thin wrappers bridging MCP types to tool functions.
- **CLI commands** (`cmd/`): Parse flags, call tool functions, print JSON results.
MCP capabilities:
- **Tools** (model-controlled): Generic SQL query execution + time utility
- **Resources** (application-driven): Full database schema for context
- **Prompts** (user-controlled): SQL workflow templates that teach query patterns
### Philosophy: Schema + Generic SQL > Specialized Tools
**Why Generic SQL:**
- LLMs can construct any query given the schema (infinite flexibility)
- No rigid tool APIs to learn (just SQL)
- Full SQL expressiveness: JOINs, aggregates, CTEs, subqueries
- Prompts teach SQL patterns instead of tool calling

[5.365715]

[5.366527]

main.go          → CLI dispatcher (mcp | import | sql | dataset | ...)
cmd/mcp.go       → MCP server + thin adapters (ONLY MCP SDK import)
cmd/*.go         → CLI commands (flags → tools → JSON output)
tools/*.go       → Core logic (plain Go structs, no MCP dependency)
utils/*.go       → Reusable helpers
db/              → Database connection + types
```

Replacement in CLAUDE.md at line 66 [5.363912]

B:BD[5.366528] → [5.366528:366723]

**Previous specialized tools were limiting:**
- Each tool = one fixed query
- Couldn't filter beyond hardcoded parameters
- Couldn't JOIN tables or use aggregates
- Created artificial boundaries

[5.366528]

[5.366723]

**Philosophy:** Schema + Generic SQL > Specialized Tools
- LLMs construct queries from schema (infinite flexibility)
- Full SQL expressiveness (JOINs, aggregates, CTEs)
- Prompts teach SQL patterns, not tool APIs

Replacement in CLAUDE.md at line 71 [5.363912]

B:BD[5.366724] → [5.366724:366942]

**With schema + generic SQL:**
- Infinite query possibilities
- LLM constructs appropriate query for each question
- Full DuckDB SQL feature set available
- More aligned with MCP philosophy (provide context, not APIs)

[5.366724]

[5.366942]

---

Replacement in CLAUDE.md at line 73 [5.363912]
B:BD[5.366943] → [5.366943:366967]
```
### Directory Structure
```
[5.366943]
[5.366967]
```
## Directory Structure
```

Replacement in CLAUDE.md at line 77 [5.363912]

B:BD[15.25614] → [15.25614:25991]

├── main.go                       # CLI dispatcher (mcp | import | sql)
├── cmd/                          # Command entry points (only MCP importer)
│   ├── mcp.go                    # MCP server setup + adapter wrappers
│   ├── import.go                 # CLI: skraak import bulk ...
│   └── sql.go                    # CLI: skraak sql ...

[15.25614]

[5.367064]

├── main.go                    # CLI dispatcher
├── cmd/                       # MCP adapters + CLI commands
│   ├── mcp.go                 # MCP server (only file with MCP SDK)
│   ├── import.go, sql.go      # CLI commands

Replacement in CLAUDE.md at line 82 [5.363912]

B:BD[5.367078] → [5.367078:367220]

B:BD[5.367220] → [6.74903:75126]

B:BD[6.75126] → [14.20027:20094]

∅:D[14.20094] → [15.26065:26198]

B:BD[15.26065] → [15.26065:26198]

∅:D[15.26198] → [13.68446:68528]

∅:D[16.27398] → [13.68446:68528]

B:BD[13.68446] → [13.68446:68528]

B:BD[13.68528] → [12.17829:17910]

B:BD[12.17910] → [4.11494:11588]

∅:D[4.11588] → [13.68528:68615]

∅:D[12.17910] → [13.68528:68615]

B:BD[13.68528] → [13.68528:68615]

B:BD[13.68615] → [17.7598:7683]

B:BD[17.7683] → [10.49191:49468]

∅:D[17.8174] → [5.367508:367707]

∅:D[10.49468] → [5.367508:367707]

∅:D[13.68763] → [5.367508:367707]

B:BD[5.367508] → [5.367508:367707]

B:BD[5.367707] → [14.20095:20171]

∅:D[14.20171] → [6.75204:75645]

B:BD[6.75204] → [6.75204:75645]

B:BD[6.75645] → [13.68764:68911]

B:BD[13.68911] → [14.20172:20302]

∅:D[14.20302] → [6.75645:75911]

∅:D[13.68911] → [6.75645:75911]

B:BD[6.75645] → [6.75645:75911]

B:BD[6.75911] → [15.26199:26283]

∅:D[15.26283] → [9.42702:43316]

B:BD[6.75985] → [9.42702:43316]

│   ├── db.go                     # Database connection (read-only mode)
│   ├── types.go                  # Type definitions
│   ├── schema.sql                # Database schema (348 lines)
│   ├── skraak.duckdb             # Production database ⚠️
│   └── test.duckdb               # Test database (use for testing) ✅
├── tools/                        # MCP tool implementations
│   ├── time.go                   # get_current_time
│   ├── sql.go                    # execute_sql (generic query)
│   ├── import_files.go           # import_audio_files (batch WAV import)
│   ├── import_file.go            # import_file (single WAV file import)
│   ├── import_unstructured.go    # import_unstructured (unstructured dataset import)
│   ├── import_ml_selections.go   # import_ml_selections (ML detection import)
│   ├── bulk_file_import.go       # bulk_file_import (CSV-based bulk import)
│   ├── dataset.go                # create_or_update_dataset
│   ├── location.go               # create_or_update_location
│   ├── cluster.go                # create_or_update_cluster
│   └── pattern.go                # create_or_update_pattern
├── resources/
│   └── schema.go                 # Schema resources (full & per-table)
├── prompts/
│   └── examples.go               # SQL workflow templates (6 prompts)
├── utils/                        # Pure utility functions (reusable)
│   ├── astronomical.go           # Solar/civil night, moon phase calculations
│   ├── astronomical_test.go      # Tests (11 test cases)
│   ├── audiomoth_parser.go       # AudioMoth WAV comment parsing
│   ├── audiomoth_parser_test.go  # Tests (36 test cases)
│   ├── filename_parser.go        # Filename timestamp parsing + timezone
│   ├── filename_parser_test.go   # Tests (60 test cases)
│   ├── selection_parser.go       # ML selection filename/folder parsing
│   ├── selection_parser_test.go  # Tests (34 test cases)
│   ├── validation.go             # Input validation helpers
│   ├── validation_test.go        # Validation tests
│   ├── wav_metadata.go           # WAV file header parsing
│   ├── wav_metadata_test.go      # Tests (22 test cases)
│   ├── xxh64.go                  # XXH64 hash computation
│   └── xxh64_test.go             # Tests (6 test cases)
└── shell_scripts/                # Shell test scripts (end-to-end MCP tests)
    ├── get_time.sh               # Time tool test (no database)
    ├── test_sql.sh               # SQL tool tests
    ├── test_tools.sh             # All write tools tests (create/update)
    ├── test_import_file.sh       # Single file import tests
    ├── test_import_selections.sh # ML selection import tests
    ├── test_bulk_import.sh       # Bulk file import tests
    ├── test_resources_prompts.sh # Resources/prompts tests
    ├── test_all_prompts.sh       # All 6 prompts tests
    └── TESTING.md                # Comprehensive testing documentation

[5.367078]

[5.368010]

│   ├── skraak.duckdb          # Production ⚠️
│   └── test.duckdb            # Test ✅
├── tools/                     # 11 tools (MCP-free)
│   ├── time.go, sql.go        # Read tools (2)
│   ├── dataset.go, location.go, cluster.go, pattern.go  # Write tools (4)
│   ├── import_{files,file,ml_selections,unstructured}.go, bulk_file_import.go  # Import (5)
├── utils/                     # Reusable helpers
│   ├── cluster_import.go      # Centralized import logic (553 lines)
│   ├── validation.go          # Input validation
│   ├── astronomical.go        # Solar/civil night, moon phase
│   ├── audiomoth_parser.go   # AudioMoth WAV comment parsing
│   ├── filename_parser.go    # Filename timestamp parsing + timezone
│   ├── selection_parser.go   # ML selection parsing
│   ├── wav_metadata.go       # WAV header parsing
│   ├── xxh64.go              # XXH64 hash
│   └── *_test.go             # 170+ tests, 91.5% coverage
├── resources/schema.go        # Schema resources
├── prompts/examples.go        # 6 SQL workflow templates
└── shell_scripts/             # 8 end-to-end test scripts

Insertion in CLAUDE.md at line 103 [5.363912]
[5.368015]
[5.368015]
```
---
```

Replacement in CLAUDE.md at line 107 [5.363912]

B:BD[5.368035] → [5.368035:368133]

### Time Tool
- `get_current_time` - Returns current system time with timezone and Unix timestamp

[5.368035]

[5.368133]

### Read Tools (2)
- `get_current_time` - System time with timezone
- `execute_sql` - Generic SQL SELECT queries (supports JOINs, aggregates, CTEs)
  - Security: Read-only database + keyword validation
  - Limits: Default 1000 rows (max 10000)

Replacement in CLAUDE.md at line 113 [5.363912]

B:BD[5.368134] → [5.368134:368590]

B:BD[5.368590] → [13.68986:69004]

### Generic SQL Query Tool
- `execute_sql` - Execute arbitrary SQL SELECT queries
  - **Supports**: SELECT, WITH (CTEs), parameterized queries (? placeholders)
  - **Security**: Database is read-only (enforced by DuckDB), forbidden keyword validation
  - **Limits**: Default 1000 rows (max 10000) to prevent overwhelming responses
  - **Output**: Generic JSON results with column metadata
  - **Use with**: Schema resources to construct any query you need
### Import Tools

[5.368134]

[18.1676]

### Write Tools (4)
- `create_or_update_dataset` - Omit `id` to create, provide `id` to update
- `create_or_update_location` - GPS + timezone
- `create_or_update_cluster` - Recording groups
- `create_or_update_pattern` - Record/sleep cycles

Insertion in CLAUDE.md at line 119 [5.363912]
[18.1677]
[4.11589]
```
### Import Tools (5)
```

Replacement in CLAUDE.md at line 121 [5.363912]

B:BD[4.11620] → [4.11620:11795]

- `import_audio_files`, `import_file`, `bulk_file_import`, `import_ml_selections`: Require `'structured'` datasets
- `import_unstructured`: Requires `'unstructured'` datasets

[4.11620]

[18.1863]

- Structured imports: `import_audio_files`, `import_file`, `bulk_file_import`, `import_ml_selections`
- Unstructured imports: `import_unstructured`

Replacement in CLAUDE.md at line 124 [5.363912]

B:BD[18.1864] → [4.11796:11878]

∅:D[4.11878] → [13.69064:69295]

B:BD[13.69064] → [13.69064:69295]

- `import_audio_files` - Batch import WAV files from folder (structured datasets)
  - Automatically parses AudioMoth and filename timestamps
  - Calculates XXH64 hashes, extracts metadata
  - Computes astronomical data (solar/civil night, moon phase)
  - Skips duplicates (by hash), imports in single transaction

[18.1864]

[12.17998]

**Tools:**
- `import_audio_files` - Batch import folder → cluster
- `import_audio_file` - Single file import
- `bulk_file_import` - CSV-driven multi-location import
- `import_ml_selections` - ML detection folders (`Clips_{filter}_{date}/Species/CallType/*.wav`)
- `import_unstructured` - CLI-only, no location/cluster hierarchy

Replacement in CLAUDE.md at line 131 [5.363912]

B:BD[12.17999] → [4.11879:11968]

∅:D[11.1516] → [12.18060:18524]

∅:D[4.11968] → [12.18060:18524]

B:BD[12.18060] → [12.18060:18524]

B:BD[12.18524] → [11.1517:1552]

∅:D[11.1552] → [12.18553:19122]

B:BD[12.18553] → [12.18553:19122]

∅:D[12.19122] → [5.368590:368591]

∅:D[13.69295] → [5.368590:368591]

B:BD[5.368590] → [5.368590:368591]

B:BD[5.368591] → [13.69296:69815]

B:BD[13.69815] → [4.11969:12675]

∅:D[19.3285] → [17.8940:8941]

∅:D[4.12675] → [17.8940:8941]

B:BD[17.8940] → [17.8940:8941]

B:BD[17.8941] → [10.49469:49840]

∅:D[10.49840] → [13.70285:70286]

B:BD[13.70285] → [13.70285:70286]

∅:D[13.70286] → [5.368591:368974]

B:BD[5.368591] → [5.368591:368974]

- `import_audio_file` - Import a single WAV file into the database (structured datasets)
  - **Input**: Absolute path to WAV file, dataset/location/cluster IDs
  - **Processing**: Same as batch import (AudioMoth/filename timestamps, hash, metadata, astronomical data)
  - **Output**: Detailed file metadata including file_id, hash, duration, sample_rate, timestamps
  - **Duplicate detection**: Returns `is_duplicate=true` if file hash already exists
  - **Use case**: Import individual files without scanning folders
  - **Example**:
    ```json
    {
      "name": "import_audio_file",
      "arguments": {
        "file_path": "/path/to/recording.wav",
        "dataset_id": "abc123xyz789",
        "location_id": "def456uvw012",
        "cluster_id": "ghi789rst345"
      }
    }
    ```
  - **Output**:
    ```json
    {
      "file_id": "nB3xK8pLm9qR5sT7uV2wX",
      "file_name": "recording.wav",
      "hash": "a1b2c3d4e5f6g7h8",
      "duration_seconds": 60.0,
      "sample_rate": 250000,
      "timestamp_local": "2024-01-15T20:30:00+13:00",
      "is_audiomoth": true,
      "is_duplicate": false,
      "processing_time": "250ms"
    }
    ```
- `import_ml_selections` - Import ML-detected kiwi call selections
  - **Input**: Folder structure `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`
  - **Parses**: Selection filenames `{base}-{start}-{end}.wav`
  - **Validates**: Filter, species, call types, files, selection bounds
  - **Two-pass file matching**: Exact match then date_time pattern match
  - **Inserts**: selection → label (species) → label_subtype (call type)
  - **Transaction**: All-or-nothing import with comprehensive error reporting
- `import_unstructured` - Import WAV files into unstructured dataset (CLI only)
  - **Input**: Folder path and dataset ID (must be 'unstructured' type)
  - **Processing**: Minimal metadata - hash, duration, sample_rate, file modification time as timestamp
  - **No hierarchy**: location_id and cluster_id are NULL
  - **No astronomical data**: maybe_solar_night, maybe_civil_night, moon_phase are NULL
  - **Duplicate detection**: By hash - skips files that already exist in database (no linking, no modification)
  - **Use case**: Import miscellaneous recordings without location/cluster structure
  - **CLI**: `skraak import unstructured --db ./db/skraak.duckdb --dataset abc123 --path /path/to/folder`
### Create/Update Tools
- `create_or_update_dataset` - Create (omit id) or update (provide id) a dataset
- `create_or_update_location` - Create or update a location with GPS coordinates and timezone
- `create_or_update_cluster` - Create or update a cluster within a location
- `create_or_update_pattern` - Create or update a cyclic recording pattern (record/sleep cycle)
### Security
**Database is read-only** (db/db.go:27):
```go
readOnlyPath := dbPath + "?access_mode=read_only"
```
**Validation layers:**
1. Regex validation: Must start with SELECT or WITH
2. Forbidden keywords: Blocks INSERT/UPDATE/DELETE/DROP/CREATE/ALTER
3. Row limiting: Prevents overwhelming responses
All write operations are blocked at both database and validation levels.

[12.17999]

[16.27399]

**Import pipeline:**
- AudioMoth comment → filename → file modtime (timestamp fallback chain)
- XXH64 hash, metadata extraction, astronomical calculations
- Centralized logic in `utils/cluster_import.go`

Replacement in CLAUDE.md at line 136 [5.363912]
B:BD[16.27400] → [16.27400:27433]
```
### Application-Level Validation
```
[16.27400]
[16.27433]
```
---
```

Replacement in CLAUDE.md at line 138 [5.363912]

B:BD[16.27434] → [14.20303:20391]

**utils/validation.go** provides application-level guardrails for all write operations:

[16.27434]

[16.27522]

## Security for execute_sql tool

Replacement in CLAUDE.md at line 140 [5.363912]

B:BD[16.27523] → [16.27523:27724]

**ID Format Validation:**
- Short IDs (12 chars): dataset, location, cluster, pattern, species, filter
- Long IDs (21 chars): file, selection, label
- Validates alphanumeric format (nanoid compatible)

[16.27523]

[5.368974]

- Database **read-only** (`db/db.go:27` appends `?access_mode=read_only`)
- Validation: Regex (SELECT/WITH only) + forbidden keywords
- Parameterized queries prevent SQL injection
- Application-level validation: ID format, numeric bounds, string lengths, entity existence

Replacement in CLAUDE.md at line 145 [5.363912]

B:BD[5.368975] → [16.27725:27946]

**Numeric Bounds:**
- Sample rate: 1,000 - 500,000 Hz (reasonable audio range)
- Latitude: -90 to 90 degrees
- Longitude: -180 to 180 degrees
- Frequency: 0 to 300,000 Hz (for ML selections)
- Certainty: 0 to 100 percent

[5.368975]

[16.27946]

---

Replacement in CLAUDE.md at line 147 [5.363912]

B:BD[16.27947] → [16.27947:28292]

B:BD[16.28292] → [4.12676:12925]

∅:D[18.2040] → [16.28292:28371]

∅:D[4.12925] → [16.28292:28371]

B:BD[16.28292] → [16.28292:28371]

**String Length Validation:**
- Names: max 140 characters (location, cluster)
- Dataset names: max 255 characters
- Descriptions: max 255 characters
- Paths: max 255 characters
- Timezone IDs: max 40 characters
**Entity Validation:**
- Existence checks before foreign key references
- Active status validation (cannot update inactive entities)
- **Dataset type validation for imports**:
  - Structured imports (import_file, import_audio_files, bulk_file_import, import_ml_selections) require 'structured' datasets
  - Unstructured imports (import_unstructured) require 'unstructured' datasets
- Hierarchy consistency (location must belong to dataset, cluster to location)

[16.27947]

[16.28371]

## Resources & Prompts

Replacement in CLAUDE.md at line 149 [5.363912]

B:BD[16.28372] → [16.28372:28597]

∅:D[16.28597] → [5.368975:369070]

B:BD[5.368975] → [5.368975:369070]

**Duplicate Handling:**
- Pattern: Returns existing if same record_s/sleep_s
- Dataset: Returns existing if same name
- Location: Returns existing if same name in dataset
- Cluster: Returns existing if same name in location
## Resources
### Schema Resources
- `schema://full` - Complete 348-line database schema (SQL)

[16.28372]

[5.369070]

### Resources
- `schema://full` - Complete 348-line schema

Replacement in CLAUDE.md at line 153 [5.363912]

B:BD[5.369134] → [5.369134:369359]

**Valid table names**: dataset, location, cluster, file, selection, label, species, species_group, genus, family_group, order_group, family, order, class, phylum, kingdom, kiwi_call, call, syllable, and more (see schema.sql)

[5.369134]

[5.369359]

### Prompts
Six SQL workflow templates:
1. `query_active_datasets` - Dataset querying
2. `explore_database_schema` - Schema exploration
3. `explore_location_hierarchy` - Hierarchy navigation
4. `query_location_data` - Location analysis
5. `analyze_cluster_files` - File analysis
6. `system_status_check` - Health check

Replacement in CLAUDE.md at line 162 [5.363912]
B:BD[5.369360] → [5.369360:369371]
```
## Prompts
```
[5.369360]
[5.369371]
```
All teach SQL patterns with examples (SELECT, JOIN, GROUP BY, aggregates).
```
Replacement in CLAUDE.md at line 164 [5.363912]
B:BD[5.369372] → [5.369372:369426]
```
Six SQL workflow templates that teach query patterns:
```
[5.369372]
[5.369426]
```
---
```

Replacement in CLAUDE.md at line 166 [5.363912]

B:BD[5.369427] → [5.369427:369976]

1. **query_active_datasets** - Dataset querying with SQL SELECT and GROUP BY
2. **explore_database_schema** - Interactive schema exploration (resource-based)
3. **explore_location_hierarchy** - Hierarchy navigation with SQL JOINs
4. **query_location_data** - Location analysis with SQL filtering and aggregates
5. **analyze_cluster_files** - File analysis with SQL aggregate functions
6. **system_status_check** - Comprehensive health check workflow
All prompts teach SQL patterns with complete examples.
## Example SQL Queries
### Basic Queries

[5.369427]

[5.369976]

## SQL Examples

Deletion in CLAUDE.md at line 168 [5.363912]
B:BD[5.369977] → [5.369977:370006]
```
**Get all active datasets:**
```

Replacement in CLAUDE.md at line 169 [5.363912]

B:BD[5.370013] → [5.370013:370114]

SELECT id, name, type, description, active
FROM dataset
WHERE active = true
ORDER BY type, name;
```

[5.370013]

[5.370114]

-- Basic query
SELECT id, name FROM dataset WHERE active = true ORDER BY name;

Replacement in CLAUDE.md at line 172 [5.363912]

B:BD[5.370115] → [5.370115:370344]

**Get locations for a dataset (parameterized):**
```json
{
  "query": "SELECT id, name, latitude, longitude FROM location WHERE dataset_id = ? AND active = true",
  "parameters": ["vgIr9JSH_lFj"]
}
```
### JOINs (Now Possible!)

[5.370115]

[5.370344]

-- Parameterized (use execute_sql with parameters array)
SELECT * FROM location WHERE dataset_id = ? AND active = true;

Replacement in CLAUDE.md at line 175 [5.363912]
B:BD[5.370345] → [5.370345:370387]
```
**Dataset hierarchy with counts:**
```sql
```
[5.370345]
[5.370387]
```
-- JOINs + aggregates
```

Replacement in CLAUDE.md at line 177 [5.363912]

B:BD[5.370394] → [5.370394:370534]

    d.name as dataset,
    COUNT(DISTINCT l.id) as location_count,
    COUNT(DISTINCT c.id) as cluster_count,
    COUNT(f.id) as file_count

[5.370394]

[5.370534]

    d.name,
    COUNT(DISTINCT l.id) as locations,
    COUNT(DISTINCT c.id) as clusters,
    COUNT(f.id) as files

Deletion in CLAUDE.md at line 186 [5.363912]

B:BD[5.370699] → [5.370699:371431]

GROUP BY d.name
ORDER BY d.name;
```
### Aggregates (Now Possible!)
**Cluster file statistics:**
```sql
SELECT
    COUNT(*) as total_files,
    SUM(duration) as total_duration,
    AVG(duration) as avg_duration,
    MIN(timestamp_local) as first_recording,
    MAX(timestamp_local) as last_recording,
    SUM(CASE WHEN maybe_solar_night THEN 1 ELSE 0 END) as night_files
FROM file
WHERE cluster_id = ? AND active = true;
```
### Complex Analysis (New Possibilities!)
**Geographic distribution:**
```sql
SELECT
    d.name as dataset,
    COUNT(DISTINCT l.id) as locations,
    AVG(l.latitude) as avg_latitude,
    AVG(l.longitude) as avg_longitude
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id
WHERE d.active = true

Deletion in CLAUDE.md at line 187 [5.363912]
B:BD[5.371448] → [5.371448:371452]
```
```
```
Replacement in CLAUDE.md at line 188 [5.363912]
B:BD[5.371453] → [5.371453:371483]
```
**Temporal coverage:**
```sql
```
[5.371453]
[5.371483]
```
-- Temporal analysis
```

Replacement in CLAUDE.md at line 194 [5.363912]

B:BD[5.371610] → [5.371610:371705]

WHERE active = true
  AND timestamp_local >= '2024-01-01'
GROUP BY day
ORDER BY day
LIMIT 100;

[5.371610]

[5.371705]

WHERE active = true AND timestamp_local >= '2024-01-01'
GROUP BY day ORDER BY day LIMIT 100;

Replacement in CLAUDE.md at line 198 [5.363912]

B:BD[5.371710] → [5.371710:371783]

## Database Information
### Database Path
Default: `./db/skraak.duckdb`

[5.371710]

[5.371783]

**Best practices:** Always `WHERE active = true`, use parameterized queries for IDs, use `LEFT JOIN` to include parent records, use `COUNT(DISTINCT)` when joining.

Replacement in CLAUDE.md at line 200 [5.363912]

B:BD[5.371784] → [5.371784:372122]

### Key Tables
- **dataset** - Project datasets (organise/test/train types)
- **location** - Recording locations with GPS coordinates (139 active locations)
- **cluster** - Grouped recordings at locations
- **file** - Individual audio files with metadata
- **label** - Annotations and classifications
- **species** - Taxonomy information

[5.371784]

[5.372122]

---

Replacement in CLAUDE.md at line 202 [5.363912]
B:BD[5.372123] → [5.372123:372147]
```
## Building and Running
```
[5.372123]
[5.372147]
```
## Building & Running
```
Replacement in CLAUDE.md at line 209 [5.363912]
B:BD[5.372194] → [15.26284:26347]
```
### Run MCP Server (stdio mode - waits for MCP protocol input)
```
[5.372194]
[5.372246]
```
### MCP Server
```
Replacement in CLAUDE.md at line 214 [5.363912]
B:BD[5.372291] → [15.26348:26393]
```
### CLI Commands (power-user, no MCP needed)
```
[5.372291]
[15.26393]
```
### CLI Commands
```

Replacement in CLAUDE.md at line 217 [5.363912]

B:BD[15.26413] → [15.26413:26590]

B:BD[15.26590] → [7.30062:30496]

∅:D[7.30496] → [15.26590:26591]

B:BD[15.26590] → [15.26590:26591]

B:BD[15.26591] → [7.30497:30725]

./skraak sql --db ./db/skraak.duckdb "SELECT COUNT(*) FROM file WHERE active = true"
./skraak sql --db ./db/skraak.duckdb --limit 10 "SELECT * FROM dataset WHERE active = true"
# Dataset management
./skraak dataset create --db ./db/skraak.duckdb --name "My Dataset" --type unstructured
./skraak dataset update --db ./db/skraak.duckdb --id abc123 --name "Updated Name"
# Location management
./skraak location create --db ./db/skraak.duckdb --dataset abc123 --name "Site A" --lat -36.85 --lon 174.76 --timezone Pacific/Auckland
./skraak location update --db ./db/skraak.duckdb --id loc123 --name "Updated Name"
# Cluster management
./skraak cluster create --db ./db/skraak.duckdb --dataset abc123 --location loc456 --name "2024-01" --sample-rate 250000
./skraak cluster update --db ./db/skraak.duckdb --id cluster123 --name "Updated Name"

[15.26413]

[7.30725]

./skraak sql --db ./db/test.duckdb "SELECT COUNT(*) FROM file WHERE active = true"

Replacement in CLAUDE.md at line 219 [5.363912]

B:BD[7.30726] → [7.30726:30906]

# Recording pattern management
./skraak pattern create --db ./db/skraak.duckdb --record 60 --sleep 1740
./skraak pattern update --db ./db/skraak.duckdb --id pattern123 --record 30

[7.30726]

[7.30906]

# Entity management
./skraak dataset create --db ./db/test.duckdb --name "Test" --type unstructured
./skraak location create --db ./db/test.duckdb --dataset abc123 --name "Site A" --lat -36.85 --lon 174.76 --timezone Pacific/Auckland
./skraak cluster create --db ./db/test.duckdb --dataset abc123 --location loc456 --name "2024-01" --sample-rate 250000

Replacement in CLAUDE.md at line 224 [5.363912]

B:BD[7.30907] → [7.30907:30925]

∅:D[7.30925] → [15.26614:26712]

B:BD[15.26614] → [15.26614:26712]

B:BD[15.26712] → [7.30926:31295]

# Import commands
./skraak import bulk --db ./db/skraak.duckdb --dataset abc123 --csv import.csv --log progress.log
./skraak import file --db ./db/skraak.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/file.wav
./skraak import folder --db ./db/skraak.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/folder
./skraak import selections --db ./db/skraak.duckdb --dataset abc123 --cluster clust789 --path /path/to/Clips_filter_date

[7.30907]

[20.3741]

# Import
./skraak import file --db ./db/test.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/file.wav
./skraak import folder --db ./db/test.duckdb --dataset abc123 --location loc456 --cluster clust789 --path /path/to/folder
./skraak import bulk --db ./db/test.duckdb --dataset abc123 --csv import.csv --log progress.log

Replacement in CLAUDE.md at line 230 [5.363912]

∅:D[7.31308] → [20.3759:3905]

B:BD[20.3759] → [20.3759:3905]

B:BD[20.3905] → [7.31309:31373]

./skraak xxhash --file recording.wav     # Compute XXH64 hash (same format as DB)
./skraak metadata --file recording.wav   # Extract WAV metadata
./skraak time                            # Current time as JSON

[7.31308]

[20.3905]

./skraak xxhash --file recording.wav
./skraak metadata --file recording.wav
./skraak time

Deletion in CLAUDE.md at line 234 [5.363912]
B:BD[20.3909] → [20.3909:3932]
```
### CLI Design Policy
```

Replacement in CLAUDE.md at line 235 [5.363912]

B:BD[20.3933] → [20.3933:4036]

**All CLI tools MUST output JSON** for consistency and composability with Unix tools (jq, grep, etc.).

[20.3933]

[20.4036]

**CLI Design:** All tools output JSON for composability with Unix tools (jq, grep). Errors to stderr.

Replacement in CLAUDE.md at line 237 [5.363912]

B:BD[20.4037] → [20.4037:4193]

B:BD[20.4193] → [7.31374:31500]

```bash
# JSON output allows piping to jq
./skraak xxhash --file recording.wav | jq '.hash'
./skraak metadata --file recording.wav | jq '.duration_seconds'
./skraak time | jq '.unix'
./skraak dataset create --db ./db/test.duckdb --name "Test" --type unstructured | jq '.dataset.id'

[20.4037]

[20.4193]

---

Replacement in CLAUDE.md at line 239 [5.363912]

B:BD[20.4194] → [20.4194:4373]

B:BD[20.4373] → [7.31501:31624]

∅:D[20.4373] → [15.26712:26716]

∅:D[7.31624] → [15.26712:26716]

B:BD[15.26712] → [15.26712:26716]

# JSON output allows chaining commands
HASH=$(./skraak xxhash --file recording.wav | jq -r '.hash')
DURATION=$(./skraak metadata --file recording.wav | jq -r '.duration_seconds')
DATASET_ID=$(./skraak dataset create --db ./db/test.duckdb --name "New Dataset" --type unstructured | jq -r '.dataset.id')
```

[20.4194]

[15.26716]

## Testing

Replacement in CLAUDE.md at line 241 [5.363912]

B:BD[15.26717] → [20.4374:4464]

**Exception**: Error messages go to stderr (not stdout) so they don't break JSON parsing.

[15.26717]

[20.4464]

### Shell Scripts (in shell_scripts/)
All scripts default to `../db/test.duckdb`:

Deletion in CLAUDE.md at line 244 [5.363912]
∅:D[20.4465] → [5.372291:372307]
∅:D[15.26717] → [5.372291:372307]
B:BD[5.372291] → [5.372291:372307]
```
### Quick Tests
```
Deletion in CLAUDE.md at line 245 [5.363912]
B:BD[5.372315] → [5.372315:372353]
```
# Navigate to shell_scripts directory
```

Replacement in CLAUDE.md at line 247 [5.363912]

B:BD[5.372371] → [6.76299:76339]

∅:D[6.76339] → [5.372390:372404]

B:BD[5.372390] → [5.372390:372404]

# Quick time check (no database needed)
./get_time.sh

[5.372371]

[5.372404]

# Core functionality
./get_time.sh                                    # Time tool (no DB)
./test_sql.sh ../db/test.duckdb > test.txt 2>&1 # SQL tool

Replacement in CLAUDE.md at line 251 [5.363912]

B:BD[5.372405] → [6.76340:76448]

∅:D[6.76448] → [5.372468:372629]

B:BD[5.372468] → [5.372468:372629]

# SQL tool tests (ALWAYS use test.duckdb and pipe to file!)
./test_sql.sh ../db/test.duckdb > test.txt 2>&1
rg '"result":' test.txt | wc -l  # Should show 8 responses (6 successful + 2 validations)
rg '"isError":true' test.txt | wc -l  # Should show 2 (security tests)

[5.372405]

[5.372629]

# Write tools
./test_tools.sh ../db/test.duckdb > test.txt 2>&1

Replacement in CLAUDE.md at line 254 [5.363912]

B:BD[5.372630] → [6.76449:76596]

# Resources and prompts (use test.duckdb!)
./test_resources_prompts.sh ../db/test.duckdb > test_resources.txt 2>&1
cat test_resources.txt | jq '.'

[5.372630]

[5.372691]

# Import tools
./test_import_file.sh ../db/test.duckdb > test.txt 2>&1
./test_bulk_import.sh ../db/test.duckdb > test.txt 2>&1

Replacement in CLAUDE.md at line 258 [5.363912]

B:BD[5.372692] → [6.76597:76699]

∅:D[6.76699] → [5.372757:372797]

B:BD[5.372757] → [5.372757:372797]

# All prompts test (use test.duckdb!)
./test_all_prompts.sh ../db/test.duckdb > test_prompts.txt 2>&1
rg '"result":' test_prompts.txt | wc -l

[5.372692]

[15.26718]

# Resources/prompts
./test_resources_prompts.sh ../db/test.duckdb | jq '.'
./test_all_prompts.sh ../db/test.duckdb > test.txt 2>&1

Replacement in CLAUDE.md at line 262 [5.363912]

B:BD[15.26719] → [15.26719:26824]

# CLI sql test (no MCP, direct output)
./skraak sql --db ./db/test.duckdb "SELECT COUNT(*) FROM dataset"

[15.26719]

[5.372797]

# Verify
rg '"result":' test.txt | wc -l     # Count successes
rg '"isError":true' test.txt | wc -l  # Count expected errors

Deletion in CLAUDE.md at line 266 [5.363912]
B:BD[5.372801] → [5.372801:372848]
```
## SQL Query Tips
### Using execute_sql Tool
```

Replacement in CLAUDE.md at line 267 [5.363912]

B:BD[5.372849] → [5.372849:372981]

**Basic query:**
```json
{
  "name": "execute_sql",
  "arguments": {
    "query": "SELECT * FROM dataset WHERE active = true"
  }
}

[5.372849]

[5.372981]

### Go Unit Tests
```bash
go test ./...                        # All tests
go test -v ./utils/                  # Verbose
go test -cover ./utils/              # Coverage
go test -coverprofile=coverage.out ./utils/ && go tool cover -html=coverage.out

Replacement in CLAUDE.md at line 275 [5.363912]

B:BD[5.372986] → [5.372986:373197]

**Parameterized query (recommended for user input):**
```json
{
  "name": "execute_sql",
  "arguments": {
    "query": "SELECT * FROM location WHERE dataset_id = ?",
    "parameters": ["vgIr9JSH_lFj"]
  }
}
```

[5.372986]

[5.373197]

170+ tests, 91.5% coverage.

Replacement in CLAUDE.md at line 277 [5.363912]

B:BD[5.373198] → [5.373198:374421]

B:BD[5.374421] → [6.76700:76841]

**With custom row limit:**
```json
{
  "name": "execute_sql",
  "arguments": {
    "query": "SELECT * FROM file WHERE active = true",
    "limit": 100
  }
}
```
### SQL Best Practices
1. **Always use WHERE active = true** for main tables (dataset, location, cluster, file)
2. **Use parameterized queries** (? placeholders) for filtering by IDs
3. **Use LEFT JOIN** to include parent records even if children don't exist
4. **Use COUNT(DISTINCT)** when joining to avoid double-counting
5. **Use LIMIT** to restrict large result sets
6. **Use DATE_TRUNC** to group temporal data
7. **Use CASE WHEN** for conditional aggregates (e.g., count night vs day files)
## Common Issues and Solutions
### Query Results Too Large
**Problem**: Query returns too many rows
**Solution**: Use LIMIT clause (default 1000, max 10000)
### Server Exits Immediately
**Normal behavior** - Server runs in stdio mode, waiting for JSON-RPC input
### No Response from Tool Call
**Check**: Must initialize connection first with `initialize` method before calling tools
### Database Connection Failed
**Check**: Database path exists and is readable
### SQL Syntax Error
**Check**: Query syntax, table names (use schema resources), column names
## Go Unit Testing
### Test Coverage
The project includes comprehensive unit tests for all utility packages with **91.5% code coverage**.

[5.373198]

[6.76841]

---

Replacement in CLAUDE.md at line 279 [5.363912]

B:BD[6.76842] → [6.76842:77102]

B:BD[6.77102] → [13.70287:70361]

∅:D[13.70361] → [6.77102:77213]

B:BD[6.77102] → [6.77102:77213]

**Test files:**
- `utils/astronomical_test.go` - Astronomical calculations (solar/civil night, moon phase)
- `utils/audiomoth_parser_test.go` - AudioMoth WAV comment parsing
- `utils/filename_parser_test.go` - Filename timestamp parsing with timezone handling
- `utils/selection_parser_test.go` - ML selection filename/folder parsing
- `utils/wav_metadata_test.go` - WAV file metadata extraction
- `utils/xxh64_test.go` - XXH64 hash computation

[6.76842]

[6.77213]

## Common Issues

Replacement in CLAUDE.md at line 281 [5.363912]

B:BD[6.77214] → [13.70362:70394]

∅:D[13.70394] → [6.77246:77602]

B:BD[6.77246] → [6.77246:77602]

B:BD[6.77602] → [13.70395:70584]

∅:D[13.70584] → [6.77602:77810]

B:BD[6.77602] → [6.77602:77810]

**Total: 170+ tests covering:**
- Date format detection (YYYYMMDD, YYMMDD, DDMMYY)
- Variance-based disambiguation
- Timezone offset calculation with fixed-offset strategy
- DST transition handling
- UTC conversion correctness
- AudioMoth metadata parsing (all gain levels, temperature, battery)
- WAV header parsing (duration, sample rate, channels, INFO chunks)
- XXH64 hash validation
- ML selection filename parsing (base-start-end format)
- ML folder name parsing (Clips_filter_date format)
- WAV/PNG pair validation
- Date/time pattern extraction for fuzzy file matching
- Edge cases (invalid dates, leap years, case sensitivity)
### Running Go Tests
```bash
# Run all tests
go test ./...
# Run specific package
go test ./utils/
# Run with verbose output
go test -v ./utils/

[6.77214]

[6.77810]

- **Query too large:** Use `LIMIT` (default 1000, max 10000)
- **Server exits immediately:** Normal - runs in stdio mode (waits for MCP input)
- **Database connection failed:** Check path exists and is readable
- **SQL syntax error:** Check query syntax, table/column names (use schema resources)

Replacement in CLAUDE.md at line 286 [5.363912]
B:BD[6.77811] → [6.77811:77884]
```
# Run specific test
go test -v ./utils/ -run TestParseFilenameTimestamps
```
[6.77811]
[6.77884]
```
---
```

Deletion in CLAUDE.md at line 288 [5.363912]

B:BD[6.77885] → [6.77885:78046]

∅:D[6.78046] → [5.374421:374422]

B:BD[5.374421] → [5.374421:374422]

B:BD[5.374422] → [6.78047:78627]

# Run with coverage report
go test -cover ./utils/
# Generate coverage profile
go test -coverprofile=coverage.out ./utils/
go tool cover -html=coverage.out
```
### Test Organization
Tests follow Go conventions:
- Test files named `*_test.go`
- Test functions named `Test*`
- Use table-driven tests where appropriate
- Include edge cases and error conditions
- Match TypeScript test suite from original project
**Key differences from TypeScript tests:**
- Go separates filename parsing from timezone application (better design)
- Go validates dates strictly (TypeScript's Date constructor auto-corrects)
- Console logging tests omitted (not applicable to MCP servers)
- All essential functionality covered with equivalent or better tests

Replacement in CLAUDE.md at line 290 [5.363912]
B:BD[5.374455] → [5.374455:374510]
```
Add to `~/.config/Claude/claude_desktop_config.json`:
```
[5.374455]
[5.374510]
```
`~/.config/Claude/claude_desktop_config.json`:
```

Deletion in CLAUDE.md at line 301 [5.363912]

B:BD[5.374700] → [5.374700:374784]

B:BD[5.374784] → [21.9950:9951]

B:BD[21.9951] → [15.26825:27199]


Remember to restart Claude Desktop after configuration changes.
## Recent Changes
### Latest Update: CLI Refactoring — Two-Layer Architecture (2026-02-11)
**Major refactoring: Separated core logic from MCP types, added CLI commands**
**Problem:** All tool functions were tightly coupled to MCP SDK types (`*mcp.CallToolRequest`, `*mcp.CallToolResult`). This meant functionality could only be invoked via MCP protocol — no CLI access for power users.

Replacement in CLAUDE.md at line 302 [5.363912]

B:BD[15.27200] → [15.27200:27278]

**Solution:** Two-layer architecture separating core logic from MCP adapters.

[15.27200]

[15.27278]

Restart Claude Desktop after changes.

Deletion in CLAUDE.md at line 304 [5.363912]

B:BD[15.27279] → [15.27279:28430]

B:BD[15.28430] → [22.37843:37878]

∅:D[22.37878] → [15.28460:28746]

B:BD[15.28460] → [15.28460:28746]

∅:D[15.28746] → [21.10027:10028]

B:BD[21.10027] → [21.10027:10028]

B:BD[21.10028] → [15.28747:29277]

**Created:**
- `cmd/mcp.go` — MCP server setup + 10 thin adapter wrappers (~3 lines each)
- `cmd/import.go` — `skraak import bulk` CLI command with flag parsing
- `cmd/sql.go` — `skraak sql` CLI command for ad-hoc queries
**Modified (mechanical, all tools/):**
- Removed `*mcp.CallToolRequest` parameter (was never used — `req` always ignored)
- Removed `*mcp.CallToolResult` from returns (was always empty `&mcp.CallToolResult{}`)
- Removed `import "github.com/modelcontextprotocol/go-sdk/mcp"` from all tool files
- Updated test files (`integration_test.go`, `pattern_test.go`) to match new signatures
- Updated `main.go` to pure dispatcher: `mcp | import | sql`
**Architecture:**
```
main.go              → pure dispatcher
cmd/mcp.go           → MCP server + adapter wrappers (ONLY file importing mcp SDK)
cmd/import.go         → CLI: skraak import bulk --db ... --dataset ... --csv ... --log ...
cmd/sql.go            → CLI: skraak sql --db ... "SELECT ..."
tools/*.go            → core logic, NO mcp dependency (plain Go structs in/out)
utils/, db/, etc.     → unchanged
```
**CLI Usage:**
```bash
# MCP server (unchanged)
skraak mcp --db ./db/skraak.duckdb
# Power-user CLI commands (new)
skraak sql --db ./db/skraak.duckdb "SELECT COUNT(*) FROM file WHERE active = true"
skraak sql --db ./db/skraak.duckdb --limit 10 "SELECT * FROM dataset"
skraak import bulk --db ./db/skraak.duckdb --dataset abc123 --csv import.csv --log progress.log
```
**Benefits:**
- ✅ **CLI access:** Power users can run imports and queries without MCP
- ✅ **Token savings:** CLI commands avoid MCP protocol overhead
- ✅ **Code sharing:** CLI and MCP call identical core functions
- ✅ **MCP SDK contained:** Only `cmd/mcp.go` imports the MCP SDK
- ✅ **Extensible:** New CLI commands just need a file in `cmd/` calling `tools/`
- ✅ **No logic changes:** All core tool logic unchanged, just signature cleanup
- ✅ **All tests pass:** `go test ./...`, all 8 shell test scripts verified

Deletion in CLAUDE.md at line 305 [5.363912]

B:BD[15.29281] → [15.29281:29361]

∅:D[15.29361] → [21.10028:10125]

B:BD[21.10028] → [21.10028:10125]

∅:D[21.10125] → [23.607:608]

B:BD[5.374784] → [23.607:608]

B:BD[23.608] → [21.10126:11946]

∅:D[23.672] → [24.4021:4022]

∅:D[21.11946] → [24.4021:4022]

B:BD[5.374784] → [24.4021:4022]

B:BD[24.4022] → [23.673:2081]

∅:D[23.2081] → [5.374784:374785]

∅:D[24.4086] → [5.374784:374785]

B:BD[5.374784] → [5.374784:374785]

B:BD[5.374785] → [24.4087:6448]

∅:D[24.6448] → [5.374861:376730]

B:BD[5.374861] → [5.374861:376730]

B:BD[5.376730] → [6.78628:78753]

∅:D[6.78753] → [5.376730:376731]

B:BD[5.376730] → [5.376730:376731]

B:BD[5.376731] → [6.78754:79728]

B:BD[6.79728] → [13.70585:70660]

∅:D[13.70660] → [6.79728:79786]

B:BD[6.79728] → [6.79728:79786]


### Previous Update: Bulk File Import Cluster Assignment Bug Fix (2026-02-10)
**Critical Bug Fix: Files now correctly distributed across multiple clusters for same location**
**Problem:** When the same location appeared multiple times in the CSV with different date ranges, all files ended up in the **last cluster created** instead of being distributed across their respective clusters.
**Root Cause:** The `clusterIDMap` used only `LocationID` as the key, causing each new cluster for the same location to overwrite the previous one in the map.
**Example of Bug:**
```csv
A12,loc123,/path/2019,2019,8000,864
A12,loc123,/path/2020,2020,8000,180
A12,loc123,/path/2022,2022,8000,180
A12,loc123,/path/2024,2024,8000,549
```
- **Before fix:** 4 clusters created, ALL 1773 files go into 2024 cluster
- **After fix:** 4 clusters created, files distributed correctly (864 in 2019, 180 in 2020, etc.)
**Solution:** Changed map key from `LocationID` to composite key `LocationID|DateRange`.
**Modified:**
- `tools/bulk_file_import.go` (lines 125, 171-172, 183-184)
  - Line 125: Updated map comment to reflect composite key
  - Line 171-172: Store with `compositeKey := loc.LocationID + "|" + loc.DateRange`
  - Line 183-184: Retrieve with same composite key
**Impact:**
- ✅ **Data integrity restored:** Files now go to correct clusters
- ✅ **Multiple date ranges per location:** Now works correctly
- ✅ **No breaking changes:** Simple 3-line fix
- ✅ **Backwards compatible:** Single location CSV rows work identically
**Verification:**
```sql
SELECT
    l.name as location_name,
    c.name as cluster_name,
    COUNT(f.id) as file_count
FROM cluster c
JOIN location l ON c.location_id = l.id
LEFT JOIN file f ON f.cluster_id = c.id
WHERE l.name = 'A12'
GROUP BY c.id, l.name, c.name
ORDER BY c.name;
```
**Note:** Data previously imported with the buggy code will need to be re-imported to fix cluster assignments.
---
### Previous Update: File Modification Time Fallback (2026-02-07)
**Enhancement: Added file modification time as third timestamp fallback**
**Problem:** Small clusters (1-2 files) failed variance-based filename disambiguation because the algorithm needs multiple samples to determine date format (YYYYMMDD vs YYMMDD vs DDMMYY).
**Solution:** Added file modification time as third fallback in timestamp resolution chain.
**Timestamp Resolution Order:**
```
1. AudioMoth comment → timestamp
2. Filename parsing → timestamp  
3. File modification time → timestamp (NEW!)
4. FAIL (skip file with error)
```
**Modified:**
- `utils/cluster_import.go` - Added FileModTime fallback in `batchProcessFiles()`
  - Silent fallback (no warning logged)
  - Assumes FileModTime is in location timezone
  - Reduces import failures in small clusters
**Benefits:**
- ✅ **Fewer failures:** Small clusters (1-2 files) no longer fail when filename parsing can't disambiguate
- ✅ **No performance impact:** FileModTime already extracted in `ParseWAVHeader()`
- ✅ **Backwards compatible:** Only helps files that would have failed
- ✅ **Simple:** 10 lines of code, defensive checks, no complexity
**Use Case:** User has 1-2 files with unparseable filenames (e.g., `recording001.wav`) → Previously failed, now uses FileModTime.
**See Also:**
- `TIMESTAMP_FALLBACK_PLAN.md` - Complete implementation plan
---
### Previous Update: Cluster Import Logic Extraction (2026-02-07)
**Major refactoring: Extracted shared cluster import logic into utils module**
**Key Insight:** A cluster is the atomic unit of import (one SD card / one recording session / one folder).
**Created:**
- `utils/cluster_import.go` (553 lines) - Single source of truth for cluster imports
  - `ImportCluster()` - Main entry point used by both import_files.go and bulk_file_import.go
  - `scanClusterFiles()` - Recursive WAV file scanning
  - `batchProcessFiles()` - Batch processing with variance-based filename timestamp parsing
  - `insertClusterFiles()` - Transactional database insertion
  - Moved `FileImportError` type from tools/ to utils/
**Modified:**
- `tools/import_files.go` - **75% code reduction** (650 lines → 161 lines)
  - Now just calls `utils.ImportCluster()` for all the heavy lifting
  - Removed ~500 lines of duplicated logic
- `tools/bulk_file_import.go` - **Bug fixes + simplification**
  - **🐛 CRITICAL BUG FIXED:** Now inserts into `file_dataset` table (was missing!)
  - **🐛 CRITICAL BUG FIXED:** Now inserts into `moth_metadata` table (was missing!)
  - Now uses shared `utils.ImportCluster()` logic
  - Files are no longer orphaned from datasets
- `tools/import_file.go` - Added helper wrappers for compatibility
**Benefits:**
- ✅ **Bug Fixed:** 68,043 orphaned files found in test database (confirms bug was real)
- ✅ **Single source of truth:** All cluster import logic in one place
- ✅ **Code reduction:** ~500 lines of duplicated code eliminated
- ✅ **Consistency:** Both single-cluster and multi-cluster imports use identical logic
- ✅ **Maintainability:** Changes to import logic made in one place
- ✅ **Performance:** No regression, same batch processing as before
**Architecture:**
```
Before:
  tools/import_files.go (650 lines) - Custom logic
  tools/bulk_file_import.go (460 lines) - Different logic (BUGGY)
After:
  utils/cluster_import.go (553 lines) - Shared logic
  tools/import_files.go (161 lines) - Calls utils.ImportCluster()
  tools/bulk_file_import.go (393 lines) - Calls utils.ImportCluster()
```
**See Also:**
- `plan.md` - Complete refactoring plan with implementation checklist
- `REFACTORING_SUMMARY.md` - Detailed summary of changes
- `VERIFICATION_RESULTS.md` - Test results and database analysis
---
### Previous Update: Generic SQL Tool + Codebase Rationalization (2026-01-26)
**Major architectural change: Replaced 6 specialized tools with generic SQL approach**
**Deleted:**
- `tools/dataset.go` - query_datasets tool
- `tools/location.go` - query_locations, query_locations_by_dataset tools
- `tools/cluster.go` - query_clusters, query_clusters_by_location tools
- `tools/file.go` - query_files_by_cluster tool
- `shell_scripts/test_new_tools.sh` - Obsolete test script
- `shell_scripts/test_mcp.sh` - Obsolete test script
**Added:**
- `tools/sql.go` - Generic execute_sql tool (~200 lines)
- `shell_scripts/test_sql.sh` - Comprehensive SQL test suite
**Modified:**
- `main.go` - Removed 6 tool registrations, kept only get_current_time and execute_sql
- `prompts/examples.go` - Completely rewritten to teach SQL patterns instead of tool calls
- All 6 prompts now include SQL examples with SELECT, JOIN, GROUP BY, aggregates
**Benefits:**
- Full SQL expressiveness (JOINs, aggregates, CTEs, subqueries) - **previously impossible**
- Infinite query possibilities vs 6 fixed queries
- More aligned with MCP philosophy (context over APIs)
- LLMs can answer any question given the schema
- Smaller codebase (2 tools instead of 8)
- More maintainable (no new tool for each query pattern)
**Security:**
- Database already read-only (verified in db/db.go)
- Validation layers block write operations
- Parameterized queries prevent SQL injection
- Row limits prevent overwhelming responses
**Migration Notes:**
- Old tool calls must be replaced with SQL queries
- All old functionality is still available via SQL
- Prompts provide SQL examples for common patterns
- Schema resources provide full context for query construction
### Previous Update: Shell Scripts Organization (2026-01-26)
- Reorganized all shell scripts into `shell_scripts/` directory
- Keeps project root clean and organized
- All scripts updated with correct relative paths
### Latest Update: Comprehensive Go Unit Testing (2026-01-28)
**Added comprehensive unit test suite for utility packages**
**Added:**
- `utils/astronomical_test.go` - 11 test cases for astronomical calculations
- `utils/audiomoth_parser_test.go` - 36 test cases for AudioMoth parsing
- `utils/filename_parser_test.go` - 60 test cases for filename/timezone parsing
- `utils/wav_metadata_test.go` - 22 test cases for WAV metadata extraction
- `utils/xxh64_test.go` - 6 test cases for hash computation
**Test Coverage:**
- **Total: 136 tests**
- **Coverage: 91.5%** of statements
- All tests ported from TypeScript test suite
- Additional Go-specific tests for date validation
**Key Test Areas:**
- Filename parsing: YYYYMMDD, YYMMDD, DDMMYY formats with variance-based disambiguation
- Timezone handling: Fixed-offset strategy, DST transitions (Auckland, US timezones)
- UTC conversion: Mathematical correctness validation
- AudioMoth: Comment parsing, all gain levels, timezone formats
- WAV metadata: Duration, sample rate, INFO chunks
- Astronomical: Solar/civil night, moon phase calculations
- ML selection parsing: Filename format, folder structure, WAV/PNG pairing
- Edge cases: Invalid dates, leap years, case sensitivity

Replacement in CLAUDE.md at line 306 [5.363912]
B:BD[6.79787] → [13.70661:70718]
```
### Latest Update: ML Selection Import Tool (2026-01-29)
```
[6.79787]
[13.70718]
```
## Quick Reference
```

Replacement in CLAUDE.md at line 308 [5.363912]

B:BD[13.70719] → [13.70719:70798]

**New Feature: Import ML-detected kiwi call selections from folder structure**

[13.70719]

[13.70798]

**Status:** CLI refactoring complete (2026-02-14)
**Architecture:** Two-layer (tools=MCP-free, cmd/mcp.go=adapters)
**Tools:** 11 total (read: 2, write: 4, import: 5)
**CLI Commands:** `mcp`, `sql`, `dataset`, `location`, `cluster`, `pattern`, `import`
**Test Scripts:** 8 comprehensive shell scripts
**Test Coverage:** 170+ Go unit tests (91.5%)
**Import Logic:** Centralized in `utils/cluster_import.go` (553 lines)
**Timestamp Fallback:** AudioMoth → Filename → FileModTime
**Databases:** `skraak.duckdb` (production ⚠️), `test.duckdb` (testing ✅)
**Current Data:** 1.19M files, 139 locations, 8 active datasets

Replacement in CLAUDE.md at line 319 [5.363912]

B:BD[13.70799] → [13.70799:72949]

B:BD[13.72949] → [10.49841:49981]

∅:D[17.9006] → [12.19123:19181]

∅:D[10.49981] → [12.19123:19181]

B:BD[13.73026] → [12.19123:19181]

B:BD[12.19181] → [11.1553:1628]

∅:D[11.1628] → [12.19250:20852]

B:BD[12.19250] → [12.19250:20852]

B:BD[12.20852] → [10.49982:50121]

∅:D[10.50121] → [17.9070:10853]

B:BD[17.9070] → [17.9070:10853]

B:BD[17.10853] → [11.1629:1683]

∅:D[11.1683] → [17.10901:11058]

B:BD[17.10901] → [17.10901:11058]

B:BD[17.11058] → [10.50122:50199]

∅:D[17.11135] → [13.73026:73027]

∅:D[12.20929] → [13.73026:73027]

∅:D[10.50199] → [13.73026:73027]

B:BD[13.73026] → [13.73026:73027]

B:BD[13.73027] → [9.43317:43894]

B:BD[9.43894] → [10.50200:50289]

∅:D[10.50289] → [9.43966:44485]

B:BD[9.43966] → [9.43966:44485]

B:BD[9.44485] → [10.50290:50371]

∅:D[10.50371] → [9.44538:44910]

B:BD[9.44538] → [9.44538:44910]

B:BD[9.44910] → [10.50372:50413]

∅:D[10.50413] → [9.44951:45050]

B:BD[9.44951] → [9.44951:45050]

B:BD[9.45050] → [10.50414:51992]

∅:D[10.51992] → [9.45050:45051]

B:BD[9.45050] → [9.45050:45051]

B:BD[9.45051] → [10.51993:52286]

∅:D[9.45051] → [5.376731:376736]

∅:D[10.52286] → [5.376731:376736]

∅:D[13.73027] → [5.376731:376736]

∅:D[6.79787] → [5.376731:376736]

B:BD[5.376731] → [5.376731:376736]

B:BD[5.376736] → [4.12926:12960]

∅:D[4.12960] → [15.29396:29571]

B:BD[15.29396] → [15.29396:29571]

B:BD[15.29571] → [4.12961:13223]

∅:D[4.13223] → [24.6566:6641]

∅:D[15.29726] → [24.6566:6641]

B:BD[10.52407] → [24.6566:6641]

∅:D[24.6641] → [9.45154:45209]

B:BD[9.45154] → [9.45154:45209]

B:BD[9.45209] → [24.6642:6711]

B:BD[24.6711] → [23.2150:2250]

∅:D[23.2250] → [24.6711:6884]

B:BD[24.6711] → [24.6711:6884]

∅:D[24.6884] → [6.79924:80005]

∅:D[13.73319] → [6.79924:80005]

B:BD[5.376988] → [6.79924:80005]

B:BD[6.80005] → [24.6885:6977]

**Added:**
- `utils/selection_parser.go` - Selection filename and folder parsing utilities
- `utils/selection_parser_test.go` - Comprehensive tests (34 test cases)
- `tools/import_ml_selections.go` - MCP tool implementation (~1050 lines)
- `shell_scripts/test_import_selections.sh` - Integration test script
**Features:**
- **Folder structure support**: `Clips_{filter_name}_{date}/Species/CallType/*.wav+.png`
- **Filename parsing**: Extracts `{base}-{start}-{end}.wav` format
- **Two-pass file matching**:
  1. Exact match by filename
  2. Fuzzy match by date_time pattern (handles prefix variations)
- **Comprehensive validation**:
  - Filter exists in database
  - Species linked to dataset
  - Call types exist for species
  - Files exist in cluster
  - Selection bounds within file duration
- **Transactional import**: All-or-nothing with error collection
- **Database relations**: selection → label (species) → label_subtype (call type)
**Usage Example:**
```bash
# Folder structure:
# Clips_opensoundscape-kiwi-1.0_2025-11-14/
#   └── Brown Kiwi/
#       ├── Male - Solo/
#       │   ├── A05-20250517_214501-102-133.wav
#       │   └── A05-20250517_214501-102-133.png
#       └── Female - Solo/
#           └── ...
# MCP tool call:
{
  "name": "import_ml_selections",
  "arguments": {
    "folder_path": "/path/to/Clips_opensoundscape-kiwi-1.0_2025-11-14",
    "dataset_id": "abc123xyz789",
    "cluster_id": "def456uvw012"
  }
}
```
**Validation Features:**
- Batch validation of all entities before any database writes
- Comprehensive error reporting (filename, species, call type, stage)
- Fuzzy file matching handles filename prefix variations
- Strict selection bounds checking (end_time ≤ file.duration)
- Ambiguous match detection (multiple files with same date_time pattern)
**Test Coverage:**
- 34 unit tests for selection parsing utilities
- Tests for various filename formats (with/without dashes, decimal times)
- Tests for folder name parsing (filter + date extraction)
- Tests for WAV/PNG pair validation
- Tests for date_time pattern extraction (8-digit and 6-digit formats)
**Tool Count Update**: Now **7 total tools** (read: 2, write: 4, import: 2)
*Note: Tool count increased to 10 in later updates (import: 4)*
### Latest Update: Single File Import Tool (2026-02-02)
**New Feature: Import individual WAV files with `import_audio_file` tool**
**Added:**
- `tools/import_file.go` - Single file import implementation (~300 lines)
- `shell_scripts/test_import_file.sh` - Integration test script
**Features:**
- **Single file import**: Import one WAV file at a time with detailed feedback
- **Same processing pipeline**: Reuses all utilities from batch import (AudioMoth parsing, timestamp extraction, hash computation, astronomical calculations)
- **Shared helper functions**: Reuses `validateImportInput()`, `getLocationData()`, `ensureClusterPath()` from import_files.go
- **Detailed output**: Returns file_id, hash, duration, sample_rate, timestamps, processing time
- **Duplicate detection**: Checks hash before insertion, returns `is_duplicate=true` if exists
- **Fail-fast errors**: Single file import is atomic - succeeds completely or fails with clear error message
**Input:**
```json
{
  "file_path": "/absolute/path/to/file.wav",
  "dataset_id": "12-char-id",
  "location_id": "12-char-id",
  "cluster_id": "12-char-id"
}
```
**Output:**
```json
{
  "file_id": "21-char-nanoid",
  "file_name": "filename.wav",
  "hash": "16-char-xxh64-hex",
  "duration_seconds": 60.0,
  "sample_rate": 250000,
  "timestamp_local": "2024-01-15T20:30:00+13:00",
  "is_audiomoth": true,
  "is_duplicate": false,
  "processing_time": "250ms"
}
```
**Use Cases:**
- Import files one at a time with detailed feedback per file
- Programmatic import where you already know the file path
- Import files from different locations without folder scanning
- Get immediate feedback on duplicate detection
- Alternative to batch import for small numbers of files
**Tool Count Update**: Now **8 total tools** (read: 2, write: 4, import: 3)
*Note: Tool count increased to 10 in later update (import: 4)*
### Latest Update: Bulk File Import Tool (2026-02-06)
**New Feature: CSV-based bulk import across multiple locations and clusters**
**Added:**
- `tools/bulk_file_import.go` - CSV-based bulk import implementation (~500 lines)
**Features:**
- **CSV-driven import**: Single CSV file specifies multiple locations, directories, and clusters
- **Auto-cluster creation**: Automatically creates clusters if they don't exist for location/date_range
- **Progress logging**: Real-time progress logging to file (monitor with `tail -f log_file.log`)
- **Synchronous/fail-fast**: Processes sequentially with immediate error reporting
- **Summary statistics**: Returns detailed counts for locations, clusters, files, duplicates, errors
- **Shared utilities**: Reuses all WAV processing utilities (AudioMoth, timestamps, hash, astronomical)
**CSV Format:**
```csv
location_name,location_id,directory_path,date_range,sample_rate,file_count
Site A,loc123456789,/path/to/recordings,2024-01,48000,150
Site B,loc987654321,/path/to/recordings,2024-02,250000,200
```
**Input:**
```json
{
  "dataset_id": "12-char-id",
  "csv_path": "/absolute/path/to/import.csv",
  "log_file_path": "/absolute/path/to/progress.log"
}
```
**Output:**
```json
{
  "total_locations": 10,
  "clusters_created": 5,
  "clusters_existing": 5,
  "total_files_scanned": 1500,
  "files_imported": 1200,
  "files_duplicate": 250,
  "files_error": 50,
  "processing_time": "5m30s",
  "errors": []
}
```
**Use Cases:**
- Bulk import across many locations in one operation
- Automated import pipelines with CSV generation
- Large-scale data migration from existing systems
- Batch processing with progress monitoring via log file
**Comparison with other import tools:**
- `import_audio_files`: Single folder → single cluster
- `import_audio_file`: Single file → single cluster
- `import_ml_selections`: ML detection folder structure → selections
- `bulk_file_import`: CSV with multiple folders → multiple clusters (auto-creates)
**Tool Count Update**: Now **10 total tools** (read: 2, write: 4, import: 4)
### Latest Update: Test Script Consolidation (2026-02-06)
**Rationalized and consolidated shell test scripts for better organization**
**Removed redundant scripts:**
- `test_import_simple.sh` - Only tested registration (redundant)
- `test_import_tool.sh` - Incomplete, just schema validation
- `test_write_simple.sh` - Incomplete happy-path test
- `test_write_tools.sh` - Replaced by comprehensive test_tools.sh
- `test_write_e2e.sh` - Required manual ID replacement (not automated)
- `test_update_tools.sh` - Replaced by test_tools.sh
**Added comprehensive test scripts:**
- `test_tools.sh` - All 4 create_or_update tools (create + update modes) with validation
- `test_bulk_import.sh` - Tests bulk_file_import tool with CSV parsing
**Updated documentation:**
- `shell_scripts/TESTING.md` - Complete rewrite with current tool set
- Removed references to deleted tools (query_datasets, etc.)
- Added examples for all 14 current tools
- Added SQL query examples (JOINs, aggregates, temporal analysis)
- Added troubleshooting section and best practices
**Current test suite (8 scripts):**
1. `get_time.sh` - Time tool (no database)
2. `test_sql.sh` - SQL query tool (comprehensive)
3. `test_tools.sh` - All create_or_update tools (4 tools, create + update modes)
4. `test_import_file.sh` - Single file import
5. `test_import_selections.sh` - ML selection import
6. `test_bulk_import.sh` - Bulk CSV import
7. `test_resources_prompts.sh` - Resources/prompts
8. `test_all_prompts.sh` - All 6 prompts
**Benefits:**
- Cleaner shell_scripts directory (8 scripts vs 14)
- Better organization by functionality
- No redundant/incomplete tests
- Comprehensive coverage of all 10 tools
- Up-to-date documentation matching current codebase
- All tests default to test.duckdb for safety
### Latest Update: Tool Consolidation - 8 write/update tools → 4 create_or_update tools (2026-02-06)
**Consolidated 4 create_* + 4 update_* tools into 4 create_or_update_* tools**
**Deleted (8 files):**
- `tools/write_dataset.go`, `tools/write_location.go`, `tools/write_cluster.go`, `tools/write_pattern.go`
- `tools/update_dataset.go`, `tools/update_location.go`, `tools/update_cluster.go`, `tools/update_pattern.go`
- `tools/write_pattern_test.go`
**Added (4 files + 1 test):**
- `tools/dataset.go` - `create_or_update_dataset` (create when no id, update when id provided)
- `tools/location.go` - `create_or_update_location`
- `tools/cluster.go` - `create_or_update_cluster`
- `tools/pattern.go` - `create_or_update_pattern`
- `tools/pattern_test.go` - Updated tests for consolidated pattern tool
**Modified:**
- `main.go` - 8 tool registrations → 4
- `tools/integration_test.go` - Updated to use new ClusterInput/CreateOrUpdateCluster types
- `shell_scripts/test_tools.sh` - Updated to test 4 tools (both create and update modes)
- `shell_scripts/test_bulk_import.sh` - Updated tool names
- `shell_scripts/TESTING.md` - Updated documentation
- `CLAUDE.md` - Updated tool counts, directory structure, documentation
**Design:**
- Omit `id` field → CREATE mode (generates nanoid, inserts, returns entity)
- Provide `id` field → UPDATE mode (verifies exists, builds dynamic UPDATE, returns entity)
- Shared validation logic per entity (e.g., coordinate bounds, name length)
- Both modes now return the full entity (update previously only returned success boolean)
**Benefits:**
- Tool count reduced from 14 → 10 (fewer tools for LLM to reason about)
- File count reduced from 8 → 4 (fewer files to maintain)
- ~31% less code (~320 lines removed)
- Shared validation logic eliminates duplication
- Consistent return types (both modes return the entity)
---
**Last Updated**: 2026-02-14 NZDT
**Status**: CLI refactoring complete — two-layer architecture ✅
**Architecture**: `tools/` = core logic (MCP-free), `cmd/mcp.go` = MCP adapters, `cmd/*.go` = CLI commands
**Current Tools**: 11 (read: 2, write: 4, import: 5) — 10 via MCP, 1 CLI-only (import_unstructured)
**CLI Commands**: `skraak mcp`, `skraak sql`, `skraak dataset`, `skraak location`, `skraak cluster`, `skraak import {bulk|file|folder|selections|unstructured}`
**Test Scripts**: 8 comprehensive shell scripts + verify_database_state.sh
**Test Coverage**: 170+ Go unit tests (91.5% coverage)
**Import Logic**: Centralized in utils/cluster_import.go (553 lines)
**Timestamp Fallback**: AudioMoth → Filename → FileModTime (reduces failures in small clusters)
**Code Quality**: ~500 lines of duplication eliminated (75% reduction in import_files.go)
**Current Database**: 139 locations, 8 active datasets, 1.19M files in test.duckdb
**Test Databases**: skraak.duckdb (production) ⚠️, test.duckdb (testing) ✅
**Known Issue**: test.duckdb has 68K orphaned files from old buggy import (historical data)

[13.70799]

**Last Updated:** 2026-02-15 NZDT