quietlight/skraak_mcp: CLAUDE.md

# Claude's Notes - Skraak MCP Server

This file contains important reminders and best practices for working with the Skraak MCP Server codebase.

## ⚠️ CRITICAL TESTING REMINDER

### Running Test Scripts with Large Output

**NEVER** run test scripts directly without piping to a file. Large outputs can cause token overflow.

**CORRECT APPROACH:**
```bash
# All shell scripts are in shell_scripts/ directory
cd shell_scripts

# Pipe output to file first
./test_sql.sh > test.txt 2>&1

# Then use targeted searches to verify results
rg -i "error" test.txt                    # Check for errors
rg '"result":' test.txt | wc -l           # Count successful responses
rg '"isError":true' test.txt | wc -l      # Count validation errors (expected)
```

**WRONG APPROACH:**
```bash
# ❌ DON'T DO THIS - may crash with massive output
cd shell_scripts && ./test_sql.sh
```

### Available Test Scripts

**IMPORTANT: All shell scripts are located in the `shell_scripts/` directory** to keep the project organized.

All test scripts accept an optional database path argument (defaults to `../db/skraak.duckdb`):

1. **test_sql.sh** - Tests execute_sql tool with various queries
   - Tests: simple SELECT, parameterized queries, JOINs, aggregates, security validation
   - Always pipe to file!
2. **test_resources_prompts.sh** - Tests resources and prompts
3. **test_all_prompts.sh** - Tests all 6 prompts
4. **get_time.sh** - Quick test of get_current_time tool

### Verifying Test Success

After piping to test.txt, check for:
```bash
# Count successful responses (should equal number of successful tests)
rg '"result":' test.txt | wc -l

# Count validation errors (expected for security tests)
rg '"isError":true' test.txt | wc -l

# No unexpected errors
rg -i '"error"' test.txt | grep -v '"isError"'
```

## Project Overview

### Architecture

The Skraak MCP Server is a Model Context Protocol (MCP) server written in Go that provides a **generic SQL query interface** for an acoustic monitoring system. It follows MCP's three-primitive architecture with an LLM-friendly design:

- **Tools** (model-controlled): Generic SQL query execution + time utility
- **Resources** (application-driven): Full database schema for context
- **Prompts** (user-controlled): SQL workflow templates that teach query patterns

### Philosophy: Schema + Generic SQL > Specialized Tools

**Why Generic SQL:**
- LLMs can construct any query given the schema (infinite flexibility)
- No rigid tool APIs to learn (just SQL)
- Full SQL expressiveness: JOINs, aggregates, CTEs, subqueries
- Prompts teach SQL patterns instead of tool calling

**Previous specialized tools were limiting:**
- Each tool = one fixed query
- Couldn't filter beyond hardcoded parameters
- Couldn't JOIN tables or use aggregates
- Created artificial boundaries

**With schema + generic SQL:**
- Infinite query possibilities
- LLM constructs appropriate query for each question
- Full DuckDB SQL feature set available
- More aligned with MCP philosophy (provide context, not APIs)

### Directory Structure

```
skraak_mcp/
├── main.go                       # Server entry point, tool registration
├── db/
│   ├── db.go                     # Database connection (read-only mode)
│   ├── types.go                  # Type definitions
│   └── schema.sql                # Database schema (348 lines)
├── tools/                        # Tool implementations
│   ├── time.go                   # get_current_time (utility tool)
│   └── sql.go                    # execute_sql (generic query tool)
├── resources/
│   └── schema.go                 # Schema resources (full & per-table)
├── prompts/
│   └── examples.go               # SQL workflow templates (6 prompts)
└── shell_scripts/                # Shell test scripts
    ├── test_sql.sh               # SQL tool tests
    ├── test_resources_prompts.sh # Resources/prompts tests
    ├── test_all_prompts.sh       # All 6 prompts tests
    └── get_time.sh               # Time tool test
```

## Available Tools

### Time Tool
- `get_current_time` - Returns current system time with timezone and Unix timestamp

### Generic SQL Query Tool
- `execute_sql` - Execute arbitrary SQL SELECT queries
  - **Supports**: SELECT, WITH (CTEs), parameterized queries (? placeholders)
  - **Security**: Database is read-only (enforced by DuckDB), forbidden keyword validation
  - **Limits**: Default 1000 rows (max 10000) to prevent overwhelming responses
  - **Output**: Generic JSON results with column metadata
  - **Use with**: Schema resources to construct any query you need

### Security

**Database is read-only** (db/db.go:27):
```go
readOnlyPath := dbPath + "?access_mode=read_only"
```

**Validation layers:**
1. Regex validation: Must start with SELECT or WITH
2. Forbidden keywords: Blocks INSERT/UPDATE/DELETE/DROP/CREATE/ALTER
3. Row limiting: Prevents overwhelming responses

All write operations are blocked at both database and validation levels.

## Resources

### Schema Resources
- `schema://full` - Complete 348-line database schema (SQL)
- `schema://table/{table_name}` - Individual table definitions

**Valid table names**: dataset, location, cluster, file, selection, label, species, species_group, genus, family_group, order_group, family, order, class, phylum, kingdom, kiwi_call, call, syllable, and more (see schema.sql)

## Prompts

Six SQL workflow templates that teach query patterns:

1. **query_active_datasets** - Dataset querying with SQL SELECT and GROUP BY
2. **explore_database_schema** - Interactive schema exploration (resource-based)
3. **explore_location_hierarchy** - Hierarchy navigation with SQL JOINs
4. **query_location_data** - Location analysis with SQL filtering and aggregates
5. **analyze_cluster_files** - File analysis with SQL aggregate functions
6. **system_status_check** - Comprehensive health check workflow

All prompts teach SQL patterns with complete examples.

## Example SQL Queries

### Basic Queries

**Get all active datasets:**
```sql
SELECT id, name, type, description, active
FROM dataset
WHERE active = true
ORDER BY type, name;
```

**Get locations for a dataset (parameterized):**
```json
{
  "query": "SELECT id, name, latitude, longitude FROM location WHERE dataset_id = ? AND active = true",
  "parameters": ["vgIr9JSH_lFj"]
}
```

### JOINs (Now Possible!)

**Dataset hierarchy with counts:**
```sql
SELECT
    d.name as dataset,
    COUNT(DISTINCT l.id) as location_count,
    COUNT(DISTINCT c.id) as cluster_count,
    COUNT(f.id) as file_count
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id
LEFT JOIN cluster c ON l.id = c.location_id
LEFT JOIN file f ON c.id = f.cluster_id
WHERE d.active = true
GROUP BY d.name
ORDER BY d.name;
```

### Aggregates (Now Possible!)

**Cluster file statistics:**
```sql
SELECT
    COUNT(*) as total_files,
    SUM(duration) as total_duration,
    AVG(duration) as avg_duration,
    MIN(timestamp_local) as first_recording,
    MAX(timestamp_local) as last_recording,
    SUM(CASE WHEN maybe_solar_night THEN 1 ELSE 0 END) as night_files
FROM file
WHERE cluster_id = ? AND active = true;
```

### Complex Analysis (New Possibilities!)

**Geographic distribution:**
```sql
SELECT
    d.name as dataset,
    COUNT(DISTINCT l.id) as locations,
    AVG(l.latitude) as avg_latitude,
    AVG(l.longitude) as avg_longitude
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id
WHERE d.active = true
GROUP BY d.name;
```

**Temporal coverage:**
```sql
SELECT
    DATE_TRUNC('day', timestamp_local) as day,
    COUNT(*) as recordings,
    SUM(duration) as total_seconds
FROM file
WHERE active = true
  AND timestamp_local >= '2024-01-01'
GROUP BY day
ORDER BY day
LIMIT 100;
```

## Database Information

### Database Path
Default: `./db/skraak.duckdb`

### Key Tables
- **dataset** - Project datasets (organise/test/train types)
- **location** - Recording locations with GPS coordinates (139 active locations)
- **cluster** - Grouped recordings at locations
- **file** - Individual audio files with metadata
- **label** - Annotations and classifications
- **species** - Taxonomy information

## Building and Running

### Build
```bash
go build -o skraak_mcp
```

### Run (stdio mode - waits for MCP protocol input)
```bash
./skraak_mcp ./db/skraak.duckdb
```

### Quick Tests
```bash
# Navigate to shell_scripts directory
cd shell_scripts

# Quick time check
./get_time.sh

# SQL tool tests (PIPE TO FILE!)
./test_sql.sh > test.txt 2>&1
rg '"result":' test.txt | wc -l  # Should show 8 responses (6 successful + 2 validations)
rg '"isError":true' test.txt | wc -l  # Should show 2 (security tests)

# Resources and prompts
./test_resources_prompts.sh | jq '.'

# All prompts test
./test_all_prompts.sh > test_prompts.txt 2>&1
rg '"result":' test_prompts.txt | wc -l
```

## SQL Query Tips

### Using execute_sql Tool

**Basic query:**
```json
{
  "name": "execute_sql",
  "arguments": {
    "query": "SELECT * FROM dataset WHERE active = true"
  }
}
```

**Parameterized query (recommended for user input):**
```json
{
  "name": "execute_sql",
  "arguments": {
    "query": "SELECT * FROM location WHERE dataset_id = ?",
    "parameters": ["vgIr9JSH_lFj"]
  }
}
```

**With custom row limit:**
```json
{
  "name": "execute_sql",
  "arguments": {
    "query": "SELECT * FROM file WHERE active = true",
    "limit": 100
  }
}
```

### SQL Best Practices

1. **Always use WHERE active = true** for main tables (dataset, location, cluster, file)
2. **Use parameterized queries** (? placeholders) for filtering by IDs
3. **Use LEFT JOIN** to include parent records even if children don't exist
4. **Use COUNT(DISTINCT)** when joining to avoid double-counting
5. **Use LIMIT** to restrict large result sets
6. **Use DATE_TRUNC** to group temporal data
7. **Use CASE WHEN** for conditional aggregates (e.g., count night vs day files)

## Common Issues and Solutions

### Query Results Too Large
**Problem**: Query returns too many rows
**Solution**: Use LIMIT clause (default 1000, max 10000)

### Server Exits Immediately
**Normal behavior** - Server runs in stdio mode, waiting for JSON-RPC input

### No Response from Tool Call
**Check**: Must initialize connection first with `initialize` method before calling tools

### Database Connection Failed
**Check**: Database path exists and is readable

### SQL Syntax Error
**Check**: Query syntax, table names (use schema resources), column names

## Claude Desktop Configuration

Add to `~/.config/Claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "skraak_mcp": {
      "command": "/home/david/go/src/skraak_mcp/skraak_mcp",
      "args": ["/home/david/go/src/skraak_mcp/db/skraak.duckdb"]
    }
  }
}
```

Remember to restart Claude Desktop after configuration changes.

## Recent Changes

### Latest Update: Generic SQL Tool + Codebase Rationalization (2026-01-26)

**Major architectural change: Replaced 6 specialized tools with generic SQL approach**

**Deleted:**
- `tools/dataset.go` - query_datasets tool
- `tools/location.go` - query_locations, query_locations_by_dataset tools
- `tools/cluster.go` - query_clusters, query_clusters_by_location tools
- `tools/file.go` - query_files_by_cluster tool
- `shell_scripts/test_new_tools.sh` - Obsolete test script
- `shell_scripts/test_mcp.sh` - Obsolete test script

**Added:**
- `tools/sql.go` - Generic execute_sql tool (~200 lines)
- `shell_scripts/test_sql.sh` - Comprehensive SQL test suite

**Modified:**
- `main.go` - Removed 6 tool registrations, kept only get_current_time and execute_sql
- `prompts/examples.go` - Completely rewritten to teach SQL patterns instead of tool calls
- All 6 prompts now include SQL examples with SELECT, JOIN, GROUP BY, aggregates

**Benefits:**
- Full SQL expressiveness (JOINs, aggregates, CTEs, subqueries) - **previously impossible**
- Infinite query possibilities vs 6 fixed queries
- More aligned with MCP philosophy (context over APIs)
- LLMs can answer any question given the schema
- Smaller codebase (2 tools instead of 8)
- More maintainable (no new tool for each query pattern)

**Security:**
- Database already read-only (verified in db/db.go)
- Validation layers block write operations
- Parameterized queries prevent SQL injection
- Row limits prevent overwhelming responses

**Migration Notes:**
- Old tool calls must be replaced with SQL queries
- All old functionality is still available via SQL
- Prompts provide SQL examples for common patterns
- Schema resources provide full context for query construction

### Previous Update: Shell Scripts Organization (2026-01-26)
- Reorganized all shell scripts into `shell_scripts/` directory
- Keeps project root clean and organized
- All scripts updated with correct relative paths

---

**Last Updated**: 2026-01-26 13:15 NZDT
**Status**: Generic SQL tool operational, all prompts rewritten, tests passing
**Current Tools**: 2 (get_current_time, execute_sql)
**Current Database**: 139 locations, 8 active datasets, read-only mode enforced