# Claude's Notes - Skraak MCP Server
This file contains important reminders and best practices for working with the Skraak MCP Server codebase.
## ⚠️ CRITICAL TESTING REMINDER
### Running Test Scripts with Large Output
**NEVER** run test scripts directly without piping to a file. Large outputs can cause token overflow.
**CORRECT APPROACH:**
```bash
# All shell scripts are in shell_scripts/ directory
cd shell_scripts
# Pipe output to file first
./test_sql.sh > test.txt 2>&1
# Then use targeted searches to verify results
rg -i "error" test.txt # Check for errors
rg '"result":' test.txt | wc -l # Count successful responses
rg '"isError":true' test.txt | wc -l # Count validation errors (expected)
```
**WRONG APPROACH:**
```bash
# ❌ DON'T DO THIS - may crash with massive output
cd shell_scripts && ./test_sql.sh
```
### Available Test Scripts
**IMPORTANT: All shell scripts are located in the `shell_scripts/` directory** to keep the project organized.
All test scripts accept an optional database path argument (defaults to `../db/skraak.duckdb`):
1. **test_sql.sh** - Tests execute_sql tool with various queries
- Tests: simple SELECT, parameterized queries, JOINs, aggregates, security validation
- Always pipe to file!
2. **test_resources_prompts.sh** - Tests resources and prompts
3. **test_all_prompts.sh** - Tests all 6 prompts
4. **get_time.sh** - Quick test of get_current_time tool
### Verifying Test Success
After piping to test.txt, check for:
```bash
# Count successful responses (should equal number of successful tests)
rg '"result":' test.txt | wc -l
# Count validation errors (expected for security tests)
rg '"isError":true' test.txt | wc -l
# No unexpected errors
rg -i '"error"' test.txt | grep -v '"isError"'
```
## Project Overview
### Architecture
The Skraak MCP Server is a Model Context Protocol (MCP) server written in Go that provides a **generic SQL query interface** for an acoustic monitoring system. It follows MCP's three-primitive architecture with an LLM-friendly design:
- **Tools** (model-controlled): Generic SQL query execution + time utility
- **Resources** (application-driven): Full database schema for context
- **Prompts** (user-controlled): SQL workflow templates that teach query patterns
### Philosophy: Schema + Generic SQL > Specialized Tools
**Why Generic SQL:**
- LLMs can construct any query given the schema (infinite flexibility)
- No rigid tool APIs to learn (just SQL)
- Full SQL expressiveness: JOINs, aggregates, CTEs, subqueries
- Prompts teach SQL patterns instead of tool calling
**Previous specialized tools were limiting:**
- Each tool = one fixed query
- Couldn't filter beyond hardcoded parameters
- Couldn't JOIN tables or use aggregates
- Created artificial boundaries
**With schema + generic SQL:**
- Infinite query possibilities
- LLM constructs appropriate query for each question
- Full DuckDB SQL feature set available
- More aligned with MCP philosophy (provide context, not APIs)
### Directory Structure
```
skraak_mcp/
├── main.go # Server entry point, tool registration
├── db/
│ ├── db.go # Database connection (read-only mode)
│ ├── types.go # Type definitions
│ └── schema.sql # Database schema (348 lines)
├── tools/ # Tool implementations
│ ├── time.go # get_current_time (utility tool)
│ └── sql.go # execute_sql (generic query tool)
├── resources/
│ └── schema.go # Schema resources (full & per-table)
├── prompts/
│ └── examples.go # SQL workflow templates (6 prompts)
└── shell_scripts/ # Shell test scripts
├── test_sql.sh # SQL tool tests
├── test_resources_prompts.sh # Resources/prompts tests
├── test_all_prompts.sh # All 6 prompts tests
└── get_time.sh # Time tool test
```
## Available Tools
### Time Tool
- `get_current_time` - Returns current system time with timezone and Unix timestamp
### Generic SQL Query Tool
- `execute_sql` - Execute arbitrary SQL SELECT queries
- **Supports**: SELECT, WITH (CTEs), parameterized queries (? placeholders)
- **Security**: Database is read-only (enforced by DuckDB), forbidden keyword validation
- **Limits**: Default 1000 rows (max 10000) to prevent overwhelming responses
- **Output**: Generic JSON results with column metadata
- **Use with**: Schema resources to construct any query you need
### Security
**Database is read-only** (db/db.go:27):
```go
readOnlyPath := dbPath + "?access_mode=read_only"
```
**Validation layers:**
1. Regex validation: Must start with SELECT or WITH
2. Forbidden keywords: Blocks INSERT/UPDATE/DELETE/DROP/CREATE/ALTER
3. Row limiting: Prevents overwhelming responses
All write operations are blocked at both database and validation levels.
## Resources
### Schema Resources
- `schema://full` - Complete 348-line database schema (SQL)
- `schema://table/{table_name}` - Individual table definitions
**Valid table names**: dataset, location, cluster, file, selection, label, species, species_group, genus, family_group, order_group, family, order, class, phylum, kingdom, kiwi_call, call, syllable, and more (see schema.sql)
## Prompts
Six SQL workflow templates that teach query patterns:
1. **query_active_datasets** - Dataset querying with SQL SELECT and GROUP BY
2. **explore_database_schema** - Interactive schema exploration (resource-based)
3. **explore_location_hierarchy** - Hierarchy navigation with SQL JOINs
4. **query_location_data** - Location analysis with SQL filtering and aggregates
5. **analyze_cluster_files** - File analysis with SQL aggregate functions
6. **system_status_check** - Comprehensive health check workflow
All prompts teach SQL patterns with complete examples.
## Example SQL Queries
### Basic Queries
**Get all active datasets:**
```sql
SELECT id, name, type, description, active
FROM dataset
WHERE active = true
ORDER BY type, name;
```
**Get locations for a dataset (parameterized):**
```json
{
"query": "SELECT id, name, latitude, longitude FROM location WHERE dataset_id = ? AND active = true",
"parameters": ["vgIr9JSH_lFj"]
}
```
### JOINs (Now Possible!)
**Dataset hierarchy with counts:**
```sql
SELECT
d.name as dataset,
COUNT(DISTINCT l.id) as location_count,
COUNT(DISTINCT c.id) as cluster_count,
COUNT(f.id) as file_count
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id
LEFT JOIN cluster c ON l.id = c.location_id
LEFT JOIN file f ON c.id = f.cluster_id
WHERE d.active = true
GROUP BY d.name
ORDER BY d.name;
```
### Aggregates (Now Possible!)
**Cluster file statistics:**
```sql
SELECT
COUNT(*) as total_files,
SUM(duration) as total_duration,
AVG(duration) as avg_duration,
MIN(timestamp_local) as first_recording,
MAX(timestamp_local) as last_recording,
SUM(CASE WHEN maybe_solar_night THEN 1 ELSE 0 END) as night_files
FROM file
WHERE cluster_id = ? AND active = true;
```
### Complex Analysis (New Possibilities!)
**Geographic distribution:**
```sql
SELECT
d.name as dataset,
COUNT(DISTINCT l.id) as locations,
AVG(l.latitude) as avg_latitude,
AVG(l.longitude) as avg_longitude
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id
WHERE d.active = true
GROUP BY d.name;
```
**Temporal coverage:**
```sql
SELECT
DATE_TRUNC('day', timestamp_local) as day,
COUNT(*) as recordings,
SUM(duration) as total_seconds
FROM file
WHERE active = true
AND timestamp_local >= '2024-01-01'
GROUP BY day
ORDER BY day
LIMIT 100;
```
## Database Information
### Database Path
Default: `./db/skraak.duckdb`
### Key Tables
- **dataset** - Project datasets (organise/test/train types)
- **location** - Recording locations with GPS coordinates (139 active locations)
- **cluster** - Grouped recordings at locations
- **file** - Individual audio files with metadata
- **label** - Annotations and classifications
- **species** - Taxonomy information
## Building and Running
### Build
```bash
go build -o skraak_mcp
```
### Run (stdio mode - waits for MCP protocol input)
```bash
./skraak_mcp ./db/skraak.duckdb
```
### Quick Tests
```bash
# Navigate to shell_scripts directory
cd shell_scripts
# Quick time check
./get_time.sh
# SQL tool tests (PIPE TO FILE!)
./test_sql.sh > test.txt 2>&1
rg '"result":' test.txt | wc -l # Should show 8 responses (6 successful + 2 validations)
rg '"isError":true' test.txt | wc -l # Should show 2 (security tests)
# Resources and prompts
./test_resources_prompts.sh | jq '.'
# All prompts test
./test_all_prompts.sh > test_prompts.txt 2>&1
rg '"result":' test_prompts.txt | wc -l
```
## SQL Query Tips
### Using execute_sql Tool
**Basic query:**
```json
{
"name": "execute_sql",
"arguments": {
"query": "SELECT * FROM dataset WHERE active = true"
}
}
```
**Parameterized query (recommended for user input):**
```json
{
"name": "execute_sql",
"arguments": {
"query": "SELECT * FROM location WHERE dataset_id = ?",
"parameters": ["vgIr9JSH_lFj"]
}
}
```
**With custom row limit:**
```json
{
"name": "execute_sql",
"arguments": {
"query": "SELECT * FROM file WHERE active = true",
"limit": 100
}
}
```
### SQL Best Practices
1. **Always use WHERE active = true** for main tables (dataset, location, cluster, file)
2. **Use parameterized queries** (? placeholders) for filtering by IDs
3. **Use LEFT JOIN** to include parent records even if children don't exist
4. **Use COUNT(DISTINCT)** when joining to avoid double-counting
5. **Use LIMIT** to restrict large result sets
6. **Use DATE_TRUNC** to group temporal data
7. **Use CASE WHEN** for conditional aggregates (e.g., count night vs day files)
## Common Issues and Solutions
### Query Results Too Large
**Problem**: Query returns too many rows
**Solution**: Use LIMIT clause (default 1000, max 10000)
### Server Exits Immediately
**Normal behavior** - Server runs in stdio mode, waiting for JSON-RPC input
### No Response from Tool Call
**Check**: Must initialize connection first with `initialize` method before calling tools
### Database Connection Failed
**Check**: Database path exists and is readable
### SQL Syntax Error
**Check**: Query syntax, table names (use schema resources), column names
## Claude Desktop Configuration
Add to `~/.config/Claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"skraak_mcp": {
"command": "/home/david/go/src/skraak_mcp/skraak_mcp",
"args": ["/home/david/go/src/skraak_mcp/db/skraak.duckdb"]
}
}
}
```
Remember to restart Claude Desktop after configuration changes.
## Recent Changes
### Latest Update: Generic SQL Tool + Codebase Rationalization (2026-01-26)
**Major architectural change: Replaced 6 specialized tools with generic SQL approach**
**Deleted:**
- `tools/dataset.go` - query_datasets tool
- `tools/location.go` - query_locations, query_locations_by_dataset tools
- `tools/cluster.go` - query_clusters, query_clusters_by_location tools
- `tools/file.go` - query_files_by_cluster tool
- `shell_scripts/test_new_tools.sh` - Obsolete test script
- `shell_scripts/test_mcp.sh` - Obsolete test script
**Added:**
- `tools/sql.go` - Generic execute_sql tool (~200 lines)
- `shell_scripts/test_sql.sh` - Comprehensive SQL test suite
**Modified:**
- `main.go` - Removed 6 tool registrations, kept only get_current_time and execute_sql
- `prompts/examples.go` - Completely rewritten to teach SQL patterns instead of tool calls
- All 6 prompts now include SQL examples with SELECT, JOIN, GROUP BY, aggregates
**Benefits:**
- Full SQL expressiveness (JOINs, aggregates, CTEs, subqueries) - **previously impossible**
- Infinite query possibilities vs 6 fixed queries
- More aligned with MCP philosophy (context over APIs)
- LLMs can answer any question given the schema
- Smaller codebase (2 tools instead of 8)
- More maintainable (no new tool for each query pattern)
**Security:**
- Database already read-only (verified in db/db.go)
- Validation layers block write operations
- Parameterized queries prevent SQL injection
- Row limits prevent overwhelming responses
**Migration Notes:**
- Old tool calls must be replaced with SQL queries
- All old functionality is still available via SQL
- Prompts provide SQL examples for common patterns
- Schema resources provide full context for query construction
### Previous Update: Shell Scripts Organization (2026-01-26)
- Reorganized all shell scripts into `shell_scripts/` directory
- Keeps project root clean and organized
- All scripts updated with correct relative paths
---
**Last Updated**: 2026-01-26 13:15 NZDT
**Status**: Generic SQL tool operational, all prompts rewritten, tests passing
**Current Tools**: 2 (get_current_time, execute_sql)
**Current Database**: 139 locations, 8 active datasets, read-only mode enforced