added shell script integration tests.

quietlight
Apr 30, 2026, 10:08 PM
DS22DKV35FHPKEBSCUCE5VYYUSYBFN46YUP2JZCN72YX7SKQRLVQC

Dependencies

  • [2] FCCJNYCV more tests for utils/
  • [3] KZKLAINJ run out of space on nest, cleaned out

Change contents

  • replacement in utils/wav_metadata_test.go at line 274
    [2.15670][2.15670:15751]()
    data := []byte{0x01, 0x00, 0x02, 0x00} // 4 bytes, treated as 2x 16-bit samples
    [2.15670]
    [2.15751]
    data := []byte{0x01, 0x00, 0x02, 0x00} // 4 bytes, treated as 2x 16-bit samples
  • replacement in utils/terminal_image_test.go at line 195
    [2.16241][2.16241:16449]()
    {100, 224}, // below minimum, clamped up
    {224, 224}, // at minimum
    {448, 448}, // in range
    {600, 600}, // in range
    {896, 896}, // at maximum
    {1000, 896}, // above maximum, clamped down
    [2.16241]
    [2.16449]
    {100, 224}, // below minimum, clamped up
    {224, 224}, // at minimum
    {448, 448}, // in range
    {600, 600}, // in range
    {896, 896}, // at maximum
    {1000, 896}, // above maximum, clamped down
  • edit in utils/mapping.go at line 118
    [3.93015]
    [3.93015]
    // Skip sentinel values — they are never looked up in the DB
    if sm.Species == MappingNegative || sm.Species == MappingIgnore {
    continue
    }
  • replacement in utils/colormap_test.go at line 9
    [2.3145][2.3145:3229]()
    idx int
    wantR uint8
    wantG uint8
    wantB uint8
    desc string
    [2.3145]
    [2.3229]
    idx int
    wantR uint8
    wantG uint8
    wantB uint8
    desc string
  • edit in utils/check_duplicate_hash_test.go at line 5
    [2.6650][2.6650:6660]()
    "errors"
  • edit in utils/check_duplicate_hash_test.go at line 7
    [2.6673][2.6673:6919]()
    // mockDB implements the DB interface for testing
    type mockDB struct {
    queryRowFunc func(query string, args ...any) *sql.Row
    }
    func (m *mockDB) Query(query string, args ...any) (*sql.Rows, error) {
    return nil, errors.New("not implemented")
    }
  • edit in utils/check_duplicate_hash_test.go at line 8
    [2.6920][2.6920:7489]()
    func (m *mockDB) QueryRow(query string, args ...any) *sql.Row {
    return m.queryRowFunc(query, args...)
    }
    // rowScanner wraps a func to satisfy *sql.Row Scan behavior via a helper.
    // Since *sql.Row can't be constructed directly, we use sql.Open with an
    // in-memory driver pattern isn't feasible here. Instead, we use a real
    // test database or a simple approach with a lightweight SQLite/DuckDB.
    //
    // However, for simplicity we test CheckDuplicateHash by verifying the
    // logic paths through a lightweight mock that uses a real *sql.DB with
    // an in-memory DuckDB.
  • file addition: test_mapping_validation.sh (---r------)
    [3.638309]
    #!/bin/bash
    # Test ValidateMappingAgainstDB via the `import segments` CLI command
    # Creates temporary .data files and mapping.json, then validates mapping errors
    # Uses fresh copy of production DB in /tmp (auto-cleaned)
    source "$(dirname "$0")/test_lib.sh"
    echo "=== Testing Mapping Validation ==="
    echo ""
    check_binary
    # Create fresh test database
    DB_PATH=$(fresh_test_db)
    trap "cleanup_test_db '$DB_PATH'" EXIT
    echo "Using fresh test database: $DB_PATH"
    echo ""
    SKRAAK="$PROJECT_DIR/skraak"
    # Create test entities
    echo "Setup: Creating test dataset, location, cluster"
    DATASET_RESULT=$($SKRAAK create dataset --db "$DB_PATH" --name "Mapping Validation Test" --type structured 2>/dev/null)
    DATASET_ID=$(echo "$DATASET_RESULT" | jq -r '.dataset.id // empty')
    if [ -z "$DATASET_ID" ]; then
    echo -e "${RED}✗ Failed to create test dataset${NC}"
    exit 1
    fi
    LOCATION_RESULT=$($SKRAAK create location --db "$DB_PATH" --dataset "$DATASET_ID" --name "MapTest Site" --lat -41.2865 --lon 174.7762 --timezone "Pacific/Auckland" 2>/dev/null)
    LOCATION_ID=$(echo "$LOCATION_RESULT" | jq -r '.location.id // empty')
    if [ -z "$LOCATION_ID" ]; then
    echo -e "${RED}✗ Failed to create test location${NC}"
    exit 1
    fi
    CLUSTER_RESULT=$($SKRAAK create cluster --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --name "MapTest Cluster" --sample-rate 16000 2>/dev/null)
    CLUSTER_ID=$(echo "$CLUSTER_RESULT" | jq -r '.cluster.id // empty')
    if [ -z "$CLUSTER_ID" ]; then
    echo -e "${RED}✗ Failed to create test cluster${NC}"
    exit 1
    fi
    echo " Dataset: $DATASET_ID"
    echo " Location: $LOCATION_ID"
    echo " Cluster: $CLUSTER_ID"
    echo ""
    # Import WAV files into the cluster first (segments require existing file records)
    WAV_DIR="/tmp/skraak_map_test_$$"
    mkdir -p "$WAV_DIR"
    generate_wav "$WAV_DIR/test_recording.wav" 1 16000
    $SKRAAK import folder --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --cluster "$CLUSTER_ID" --folder "$WAV_DIR" --recursive=false 2>&1 > /dev/null
    IMPORT_COUNT=$($SKRAAK sql --db "$DB_PATH" "SELECT COUNT(*) as cnt FROM file WHERE cluster_id = '$CLUSTER_ID' AND active = true" 2>/dev/null | jq -r '.rows[0].cnt')
    if [ "$IMPORT_COUNT" != "1" ]; then
    echo -e "${RED}✗ Failed to import test WAV file (found $IMPORT_COUNT files)${NC}"
    rm -rf "$WAV_DIR"
    exit 1
    fi
    echo -e "${GREEN}✓${NC} Imported 1 WAV file for segment testing"
    echo ""
    # Helper: create a .data file for the test WAV
    # Format: [meta, [start, end, freqLow, freqHigh, [labels]]]
    # Labels have: species, certainty, filter
    create_data_file() {
    local wav_path="$1"
    local species="$2"
    local calltype="$3"
    local filter="$4"
    local data_path="${wav_path}.data"
    local label_json
    if [ -n "$calltype" ]; then
    label_json="{\"species\":\"$species\",\"certainty\":100,\"filter\":\"$filter\",\"calltype\":\"$calltype\"}"
    else
    label_json="{\"species\":\"$species\",\"certainty\":100,\"filter\":\"$filter\"}"
    fi
    echo "[[\"Operator\",\"None\",1],[0,0.5,100,7900,[$label_json]]]" > "$data_path"
    }
    # Helper: run import segments and capture output
    run_import_segments() {
    local mapping_path="$1"
    $SKRAAK import segments --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --cluster "$CLUSTER_ID" --folder "$WAV_DIR" --mapping "$mapping_path" 2>&1 || true
    }
    # -------------------------------------------------------
    # Test 1: Valid mapping - species exists in DB
    # -------------------------------------------------------
    echo "Test 1: Valid mapping (species exists in DB)"
    VALID_MAPPING="/tmp/skraak_valid_mapping_$$.json"
    cat > "$VALID_MAPPING" << 'EOF'
    {
    "Roroa": {"species": "Roroa", "calltypes": {"Male": "Male - Solo"}}
    }
    EOF
    create_data_file "$WAV_DIR/test_recording.wav" "Roroa" "Male" "Manual"
    # This should pass mapping validation (may fail later for other reasons, but no mapping error)
    RESULT=$(run_import_segments "$VALID_MAPPING")
    if ! echo "$RESULT" | grep -qi "species in .data but not in mapping\|mapped species not found in DB\|calltypes not found in DB"; then
    echo -e "${GREEN}✓${NC} No mapping validation errors for valid mapping"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Unexpected mapping validation error:"
    echo "$RESULT" | grep -i "mapping\|species\|calltype" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    rm -f "$VALID_MAPPING" "$WAV_DIR/test_recording.wav.data"
    # -------------------------------------------------------
    # Test 2: Species in .data but not in mapping
    # -------------------------------------------------------
    echo ""
    echo "Test 2: Species in .data but not in mapping"
    INCOMPLETE_MAPPING="/tmp/skraak_incomplete_mapping_$$.json"
    cat > "$INCOMPLETE_MAPPING" << 'EOF'
    {
    "SomeOtherSpecies": {"species": "Roroa"}
    }
    EOF
    create_data_file "$WAV_DIR/test_recording.wav" "Kiwi" "" "Manual"
    RESULT=$(run_import_segments "$INCOMPLETE_MAPPING")
    if echo "$RESULT" | grep -qi "species in .data but not in mapping"; then
    echo -e "${GREEN}✓${NC} Correctly detected unmapped species in .data"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Should have detected unmapped species 'Kiwi'"
    echo "$RESULT" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    rm -f "$INCOMPLETE_MAPPING" "$WAV_DIR/test_recording.wav.data"
    # -------------------------------------------------------
    # Test 3: Mapped species not found in DB
    # -------------------------------------------------------
    echo ""
    echo "Test 3: Mapped species not found in DB"
    PHANTOM_MAPPING="/tmp/skraak_phantom_mapping_$$.json"
    cat > "$PHANTOM_MAPPING" << 'EOF'
    {
    "Kiwi": {"species": "PhantomSpecies"}
    }
    EOF
    create_data_file "$WAV_DIR/test_recording.wav" "Kiwi" "" "Manual"
    RESULT=$(run_import_segments "$PHANTOM_MAPPING")
    if echo "$RESULT" | grep -qi "mapped species not found in DB\|not found in DB"; then
    echo -e "${GREEN}✓${NC} Correctly detected species not in DB"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Should have detected 'PhantomSpecies' not in DB"
    echo "$RESULT" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    rm -f "$PHANTOM_MAPPING" "$WAV_DIR/test_recording.wav.data"
    # -------------------------------------------------------
    # Test 4: Calltype not found in DB
    # -------------------------------------------------------
    echo ""
    echo "Test 4: Calltype not found in DB"
    BAD_CT_MAPPING="/tmp/skraak_bad_ct_mapping_$$.json"
    cat > "$BAD_CT_MAPPING" << 'EOF'
    {
    "Roroa": {"species": "Roroa", "calltypes": {"Male": "NonexistentCall"}}
    }
    EOF
    create_data_file "$WAV_DIR/test_recording.wav" "Roroa" "Male" "Manual"
    RESULT=$(run_import_segments "$BAD_CT_MAPPING")
    if echo "$RESULT" | grep -qi "calltypes not found in DB"; then
    echo -e "${GREEN}✓${NC} Correctly detected calltype not in DB"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Should have detected 'NonexistentCall' calltype not in DB"
    echo "$RESULT" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    rm -f "$BAD_CT_MAPPING" "$WAV_DIR/test_recording.wav.data"
    # -------------------------------------------------------
    # Test 5: __NEGATIVE__ sentinel - should not error
    # -------------------------------------------------------
    echo ""
    echo "Test 5: __NEGATIVE__ sentinel (no DB lookup)"
    NEG_MAPPING="/tmp/skraak_neg_mapping_$$.json"
    cat > "$NEG_MAPPING" << 'EOF'
    {
    "Noise": {"species": "__NEGATIVE__"}
    }
    EOF
    create_data_file "$WAV_DIR/test_recording.wav" "Noise" "" "Manual"
    RESULT=$(run_import_segments "$NEG_MAPPING")
    # __NEGATIVE__ species are NOT looked up in DB, so no "mapped species not found" error
    if ! echo "$RESULT" | grep -qi "mapped species not found in DB.*__NEGATIVE__\|Phantom"; then
    echo -e "${GREEN}✓${NC} __NEGATIVE__ sentinel not looked up in DB"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} __NEGATIVE__ should not trigger DB species lookup"
    echo "$RESULT" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    rm -f "$NEG_MAPPING" "$WAV_DIR/test_recording.wav.data"
    # -------------------------------------------------------
    # Test 6: __IGNORE__ sentinel - should not error
    # -------------------------------------------------------
    echo ""
    echo "Test 6: __IGNORE__ sentinel (no DB lookup)"
    IGNORE_MAPPING="/tmp/skraak_ignore_mapping_$$.json"
    cat > "$IGNORE_MAPPING" << 'EOF'
    {
    "Skip": {"species": "__IGNORE__"}
    }
    EOF
    create_data_file "$WAV_DIR/test_recording.wav" "Skip" "" "Manual"
    RESULT=$(run_import_segments "$IGNORE_MAPPING")
    if ! echo "$RESULT" | grep -qi "mapped species not found in DB.*__IGNORE__"; then
    echo -e "${GREEN}✓${NC} __IGNORE__ sentinel not looked up in DB"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} __IGNORE__ should not trigger DB species lookup"
    echo "$RESULT" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    rm -f "$IGNORE_MAPPING" "$WAV_DIR/test_recording.wav.data"
    # Cleanup
    rm -rf "$WAV_DIR"
    echo ""
    print_summary
  • edit in shell_scripts/test_lib.sh at line 53
    [3.661421]
    [3.661421]
    }
    # Generate a minimal valid WAV file (1-channel, 16-bit PCM, silence)
    # Usage: generate_wav <output_path> [duration_seconds] [sample_rate]
    # Default: 1 second, 16000 Hz sample rate
    # Requires: python3
    generate_wav() {
    local output_path="$1"
    local duration_sec="${2:-1}"
    local sample_rate="${3:-16000}"
    python3 -c "
    import struct, sys
    sr=$sample_rate; dur=$duration_sec; n=sr*dur; ds=n*2; fs=36+ds
    with open('$output_path','wb') as f:
    f.write(b'RIFF')
    f.write(struct.pack('<I', fs))
    f.write(b'WAVE')
    f.write(b'fmt ')
    f.write(struct.pack('<IHHIIHH', 16, 1, 1, sr, sr*2, 2, 16))
    f.write(b'data')
    f.write(struct.pack('<I', ds))
    f.write(b'\x00' * ds)
    "
  • file addition: test_cluster_import.sh (---r------)
    [3.638309]
    #!/bin/bash
    # Test cluster_import.go via the CLI
    # Exercises ImportCluster, GetLocationData, EnsureClusterPath, batchProcessFiles, insertClusterFiles
    # Uses fresh copy of production DB in /tmp (auto-cleaned)
    source "$(dirname "$0")/test_lib.sh"
    echo "=== Testing Cluster Import ==="
    echo ""
    check_binary
    # Create fresh test database
    DB_PATH=$(fresh_test_db)
    trap "cleanup_test_db '$DB_PATH'" EXIT
    echo "Using fresh test database: $DB_PATH"
    echo ""
    SKRAAK="$PROJECT_DIR/skraak"
    # Create test entities in DB
    echo "Setup: Creating test dataset, location, cluster"
    DATASET_RESULT=$($SKRAAK create dataset --db "$DB_PATH" --name "Cluster Import Test" --type structured 2>/dev/null)
    DATASET_ID=$(echo "$DATASET_RESULT" | jq -r '.dataset.id // empty')
    if [ -z "$DATASET_ID" ]; then
    echo -e "${RED}✗ Failed to create test dataset${NC}"
    echo "$DATASET_RESULT" | head -5
    exit 1
    fi
    echo " Dataset: $DATASET_ID"
    LOCATION_RESULT=$($SKRAAK create location --db "$DB_PATH" --dataset "$DATASET_ID" --name "Test Site" --lat -41.2865 --lon 174.7762 --timezone "Pacific/Auckland" 2>/dev/null)
    LOCATION_ID=$(echo "$LOCATION_RESULT" | jq -r '.location.id // empty')
    if [ -z "$LOCATION_ID" ]; then
    echo -e "${RED}✗ Failed to create test location${NC}"
    exit 1
    fi
    echo " Location: $LOCATION_ID"
    CLUSTER_RESULT=$($SKRAAK create cluster --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --name "Test Cluster" --sample-rate 16000 2>/dev/null)
    CLUSTER_ID=$(echo "$CLUSTER_RESULT" | jq -r '.cluster.id // empty')
    if [ -z "$CLUSTER_ID" ]; then
    echo -e "${RED}✗ Failed to create test cluster${NC}"
    exit 1
    fi
    echo " Cluster: $CLUSTER_ID"
    echo ""
    # Helper: extract JSON object from mixed stdout/stderr output
    extract_json() {
    echo "$1" | grep -A 1000 '^{' | head -100
    }
    # Create test WAV files
    WAV_DIR="/tmp/skraak_cluster_test_$$"
    mkdir -p "$WAV_DIR"
    # Create test WAV files - each with unique content to avoid hash collisions
    generate_wav "$WAV_DIR/test_recording_01.wav" 1 16000
    # Make second file unique by appending a byte (still valid enough for header parsing)
    generate_wav "$WAV_DIR/test_recording_02.wav" 2 16000
    echo -e "${GREEN}✓${NC} Created 2 test WAV files in $WAV_DIR"
    echo ""
    # -------------------------------------------------------
    # Test 1: Happy path - import folder
    # -------------------------------------------------------
    echo "Test 1: Import folder with valid WAV files"
    RESULT=$($SKRAAK import folder --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --cluster "$CLUSTER_ID" --folder "$WAV_DIR" --recursive=false 2>&1)
    JSON=$(extract_json "$RESULT")
    IMPORTED=$(echo "$JSON" | jq -r '.summary.imported_files // empty')
    FAILED=$(echo "$JSON" | jq -r '.summary.failed_files // 0')
    if [ "$IMPORTED" = "2" ] && [ "$FAILED" = "0" ]; then
    echo -e "${GREEN}✓${NC} Imported 2 files, 0 failures"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Expected imported=2 failed=0, got imported=$IMPORTED failed=$FAILED"
    echo "$RESULT" | head -10
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 2: DB state - file records exist
    # -------------------------------------------------------
    echo ""
    echo "Test 2: Verify file records in database"
    FILE_COUNT=$($SKRAAK sql --db "$DB_PATH" "SELECT COUNT(*) as cnt FROM file WHERE cluster_id = '$CLUSTER_ID' AND active = true" 2>/dev/null | jq -r '.rows[0].cnt')
    if [ "$FILE_COUNT" = "2" ]; then
    echo -e "${GREEN}✓${NC} Found 2 file records for cluster"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Expected 2 file records, got $FILE_COUNT"
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 3: DB state - file_dataset junction records
    # -------------------------------------------------------
    echo ""
    echo "Test 3: Verify file_dataset junction records"
    FD_COUNT=$($SKRAAK sql --db "$DB_PATH" "SELECT COUNT(*) as cnt FROM file_dataset WHERE dataset_id = '$DATASET_ID'" 2>/dev/null | jq -r '.rows[0].cnt')
    if [ "$FD_COUNT" = "2" ]; then
    echo -e "${GREEN}✓${NC} Found 2 file_dataset records"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Expected 2 file_dataset records, got $FD_COUNT"
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 4: DB state - astronomical data computed
    # -------------------------------------------------------
    echo ""
    echo "Test 4: Verify astronomical data was computed"
    ASTRO_RESULT=$($SKRAAK sql --db "$DB_PATH" "SELECT maybe_solar_night, maybe_civil_night, moon_phase FROM file WHERE cluster_id = '$CLUSTER_ID' AND active = true LIMIT 1" 2>/dev/null)
    SOLAR_NIGHT=$(echo "$ASTRO_RESULT" | jq -r '.rows[0].maybe_solar_night')
    MOON_PHASE=$(echo "$ASTRO_RESULT" | jq -r '.rows[0].moon_phase')
    if [ "$SOLAR_NIGHT" != "null" ] && [ "$MOON_PHASE" != "null" ]; then
    echo -e "${GREEN}✓${NC} Astronomical data present (solar_night=$SOLAR_NIGHT, moon_phase=$MOON_PHASE)"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Astronomical data missing"
    echo "$ASTRO_RESULT" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 5: DB state - duration correct
    # -------------------------------------------------------
    echo ""
    echo "Test 5: Verify file duration is correct"
    DURATION=$($SKRAAK sql --db "$DB_PATH" "SELECT duration FROM file WHERE cluster_id = '$CLUSTER_ID' AND active = true LIMIT 1" 2>/dev/null | jq -r '.rows[0].duration')
    # Duration should be approximately 1.0 (within 0.1 tolerance)
    if [ "$DURATION" != "null" ] && [ "$DURATION" != "" ]; then
    # Use awk for float comparison
    APPROX_OK=$(echo "$DURATION" | awk '{if ($1 > 0.9 && $1 < 1.1) print "yes"; else print "no"}')
    if [ "$APPROX_OK" = "yes" ]; then
    echo -e "${GREEN}✓${NC} Duration = $DURATION (≈ 1.0s)"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Duration = $DURATION (expected ≈ 1.0)"
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    else
    echo -e "${RED}✗${NC} Duration not found"
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 6: EnsureClusterPath - cluster path set after import
    # -------------------------------------------------------
    echo ""
    echo "Test 6: Verify cluster path was set (EnsureClusterPath)"
    CLUSTER_PATH=$($SKRAAK sql --db "$DB_PATH" "SELECT path FROM cluster WHERE id = '$CLUSTER_ID'" 2>/dev/null | jq -r '.rows[0].path')
    if [ -n "$CLUSTER_PATH" ] && [ "$CLUSTER_PATH" != "null" ]; then
    echo -e "${GREEN}✓${NC} Cluster path set: $CLUSTER_PATH"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Cluster path not set"
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 7: Duplicate detection - re-import should skip
    # -------------------------------------------------------
    echo ""
    echo "Test 7: Duplicate detection (re-import same folder)"
    # Need a new cluster to re-import into (hash dedup is global)
    CLUSTER2_RESULT=$($SKRAAK create cluster --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --name "Test Cluster 2" --sample-rate 16000 2>/dev/null)
    CLUSTER2_ID=$(echo "$CLUSTER2_RESULT" | jq -r '.cluster.id // empty')
    RESULT2=$($SKRAAK import folder --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --cluster "$CLUSTER2_ID" --folder "$WAV_DIR" --recursive=false 2>&1)
    JSON2=$(extract_json "$RESULT2")
    SKIPPED=$(echo "$JSON2" | jq -r '.summary.skipped_files // 0')
    IMPORTED2=$(echo "$JSON2" | jq -r '.summary.imported_files // 0')
    if [ "$SKIPPED" = "2" ] && [ "$IMPORTED2" = "0" ]; then
    echo -e "${GREEN}✓${NC} Skipped 2 duplicates, imported 0 new"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Expected skipped=2 imported=0, got skipped=$SKIPPED imported=$IMPORTED2"
    echo "$RESULT2" | head -10
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 8: Empty folder - no WAV files
    # -------------------------------------------------------
    echo ""
    echo "Test 8: Empty folder (no WAV files)"
    EMPTY_DIR="/tmp/skraak_empty_test_$$"
    mkdir -p "$EMPTY_DIR"
    CLUSTER3_RESULT=$($SKRAAK create cluster --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --name "Test Cluster 3" --sample-rate 16000 2>/dev/null)
    CLUSTER3_ID=$(echo "$CLUSTER3_RESULT" | jq -r '.cluster.id // empty')
    RESULT3=$($SKRAAK import folder --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --cluster "$CLUSTER3_ID" --folder "$EMPTY_DIR" 2>&1)
    JSON3=$(extract_json "$RESULT3")
    TOTAL_FILES=$(echo "$JSON3" | jq -r '.summary.total_files // empty')
    if [ "$TOTAL_FILES" = "0" ]; then
    echo -e "${GREEN}✓${NC} Empty folder returns total_files=0"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Expected total_files=0, got $TOTAL_FILES"
    echo "$RESULT3" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    rm -rf "$EMPTY_DIR"
    # -------------------------------------------------------
    # Test 9: Invalid location ID
    # -------------------------------------------------------
    echo ""
    echo "Test 9: Invalid location ID (should fail)"
    CLUSTER4_RESULT=$($SKRAAK create cluster --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --name "Test Cluster 4" --sample-rate 16000 2>/dev/null)
    CLUSTER4_ID=$(echo "$CLUSTER4_RESULT" | jq -r '.cluster.id // empty')
    RESULT4=$($SKRAAK import folder --db "$DB_PATH" --dataset "$DATASET_ID" --location "INVALID_ID_123" --cluster "$CLUSTER4_ID" --folder "$WAV_DIR" 2>&1 || true)
    if echo "$RESULT4" | grep -qi "error\|failed\|not found"; then
    echo -e "${GREEN}✓${NC} Correctly rejected invalid location ID"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Should have rejected invalid location ID"
    echo "$RESULT4" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 10: Non-existent folder
    # -------------------------------------------------------
    echo ""
    echo "Test 10: Non-existent folder (should fail)"
    RESULT5=$($SKRAAK import folder --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --cluster "$CLUSTER_ID" --folder /nonexistent/path 2>&1 || true)
    if echo "$RESULT5" | grep -qi "error\|not accessible\|not found\|no such"; then
    echo -e "${GREEN}✓${NC} Correctly rejected non-existent folder"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Should have rejected non-existent folder"
    echo "$RESULT5" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # -------------------------------------------------------
    # Test 11: Recursive vs non-recursive
    # -------------------------------------------------------
    echo ""
    echo "Test 11: Recursive scan finds WAV in subfolder"
    mkdir -p "$WAV_DIR/subfolder"
    generate_wav "$WAV_DIR/subfolder/nested_file.wav" 1 16000
    CLUSTER5_RESULT=$($SKRAAK create cluster --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --name "Test Cluster 5" --sample-rate 16000 2>/dev/null)
    CLUSTER5_ID=$(echo "$CLUSTER5_RESULT" | jq -r '.cluster.id // empty')
    # The nested file has same hash as the others (identical silence), so it'll be skipped
    # But total_files should show 3 (2 top-level + 1 nested)
    RESULT6=$($SKRAAK import folder --db "$DB_PATH" --dataset "$DATASET_ID" --location "$LOCATION_ID" --cluster "$CLUSTER5_ID" --folder "$WAV_DIR" --recursive=true 2>&1)
    JSON6=$(extract_json "$RESULT6")
    TOTAL=$(echo "$JSON6" | jq -r '.summary.total_files // empty')
    if [ "$TOTAL" = "3" ]; then
    echo -e "${GREEN}✓${NC} Recursive scan found 3 WAV files"
    ((TESTS_RUN++)) || true; ((TESTS_PASSED++)) || true
    else
    echo -e "${RED}✗${NC} Expected total_files=3, got $TOTAL"
    echo "$RESULT6" | head -5
    ((TESTS_RUN++)) || true; ((TESTS_FAILED++)) || true
    fi
    # Cleanup
    rm -rf "$WAV_DIR"
    echo ""
    print_summary
  • edit in CLAUDE.md at line 108
    [3.1197511]
    [3.1197511]
    ./test_cluster_import.sh # cluster_import.go integration
    ./test_mapping_validation.sh # ValidateMappingAgainstDB integration
  • edit in CLAUDE.md at line 115
    [3.1197641]
    [3.1197641]
    ### Untested Code — Intentional Skips
    | File | Why Skipped |
    |------|------------|
    | `utils/audio_player.go` | Thin wrapper over `oto` (requires audio hardware). No testable logic without hardware; `Resample` already covered in `resample_test.go` |
  • edit in CHANGELOG.md at line 4
    [3.1198010]
    [3.1198010]
    ## [2026-05-01] Add integration tests for cluster_import and mapping validation
    Added shell script integration tests covering previously untested code paths
    in `utils/cluster_import.go` and `utils/mapping.go`.
    **New files:**
    - `shell_scripts/test_cluster_import.sh` — 11 tests exercising ImportCluster,
    GetLocationData, EnsureClusterPath, batchProcessFiles, insertClusterFiles
    - `shell_scripts/test_mapping_validation.sh` — 6 tests exercising
    ValidateMappingAgainstDB via `import segments` CLI
    **Infrastructure:**
    - Added `generate_wav()` helper to `test_lib.sh` (uses python3 to create
    valid minimal WAV files for test fixtures)
  • edit in CHANGELOG.md at line 20
    [3.1198011]
    [3.1198011]
    **Bug fix:**
    - `ValidateMappingAgainstDB` now skips `__NEGATIVE__` and `__IGNORE__`
    sentinels when building the set of species to validate against the DB.
    Previously these sentinels were incorrectly queried in the species table,
    causing a false validation error.
    **Skipped:**
    - `utils/audio_player.go` — thin `oto` wrapper requiring audio hardware;
    no testable logic independent of the external package. Documented in
    CLAUDE.md as intentional skip.