---
name: bulk-import-setup
description: Set up bulk import for acoustic monitoring datasets - fuzzy match existing location NAMES, create new ones with GPS parsing if log.txt present in folder, and generate import CSV with comprehensive logging
---
# Bulk Import Setup for Acoustic Monitoring Datasets
This skill helps you prepare acoustic monitoring datasets for bulk import by analyzing locations, matching existing records, creating new locations with optional GPS parsing, and generating import CSV files.
## When to Use This Skill
Use this skill when the user needs to:
- Import a batch of acoustic recordings organized in location folders
- Create multiple location records for a dataset
- Match folder names to existing database locations (fuzzy matching)
- Parse GPS coordinates from log files
- Generate CSV files for bulk file import
## Workflow Overview
1. **Analyze** directory structure and existing locations
2. **Fuzzy match** folder names to database locations (NAME ONLY - never GPS!)
3. **Parse GPS log.txt files** from folders
4. **Prepare CSV** with date-range cluster names and file counts
5. **Check with user** before creating any new locations
6. **Create locations** (if confirmed) with GPS or fallback coordinates
7. **Update existing locations** with GPS data (if they have assumed coords)
8. **Update CSV** with final location IDs
9. **Verify** and provide bulk import command
**NEVER match on GPS coordinates!** Recorders can be physically close but represent different locations.
```python
# ❌ WRONG - Don't compare GPS distances
if abs(loc1_lat - loc2_lat) < 0.001: # NO!
return "match"
# ✅ CORRECT - Only compare normalized names
normalized_name1 = normalize_location_name(folder_name)
normalized_name2 = normalize_location_name(existing_name)
score = SequenceMatcher(None, normalized_name1, normalized_name2).ratio()
```
### 2. GPS Log Files are INSIDE Folders
**Parse log files by folder structure**, not by separate log file list.
```python
# ✅ CORRECT - Log file is inside the folder
folder_path = "/path/to/mok_bl76/"
log_path = folder_path / "*log.txt" # Inside the folder!
# Extract folder name from path for mapping
folder_name = Path(folder_path).name # "mok_bl76"
gps_data[folder_name] = parse_gps_log(log_path)
```
### 3. Update Matched Locations with GPS Data
If fuzzy matching finds an existing location with "assumed" coordinates, **update it** with GPS data if GPS data is available.
```python
if matched_location:
# Check if location has assumed Takaka coordinates
if matched_location['description'].contains("assumed") or \
matched_location['description'].contains("Takaka") or \
matched_location['description'].contains("pending GPS"):
# We have GPS data - UPDATE the location!
if gps_data:
update_location(matched_location['id'], gps_data['lat'], gps_data['lon'])
```
### 4. Cluster Names = Date Ranges (NOT Location Names!)
**Cluster names represent time periods**, not locations.
```python
# ❌ WRONG - Using location name
cluster_name = "mok_bl76"
# ✅ CORRECT - Using date range from files
first_file = "mok_bl76_20250422_173008.wav"
last_file = "mok_bl76_20250503_064508.wav"
cluster_name = "2025-04-22 to 2025-05-03"
```
### 5. CSV Must Have 6 Columns (Including file_count)
The bulk import expects **exactly 6 columns**:
```csv
location_name,location_id,directory_path,date_range,sample_rate,file_count
mok_bl11,HA4TDv3DlhjX,/media/david/Misc-2/Manu o Kahurangi kiwi survey (3)/Data/K3_mokbl_boulder lake track/mok_bl11,2024-12,8000,68```
**Missing file_count will cause**: `Error: CSV row 2 has insufficient columns (expected 6, got 5)`
## Step 1: Analyze Directory Structure
First, understand the organization:
```python
# Key questions to answer:
- How many location folders?
- Naming patterns (dash/underscore)?
- GPS log files present?
- File counts per location?
- Date formats in filenames?
```
**Check for:**
- Log files: `LOCATION-log.txt` or `log.txt`
- GPS data format: `GPS (lat,long): -40.123456,172.123456`
- Filename date patterns: `YYYYMMDD` or `DDMMYY` or `YYMMDD`
## Step 2: Fuzzy Match Existing Locations
**IMPORTANT**: Always query existing locations first to avoid duplicates!
### Fuzzy Matching Rules
**CRITICAL: Use strict threshold (0.90) to avoid false matches!**
```python
from difflib import SequenceMatcher
def normalize_location_name(name: str) -> str:
"""Normalize location name for fuzzy matching."""
# Remove common suffixes/prefixes
name = name.lower()
name = re.sub(r'[_\-\s]+', '', name) # Remove underscores, hyphens, spaces
name = re.sub(r'(low|high)$', '', name) # Remove Low/High suffix
name = re.sub(r'^mok', 'mok', name) # Normalize mok prefix
return name
def fuzzy_match_score(s1: str, s2: str) -> float:
"""Calculate fuzzy match score between two strings."""
return SequenceMatcher(None, normalize_location_name(s1), normalize_location_name(s2)).ratio()
def fuzzy_match_location(folder_name: str, existing_locations: List[Dict]) -> Optional[Dict]:
"""
Fuzzy match folder name to existing location BY NAME ONLY.
IMPORTANT:
- Returns match ONLY if similarity > 0.90
- NEVER compares GPS coordinates (locations can be close but different!)
This threshold prevents false matches like:
- "mok_bl6P" matching to "mok_bl16" (score: 0.86) ✗ Different locations!
- "mok_tuna51" matching to "mok_tuna31" (score: 0.89) ✗ Different locations!
But allows:
- "mok_bl6P_Low" matching to "mok_bl6P" (score: 1.00) ✓ Same location
"""
normalized_folder = normalize_location_name(folder_name)
best_match = None
best_score = 0.0
for loc in existing_locations:
normalized_loc = normalize_location_name(loc['name'])
score = SequenceMatcher(None, normalized_folder, normalized_loc).ratio()
if score > best_score:
best_score = score
best_match = loc
# Strict threshold - only match if very similar
if best_score > 0.90:
print(f" Matched '{folder_name}' -> '{best_match['name']}' (score: {best_score:.3f})")
# Check if matched location needs GPS update
if 'description' in best_match:
desc = best_match['description'].lower()
if 'assumed' in desc or 'takaka' in desc or 'pending gps' in desc:
print(f" ⚠️ Location has assumed coords - check for GPS data to update")
return best_match
return None
```
**Why 0.90 threshold?**
- Too low (0.80-0.85): False matches → wrong location assignments
- 0.90: Only matches very similar names (case/punctuation differences)
- Example: `mok_bl6P_Low` (score: 1.00) ✓ vs `mok_bl16` (score: 0.86) ✗
### Query Existing Locations
```python
cmd = [
SKRAAK_BIN, "sql",
"--db", db_path,
f"SELECT name, id FROM location WHERE dataset_id = '{dataset_id}' AND active = true",
"--limit", "500"
]
```
**Note**: If production DB is locked, ask to use use backup database for querying, but note that locations will be created in production later.
**Note**: Don't forget to check some file sample rates using skraak metadata command, or sox.
## Step 3: Prepare CSV with Analysis
**Create preparation script** that:
1. Matches folders to existing locations
2. Identifies which locations need to be created
3. Analyzes file metadata (count, sample rate, date range)
4. Generates CSV with placeholder IDs for new locations
5. Writes comprehensive log file
### CSV Format
**CRITICAL: Must have exactly 6 columns with date-range cluster names!**
```csv
location_name,location_id,directory_path,date_range,sample_rate,file_count
mok_bl11,HA4TDv3DlhjX,/media/david/Misc-2/Manu o Kahurangi kiwi survey (3)/Data/K3_mokbl_boulder lake track/mok_bl11,2024-12,8000,68
```
**Key points:**
- `dataset_id`: 12-character nanoid
- `location_id`: Actual ID or `<CREATE:name>` placeholder
- `cluster_name`: **Date range** (e.g., "2025-04-22 to 2025-05-03") - NEVER location name!
- `folder_path`: Full absolute path
- `sample_rate`: Integer (e.g., 8000, 32000)
- `file_count`: Integer count of WAV files in folder
### Log File Format
Create detailed log showing:
- Each location processed
- Match status (existing or new)
- File counts and metadata
- Summary statistics
- Next steps
```python
LOG_FILE = BASE_DIR / "import_log.txt"
with open(LOG_FILE, 'w') as log_file:
log(f"Processing: {folder_name}", log_file)
if matched:
log(f" ✓ MATCH: {folder} → {existing} (ID: {id})", log_file)
else:
log(f" + NEW: {folder} (will create)", log_file)
log(f" Files: {count}, Sample Rate: {rate} Hz, Date: {date}", log_file)
```
## Step 4: Check with User Before Creating Locations
**CRITICAL**: Never create locations without user confirmation!
Present summary and ask:
```
PLAN READY - AWAITING CONFIRMATION
===================================
Matched existing: 17
Need to create: 5
Total files: 8,003
Locations to be created:
- new-location-1 (420 files)
- new-location-2 (350 files)
...
Shall I proceed to create these 5 new locations?
```
**Wait for user approval** before executing any location creation!
## Step 5: Create Locations (If Confirmed)
### GPS Parsing (If Available)
```python
def parse_gps_from_log(log_file: Path) -> Tuple[float, float, int]:
"""
Parse and average GPS coordinates from log file.
IMPORTANT: Use errors='ignore' to handle encoding issues!
Some log files contain non-UTF-8 bytes that will crash without this.
"""
if not log_file.exists():
return None, None, 0
gps_pattern = re.compile(r'GPS \(lat,long\): (-?\d+\.\d+),(-?\d+\.\d+)')
latitudes = []
longitudes = []
# CRITICAL: errors='ignore' prevents UnicodeDecodeError
with open(log_file, 'r', encoding='utf-8', errors='ignore') as f:
for line in f:
match = gps_pattern.search(line)
if match:
latitudes.append(float(match.group(1)))
longitudes.append(float(match.group(2)))
if latitudes:
avg_lat = sum(latitudes) / len(latitudes)
avg_lon = sum(longitudes) / len(longitudes)
return (avg_lat, avg_lon, len(latitudes))
return None, None, 0
```
### Location Creation
```python
# Use CLI tool (not MCP) for scripting
cmd = [
"./skraak", "create", "location",
"--db", db_path,
"--dataset", dataset_id,
"--name", location_name,
"--lat", str(latitude),
"--lon", str(longitude),
"--timezone", "Pacific/Auckland", # NZ default
"--description", description
]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
response = json.loads(result.stdout)
location_id = response["location"]["id"]
```
### Description Format
**With GPS:**
```python
description = f"Averaged from {fix_count} GPS readings"
# Example: "Averaged from 14 GPS readings"
```
**Without GPS (fallback to Takaka if Kahurangi data, Te Anau if Fiordland data):**
```python
description = "Location assumed to be Takaka pending GPS data"
# Note: Takaka coords are -40.85085, 172.80703
```
**Generic (user preference - if provided):**
```python
description = "Friends of Cobb - Assumed location (Takaka) - no GPS data available"
```
## Step 6: File Metadata Analysis
### Count WAV Files (RECURSIVE!)
**CRITICAL: Always count recursively!** Files may be organized in daily subfolders.
```python
def count_wav_files(folder_path: str) -> int:
"""
Count WAV files recursively (searches all subdirectories).
IMPORTANT: Don't use glob("*.wav") - it only searches top level!
Files are often organized in daily subfolders:
location/
2025-03-08/
file1.wav
file2.wav
2025-03-09/
file3.wav
"""
count = 0
for root, dirs, files in os.walk(folder_path):
for file in files:
if file.lower().endswith('.wav'):
count += 1
return count
```
### Extract Sample Rate (RECURSIVE!)
```python
def get_sample_rate(folder_path: str) -> int:
"""
Get sample rate from first WAV file (searches recursively).
IMPORTANT: Search recursively since files may be in subfolders!
"""
for root, dirs, files in os.walk(folder_path):
for file in files:
if file.lower().endswith('.wav'):
file_path = os.path.join(root, file)
# Read WAV header (sample rate at bytes 24-27)
try:
with open(file_path, 'rb') as f:
f.seek(24)
sample_rate = int.from_bytes(f.read(4), byteorder='little')
return sample_rate
except:
pass
return 8000 # Default fallback
```
### Parse Date Range (RECURSIVE!)
**Support both formats:**
```python
def extract_date_from_filename(filename: str) -> Optional[str]:
"""Extract date from filename like mok_bl6P_20250308_200003.wav"""
match = re.search(r'(\d{8})_', filename)
if match:
date_str = match.group(1)
# Convert YYYYMMDD to YYYY-MM
return f"{date_str[:4]}-{date_str[4:6]}"
return None
def determine_date_range(folder_path: str) -> str:
"""
Determine date range from WAV filenames (searches recursively).
IMPORTANT: Search recursively since files may be in daily subfolders!
"""
dates = set()
for root, dirs, files in os.walk(folder_path):
for file in files:
if file.lower().endswith('.wav'):
date = extract_date_from_filename(file)
if date:
dates.add(date)
if dates:
sorted_dates = sorted(dates)
if len(sorted_dates) == 1:
return sorted_dates[0]
else:
return f"{sorted_dates[0]} to {sorted_dates[-1]}"
return "unknown"
```
## Step 7: Update CSV and Verify
After creating locations, update CSV with real IDs:
```python
# Update TO_BE_CREATED → real location IDs
# Regenerate CSV with all real IDs
```
### Verification Queries
```sql
-- Check all locations created
SELECT id, name, latitude, longitude, SUBSTRING(description, 1, 50)
FROM location
WHERE dataset_id = 'DATASET_ID'
AND name LIKE 'PREFIX%'
AND active = true
ORDER BY name;
-- Verify row count matches expected
```
### CSV Verification
```bash
# Should be N+1 rows (header + N locations)
wc -l import.csv
# Sum file counts
cat import.csv | tail -n +2 | cut -d',' -f6 | paste -sd+ | bc
```
## Database Safety
### Use Backup for Queries if Production Locked
```python
# Query backup database for existing locations
QUERY_DB = Check, expect something like "/home/david/go/src/skraak/db/backup_YYYY-MM-DD_X.duckdb"
# Create locations in production (when unlocked)
PROD_DB = "/home/david/go/src/skraak/db/skraak.duckdb"
```
### Always Default to Test Database
⚠️ **Unless user explicitly specifies production:**
```python
DB_PATH = "/home/david/go/src/skraak/db/test.duckdb" # Default
```
## Example Usage
### Example 1: All Locations Already Exist
**User Request:**
```
Setup bulk import for Friends of Cobb dataset:
- Dataset ID: QZ0tlUrX4Nyi
- 17 location folders
- Use Takaka coordinates (-40.85085, 172.80703)
- Generate CSV at: /path/to/foc_import.csv
```
**Your Response:**
1. Query existing locations (17 found)
2. Fuzzy match all 17 folders → all matched!
3. Generate CSV with real location IDs
4. Report: "All locations exist, CSV ready for import"
5. Provide bulk import command
### Example 2: Create New Locations (MOK Import)
**User Request:**
```
Import Manu o Kahurangi kiwi survey:
- 23 location folders
- 8 have GPS log files
- 15 need Takaka default coords
- Generate CSV at: /path/to/mok_import.csv
```
**Your Workflow:**
1. **Parse GPS logs** (8 locations):
```python
GPS_DATA = {
"mok_bl6P_Low": {"lat": -40.823070, "lon": 172.582703, "fixes": 14},
"mok_bl6T_Low": {"lat": -40.797182, "lon": 172.594582, "fixes": 14},
...
}
```
2. **Query existing locations** with strict fuzzy matching (threshold 0.95):
- Result: No matches found (all new locations)
3. **Count files recursively** (files in daily subfolders):
- Total: 15,212 WAV files across 23 locations
4. **Present plan to user**:
```
PLAN: Create 23 new locations
- 8 with GPS coordinates (averaged from log files)
- 15 with Takaka defaults
Total files: 15,212
Proceed? [yes/no]
```
5. **After approval, create all locations**:
```python
for location in locations:
if location in GPS_DATA:
gps = GPS_DATA[location]
description = f"Averaged from {gps['fixes']} GPS readings"
else:
gps = {'lat': DEFAULT_LAT, 'lon': DEFAULT_LON}
description = "Location assumed to be Takaka pending GPS data"
location_id = create_location(location, gps['lat'], gps['lon'], description)
```
6. **Generate CSV** with real location IDs
7. **Verify**:
- Query database to confirm all 23 locations created
- Verify CSV has 23 rows + header
- Provide bulk import command
## Output Summary Template
```
======================================================================
BULK IMPORT SETUP COMPLETE
======================================================================
Dataset: Dataset Name (ID: abc123)
Database: /path/to/db
Locations processed: 17/17
- Existing (matched): 17
- New (created): 0
- Failed: 0
Total files: 8,003
CSV Generated: /path/to/import.csv
Log file: /path/to/import_log.txt
Ready for bulk import!
Run:
cd /home/david/go/src/skraak
./skraak import bulk --db ./db/skraak.duckdb \
--dataset DATASET_ID \
--csv "/path/to/import.csv" \
--log /path/to/bulk_import.log
```
## All-in-One Python Script Template
For complex imports with many locations, create a comprehensive Python script:
```python
#!/usr/bin/env python3
"""
Bulk import setup: Parse GPS, create locations, generate CSV
"""
import os
import re
import json
import subprocess
from pathlib import Path
from difflib import SequenceMatcher
from typing import Dict, List, Optional, Tuple
# Configuration
DATASET_ID = "abc123xyz789"
DB_PATH = "./db/skraak.duckdb"
TIMEZONE = "Pacific/Auckland"
BASE_PATH = "/path/to/data"
DEFAULT_LAT = -40.85085
DEFAULT_LON = 172.80703
# GPS log files (if available)
GPS_LOGS = ["location1/log.txt", "location2/log.txt"]
# Location folders
FOLDERS = ["location1", "location2", ...]
def parse_gps_log(log_path: str) -> Optional[Tuple[float, float, int]]:
"""Parse GPS log and return (lat, lon, fix_count)"""
if not os.path.exists(log_path):
return None
gps_pattern = re.compile(r'GPS \(lat,long\): (-?\d+\.\d+),(-?\d+\.\d+)')
coords = []
with open(log_path, 'r', encoding='utf-8', errors='ignore') as f:
for line in f:
match = gps_pattern.search(line)
if match:
coords.append((float(match.group(1)), float(match.group(2))))
if coords:
avg_lat = sum(c[0] for c in coords) / len(coords)
avg_lon = sum(c[1] for c in coords) / len(coords)
return (avg_lat, avg_lon, len(coords))
return None
def count_wav_files_recursive(folder_path: str) -> int:
"""Count WAV files recursively"""
count = 0
for root, dirs, files in os.walk(folder_path):
count += sum(1 for f in files if f.lower().endswith('.wav'))
return count
def get_sample_rate_recursive(folder_path: str) -> int:
"""Get sample rate from first WAV file found"""
for root, dirs, files in os.walk(folder_path):
for file in files:
if file.lower().endswith('.wav'):
try:
with open(os.path.join(root, file), 'rb') as f:
f.seek(24)
return int.from_bytes(f.read(4), byteorder='little')
except:
pass
return 8000
def create_location(name: str, lat: float, lon: float, desc: str) -> str:
"""Create location via CLI and return ID"""
cmd = [
"./skraak", "create", "location",
"--db", DB_PATH,
"--dataset", DATASET_ID,
"--name", name,
"--lat", str(lat),
"--lon", str(lon),
"--timezone", TIMEZONE,
"--description", desc
]
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
return json.loads(result.stdout)['location']['id']
def main():
# Parse GPS logs
gps_data = {}
for log_file in GPS_LOGS:
result = parse_gps_log(os.path.join(BASE_PATH, log_file))
if result:
folder = log_file.split('/')[0]
gps_data[folder] = {
'lat': result[0],
'lon': result[1],
'fixes': result[2]
}
# Process locations
location_map = {}
for folder in FOLDERS:
folder_path = os.path.join(BASE_PATH, folder)
# Determine coordinates and description
if folder in gps_data:
lat = gps_data[folder]['lat']
lon = gps_data[folder]['lon']
desc = f"Averaged from {gps_data[folder]['fixes']} GPS readings"
else:
lat = DEFAULT_LAT
lon = DEFAULT_LON
desc = "Location assumed to be Takaka pending GPS data"
# Create location
location_id = create_location(folder, lat, lon, desc)
# Get metadata
file_count = count_wav_files_recursive(folder_path)
sample_rate = get_sample_rate_recursive(folder_path)
location_map[folder] = {
'id': location_id,
'files': file_count,
'sample_rate': sample_rate
}
# Generate CSV (use manual writes, not csv module, to avoid \r\n line endings)
csv_path = os.path.join(BASE_PATH, "import.csv")
with open(csv_path, 'w') as f:
f.write("location_name,location_id,directory_path,date_range,sample_rate,file_count\n")
for folder, info in location_map.items():
f.write(f"{folder},{info['id']},{BASE_PATH}/{folder},2025-02,{info['sample_rate']},{info['files']}\n")
print(f"✓ Created {len(location_map)} locations")
print(f"✓ Generated CSV: {csv_path}")
if __name__ == '__main__':
main()
```
## Common Patterns
### Date Format Detection
```python
# YYYYMMDD: 20231115_050004
# DDMMYY: 011123_050004
# YYMMDD: 231101_050004
# Heuristic: If first 4 chars > 2000, likely YYYYMMDD
# Can use variance to disambiguate YYMMDD/DDMMYY (many days, fewer years)
```
### Multiple Sample Rates
```python
# Don't assume uniform - check each location
# Common: 8000 Hz, 16000 Hz, 32000 Hz, 250000 Hz
```
### Trip References in Names
```python
# Location names may include trip references:
# foc-thup-foc01 → Friends of Cobb, trip 1
# foc-thup-fof11 → Friends of Flora, trip 11
```
## Troubleshooting
### Production Database Locked
**Solution**: Use backup database for queries, create locations later:
```python
# Step 1: Query backup for matches
# Step 2: Prepare CSV with analysis
# Step 3: Check with user
# Step 4: Wait for prod unlock
# Step 5: Check prod agrees with backup.
# Step 5: Create locations in prod
# Step 6: Update CSV with real IDs
```
### Ambiguous Fuzzy Matches
**Problem**: One database location matches multiple folders
**Solution**: Be conservative - create new locations for each folder
```python
# foc-thup-f could match:
# - foc-thup-foc01
# - foc-thup-fof11
# create new locations for each folder
```
### Zero-Byte Files
**Handling**: Count them but note in logs
```python
file_count = len(list(directory.glob("*.wav")))
zero_byte_files = [f for f in directory.glob("*.wav") if f.stat().st_size == 0]
if zero_byte_files:
log(f" ⚠ {len(zero_byte_files)} zero-byte files found", log_file)
```
## Key Lessons (From Production Imports)
### 🔴 CRITICAL: Stricter Fuzzy Matching (0.9 threshold)
**Problem:** Threshold of 0.80 caused false matches:
- `mok_bl6P` matched to `mok_bl16` (score: 0.86) → WRONG! Different locations
- `mok_tuna51` matched to `mok_tuna31` (score: 0.89) → WRONG! Different locations
**Solution:** Use 0.9 threshold:
```python
if best_score > 0.9: # NOT 0.80!
return best_match
```
### 🔴 CRITICAL: Always Search Recursively
**Problem:** Using `glob("*.wav")` missed files in subdirectories.
**Solution:** Use `os.walk()` everywhere:
```python
for root, dirs, files in os.walk(folder_path): # NOT glob()!
for file in files:
if file.lower().endswith('.wav'):
# process file
```
**Applies to:**
- File counting
- Sample rate detection
- Date range parsing
### 🔴 CRITICAL: Handle Encoding Errors
**Problem:** Some GPS log files contain non-UTF-8 bytes → UnicodeDecodeError
**Solution:** Always use `errors='ignore'`:
```python
with open(log_file, 'r', encoding='utf-8', errors='ignore') as f: # errors='ignore'!
```
### 🟡 IMPORTANT: Remove Suffixes in Normalization
Normalize location names to handle suffix variations:
```python
name = re.sub(r'(low|high)$', '', name.lower()) # Remove Low/High
# mok_bl6P_Low → mok_bl6p (for matching)
```
### 🟡 IMPORTANT: All-in-One Python Script
For complex imports (10+ locations), create a single Python script that:
1. Parses GPS logs
2. Counts files recursively
3. Gets metadata
4. Creates locations via CLI
5. Generates CSV
**Benefits:**
- Single point of failure
- Easy to debug
- Reusable for similar imports
- Complete audit trail
## Guidelines
- Always check existing locations before creating new ones (avoid duplicates)
- Use **strict** fuzzy matching (0.9 threshold) to avoid false matches
- **Always search recursively** with `os.walk()` (files in subfolders!)
- Prepare CSV before creating locations (allows user review)
- Never create locations without explicit user confirmation
- Write comprehensive logs for debugging and records
- Support multiple date formats (YYYYMMDD, DDMMYY and YYMMDD)
- Handle missing GPS gracefully with fallback coordinates
- **Handle encoding errors** with `errors='ignore'` when reading logs
- Verify results with SQL queries and CSV checks
- Provide ready-to-run bulk import command
## Success Criteria
✅ No duplicate locations created
✅ All folders matched or new locations planned
✅ CSV has correct format with all required columns
✅ Log file provides audit trail
✅ User confirmed before any database writes
✅ Verification queries show expected results
✅ Bulk import command provided