---
name: dataset-report
description: Generate a report showing dataset breakdown - locations, file counts per location and per dataset - handling both structured and unstructured dataset types
---
# Dataset Report
Generate a summary report of all active datasets showing location counts, file totals, and files per location.
## When to Use
When the user asks for a dataset overview, summary, breakdown, or report.
## Query
Structured datasets link files through `location > cluster > file`. Unstructured datasets link files through the `file_dataset` junction table with no location hierarchy.
```sql
WITH structured AS (
SELECT
d.name AS dataset,
d.type,
COUNT(DISTINCT l.id) AS locations,
COUNT(f.id) AS total_files
FROM dataset d
LEFT JOIN location l ON d.id = l.dataset_id AND l.active = true
LEFT JOIN cluster c ON l.id = c.location_id AND c.active = true
LEFT JOIN file f ON c.id = f.cluster_id AND f.active = true
WHERE d.active = true AND d.type != 'unstructured'
GROUP BY d.name, d.type
),
unstructured AS (
SELECT
d.name AS dataset,
d.type,
0 AS locations,
COUNT(fd.file_id) AS total_files
FROM dataset d
LEFT JOIN file_dataset fd ON d.id = fd.dataset_id
LEFT JOIN file f ON fd.file_id = f.id AND f.active = true
WHERE d.active = true AND d.type = 'unstructured'
GROUP BY d.name, d.type
),
combined AS (
SELECT * FROM structured
UNION ALL
SELECT * FROM unstructured
)
SELECT
dataset,
type,
locations,
total_files,
CASE WHEN locations > 0
THEN ROUND(total_files::DECIMAL / locations, 0)
ELSE NULL END AS files_per_location
FROM combined
ORDER BY total_files DESC
```
## Output
Present results as a markdown table with a summary line showing totals.
- Use `-` for files_per_location on unstructured datasets (no location hierarchy)
- Format large numbers with commas for readability
- Include dataset type column