---
name: dataset-report
description: Generate a report showing dataset breakdown - locations, file counts per location and per dataset - handling both structured and unstructured dataset types
---

# Dataset Report

Generate a summary report of all active datasets showing location counts, file totals, and files per location.

## When to Use

When the user asks for a dataset overview, summary, breakdown, or report.

## Query

Structured datasets link files through `location > cluster > file`. Unstructured datasets link files through the `file_dataset` junction table with no location hierarchy.

```sql
WITH structured AS (
    SELECT
        d.name AS dataset,
        d.type,
        COUNT(DISTINCT l.id) AS locations,
        COUNT(f.id) AS total_files
    FROM dataset d
    LEFT JOIN location l ON d.id = l.dataset_id AND l.active = true
    LEFT JOIN cluster c ON l.id = c.location_id AND c.active = true
    LEFT JOIN file f ON c.id = f.cluster_id AND f.active = true
    WHERE d.active = true AND d.type != 'unstructured'
    GROUP BY d.name, d.type
),
unstructured AS (
    SELECT
        d.name AS dataset,
        d.type,
        0 AS locations,
        COUNT(fd.file_id) AS total_files
    FROM dataset d
    LEFT JOIN file_dataset fd ON d.id = fd.dataset_id
    LEFT JOIN file f ON fd.file_id = f.id AND f.active = true
    WHERE d.active = true AND d.type = 'unstructured'
    GROUP BY d.name, d.type
),
combined AS (
    SELECT * FROM structured
    UNION ALL
    SELECT * FROM unstructured
)
SELECT
    dataset,
    type,
    locations,
    total_files,
    CASE WHEN locations > 0
         THEN ROUND(total_files::DECIMAL / locations, 0)
         ELSE NULL END AS files_per_location
FROM combined
ORDER BY total_files DESC
```

## Output

Present results as a markdown table with a summary line showing totals.

- Use `-` for files_per_location on unstructured datasets (no location hierarchy)
- Format large numbers with commas for readability
- Include dataset type column