Automatic partition eviction when ClickHouse local disk usage exceeds configurable watermarks

Disk Pressure & Automatic Eviction

nano continuously monitors ClickHouse disk usage and automatically drops the oldest daily partitions when local storage fills up. This prevents disk-full outages without manual intervention.

Disk pressure monitoring is only active in dual database mode (ClickHouse enabled). It runs on the elected leader node in multi-pod deployments.

How It Works

The disk pressure service runs a check cycle on a configurable interval (default: every 60 seconds):

Queries ClickHouse system.disks for current disk usage
Classifies usage into a pressure level based on watermark thresholds
Takes action based on the level — dropping partitions, emitting notifications, or pausing ingestion

Pressure Levels

Level	Trigger	Behavior
Normal	Below high watermark	No action. Resolves any active health issue and resumes ingestion.
Elevated	Above high watermark (60%)	Drops oldest partitions until usage falls below the low watermark.
Critical	Above critical threshold (85%)	Same as elevated, plus emits a disk pressure warning notification.
Emergency	Above emergency threshold (90%)	Same as critical, plus optionally pauses log ingestion.

Partition Drop Strategy

When pressure is elevated or higher, the service drops daily partitions using a FIFO (oldest-first) strategy:

Tables affected: logs, signals, ingestion_errors, identity_observations, nat_candidates
Safety limit: Maximum 5 partitions dropped per check cycle
Cool-down: 2-second pause between drops to let ClickHouse settle
Target: Drops continue until usage falls below the low watermark (default 50%)

Dropped partitions cannot be recovered. If you need long-term retention, configure Storage Tiering to move data to S3 before it ages out.

Configuration

All settings are configured via environment variables. The defaults are designed for production use — most deployments won't need to change them.

Variable	Default	Description
`DISK_PRESSURE_CHECK_INTERVAL_SECS`	`60`	Seconds between disk usage checks
`DISK_PRESSURE_HIGH_WATERMARK`	`0.60`	Fraction of disk usage that triggers partition eviction
`DISK_PRESSURE_LOW_WATERMARK`	`0.50`	Target usage fraction — eviction stops when usage drops below this
`DISK_PRESSURE_CRITICAL_THRESHOLD`	`0.85`	Fraction that triggers critical-level warnings
`DISK_PRESSURE_EMERGENCY_THRESHOLD`	`0.90`	Fraction that triggers emergency-level warnings and optional ingestion pause
`DISK_PRESSURE_PAUSE_INGESTION`	`false`	Whether to pause log ingestion at emergency level

Watermark values are fractions from 0.0 to 1.0 representing the percentage of total disk space used. For example, 0.60 means 60%.

Example: Conservative Settings

For environments with large disks and predictable growth:

export DISK_PRESSURE_HIGH_WATERMARK=0.75
export DISK_PRESSURE_LOW_WATERMARK=0.65
export DISK_PRESSURE_CRITICAL_THRESHOLD=0.90
export DISK_PRESSURE_EMERGENCY_THRESHOLD=0.95

Example: Aggressive Settings

For small-disk environments where space is tight:

export DISK_PRESSURE_HIGH_WATERMARK=0.50
export DISK_PRESSURE_LOW_WATERMARK=0.40
export DISK_PRESSURE_CRITICAL_THRESHOLD=0.75
export DISK_PRESSURE_EMERGENCY_THRESHOLD=0.85
export DISK_PRESSURE_PAUSE_INGESTION=true

Automatic Skip: Storage Tiering and ClickHouse Cloud

Disk pressure eviction is automatically skipped when storage tiering to S3-compatible storage is active. When the system detects that tiering is enabled and in active status, partition drops are bypassed entirely because:

ClickHouse TTL rules handle data movement — cold partitions are automatically moved to S3/R2 object storage by ClickHouse itself, freeing local disk space without dropping data
Data is preserved — customers using tiering are paying for offsite storage specifically to retain historical data, so dropping partitions would defeat the purpose
Local disk pressure is self-correcting — as TTL moves age data off local disk, space is freed naturally

When tiering is active, the disk pressure service logs:

Storage tiering is active — skipping partition drops.
ClickHouse TTL rules will move data to S3/R2 automatically.

The same applies to ClickHouse Cloud deployments where storage is managed by the cloud provider. Since ClickHouse Cloud uses shared object storage under the hood, local disk pressure is not a concern and the eviction system has no partitions to drop.

To configure storage tiering, see Storage Tiering.

Notifications

Disk pressure events generate notifications visible to all admin users in the nano UI.

Disk Pressure Warning

Sent once per pressure episode when critical or emergency level is reached:

Title: Disk pressure {severity}: ClickHouse at {percentage}%
Link: Redirects to Settings > Retention for admin action

Partition Dropped

Sent after each partition is dropped:

Title: Partition {date} dropped due to disk pressure
Details: Lists the 5 daily tables affected

Notifications are deduplicated — you won't receive repeated warnings for the same active pressure episode. When usage returns to normal, the health issue is automatically resolved.

Monitoring via API

The disk pressure status is included in the storage overview endpoint:

curl -X GET "http://localhost:3000/api/settings/storage/overview" \
  -H "Authorization: Bearer <token>"

The response includes a disk_pressure object with:

Field	Description
`usage_fraction`	Current disk usage (0.0–1.0)
`total_bytes` / `used_bytes` / `free_bytes`	Absolute disk metrics
`level`	Current pressure level (`normal`, `elevated`, `critical`, `emergency`)
`estimated_retention_days`	Projected days of capacity remaining
`partitions_dropped`	Counter of partitions dropped since service start
`ingestion_paused`	Whether ingestion is currently paused

Audit Trail

All disk pressure actions are logged to the audit table:

partition_dropped — Emitted when a partition is dropped, with partition date and affected tables
disk_pressure_critical — Emitted when critical or emergency level is reached

These audit events are searchable via the standard search interface, providing a complete history of automatic storage management actions.

Storage & Retention Settings — TTL configuration, database modes, and storage tiering setup
Deployment Architecture — Infrastructure planning and disk sizing

Disk Pressure & Automatic Eviction

On this page