Storage & Retention Settings
Storage & Retention Settings
nano provides flexible storage options to optimize performance, cost, and compliance requirements. This page covers all storage configuration options, including database modes, retention policies, and advanced storage tiering.
Storage Architecture Overview
nano supports two storage architectures:
PostgreSQL Only Mode (Legacy)
- All data stored in PostgreSQL with TimescaleDB extension
- Best for: Small to medium deployments (< 1TB logs)
- Advantages: Simple setup, single database to manage
- Limitations: Higher storage costs, slower queries on large datasets
Dual Database Mode (Recommended)
- Log telemetry stored in ClickHouse for optimal query performance
- Metadata stored in PostgreSQL (rules, alerts, dashboards, settings)
- Best for: Medium to large deployments (> 1TB logs)
- Advantages: Better performance, lower storage costs, advanced tiering options
Storage Tabs Overview
The Storage & Retention settings page contains multiple tabs based on your configuration:
ClickHouse Tab (Log Storage)
Available when dual database mode is enabled. Manages the primary log storage system.
Storage Statistics
- Total Size: Compressed size of all log data with compression ratio
- Total Logs: Number of log entries stored
- Partitions: Number of date-based partitions (typically one per day)
- Parts: Number of data files (ClickHouse's storage units)
- Date Range: Oldest and newest log timestamps
TTL Retention Configuration
- Retention Period: How long to keep logs (1-3650 days)
- Auto-deletion: ClickHouse automatically deletes expired data
- Force TTL: Manually trigger immediate deletion of expired data
TTL Behavior:
- Data is partitioned by day for efficient deletion
- TTL is applied during background merge operations
- "Force TTL Now" immediately processes all partitions
- Deleted data cannot be recovered
PostgreSQL Tab (Metadata/Legacy)
In Dual Database Mode
Shows information about metadata storage:
- Detection rules and configurations
- Alert definitions and history
- Dashboard configurations
- User settings and preferences
- Parser configurations
- AI/ML model settings
In PostgreSQL Only Mode
Provides full storage management:
Storage Statistics:
- Total Size: Size of all tables including indexes
- Total Logs: Number of log entries
- Chunks: TimescaleDB time-based chunks
- Compressed Chunks: Number of compressed chunks
- Date Range: Data time span
Retention Policy Configuration:
- Enable/Disable: Toggle automatic retention
- Retention Period: Days to keep data (1-3650)
- Manual Execution: Force retention cleanup immediately
TimescaleDB Features:
- Automatic chunk compression after 1 day
- Background retention job execution
- Efficient time-based data deletion
Storage Tiering Tab (Advanced)
Available only in dual database mode with ClickHouse. Provides cost-effective long-term storage.
Storage Tiering Deep Dive
Storage tiering automatically moves data between storage tiers based on age and access patterns.
Tier Architecture
Hot Tier (Local SSD)
- Storage: Local NVMe/SSD storage
- Performance: Fastest query response times
- Use case: Recent data requiring frequent access
- Configuration: 1-365 days (recommended: 7-60 days)
Auto-Move Behavior Options:
- TTL Only: Move data based purely on age
- 90% Full: Also move data when disk reaches 90% capacity
- 80% Full: Also move data when disk reaches 80% capacity
Warm Tier (S3-backed)
- Storage: S3-compatible object storage
- Performance: Slower queries but still accessible via ClickHouse
- Use case: Historical data for investigations and compliance
- Configuration: Total retention period (hot days to warm days)
Cold Tier (Archive)
- Storage: S3 archive storage classes
- Performance: Export-only, not queryable
- Use case: Long-term compliance and backup
- Configuration: Optional, for data older than warm tier
S3 Configuration Options
Basic S3 Settings
- S3 Bucket: Target bucket for tiered data
- Region: AWS region or equivalent for other providers
- Custom Endpoint: For MinIO, Cloudflare R2, Backblaze B2
- Path-style Access: Required for MinIO and some S3-compatible services
Supported S3 Providers
- AWS S3: Native support, all storage classes
- MinIO: Self-hosted S3-compatible storage
- Cloudflare R2: Cost-effective alternative to S3
- Backblaze B2: Budget-friendly cloud storage
- Google Cloud Storage: S3-compatible API
- Azure Blob Storage: S3-compatible interface
Credentials Management
- Access Key ID: S3 access credentials
- Secret Access Key: Encrypted storage at rest
- Connection Testing: Verify connectivity before applying
- Credential Status: Visual indicator of configuration state
Tiering Configuration Process
1. Enable Storage Tiering
Toggle the main switch to enable S3 storage tiering functionality.
2. Configure S3 Settings
- Set bucket name and region
- Configure custom endpoint if using non-AWS provider
- Enable path-style access for MinIO
3. Set Credentials
- Enter S3 access key and secret key
- Test connection to verify configuration
- Credentials are encrypted and stored securely
4. Configure Tier Thresholds
- Hot Tier: Set days for local storage (7, 14, 30, 60 days)
- Auto-Move: Choose disk utilization trigger
- Warm Tier: Set total retention period (90, 180, 365, 730 days)
- Cold Tier: Optional archive tier for compliance
5. Apply Configuration
- Save configuration to database
- Apply to ClickHouse to activate tiering
- Monitor status and storage distribution
Storage Distribution Monitoring
When tiering is active, monitor data distribution across tiers:
Hot Tier Metrics
- Size: Amount of data on local storage
- Row Count: Number of recent log entries
- Performance: Query response times
Warm Tier Metrics
- Size: Amount of data in S3 storage
- Row Count: Number of archived log entries
- Cost: S3 storage and request costs
Total Metrics
- Combined Size: Total across all tiers
- Combined Rows: Total log entries
- Cost Savings: Compared to all-local storage
Performance Considerations
Query Performance by Tier
| Tier | Query Speed | Use Case | Cost |
|---|---|---|---|
| Hot (Local SSD) | < 100ms | Real-time analysis | High |
| Warm (S3) | 1-10s | Historical investigation | Medium |
| Cold (Archive) | Export only | Compliance | Low |
Storage Cost Optimization
Hot Tier Sizing
- Small environments: 7-14 days hot storage
- Medium environments: 14-30 days hot storage
- Large environments: 30-60 days hot storage
Warm Tier Strategy
- Compliance: Match regulatory requirements (90 days, 1 year, 7 years)
- Investigation: Keep 6-12 months for threat hunting
- Budget: Balance storage costs with access needs
Auto-Move Tuning
- Conservative: TTL only (predictable behavior)
- Balanced: 90% full trigger (handles traffic spikes)
- Aggressive: 80% full trigger (maximizes local storage efficiency)
Retention Best Practices
Initial Configuration
- Start conservative: Begin with longer retention periods
- Monitor usage: Track query patterns and data access
- Adjust gradually: Reduce retention based on actual needs
- Test recovery: Verify backup and restore procedures
Compliance Considerations
- Regulatory requirements: Match industry standards (SOX, HIPAA, PCI-DSS)
- Legal hold: Ability to preserve data for litigation
- Data sovereignty: Ensure S3 region compliance
- Encryption: Enable S3 encryption for sensitive data
Operational Management
- Monitoring: Set up alerts for storage capacity and costs
- Backup: Regular backups of configuration and metadata
- Testing: Periodic restore tests and disaster recovery drills
- Documentation: Maintain records of retention policies and changes
Troubleshooting
Common Issues
ClickHouse TTL Issues
Symptoms: Old data not being deleted Solutions:
- Check TTL configuration:
SHOW CREATE TABLE logs - Force TTL: Use "Force TTL Now" button
- Verify partitions: Check partition dates and sizes
- Monitor merges: TTL applies during background merges
Storage Tiering Problems
Symptoms: Data not moving to S3 Solutions:
- Verify S3 credentials and permissions
- Check bucket accessibility and region
- Review ClickHouse logs for S3 errors
- Test connection using the test button
Performance Issues
Symptoms: Slow queries on tiered data Solutions:
- Optimize query patterns for time-based filtering
- Increase hot tier retention for frequently accessed data
- Use appropriate S3 storage class for access patterns
- Monitor S3 request costs and optimize query frequency
Error Messages
| Error | Cause | Solution |
|---|---|---|
| "TTL not applied" | Background merges not running | Force TTL or restart ClickHouse |
| "S3 access denied" | Invalid credentials or permissions | Update S3 credentials and bucket policy |
| "Partition not found" | Data already deleted | Check retention settings and restore from backup |
| "Disk space full" | Hot tier storage exhausted | Reduce hot tier retention or add storage |
API Configuration
Programmatic storage management via REST API:
# Get storage overview
curl -X GET "http://localhost:8080/api/settings/storage/overview"
# Update PostgreSQL retention
curl -X PUT "http://localhost:8080/api/settings/retention" \
-H "Content-Type: application/json" \
-d '{"enabled": true, "retention_days": 90}'
# Update ClickHouse TTL
curl -X PUT "http://localhost:8080/api/settings/clickhouse/retention" \
-H "Content-Type: application/json" \
-d '{"retention_days": 365}'
# Configure storage tiering
curl -X PUT "http://localhost:8080/api/settings/tiering" \
-H "Content-Type: application/json" \
-d '{
"enabled": true,
"s3_bucket": "my-siem-logs",
"s3_region": "us-east-1",
"hot_days": 30,
"warm_days": 365
}'
# Apply tiering configuration
curl -X POST "http://localhost:8080/api/settings/tiering/apply"Related Documentation
- Data Management - User guide for data lifecycle
- Deployment Architecture - Infrastructure planning and scaling