Query Safety Limits
Configure query-level OOM protection with bounded array aggregation, row expansion limits, and post-processing caps
Query Safety Limits
nano includes configurable safety limits that prevent individual queries from exhausting server memory. These limits target specific high-risk query patterns while preserving the flexibility analysts need for threat hunting and incident response.
Why These Limits Exist
Certain nPL query patterns can consume unbounded memory if left unchecked:
- Array aggregation (
values(),list()) collects all matching values into in-memory arrays per group - Multi-value expansion (
mvexpand) multiplies rows viaarrayJoinbefore anyLIMITis applied - Post-processing aggregation — when stats/top/rare commands run after enrichment, they buffer all rows in a HashMap
- Streaming cache — large result sets buffered for re-display can consume significant memory
These limits complement the per-query ClickHouse resource settings (see Search Concurrency). While ClickHouse enforces max_memory_usage at the database level, these limits operate at the application level to bound specific patterns that can be expensive even within ClickHouse's memory budget.
Settings
All settings are configurable via Settings > Search > Query Safety Limits. Changes take effect within 60 seconds — no restart required.
Max Array Aggregation Size
| Setting | max_group_array_size |
| Default | 10,000 |
| Range | 100 – 1,000,000 |
Controls the maximum number of elements in groupArray() and groupUniqArray() calls generated by values() and list() aggregation functions. When the limit is reached, ClickHouse silently truncates the array — no error is thrown, but results may be incomplete.
Affects these nPL patterns:
* | stats values(user) by src_ip # groupUniqArray(10000)(user)
* | stats list(message) by user # groupArray(10000)(message)
* | streamstats values(dest_ip) by user # window function variant
* | transaction user # groupArray(1000)(message) for _raw_eventsSizing guidance:
| Deployment | Recommended | Rationale |
|---|---|---|
| Small (< 50 GB/day) | 10,000 (default) | Sufficient for most values() use cases |
| Medium (50–200 GB/day) | 25,000 | Higher-cardinality environments (more unique users, IPs) |
| Enterprise (> 200 GB/day) | 50,000–100,000 | Large-scale hunting where completeness matters |
For the transaction command specifically, the array limit for _raw_events is capped at the command's maxevents parameter (default: 1,000) regardless of this setting.
Max Mvexpand Rows
| Setting | max_mvexpand_rows |
| Default | 100,000 |
| Range | 1,000 – 10,000,000 |
The default row limit applied to mvexpand when the query doesn't specify an explicit limit=N. This is important because arrayJoin() expansion happens before any SQL LIMIT — without a cap, a single field with large arrays can multiply a modest result set into billions of rows.
Example:
* | mvexpand dns_answers # applies LIMIT 100000
* | mvexpand dns_answers limit=500 # uses explicit limit, ignores this settingSizing guidance:
| Deployment | Recommended | Rationale |
|---|---|---|
| Small | 100,000 (default) | Safe for DNS answer expansion, multi-value fields |
| Medium | 250,000 | Larger datasets with more multi-value fields |
| Enterprise | 500,000–1,000,000 | High-volume environments with complex array fields |
Max Post-Processing Groups
| Setting | max_post_processing_groups |
| Default | 1,000,000 |
| Range | 10,000 – 10,000,000 |
Caps the number of groups in Rust-side post-processing for stats, top, and rare commands. This only applies to post-lateral commands that can't run in ClickHouse SQL (e.g., stats after a lookup enrichment or after prevalence filtering). When the cap is reached, additional groups are silently dropped and a warning is logged server-side.
Most stats/top/rare queries execute directly in ClickHouse and are not affected by this setting — ClickHouse handles GROUP BY cardinality with external aggregation (spill-to-disk).
Sizing guidance:
| Deployment | Recommended | Rationale |
|---|---|---|
| All sizes | 1,000,000 (default) | Matches the default SQL result limit; rarely needs adjustment |
Increase only if you see truncation warnings in server logs for legitimate post-enrichment aggregations.
Max Streaming Cache Rows
| Setting | max_streaming_cache_rows |
| Default | 50,000 |
| Range | 1,000 – 1,000,000 |
Controls how many rows are buffered for the streaming result cache. When a search streams results via SSE, rows are simultaneously collected in memory for caching (so switching tabs and returning doesn't re-execute the query). If the result set exceeds this limit, caching is skipped — the query still streams correctly to the client, but won't be available for instant re-display.
Sizing guidance:
| Deployment | Recommended | Rationale |
|---|---|---|
| Small (limited RAM) | 25,000 | Reduce memory footprint per concurrent query |
| Medium | 50,000 (default) | Good balance between cache hit rate and memory |
| Enterprise (high RAM) | 100,000–200,000 | More results cached for tab-switching workflows |
Block on Cost Analysis Errors
| Setting | block_on_cost_errors |
| Default | Off |
When enabled, queries that trigger Error-severity cost analysis warnings are rejected before execution. The analyst receives a clear error message with the specific issue and a suggestion for how to fix the query.
Error-severity warnings include:
| Warning Code | Trigger |
|---|---|
UNBOUNDED_DEDUP | dedup without prior head or aggregation |
UNBOUNDED_TRANSACTION | transaction without both maxspan and maxevents |
UNBOUNDED_EVENTSTATS | eventstats without prior limit or aggregation |
UNBOUNDED_STREAMSTATS | streamstats without prior limit or aggregation |
When to enable: Recommended for shared environments where untrained analysts might run expensive queries. In single-analyst or dev environments, leaving this off allows more flexibility.
When disabled (default), these warnings are still shown in the search results UI as advisory messages — analysts can see the warning and choose to refine their query.
Cost Analysis Warnings
Beyond the configurable limits above, nano's query cost analyzer generates warnings for patterns that may be expensive. These are always shown in the search UI regardless of the block_on_cost_errors setting:
| Code | Severity | Description |
|---|---|---|
WILDCARD_SEARCH | Info | Query uses * without field filters |
REGEX_FILTER | Info | Query uses regex matching (slower than exact/prefix) |
SEQUENCE_FUNNEL_NO_FILTER | Info | sequence/funnel without prior filters |
LARGE_LIMIT | Info | head/tail requesting > 50,000 results |
UNBOUNDED_SORT | Warning | sort without prior limit |
UNBOUNDED_MVEXPAND | Warning | mvexpand using server default limit |
TINY_TIMECHART_SPAN | Warning | timechart with span < 10 seconds |
LARGE_BIN_HOP_FANOUT | Warning | bin hop creating > 100 overlapping windows per event |
STATS_VALUES_LIST_UNBOUNDED | Warning | values()/list() without prior head |
UNBOUNDED_DEDUP | Error | dedup without prior limit or aggregation |
UNBOUNDED_TRANSACTION | Error | transaction without maxspan and maxevents |
UNBOUNDED_EVENTSTATS | Error | eventstats without prior limit or aggregation |
UNBOUNDED_STREAMSTATS | Error | streamstats without prior limit or aggregation |
API
Query safety limits can be managed programmatically:
# Get current limits
curl -H "Authorization: Bearer $TOKEN" \
https://your-instance/api/settings/search/query-limits
# Update limits
curl -X PUT -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"max_group_array_size": 25000,
"max_mvexpand_rows": 250000,
"max_post_processing_groups": 1000000,
"max_streaming_cache_rows": 100000,
"block_on_cost_errors": false
}' \
https://your-instance/api/settings/search/query-limitsRequires the settings:system permission.