prevalence
prevalence
Filter or enrich events based on artifact prevalence. Identify rare or suspicious indicators.
Description
The prevalence command leverages nano's prevalence tracking to filter events based on how common artifacts are across your environment. This is powerful for detecting rare file hashes, newly seen domains, or unusual patterns.
Prevalence data tracks how many hosts have observed each artifact and when it was first seen.
Columns vs Command: For simple host count filtering, you can also use the pre-computed prevalence_* columns (e.g., WHERE prevalence_min < 5) which have zero query cost. Use this command when you need:
- first_seen filtering:
domain_first_seen > now() - 24h - Enrichment mode:
enrich=trueto add prevalence metadata - Custom time windows:
window=7dinstead of the default 30 days
See Prevalence Tracking for details.
Syntax
Filter mode:
... | prevalence <field> <operator> <value> [<field> <operator> <value> ...] [window=<duration>]Enrich mode:
... | prevalence enrich=true [window=<duration>]Filter Fields
Used with filter mode (prevalence <field> <operator> <value>):
| Field | Description |
|---|---|
hash_prevalence | Number of hosts that have seen this file hash |
domain_prevalence | Number of hosts that have seen this domain |
hash_first_seen | Timestamp when hash was first observed |
domain_first_seen | Timestamp when domain was first observed |
Enrichment Fields
When using enrich=true, the following fields are added to each event. These can be used in downstream commands like where, sort, stats, and table.
| Field | Type | Description |
|---|---|---|
host_count | number | Number of unique hosts that have observed this artifact |
is_rare | boolean (0/1) | Whether the artifact is below the rarity threshold |
prevalence_score | number (0-100) | Rarity score — 0 = never seen, 100 = everywhere. See Scoring below |
prevalence_type | string | Artifact type: domain, hash, ip, or comma-separated if multiple |
prevalence_artifact | string | The actual artifact value being tracked |
prevalence_first_seen | timestamp | When the artifact was first observed in your environment |
prevalence_last_seen | timestamp | When the artifact was most recently observed |
first_seen | timestamp | Alias for prevalence_first_seen |
last_seen | timestamp | Alias for prevalence_last_seen |
total_occurrences | number | Total number of times this artifact has been seen |
Optional Arguments
window
Syntax: window=<duration>
Description: Time window for prevalence calculation
Default: 30d
enrich
Syntax: enrich=true
Description: Add prevalence fields without filtering
Examples
Rare file hashes
file_hash=*
| prevalence hash_prevalence < 5Newly seen domains
* | prevalence domain_first_seen > now() - INTERVAL 24 HOURCombined conditions
* | prevalence hash_prevalence < 3 AND hash_first_seen > now() - INTERVAL 7 DAY window=30dEnrich with prevalence data
* | prevalence enrich=true
| table file_hash, hash_prevalence, hash_first_seenRare process execution
process_name=*
| prevalence hash_prevalence <= 10 window=7d
| table timestamp, process_name, file_hash, hash_prevalence, src_hostNew domain connections
dest_domain=*
| prevalence domain_first_seen > now() - INTERVAL 1 DAY
| stats count() by dest_domain, domain_first_seenSuspicious downloads
action=file_download
| prevalence hash_prevalence < 5 AND domain_prevalence < 10Rare and new
* | prevalence hash_prevalence < 3 AND hash_first_seen > now() - INTERVAL 3 DAY
| where bytes > 1000000Enrich and filter by host count
* | prevalence enrich=true
| where host_count < 5Find rare artifacts
sourcetype=squid_proxy
| prevalence enrich=true
| where is_rare=1
| table timestamp, user, prevalence_artifact, prevalence_type, host_count, prevalence_scoreRecently seen artifacts sorted by rarity
* | prevalence enrich=true
| where prevalence_first_seen > "2026-01-01"
| sort prevalence_scoreAggregate by artifact
sourcetype=squid_proxy
| prevalence enrich=true
| where host_count < 3
| stats count by prevalence_artifact, prevalence_type, host_count, prevalence_first_seenPrevalence Scoring
The prevalence_score field is a 0-100 score that reflects how common an artifact is across your environment. It's calculated relative to your rarity threshold setting (default: 3 hosts).
| Score | Band | Condition | Meaning |
|---|---|---|---|
| 0 | Never seen | host_count = 0 | Artifact has never been observed |
| 1-20 | Very rare | host_count < threshold | Below your rarity threshold |
| 21-50 | Rare | threshold ≤ host_count < threshold×2 | Around the rarity boundary |
| 51-80 | Uncommon | threshold×2 ≤ host_count < threshold×10 | Seen on several hosts but not widespread |
| 81-100 | Common | host_count ≥ threshold×10 | Widespread across your environment |
Within each band, the score scales linearly. For example, with a rarity threshold of 5:
| host_count | prevalence_score | Band |
|---|---|---|
| 0 | 0 | Never seen |
| 2 | 8 | Very rare |
| 5 | 20 | Very rare (at threshold) |
| 8 | 38 | Rare |
| 30 | 69 | Uncommon |
| 100+ | ~96-100 | Common |
Configuring the rarity threshold
The rarity threshold controls where the scoring bands start. A higher threshold means more artifacts are classified as rare. You can configure it in Settings > Prevalence or via API:
curl -X PUT https://your-instance/api/settings/prevalence \
-H "Authorization: Bearer $API_KEY" \
-d '{"rarity_threshold": 5}'See Prevalence Settings for all configuration options.
Usage Notes
Automatic calculation: Prevalence is calculated automatically from your log data.
Window parameter: Adjusts the time range for prevalence calculation. Shorter windows are more sensitive.
Performance: Prevalence queries are optimized but may be slower on very large datasets.
Enrichment mode: Use enrich=true to add prevalence fields without filtering.
Multiple conditions: All conditions must be satisfied (AND logic).
Known Limitations
When using enrich=true, commands after prevalence are applied in post-processing.
Commands that work after prevalence enrich
| Command | Status | Example |
|---|---|---|
where | ✅ | | where host_count < 5 |
table | ✅ | | table prevalence_artifact, host_count, prevalence_score |
fields | ✅ | | fields + prevalence_* |
head/tail | ✅ | | head 100 |
sort | ✅ | | sort prevalence_score |
stats | ✅ | | stats avg(host_count) by src_host |
top/rare | ✅ | | top prevalence_artifact |
eval | ✅ | | eval rarity = if(host_count < 3, "rare", "common") |
rex | ✅ | | rex field=prevalence_artifact "(?<prefix>.{8})" |
dedup | ✅ | | dedup prevalence_artifact |
rename | ✅ | | rename host_count as rarity |
fillnull | ✅ | | fillnull value=0 host_count |
timechart | ✅ | | timechart avg(host_count) |
Commands that don't work after prevalence enrich
| Command | Reason |
|---|---|
inputlookup | Requires separate enrichment pipeline |
lookup | Requires separate enrichment pipeline |
streamstats | Not yet implemented in post-processing |
Note: Filter mode (prevalence hash_prevalence < 5) uses optimized SQL JOINs and doesn't have these limitations.