prevalence

Filter or enrich events based on artifact prevalence. Identify rare or suspicious indicators.

Description

The prevalence command leverages nano's prevalence tracking to filter events based on how common artifacts are across your environment. This is powerful for detecting rare file hashes, newly seen domains, or unusual patterns.

Prevalence data tracks how many hosts have observed each artifact and when it was first seen.

Columns vs Command: For simple host count filtering, you can also use the pre-computed prevalence_* columns (e.g., WHERE prevalence_min < 5) which have zero query cost. Use this command when you need:

first_seen filtering: domain_first_seen > now() - 24h
Enrichment mode: enrich=true to add prevalence metadata
Custom time windows: window=7d instead of the default 30 days

See Prevalence Tracking for details.

Syntax

Filter mode:

... | prevalence <field> <operator> <value> [<field> <operator> <value> ...] [window=<duration>]

Enrich mode:

... | prevalence enrich=true [window=<duration>]

Filter Fields

Used with filter mode (prevalence <field> <operator> <value>):

Field	Description
`hash_prevalence`	Number of hosts that have seen this file hash
`domain_prevalence`	Number of hosts that have seen this domain
`hash_first_seen`	Timestamp when hash was first observed
`domain_first_seen`	Timestamp when domain was first observed

Enrichment Fields

When using enrich=true, the following fields are added to each event. These can be used in downstream commands like where, sort, stats, and table.

Field	Type	Description
`host_count`	number	Number of unique hosts that have observed this artifact
`is_rare`	boolean (0/1)	Whether the artifact is below the rarity threshold
`prevalence_score`	number (0-100)	Rarity score — 0 = never seen, 100 = everywhere. See Scoring below
`prevalence_type`	string	Artifact type: `domain`, `hash`, `ip`, or comma-separated if multiple
`prevalence_artifact`	string	The actual artifact value being tracked
`prevalence_first_seen`	timestamp	When the artifact was first observed in your environment
`prevalence_last_seen`	timestamp	When the artifact was most recently observed
`first_seen`	timestamp	Alias for `prevalence_first_seen`
`last_seen`	timestamp	Alias for `prevalence_last_seen`
`total_occurrences`	number	Total number of times this artifact has been seen

Optional Arguments

window
Syntax: window=<duration>
Description: Time window for prevalence calculation
Default: 30d

enrich
Syntax: enrich=true
Description: Add prevalence fields without filtering

Examples

Rare file hashes

file_hash=*
| prevalence hash_prevalence < 5

Newly seen domains

* | prevalence domain_first_seen > now() - INTERVAL 24 HOUR

Combined conditions

* | prevalence hash_prevalence < 3 AND hash_first_seen > now() - INTERVAL 7 DAY window=30d

Enrich with prevalence data

* | prevalence enrich=true
  | table file_hash, hash_prevalence, hash_first_seen

Rare process execution

process_name=*
| prevalence hash_prevalence <= 10 window=7d
| table timestamp, process_name, file_hash, hash_prevalence, src_host

New domain connections

dest_domain=*
| prevalence domain_first_seen > now() - INTERVAL 1 DAY
| stats count() by dest_domain, domain_first_seen

Suspicious downloads

action=file_download
| prevalence hash_prevalence < 5 AND domain_prevalence < 10

Rare and new

* | prevalence hash_prevalence < 3 AND hash_first_seen > now() - INTERVAL 3 DAY
  | where bytes > 1000000

Enrich and filter by host count

* | prevalence enrich=true
  | where host_count < 5

Find rare artifacts

sourcetype=squid_proxy
| prevalence enrich=true
  | where is_rare=1
  | table timestamp, user, prevalence_artifact, prevalence_type, host_count, prevalence_score

Recently seen artifacts sorted by rarity

* | prevalence enrich=true
  | where prevalence_first_seen > "2026-01-01"
  | sort prevalence_score

Aggregate by artifact

sourcetype=squid_proxy
| prevalence enrich=true
  | where host_count < 3
  | stats count by prevalence_artifact, prevalence_type, host_count, prevalence_first_seen

Prevalence Scoring

The prevalence_score field is a 0-100 score that reflects how common an artifact is across your environment. It's calculated relative to your rarity threshold setting (default: 3 hosts).

Score	Band	Condition	Meaning
0	Never seen	`host_count = 0`	Artifact has never been observed
1-20	Very rare	`host_count < threshold`	Below your rarity threshold
21-50	Rare	`threshold ≤ host_count < threshold×2`	Around the rarity boundary
51-80	Uncommon	`threshold×2 ≤ host_count < threshold×10`	Seen on several hosts but not widespread
81-100	Common	`host_count ≥ threshold×10`	Widespread across your environment

Within each band, the score scales linearly. For example, with a rarity threshold of 5:

host_count	prevalence_score	Band
0	0	Never seen
2	8	Very rare
5	20	Very rare (at threshold)
8	38	Rare
30	69	Uncommon
100+	~96-100	Common

Configuring the rarity threshold

The rarity threshold controls where the scoring bands start. A higher threshold means more artifacts are classified as rare. You can configure it in Settings > Prevalence or via API:

curl -X PUT https://your-instance/api/settings/prevalence \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"rarity_threshold": 5}'

See Prevalence Settings for all configuration options.

Usage Notes

Automatic calculation: Prevalence is calculated automatically from your log data.

Window parameter: Adjusts the time range for prevalence calculation. Shorter windows are more sensitive.

Performance: Prevalence queries are optimized but may be slower on very large datasets.

Enrichment mode: Use enrich=true to add prevalence fields without filtering.

Multiple conditions: All conditions must be satisfied (AND logic).

Known Limitations

When using enrich=true, commands after prevalence are applied in post-processing.

Commands that work after prevalence enrich

Command	Status	Example
`where`	✅	`\| where host_count < 5`
`table`	✅	`\| table prevalence_artifact, host_count, prevalence_score`
`fields`	✅	`\| fields + prevalence_*`
`head/tail`	✅	`\| head 100`
`sort`	✅	`\| sort prevalence_score`
`stats`	✅	`\| stats avg(host_count) by src_host`
`top/rare`	✅	`\| top prevalence_artifact`
`eval`	✅	`\| eval rarity = if(host_count < 3, "rare", "common")`
`rex`	✅	`\| rex field=prevalence_artifact "(?<prefix>.{8})"`
`dedup`	✅	`\| dedup prevalence_artifact`
`rename`	✅	`\| rename host_count as rarity`
`fillnull`	✅	`\| fillnull value=0 host_count`
`timechart`	✅	`\| timechart avg(host_count)`

Commands that don't work after prevalence enrich

Command	Reason
`inputlookup`	Requires separate enrichment pipeline
`lookup`	Requires separate enrichment pipeline
`streamstats`	Not yet implemented in post-processing

Note: Filter mode (prevalence hash_prevalence < 5) uses optimized SQL JOINs and doesn't have these limitations.

where - Additional filtering after prevalence
stats - Aggregate prevalence results
lookup - Enrich with other data sources

prevalence

On this page