dedup
dedup
Remove duplicate events based on specified field values.
Description
The dedup command removes duplicate events by keeping only the first or last occurrence of events with identical values in the specified fields. This is useful for finding unique values, removing redundant data, or identifying distinct entities.
Unlike stats with dc() which only counts unique values, dedup preserves the full event data while removing duplicates.
Syntax
... | dedup <field> [, <field> ...] [keepfirst=<bool>] [keeplast=<bool>]Required Arguments
field
One or more fields to use for deduplication. Events with identical values across all specified fields are considered duplicates.
Optional Arguments
keepfirst
Syntax: keepfirst=true|false
Description: Keep the first occurrence of each unique combination
Default: true
keeplast
Syntax: keeplast=true|false
Description: Keep the last occurrence of each unique combination
Default: false
Note: If both are false, no events are returned. If both are true, keepfirst takes precedence.
Examples
Remove duplicate users
* | dedup userKeeps only the first event for each unique user.
Unique source IPs
* | dedup src_ipReturns one event per unique source IP address.
Keep last occurrence
* | dedup user keeplast=trueKeeps the most recent event for each user.
Dedup by multiple fields
* | dedup src_ip, dest_portKeeps one event per unique combination of source IP and destination port.
Find unique user-action pairs
* | dedup user, action
| table user, action, timestampShows each unique user-action combination.
Unique hosts with details
* | dedup src_host
| table src_host, src_ip, os_type, last_seenLists unique hosts with their details.
Most recent activity per user
* | sort -timestamp
| dedup userShows the most recent event for each user.
Unique file hashes
file_hash=*
| dedup file_hash
| table file_hash, file_name, first_seen, src_hostLists unique files observed in the environment.
Distinct IP-port combinations
* | dedup src_ip, dest_ip, dest_port
| stats count() as unique_connectionsCounts unique network connections.
Remove duplicate alerts
alert_name="Brute Force Detected"
| dedup src_ip, target_user
| table timestamp, src_ip, target_user, alert_severityShows one alert per IP-user combination.
Latest status per endpoint
* | sort -timestamp
| dedup endpoint
| table endpoint, status, response_time, timestampShows current status of each endpoint.
Unique domains accessed
* | dedup domain
| table domain, first_seen, src_ipLists all unique domains accessed.
Find first occurrence
action=login
| sort timestamp
| dedup user keepfirst=true
| table user, timestamp, src_ipShows when each user first logged in.
Unique error messages
severity=error
| dedup message
| table message, count, first_seenLists distinct error messages.
Dedup before aggregation
* | dedup user, src_ip
| stats count() as unique_user_ip_pairs by actionCounts unique user-IP pairs per action.
Usage Notes
Order matters: The order of events affects which one is kept. Use sort before dedup to control which occurrence is preserved.
Default behavior: By default, dedup keeps the first occurrence (keepfirst=true).
Multiple fields: When deduplicating by multiple fields, all specified fields must match for events to be considered duplicates.
Performance: dedup is efficient but requires tracking seen values. Very high cardinality fields may use significant memory.
Null values: Events with null values in dedup fields are treated as having the same value (null).
Case sensitivity: String field comparisons are case-sensitive. "User" and "user" are different.
vs. stats dc(): Use dedup when you need the full event data. Use stats dc() when you only need the count.
Preserves fields: Unlike stats, dedup preserves all fields from the kept events.