nano SIEM
User Guide

Field Types & Search Performance

Field Types & Search Performance

nano organizes log data into different field categories, each optimized for specific use cases and search patterns. Understanding these field types helps you write faster, more efficient queries.

Overview

Raw Log → Parser → Field Extraction → Storage

        ┌────────────────┼────────────────┐
        ↓                ↓                ↓
   Metadata         Normalized          UDM
   (Always)         (Common)         (Extended)
        ↓                ↓                ↓
        └────────────────┴────────────────┘

              ┌──────────┴──────────┐
              ↓                     ↓
         Enrichment            Prevalence
         (GeoIP, ASN)         (Rarity Tracking)
              ↓                     ↓
         enriched_*            hash_prevalence
                              domain_prevalence

Field Categories

Metadata Fields

What they are: Core system fields that exist on every log event, regardless of source.

Examples:

  • timestamp - Event timestamp (indexed, partitioned)
  • id - Unique event identifier (UUID)
  • source_type - Log source (e.g., "sysmon", "apache", "cloudtrail")
  • ingest_time - When the event was ingested
  • raw_content - Original unparsed log message
  • metadata - System metadata (JSON)

Search Performance: ⚡⚡⚡ Fastest

Metadata fields are heavily optimized:

  • timestamp is the primary partition key - filtering by time is extremely fast
  • source_type uses LowCardinality encoding for minimal storage and fast filtering
  • id has bloom filter indexes for exact lookups
  • Queries that filter by timestamp first benefit from partition pruning

Best Practices:

# FAST - Uses partition pruning
timestamp > now() - INTERVAL 1 HOUR source_type=sysmon

# SLOWER - No time filter means scanning all partitions
source_type=sysmon

Normalized Fields

What they are: Common fields extracted from logs and mapped to a standard schema. These are the "core" fields that most log sources populate.

Examples:

  • Network: src_ip, dest_ip, src_port, dest_port, protocol
  • Users: user, src_user, dest_user, user_domain
  • Processes: process_name, process_id, process_path, process_hash, command_line
  • Files: file_path, file_name, file_hash, file_size
  • Web: url, url_domain, http_method, http_user_agent
  • DNS: query, query_type, answer, record_type
  • Email: sender, recipient, subject, message_id
  • Security: action, status, severity, category, signature

Search Performance: ⚡⚡ Fast

Normalized fields have targeted optimizations:

  • High-cardinality fields (IPs, hashes, GUIDs) use bloom filter indexes
  • Low-cardinality fields (actions, statuses) use set indexes or LowCardinality encoding
  • Text fields (command_line, file_path) use token bloom filters for substring matching
  • Ordered by (timestamp, src_ip, dest_ip) for efficient network queries

Index Types:

  • Bloom Filter (src_ip, dest_ip, user, process_hash, file_hash) - Fast exact matching
  • Token Bloom Filter (command_line, file_path, http_user_agent) - Fast substring/token matching
  • Set Index (action, status, http_method) - Fast IN() queries for low-cardinality fields
  • LowCardinality (source_type, severity, category) - Compressed storage, fast filtering

Best Practices:

# FAST - Indexed fields with exact match
src_ip=192.168.1.100
process_hash=5d41402abc4b2a76b9719d911017c592

# FAST - Token bloom filter for substring
command_line CONTAINS "powershell"

# FAST - Set index for IN queries
action IN ("login", "logout", "failed_login")

# SLOWER - Wildcard at start defeats indexes
user=*admin

UDM Fields (Unified Data Model)

What they are: Extended fields from industry-standard data models and security frameworks. These provide comprehensive coverage for specialized use cases.

Total Fields: 525+ fields covering:

  • Network Traffic (85 fields) - VLANs, NAT, load balancers, network performance
  • Authentication (45 fields) - SSO, MFA, privilege escalation, session management
  • Web Traffic (38 fields) - HTTP headers, cookies, referrers, response codes
  • Endpoint (72 fields) - Services, drivers, registry, scheduled tasks
  • Database (41 fields) - Queries, transactions, performance metrics
  • Email (28 fields) - Attachments, routing, delivery status
  • Certificate/SSL (35 fields) - Certificate chains, validation, expiration
  • Malware (22 fields) - Signatures, families, actions
  • Vulnerability (18 fields) - CVEs, CVSS scores, patches
  • Cloud/Platform (55 fields) - AWS, Azure, GCP audit logs, cloud resource tracking
  • Performance (38 fields) - CPU, memory, disk, JVM metrics
  • Custom (58 fields) - nano-specific extensions

Search Performance:Good

UDM fields are added dynamically and have basic optimizations:

  • String fields use default String type (some use LowCardinality for common values)
  • Integer/Long fields use appropriate numeric types (UInt32, UInt64, Int64)
  • Float fields use Float64 for precision
  • Boolean fields use UInt8 (0/1)
  • Commonly queried fields have bloom filter indexes

Performance Characteristics:

  • Fields are stored but not all are indexed
  • Exact matches are reasonably fast
  • Substring searches are slower than normalized fields
  • Best used when you know the specific field you need

Best Practices:

# GOOD - Specific field with exact match
ssl_issuer_common_name="Let's Encrypt"
cvss > 7.0

# ACCEPTABLE - Indexed UDM fields
dest_user="admin"
signature="Malicious Activity"

# SLOWER - Non-indexed UDM fields with wildcards
ssl_subject LIKE "%example%"

# BETTER - Combine with time filter
timestamp > now() - INTERVAL 1 DAY ssl_subject LIKE "%example%"

Enriched Fields

What they are: Fields automatically populated by enrichment processes, typically from external data sources or dictionaries.

Examples:

  • enriched_src_country - Source IP country (from GeoIP)
  • enriched_src_country_code - Source IP country code
  • enriched_src_continent - Source IP continent
  • enriched_src_asn - Source IP ASN number
  • enriched_src_as_name - Source IP AS organization name
  • enriched_dest_country - Destination IP country
  • enriched_dest_country_code - Destination IP country code
  • enriched_dest_continent - Destination IP continent
  • enriched_dest_asn - Destination IP ASN number
  • enriched_dest_as_name - Destination IP AS organization name

Search Performance: ⚡⚡ Fast

Enriched fields are materialized (pre-computed) at ingestion time:

  • Values are looked up from dictionaries and stored directly in the table
  • No runtime dictionary lookups during queries
  • Use LowCardinality encoding for efficient storage
  • Filter performance is similar to normalized fields

How Enrichment Works:

  1. Log arrives with src_ip=8.8.8.8
  2. Dictionary lookup: ip_enrichment_dict → country="United States", asn="AS15169"
  3. Fields populated: enriched_src_country="United States", enriched_src_asn="AS15169"
  4. Stored in table for instant querying

Best Practices:

# FAST - Enriched fields are pre-computed
enriched_src_country="China"
enriched_dest_asn="AS15169"

# FAST - Combine with other filters
timestamp > now() - INTERVAL 1 HOUR 
  enriched_src_country NOT IN ("United States", "Canada")

# EFFICIENT - Group by enriched fields
* | stats count() by enriched_src_country, enriched_dest_country

Prevalence Fields

What they are: Automatically computed fields that track how rare or common artifacts are across your environment.

Examples:

  • hash_prevalence - Number of hosts that have seen this file hash
  • domain_prevalence - Number of hosts that have queried this domain
  • ip_prevalence - Number of hosts that have connected to this IP
  • hash_first_seen - When this hash was first observed
  • domain_first_seen - When this domain was first observed
  • prevalence_score - Computed rarity score (0-100, 0 = never seen, 100 = everywhere)
  • is_rare - Boolean flag for rare artifacts

Search Performance:Good (with caveats)

Prevalence data is computed by materialized views:

  • Aggregated hourly in background tables
  • Lookups require JOIN operations with aggregation tables
  • Best used with the prevalence command which optimizes the query
  • Direct field access is slower than using the prevalence command

How Prevalence Works:

  1. Materialized views track artifacts (hashes, domains, IPs) per hour
  2. Count unique hosts that have seen each artifact
  3. Track first_seen and last_seen timestamps
  4. Prevalence command queries these aggregations efficiently

Best Practices:

# BEST - Use prevalence command (optimized)
EventID=1 | prevalence hash_prevalence < 5 window=24h

# GOOD - Filter rare processes
EventID=1 | prevalence hash_first_seen > now() - INTERVAL 1 DAY

# ACCEPTABLE - Enrich with prevalence data
* | prevalence enrich=true window=24h

# AVOID - Direct field access (not materialized in main table)
# prevalence_score < 0.1  # This field doesn't exist in logs table

Extension Fields

What they are: Additional UDM fields for parser-specific or custom data that doesn't fit the explicitly indexed columns. These are stored in a dynamic JSON column internally, but you search them by name just like any other field — no special prefix required.

Examples:

  • sysmon_RuleName - Sysmon-specific rule name
  • aws_eventName - AWS CloudTrail event name
  • aws_userIdentity - AWS user identity object
  • custom_threat_score - Custom threat scoring

Search Performance: Slower than indexed fields

Extension fields are stored in a JSON column under the hood:

  • The query engine automatically maps field names to the JSON storage
  • No special ext. prefix required — just use the field name directly
  • Slower than explicitly indexed columns (no bloom filters or partition pruning)
  • Best to combine with indexed filters (time range, source_type) to narrow the scan first

Best Practices:

# GOOD - Search by field name directly (no ext. prefix needed)
aws_eventName="DeleteBucket"

# BETTER - Combine with indexed fields to narrow the scan
source_type=cloudtrail aws_eventName="DeleteBucket"

# GOOD - Use in table output
* | table timestamp, user, aws_eventName, aws_sourceIPAddress

Performance Comparison

Query Speed by Field Type

Field TypeExact MatchSubstringWildcardAggregationNotes
Metadata⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡Partition pruning on timestamp
Normalized⚡⚡⚡⚡⚡⚡⚡Bloom filter + token indexes
UDM⚡⚡⚡⚡Basic indexes, many fields
Enriched⚡⚡⚡⚡⚡⚡⚡Pre-computed, LowCardinality
PrevalenceN/AN/ARequires JOIN, use command
Extension🐌🐌🐌Dynamic JSON, no indexes

Storage Efficiency

Field TypeStorageCompressionCardinality
MetadataMinimalExcellentLow-Medium
NormalizedLowVery GoodMedium-High
UDMMediumGoodVaries
EnrichedLowExcellentLow
PrevalenceSeparateExcellentMedium
ExtensionMediumGoodVaries

Query Optimization Tips

1. Always Filter by Time First

# FAST - Partition pruning
timestamp > now() - INTERVAL 1 HOUR user=admin

# SLOW - Scans all partitions
user=admin

2. Use Indexed Fields

# FAST - Bloom filter index
src_ip=192.168.1.100

# FAST - Token bloom filter
command_line CONTAINS "powershell"

# SLOWER - Extension field (no index, but works)
custom_field="value"

3. Prefer Exact Matches

# FAST - Exact match
user="admin"

# SLOWER - Wildcard
user LIKE "%admin%"

# SLOWEST - Leading wildcard
user LIKE "%admin"

4. Use Low-Cardinality Fields for Grouping

# FAST - Low cardinality
* | stats count() by source_type, action, severity

# SLOWER - High cardinality
* | stats count() by src_ip, dest_ip, command_line

5. Leverage Enriched Fields

# FAST - Pre-computed enrichment
enriched_src_country="China"

# SLOWER - Runtime enrichment
* | lookup geoip src_ip OUTPUT country | where country="China"

6. Use Prevalence Command

# FAST - Optimized prevalence query
EventID=1 | prevalence hash_prevalence < 5 window=24h

# SLOWER - Manual JOIN
EventID=1 | join file_hash [
  SELECT file_hash, uniqMerge(host_count) as hosts 
  FROM hash_prevalence_agg 
  GROUP BY file_hash
] | where hosts < 5

7. Limit Result Sets

# GOOD - Limit early
timestamp > now() - INTERVAL 1 HOUR 
  source_type=sysmon 
  | head 1000

# BETTER - Aggregate instead of raw results
timestamp > now() - INTERVAL 1 HOUR 
  source_type=sysmon 
  | stats count() by process_name

Field Discovery

List Available Fields

Use the Fields Panel in the Search UI to:

  • Browse all available fields
  • See field types and cardinality
  • View sample values
  • Add fields to your query

Check Field Population

# See which fields are populated
* | stats count() by source_type 
  | table source_type, count

# Check field coverage
source_type=sysmon 
  | stats 
      count() as total,
      count(process_hash) as has_hash,
      count(command_line) as has_cmdline

Find UDM Fields

# Search UDM field documentation
# Visit: Settings → Documentation → UDM Fields

# Or query the schema
SHOW COLUMNS FROM logs

Browse All UDM Fields: See the UDM Fields Table for a complete, searchable list of all 525+ UDM fields with types and categories.

Real-World Examples

Example 1: Threat Hunting - Rare Process Execution

Goal: Find rare processes executed in the last 24 hours from foreign countries

# Combines multiple field types for optimal performance
timestamp > now() - INTERVAL 24 HOURS              # Metadata (partition pruning)
  source_type=sysmon                               # Metadata (indexed)
  EventID=1                                        # Normalized (indexed)
  enriched_src_country NOT IN ("United States")   # Enriched (fast filter)
  | prevalence hash_prevalence < 5 window=24h     # Prevalence (optimized)
  | table timestamp, src_host, process_name, 
          process_hash, command_line,
          enriched_src_country, hash_prevalence

Performance: ⚡⚡⚡ Fast

  • Time filter enables partition pruning
  • All filters use indexed fields
  • Prevalence command is optimized
  • Result set is limited by rarity

Example 2: Security Monitoring - Suspicious PowerShell

Goal: Detect encoded PowerShell commands with high entropy

# Leverages normalized fields and eval functions
timestamp > now() - INTERVAL 1 HOUR                # Metadata
  process_name=powershell.exe                      # Normalized (indexed)
  command_line CONTAINS "-enc"                     # Normalized (token index)
  | eval cmd_entropy = entropy(command_line)       # Calculated field
  | eval decoded = base64_decode(
      extract(command_line, "-enc[odedCommand]* ([A-Za-z0-9+/=]+)")
    )
  | where cmd_entropy > 4.5                        # High entropy = suspicious
  | table timestamp, user, src_host,
          command_line, cmd_entropy, decoded

Performance: ⚡⚡ Fast

  • Time filter for partition pruning
  • Indexed process_name for fast filtering
  • Token bloom filter for substring match
  • Eval functions computed only on matching results

Example 3: Network Analysis - Geographic Traffic Patterns

Goal: Analyze outbound traffic by destination country

# Uses enriched fields for pre-computed geographic data
timestamp > now() - INTERVAL 6 HOURS               # Metadata
  dest_port IN (80, 443)                           # Normalized
  enriched_dest_country != ""                      # Enriched (non-empty)
  | stats 
      sum(bytes_out) as total_bytes,               # Normalized
      dc(src_ip) as unique_sources,                # Normalized
      dc(dest_ip) as unique_destinations,          # Normalized
      count() as connections
    by enriched_dest_country,                      # Enriched (low cardinality)
       enriched_dest_continent                     # Enriched (low cardinality)
  | eval total_mb = round(total_bytes / 1048576, 2)
  | sort -total_mb
  | head 20

Performance: ⚡⚡⚡ Fast

  • Enriched fields are pre-computed (no runtime lookups)
  • Aggregation on low-cardinality fields
  • Efficient grouping and sorting

Example 4: Compliance - Failed Authentication Tracking

Goal: Track failed login attempts with user and location details

# Combines normalized and UDM fields
timestamp > now() - INTERVAL 24 HOURS              # Metadata
  action=login                                     # Normalized (set index)
  status=failure                                   # Normalized (set index)
  | eval hour = hour(timestamp)                    # Time function
  | stats 
      count() as attempts,
      dc(src_ip) as unique_ips,
      values(enriched_src_country) as countries,   # Enriched
      values(authentication_method) as auth_methods # UDM
    by user, hour
  | where attempts > 5                             # Threshold
  | sort -attempts

Performance: ⚡⚡ Fast

  • Set indexes on action/status
  • Low-cardinality grouping (user, hour)
  • Enriched fields are pre-computed

Example 5: Incident Response - Lateral Movement Detection

Goal: Detect potential lateral movement using multiple authentication sources

# Uses UDM fields for detailed authentication tracking
timestamp > now() - INTERVAL 1 HOUR                # Metadata
  source_type IN ("windows_security", "sysmon")    # Metadata
  action=login                                     # Normalized
  auth_type=network                                # Normalized
  | bin span=5m                                    # Time bucketing
  | stats 
      dc(dest_host) as unique_targets,             # Normalized
      dc(src_ip) as unique_sources,                # Normalized
      values(authentication_method) as methods,    # UDM
      values(user_type) as user_types,             # UDM
      count() as login_count
    by time_bucket, user, src_host
  | where unique_targets > 5                       # Lateral movement threshold
  | sort -unique_targets

Performance: ⚡⚡ Fast

  • Time-based bucketing for aggregation
  • Indexed normalized fields
  • UDM fields for additional context

Example 6: Performance Troubleshooting - Slow Database Queries

Goal: Find slow database queries with execution details

# Leverages UDM database fields
timestamp > now() - INTERVAL 2 HOURS               # Metadata
  source_type=database                             # Metadata
  query_time > 5000                                # UDM (milliseconds)
  | eval query_seconds = query_time / 1000
  | stats 
      avg(query_seconds) as avg_time,
      max(query_seconds) as max_time,
      count() as query_count,
      values(instance_name) as instances,          # UDM
      values(instance_type) as db_types            # UDM
    by user, query
  | where query_count > 3                          # Repeated slow queries
  | sort -avg_time
  | head 10

Performance: ⚡ Good

  • UDM fields provide detailed database context
  • Numeric comparison on query_time
  • Aggregation reduces result set

Performance Anti-Patterns

❌ Don't: Skip Time Filters

# BAD - Scans all partitions
user=admin
# GOOD - Uses partition pruning
timestamp > now() - INTERVAL 1 DAY user=admin

❌ Don't: Use Leading Wildcards

# BAD - Can't use indexes
user LIKE "%admin"
# GOOD - Uses bloom filter index
user LIKE "admin%"

❌ Don't: Query Extension Fields Without Time/Source Filters

# BAD - Scans all data for an unindexed field
aws_eventName="DeleteBucket"
# GOOD - Narrow with indexed fields first
source_type=cloudtrail aws_eventName="DeleteBucket"

❌ Don't: Aggregate High-Cardinality Fields

# BAD - Too many groups
* | stats count() by command_line
# GOOD - Aggregate on low-cardinality
* | stats count() by command_line_name, action

❌ Don't: Manual Prevalence JOINs

# BAD - Complex manual JOIN
EventID=1 | join file_hash [...]
# GOOD - Use prevalence command
EventID=1 | prevalence hash_prevalence < 5 window=24h

Next Steps

On this page

On this page