Field Types & Search Performance

nano organizes log data into different field categories, each optimized for specific use cases and search patterns. Understanding these field types helps you write faster, more efficient queries.

Overview

Raw Log → Parser → Field Extraction → Storage
                         ↓
        ┌────────────────┼────────────────┐
        ↓                ↓                ↓
   Metadata         Normalized          UDM
   (Always)         (Common)         (Extended)
        ↓                ↓                ↓
        └────────────────┴────────────────┘
                         ↓
              ┌──────────┴──────────┐
              ↓                     ↓
         Enrichment            Prevalence
         (GeoIP, ASN)         (Rarity Tracking)
              ↓                     ↓
         enriched_*            hash_prevalence
                              domain_prevalence

Field Categories

Metadata Fields

What they are: Core system fields that exist on every log event, regardless of source.

Examples:

timestamp - Event timestamp (indexed, partitioned)
id - Unique event identifier (UUID)
source_type - Log source (e.g., "sysmon", "apache", "cloudtrail")
ingest_time - When the event was ingested
raw_content - Original unparsed log message
metadata - System metadata (JSON)

Search Performance: ⚡⚡⚡ Fastest

Metadata fields are heavily optimized:

timestamp is the primary partition key - filtering by time is extremely fast
source_type uses LowCardinality encoding for minimal storage and fast filtering
id has bloom filter indexes for exact lookups
Queries that filter by timestamp first benefit from partition pruning

Best Practices:

# FAST - Uses partition pruning
timestamp > now() - INTERVAL 1 HOUR source_type=sysmon

# SLOWER - No time filter means scanning all partitions
source_type=sysmon

Normalized Fields

What they are: Common fields extracted from logs and mapped to a standard schema. These are the "core" fields that most log sources populate.

Examples:

Network: src_ip, dest_ip, src_port, dest_port, protocol
Users: user, src_user, dest_user, user_domain
Processes: process_name, process_id, process_path, process_hash, command_line
Files: file_path, file_name, file_hash, file_size
Web: url, url_domain, http_method, http_user_agent
DNS: query, query_type, answer, record_type
Email: sender, recipient, subject, message_id
Security: action, status, severity, category, signature

Search Performance: ⚡⚡ Fast

Normalized fields have targeted optimizations:

High-cardinality fields (IPs, hashes, GUIDs) use bloom filter indexes
Low-cardinality fields (actions, statuses) use set indexes or LowCardinality encoding
Text fields (command_line, file_path) use token bloom filters for substring matching
Ordered by (timestamp, src_ip, dest_ip) for efficient network queries

Index Types:

Bloom Filter (src_ip, dest_ip, user, process_hash, file_hash) - Fast exact matching
Token Bloom Filter (command_line, file_path, http_user_agent) - Fast substring/token matching
Set Index (action, status, http_method) - Fast IN() queries for low-cardinality fields
LowCardinality (source_type, severity, category) - Compressed storage, fast filtering

Best Practices:

# FAST - Indexed fields with exact match
src_ip=192.168.1.100
process_hash=5d41402abc4b2a76b9719d911017c592

# FAST - Token bloom filter for substring
command_line CONTAINS "powershell"

# FAST - Set index for IN queries
action IN ("login", "logout", "failed_login")

# SLOWER - Wildcard at start defeats indexes
user=*admin

UDM Fields (Unified Data Model)

What they are: Extended fields from industry-standard data models and security frameworks. These provide comprehensive coverage for specialized use cases.

Total Fields: 525+ fields covering:

Network Traffic (85 fields) - VLANs, NAT, load balancers, network performance
Authentication (45 fields) - SSO, MFA, privilege escalation, session management
Web Traffic (38 fields) - HTTP headers, cookies, referrers, response codes
Endpoint (72 fields) - Services, drivers, registry, scheduled tasks
Database (41 fields) - Queries, transactions, performance metrics
Email (28 fields) - Attachments, routing, delivery status
Certificate/SSL (35 fields) - Certificate chains, validation, expiration
Malware (22 fields) - Signatures, families, actions
Vulnerability (18 fields) - CVEs, CVSS scores, patches
Cloud/Platform (55 fields) - AWS, Azure, GCP audit logs, cloud resource tracking
Performance (38 fields) - CPU, memory, disk, JVM metrics
Custom (58 fields) - nano-specific extensions

Search Performance: ⚡ Good

UDM fields are added dynamically and have basic optimizations:

String fields use default String type (some use LowCardinality for common values)
Integer/Long fields use appropriate numeric types (UInt32, UInt64, Int64)
Float fields use Float64 for precision
Boolean fields use UInt8 (0/1)
Commonly queried fields have bloom filter indexes

Performance Characteristics:

Fields are stored but not all are indexed
Exact matches are reasonably fast
Substring searches are slower than normalized fields
Best used when you know the specific field you need

Best Practices:

# GOOD - Specific field with exact match
ssl_issuer_common_name="Let's Encrypt"
cvss > 7.0

# ACCEPTABLE - Indexed UDM fields
dest_user="admin"
signature="Malicious Activity"

# SLOWER - Non-indexed UDM fields with wildcards
ssl_subject LIKE "%example%"

# BETTER - Combine with time filter
timestamp > now() - INTERVAL 1 DAY ssl_subject LIKE "%example%"

Enriched Fields

What they are: Fields automatically populated by enrichment processes, typically from external data sources or dictionaries.

Examples:

enriched_src_country - Source IP country (from GeoIP)
enriched_src_country_code - Source IP country code
enriched_src_continent - Source IP continent
enriched_src_asn - Source IP ASN number
enriched_src_as_name - Source IP AS organization name
enriched_dest_country - Destination IP country
enriched_dest_country_code - Destination IP country code
enriched_dest_continent - Destination IP continent
enriched_dest_asn - Destination IP ASN number
enriched_dest_as_name - Destination IP AS organization name

Search Performance: ⚡⚡ Fast

Enriched fields are materialized (pre-computed) at ingestion time:

Values are looked up from dictionaries and stored directly in the table
No runtime dictionary lookups during queries
Use LowCardinality encoding for efficient storage
Filter performance is similar to normalized fields

How Enrichment Works:

Log arrives with src_ip=8.8.8.8
Dictionary lookup: ip_enrichment_dict → country="United States", asn="AS15169"
Fields populated: enriched_src_country="United States", enriched_src_asn="AS15169"
Stored in table for instant querying

Best Practices:

# FAST - Enriched fields are pre-computed
enriched_src_country="China"
enriched_dest_asn="AS15169"

# FAST - Combine with other filters
timestamp > now() - INTERVAL 1 HOUR 
  enriched_src_country NOT IN ("United States", "Canada")

# EFFICIENT - Group by enriched fields
* | stats count() by enriched_src_country, enriched_dest_country

Prevalence Fields

What they are: Automatically computed fields that track how rare or common artifacts are across your environment.

Examples:

hash_prevalence - Number of hosts that have seen this file hash
domain_prevalence - Number of hosts that have queried this domain
ip_prevalence - Number of hosts that have connected to this IP
hash_first_seen - When this hash was first observed
domain_first_seen - When this domain was first observed
prevalence_score - Computed rarity score (0-100, 0 = never seen, 100 = everywhere)
is_rare - Boolean flag for rare artifacts

Search Performance: ⚡ Good (with caveats)

Prevalence data is computed by materialized views:

Aggregated hourly in background tables
Lookups require JOIN operations with aggregation tables
Best used with the prevalence command which optimizes the query
Direct field access is slower than using the prevalence command

How Prevalence Works:

Materialized views track artifacts (hashes, domains, IPs) per hour
Count unique hosts that have seen each artifact
Track first_seen and last_seen timestamps
Prevalence command queries these aggregations efficiently

Best Practices:

# BEST - Use prevalence command (optimized)
EventID=1 | prevalence hash_prevalence < 5 window=24h

# GOOD - Filter rare processes
EventID=1 | prevalence hash_first_seen > now() - INTERVAL 1 DAY

# ACCEPTABLE - Enrich with prevalence data
* | prevalence enrich=true window=24h

# AVOID - Direct field access (not materialized in main table)
# prevalence_score < 0.1  # This field doesn't exist in logs table

What they are: Additional UDM fields for parser-specific or custom data that doesn't fit the explicitly indexed columns. These are stored in a dynamic JSON column internally, but you search them by name just like any other field — no special prefix required.

Examples:

sysmon_RuleName - Sysmon-specific rule name
aws_eventName - AWS CloudTrail event name
aws_userIdentity - AWS user identity object
custom_threat_score - Custom threat scoring

Search Performance: Slower than indexed fields

Extension fields are stored in a JSON column under the hood:

The query engine automatically maps field names to the JSON storage
No special ext. prefix required — just use the field name directly
Slower than explicitly indexed columns (no bloom filters or partition pruning)
Best to combine with indexed filters (time range, source_type) to narrow the scan first

Best Practices:

# GOOD - Search by field name directly (no ext. prefix needed)
aws_eventName="DeleteBucket"

# BETTER - Combine with indexed fields to narrow the scan
source_type=cloudtrail aws_eventName="DeleteBucket"

# GOOD - Use in table output
* | table timestamp, user, aws_eventName, aws_sourceIPAddress

Performance Comparison

Query Speed by Field Type

Field Type	Exact Match	Substring	Wildcard	Aggregation	Notes
Metadata	⚡⚡⚡	⚡⚡	⚡⚡	⚡⚡⚡	Partition pruning on timestamp
Normalized	⚡⚡⚡	⚡⚡	⚡	⚡⚡	Bloom filter + token indexes
UDM	⚡⚡	⚡	⚡	⚡⚡	Basic indexes, many fields
Enriched	⚡⚡⚡	⚡⚡	⚡	⚡⚡	Pre-computed, LowCardinality
Prevalence	⚡	N/A	N/A	⚡	Requires JOIN, use command
Extension	⚡	🐌	🐌	🐌	Dynamic JSON, no indexes

Storage Efficiency

Field Type	Storage	Compression	Cardinality
Metadata	Minimal	Excellent	Low-Medium
Normalized	Low	Very Good	Medium-High
UDM	Medium	Good	Varies
Enriched	Low	Excellent	Low
Prevalence	Separate	Excellent	Medium
Extension	Medium	Good	Varies

Query Optimization Tips

1. Always Filter by Time First

# FAST - Partition pruning
timestamp > now() - INTERVAL 1 HOUR user=admin

# SLOW - Scans all partitions
user=admin

2. Use Indexed Fields

# FAST - Bloom filter index
src_ip=192.168.1.100

# FAST - Token bloom filter
command_line CONTAINS "powershell"

# SLOWER - Extension field (no index, but works)
custom_field="value"

3. Prefer Exact Matches

# FAST - Exact match
user="admin"

# SLOWER - Wildcard
user LIKE "%admin%"

# SLOWEST - Leading wildcard
user LIKE "%admin"

4. Use Low-Cardinality Fields for Grouping

# FAST - Low cardinality
* | stats count() by source_type, action, severity

# SLOWER - High cardinality
* | stats count() by src_ip, dest_ip, command_line

5. Leverage Enriched Fields

# FAST - Pre-computed enrichment
enriched_src_country="China"

# SLOWER - Runtime enrichment
* | lookup geoip src_ip OUTPUT country | where country="China"

6. Use Prevalence Command

# FAST - Optimized prevalence query
EventID=1 | prevalence hash_prevalence < 5 window=24h

# SLOWER - Manual JOIN
EventID=1 | join file_hash [
  SELECT file_hash, uniqMerge(host_count) as hosts 
  FROM hash_prevalence_agg 
  GROUP BY file_hash
] | where hosts < 5

7. Limit Result Sets

# GOOD - Limit early
timestamp > now() - INTERVAL 1 HOUR 
  source_type=sysmon 
  | head 1000

# BETTER - Aggregate instead of raw results
timestamp > now() - INTERVAL 1 HOUR 
  source_type=sysmon 
  | stats count() by process_name

Field Discovery

List Available Fields

Use the Fields Panel in the Search UI to:

Browse all available fields
See field types and cardinality
View sample values
Add fields to your query

Check Field Population

# See which fields are populated
* | stats count() by source_type 
  | table source_type, count

# Check field coverage
source_type=sysmon 
  | stats 
      count() as total,
      count(process_hash) as has_hash,
      count(command_line) as has_cmdline

Find UDM Fields

# Search UDM field documentation
# Visit: Settings → Documentation → UDM Fields

# Or query the schema
SHOW COLUMNS FROM logs

Browse All UDM Fields: See the UDM Fields Table for a complete, searchable list of all 525+ UDM fields with types and categories.

Real-World Examples

Example 1: Threat Hunting - Rare Process Execution

Goal: Find rare processes executed in the last 24 hours from foreign countries

# Combines multiple field types for optimal performance
timestamp > now() - INTERVAL 24 HOURS              # Metadata (partition pruning)
  source_type=sysmon                               # Metadata (indexed)
  EventID=1                                        # Normalized (indexed)
  enriched_src_country NOT IN ("United States")   # Enriched (fast filter)
  | prevalence hash_prevalence < 5 window=24h     # Prevalence (optimized)
  | table timestamp, src_host, process_name, 
          process_hash, command_line,
          enriched_src_country, hash_prevalence

Performance: ⚡⚡⚡ Fast

Time filter enables partition pruning
All filters use indexed fields
Prevalence command is optimized
Result set is limited by rarity

Example 2: Security Monitoring - Suspicious PowerShell

Goal: Detect encoded PowerShell commands with high entropy

# Leverages normalized fields and eval functions
timestamp > now() - INTERVAL 1 HOUR                # Metadata
  process_name=powershell.exe                      # Normalized (indexed)
  command_line CONTAINS "-enc"                     # Normalized (token index)
  | eval cmd_entropy = entropy(command_line)       # Calculated field
  | eval decoded = base64_decode(
      extract(command_line, "-enc[odedCommand]* ([A-Za-z0-9+/=]+)")
    )
  | where cmd_entropy > 4.5                        # High entropy = suspicious
  | table timestamp, user, src_host,
          command_line, cmd_entropy, decoded

Performance: ⚡⚡ Fast

Time filter for partition pruning
Indexed process_name for fast filtering
Token bloom filter for substring match
Eval functions computed only on matching results

Example 3: Network Analysis - Geographic Traffic Patterns

Goal: Analyze outbound traffic by destination country

# Uses enriched fields for pre-computed geographic data
timestamp > now() - INTERVAL 6 HOURS               # Metadata
  dest_port IN (80, 443)                           # Normalized
  enriched_dest_country != ""                      # Enriched (non-empty)
  | stats 
      sum(bytes_out) as total_bytes,               # Normalized
      dc(src_ip) as unique_sources,                # Normalized
      dc(dest_ip) as unique_destinations,          # Normalized
      count() as connections
    by enriched_dest_country,                      # Enriched (low cardinality)
       enriched_dest_continent                     # Enriched (low cardinality)
  | eval total_mb = round(total_bytes / 1048576, 2)
  | sort -total_mb
  | head 20

Performance: ⚡⚡⚡ Fast

Enriched fields are pre-computed (no runtime lookups)
Aggregation on low-cardinality fields
Efficient grouping and sorting

Example 4: Compliance - Failed Authentication Tracking

Goal: Track failed login attempts with user and location details

# Combines normalized and UDM fields
timestamp > now() - INTERVAL 24 HOURS              # Metadata
  action=login                                     # Normalized (set index)
  status=failure                                   # Normalized (set index)
  | eval hour = hour(timestamp)                    # Time function
  | stats 
      count() as attempts,
      dc(src_ip) as unique_ips,
      values(enriched_src_country) as countries,   # Enriched
      values(authentication_method) as auth_methods # UDM
    by user, hour
  | where attempts > 5                             # Threshold
  | sort -attempts

Performance: ⚡⚡ Fast

Set indexes on action/status
Low-cardinality grouping (user, hour)
Enriched fields are pre-computed

Example 5: Incident Response - Lateral Movement Detection

Goal: Detect potential lateral movement using multiple authentication sources

# Uses UDM fields for detailed authentication tracking
timestamp > now() - INTERVAL 1 HOUR                # Metadata
  source_type IN ("windows_security", "sysmon")    # Metadata
  action=login                                     # Normalized
  auth_type=network                                # Normalized
  | bin span=5m                                    # Time bucketing
  | stats 
      dc(dest_host) as unique_targets,             # Normalized
      dc(src_ip) as unique_sources,                # Normalized
      values(authentication_method) as methods,    # UDM
      values(user_type) as user_types,             # UDM
      count() as login_count
    by time_bucket, user, src_host
  | where unique_targets > 5                       # Lateral movement threshold
  | sort -unique_targets

Performance: ⚡⚡ Fast

Time-based bucketing for aggregation
Indexed normalized fields
UDM fields for additional context

Example 6: Performance Troubleshooting - Slow Database Queries

Goal: Find slow database queries with execution details

# Leverages UDM database fields
timestamp > now() - INTERVAL 2 HOURS               # Metadata
  source_type=database                             # Metadata
  query_time > 5000                                # UDM (milliseconds)
  | eval query_seconds = query_time / 1000
  | stats 
      avg(query_seconds) as avg_time,
      max(query_seconds) as max_time,
      count() as query_count,
      values(instance_name) as instances,          # UDM
      values(instance_type) as db_types            # UDM
    by user, query
  | where query_count > 3                          # Repeated slow queries
  | sort -avg_time
  | head 10

Performance: ⚡ Good

UDM fields provide detailed database context
Numeric comparison on query_time
Aggregation reduces result set

Performance Anti-Patterns

❌ Don't: Skip Time Filters

# BAD - Scans all partitions
user=admin

# GOOD - Uses partition pruning
timestamp > now() - INTERVAL 1 DAY user=admin

❌ Don't: Use Leading Wildcards

# BAD - Can't use indexes
user LIKE "%admin"

# GOOD - Uses bloom filter index
user LIKE "admin%"

❌ Don't: Query Extension Fields Without Time/Source Filters

# BAD - Scans all data for an unindexed field
aws_eventName="DeleteBucket"

# GOOD - Narrow with indexed fields first
source_type=cloudtrail aws_eventName="DeleteBucket"

❌ Don't: Aggregate High-Cardinality Fields

# BAD - Too many groups
* | stats count() by command_line

# GOOD - Aggregate on low-cardinality
* | stats count() by command_line_name, action

❌ Don't: Manual Prevalence JOINs

# BAD - Complex manual JOIN
EventID=1 | join file_hash [...]

# GOOD - Use prevalence command
EventID=1 | prevalence hash_prevalence < 5 window=24h

Next Steps

Search & Query Language - Learn the query syntax
UDM Fields - Complete field listing
Detection Rules - Create threat detections
Dashboards - Visualize your data
Enrichments - Add context to your logs

Field Types & Search Performance

On this page