Field Types & Search Performance
Field Types & Search Performance
nano organizes log data into different field categories, each optimized for specific use cases and search patterns. Understanding these field types helps you write faster, more efficient queries.
Overview
Raw Log → Parser → Field Extraction → Storage
↓
┌────────────────┼────────────────┐
↓ ↓ ↓
Metadata Normalized UDM
(Always) (Common) (Extended)
↓ ↓ ↓
└────────────────┴────────────────┘
↓
┌──────────┴──────────┐
↓ ↓
Enrichment Prevalence
(GeoIP, ASN) (Rarity Tracking)
↓ ↓
enriched_* hash_prevalence
domain_prevalenceField Categories
Metadata Fields
What they are: Core system fields that exist on every log event, regardless of source.
Examples:
timestamp- Event timestamp (indexed, partitioned)id- Unique event identifier (UUID)source_type- Log source (e.g., "sysmon", "apache", "cloudtrail")ingest_time- When the event was ingestedraw_content- Original unparsed log messagemetadata- System metadata (JSON)
Search Performance: ⚡⚡⚡ Fastest
Metadata fields are heavily optimized:
timestampis the primary partition key - filtering by time is extremely fastsource_typeuses LowCardinality encoding for minimal storage and fast filteringidhas bloom filter indexes for exact lookups- Queries that filter by
timestampfirst benefit from partition pruning
Best Practices:
# FAST - Uses partition pruning
timestamp > now() - INTERVAL 1 HOUR source_type=sysmon
# SLOWER - No time filter means scanning all partitions
source_type=sysmonNormalized Fields
What they are: Common fields extracted from logs and mapped to a standard schema. These are the "core" fields that most log sources populate.
Examples:
- Network:
src_ip,dest_ip,src_port,dest_port,protocol - Users:
user,src_user,dest_user,user_domain - Processes:
process_name,process_id,process_path,process_hash,command_line - Files:
file_path,file_name,file_hash,file_size - Web:
url,url_domain,http_method,http_user_agent - DNS:
query,query_type,answer,record_type - Email:
sender,recipient,subject,message_id - Security:
action,status,severity,category,signature
Search Performance: ⚡⚡ Fast
Normalized fields have targeted optimizations:
- High-cardinality fields (IPs, hashes, GUIDs) use bloom filter indexes
- Low-cardinality fields (actions, statuses) use set indexes or LowCardinality encoding
- Text fields (command_line, file_path) use token bloom filters for substring matching
- Ordered by
(timestamp, src_ip, dest_ip)for efficient network queries
Index Types:
- Bloom Filter (
src_ip,dest_ip,user,process_hash,file_hash) - Fast exact matching - Token Bloom Filter (
command_line,file_path,http_user_agent) - Fast substring/token matching - Set Index (
action,status,http_method) - Fast IN() queries for low-cardinality fields - LowCardinality (
source_type,severity,category) - Compressed storage, fast filtering
Best Practices:
# FAST - Indexed fields with exact match
src_ip=192.168.1.100
process_hash=5d41402abc4b2a76b9719d911017c592
# FAST - Token bloom filter for substring
command_line CONTAINS "powershell"
# FAST - Set index for IN queries
action IN ("login", "logout", "failed_login")
# SLOWER - Wildcard at start defeats indexes
user=*adminUDM Fields (Unified Data Model)
What they are: Extended fields from industry-standard data models and security frameworks. These provide comprehensive coverage for specialized use cases.
Total Fields: 525+ fields covering:
- Network Traffic (85 fields) - VLANs, NAT, load balancers, network performance
- Authentication (45 fields) - SSO, MFA, privilege escalation, session management
- Web Traffic (38 fields) - HTTP headers, cookies, referrers, response codes
- Endpoint (72 fields) - Services, drivers, registry, scheduled tasks
- Database (41 fields) - Queries, transactions, performance metrics
- Email (28 fields) - Attachments, routing, delivery status
- Certificate/SSL (35 fields) - Certificate chains, validation, expiration
- Malware (22 fields) - Signatures, families, actions
- Vulnerability (18 fields) - CVEs, CVSS scores, patches
- Cloud/Platform (55 fields) - AWS, Azure, GCP audit logs, cloud resource tracking
- Performance (38 fields) - CPU, memory, disk, JVM metrics
- Custom (58 fields) - nano-specific extensions
Search Performance: ⚡ Good
UDM fields are added dynamically and have basic optimizations:
- String fields use default String type (some use LowCardinality for common values)
- Integer/Long fields use appropriate numeric types (UInt32, UInt64, Int64)
- Float fields use Float64 for precision
- Boolean fields use UInt8 (0/1)
- Commonly queried fields have bloom filter indexes
Performance Characteristics:
- Fields are stored but not all are indexed
- Exact matches are reasonably fast
- Substring searches are slower than normalized fields
- Best used when you know the specific field you need
Best Practices:
# GOOD - Specific field with exact match
ssl_issuer_common_name="Let's Encrypt"
cvss > 7.0
# ACCEPTABLE - Indexed UDM fields
dest_user="admin"
signature="Malicious Activity"
# SLOWER - Non-indexed UDM fields with wildcards
ssl_subject LIKE "%example%"
# BETTER - Combine with time filter
timestamp > now() - INTERVAL 1 DAY ssl_subject LIKE "%example%"Enriched Fields
What they are: Fields automatically populated by enrichment processes, typically from external data sources or dictionaries.
Examples:
enriched_src_country- Source IP country (from GeoIP)enriched_src_country_code- Source IP country codeenriched_src_continent- Source IP continentenriched_src_asn- Source IP ASN numberenriched_src_as_name- Source IP AS organization nameenriched_dest_country- Destination IP countryenriched_dest_country_code- Destination IP country codeenriched_dest_continent- Destination IP continentenriched_dest_asn- Destination IP ASN numberenriched_dest_as_name- Destination IP AS organization name
Search Performance: ⚡⚡ Fast
Enriched fields are materialized (pre-computed) at ingestion time:
- Values are looked up from dictionaries and stored directly in the table
- No runtime dictionary lookups during queries
- Use LowCardinality encoding for efficient storage
- Filter performance is similar to normalized fields
How Enrichment Works:
- Log arrives with
src_ip=8.8.8.8 - Dictionary lookup:
ip_enrichment_dict→ country="United States", asn="AS15169" - Fields populated:
enriched_src_country="United States",enriched_src_asn="AS15169" - Stored in table for instant querying
Best Practices:
# FAST - Enriched fields are pre-computed
enriched_src_country="China"
enriched_dest_asn="AS15169"
# FAST - Combine with other filters
timestamp > now() - INTERVAL 1 HOUR
enriched_src_country NOT IN ("United States", "Canada")
# EFFICIENT - Group by enriched fields
* | stats count() by enriched_src_country, enriched_dest_countryPrevalence Fields
What they are: Automatically computed fields that track how rare or common artifacts are across your environment.
Examples:
hash_prevalence- Number of hosts that have seen this file hashdomain_prevalence- Number of hosts that have queried this domainip_prevalence- Number of hosts that have connected to this IPhash_first_seen- When this hash was first observeddomain_first_seen- When this domain was first observedprevalence_score- Computed rarity score (0-100, 0 = never seen, 100 = everywhere)is_rare- Boolean flag for rare artifacts
Search Performance: ⚡ Good (with caveats)
Prevalence data is computed by materialized views:
- Aggregated hourly in background tables
- Lookups require JOIN operations with aggregation tables
- Best used with the
prevalencecommand which optimizes the query - Direct field access is slower than using the prevalence command
How Prevalence Works:
- Materialized views track artifacts (hashes, domains, IPs) per hour
- Count unique hosts that have seen each artifact
- Track first_seen and last_seen timestamps
- Prevalence command queries these aggregations efficiently
Best Practices:
# BEST - Use prevalence command (optimized)
EventID=1 | prevalence hash_prevalence < 5 window=24h
# GOOD - Filter rare processes
EventID=1 | prevalence hash_first_seen > now() - INTERVAL 1 DAY
# ACCEPTABLE - Enrich with prevalence data
* | prevalence enrich=true window=24h
# AVOID - Direct field access (not materialized in main table)
# prevalence_score < 0.1 # This field doesn't exist in logs tableExtension Fields
What they are: Additional UDM fields for parser-specific or custom data that doesn't fit the explicitly indexed columns. These are stored in a dynamic JSON column internally, but you search them by name just like any other field — no special prefix required.
Examples:
sysmon_RuleName- Sysmon-specific rule nameaws_eventName- AWS CloudTrail event nameaws_userIdentity- AWS user identity objectcustom_threat_score- Custom threat scoring
Search Performance: Slower than indexed fields
Extension fields are stored in a JSON column under the hood:
- The query engine automatically maps field names to the JSON storage
- No special
ext.prefix required — just use the field name directly - Slower than explicitly indexed columns (no bloom filters or partition pruning)
- Best to combine with indexed filters (time range, source_type) to narrow the scan first
Best Practices:
# GOOD - Search by field name directly (no ext. prefix needed)
aws_eventName="DeleteBucket"
# BETTER - Combine with indexed fields to narrow the scan
source_type=cloudtrail aws_eventName="DeleteBucket"
# GOOD - Use in table output
* | table timestamp, user, aws_eventName, aws_sourceIPAddressPerformance Comparison
Query Speed by Field Type
| Field Type | Exact Match | Substring | Wildcard | Aggregation | Notes |
|---|---|---|---|---|---|
| Metadata | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ | Partition pruning on timestamp |
| Normalized | ⚡⚡⚡ | ⚡⚡ | ⚡ | ⚡⚡ | Bloom filter + token indexes |
| UDM | ⚡⚡ | ⚡ | ⚡ | ⚡⚡ | Basic indexes, many fields |
| Enriched | ⚡⚡⚡ | ⚡⚡ | ⚡ | ⚡⚡ | Pre-computed, LowCardinality |
| Prevalence | ⚡ | N/A | N/A | ⚡ | Requires JOIN, use command |
| Extension | ⚡ | 🐌 | 🐌 | 🐌 | Dynamic JSON, no indexes |
Storage Efficiency
| Field Type | Storage | Compression | Cardinality |
|---|---|---|---|
| Metadata | Minimal | Excellent | Low-Medium |
| Normalized | Low | Very Good | Medium-High |
| UDM | Medium | Good | Varies |
| Enriched | Low | Excellent | Low |
| Prevalence | Separate | Excellent | Medium |
| Extension | Medium | Good | Varies |
Query Optimization Tips
1. Always Filter by Time First
# FAST - Partition pruning
timestamp > now() - INTERVAL 1 HOUR user=admin
# SLOW - Scans all partitions
user=admin2. Use Indexed Fields
# FAST - Bloom filter index
src_ip=192.168.1.100
# FAST - Token bloom filter
command_line CONTAINS "powershell"
# SLOWER - Extension field (no index, but works)
custom_field="value"3. Prefer Exact Matches
# FAST - Exact match
user="admin"
# SLOWER - Wildcard
user LIKE "%admin%"
# SLOWEST - Leading wildcard
user LIKE "%admin"4. Use Low-Cardinality Fields for Grouping
# FAST - Low cardinality
* | stats count() by source_type, action, severity
# SLOWER - High cardinality
* | stats count() by src_ip, dest_ip, command_line5. Leverage Enriched Fields
# FAST - Pre-computed enrichment
enriched_src_country="China"
# SLOWER - Runtime enrichment
* | lookup geoip src_ip OUTPUT country | where country="China"6. Use Prevalence Command
# FAST - Optimized prevalence query
EventID=1 | prevalence hash_prevalence < 5 window=24h
# SLOWER - Manual JOIN
EventID=1 | join file_hash [
SELECT file_hash, uniqMerge(host_count) as hosts
FROM hash_prevalence_agg
GROUP BY file_hash
] | where hosts < 57. Limit Result Sets
# GOOD - Limit early
timestamp > now() - INTERVAL 1 HOUR
source_type=sysmon
| head 1000
# BETTER - Aggregate instead of raw results
timestamp > now() - INTERVAL 1 HOUR
source_type=sysmon
| stats count() by process_nameField Discovery
List Available Fields
Use the Fields Panel in the Search UI to:
- Browse all available fields
- See field types and cardinality
- View sample values
- Add fields to your query
Check Field Population
# See which fields are populated
* | stats count() by source_type
| table source_type, count
# Check field coverage
source_type=sysmon
| stats
count() as total,
count(process_hash) as has_hash,
count(command_line) as has_cmdlineFind UDM Fields
# Search UDM field documentation
# Visit: Settings → Documentation → UDM Fields
# Or query the schema
SHOW COLUMNS FROM logsBrowse All UDM Fields: See the UDM Fields Table for a complete, searchable list of all 525+ UDM fields with types and categories.
Real-World Examples
Example 1: Threat Hunting - Rare Process Execution
Goal: Find rare processes executed in the last 24 hours from foreign countries
# Combines multiple field types for optimal performance
timestamp > now() - INTERVAL 24 HOURS # Metadata (partition pruning)
source_type=sysmon # Metadata (indexed)
EventID=1 # Normalized (indexed)
enriched_src_country NOT IN ("United States") # Enriched (fast filter)
| prevalence hash_prevalence < 5 window=24h # Prevalence (optimized)
| table timestamp, src_host, process_name,
process_hash, command_line,
enriched_src_country, hash_prevalencePerformance: ⚡⚡⚡ Fast
- Time filter enables partition pruning
- All filters use indexed fields
- Prevalence command is optimized
- Result set is limited by rarity
Example 2: Security Monitoring - Suspicious PowerShell
Goal: Detect encoded PowerShell commands with high entropy
# Leverages normalized fields and eval functions
timestamp > now() - INTERVAL 1 HOUR # Metadata
process_name=powershell.exe # Normalized (indexed)
command_line CONTAINS "-enc" # Normalized (token index)
| eval cmd_entropy = entropy(command_line) # Calculated field
| eval decoded = base64_decode(
extract(command_line, "-enc[odedCommand]* ([A-Za-z0-9+/=]+)")
)
| where cmd_entropy > 4.5 # High entropy = suspicious
| table timestamp, user, src_host,
command_line, cmd_entropy, decodedPerformance: ⚡⚡ Fast
- Time filter for partition pruning
- Indexed process_name for fast filtering
- Token bloom filter for substring match
- Eval functions computed only on matching results
Example 3: Network Analysis - Geographic Traffic Patterns
Goal: Analyze outbound traffic by destination country
# Uses enriched fields for pre-computed geographic data
timestamp > now() - INTERVAL 6 HOURS # Metadata
dest_port IN (80, 443) # Normalized
enriched_dest_country != "" # Enriched (non-empty)
| stats
sum(bytes_out) as total_bytes, # Normalized
dc(src_ip) as unique_sources, # Normalized
dc(dest_ip) as unique_destinations, # Normalized
count() as connections
by enriched_dest_country, # Enriched (low cardinality)
enriched_dest_continent # Enriched (low cardinality)
| eval total_mb = round(total_bytes / 1048576, 2)
| sort -total_mb
| head 20Performance: ⚡⚡⚡ Fast
- Enriched fields are pre-computed (no runtime lookups)
- Aggregation on low-cardinality fields
- Efficient grouping and sorting
Example 4: Compliance - Failed Authentication Tracking
Goal: Track failed login attempts with user and location details
# Combines normalized and UDM fields
timestamp > now() - INTERVAL 24 HOURS # Metadata
action=login # Normalized (set index)
status=failure # Normalized (set index)
| eval hour = hour(timestamp) # Time function
| stats
count() as attempts,
dc(src_ip) as unique_ips,
values(enriched_src_country) as countries, # Enriched
values(authentication_method) as auth_methods # UDM
by user, hour
| where attempts > 5 # Threshold
| sort -attemptsPerformance: ⚡⚡ Fast
- Set indexes on action/status
- Low-cardinality grouping (user, hour)
- Enriched fields are pre-computed
Example 5: Incident Response - Lateral Movement Detection
Goal: Detect potential lateral movement using multiple authentication sources
# Uses UDM fields for detailed authentication tracking
timestamp > now() - INTERVAL 1 HOUR # Metadata
source_type IN ("windows_security", "sysmon") # Metadata
action=login # Normalized
auth_type=network # Normalized
| bin span=5m # Time bucketing
| stats
dc(dest_host) as unique_targets, # Normalized
dc(src_ip) as unique_sources, # Normalized
values(authentication_method) as methods, # UDM
values(user_type) as user_types, # UDM
count() as login_count
by time_bucket, user, src_host
| where unique_targets > 5 # Lateral movement threshold
| sort -unique_targetsPerformance: ⚡⚡ Fast
- Time-based bucketing for aggregation
- Indexed normalized fields
- UDM fields for additional context
Example 6: Performance Troubleshooting - Slow Database Queries
Goal: Find slow database queries with execution details
# Leverages UDM database fields
timestamp > now() - INTERVAL 2 HOURS # Metadata
source_type=database # Metadata
query_time > 5000 # UDM (milliseconds)
| eval query_seconds = query_time / 1000
| stats
avg(query_seconds) as avg_time,
max(query_seconds) as max_time,
count() as query_count,
values(instance_name) as instances, # UDM
values(instance_type) as db_types # UDM
by user, query
| where query_count > 3 # Repeated slow queries
| sort -avg_time
| head 10Performance: ⚡ Good
- UDM fields provide detailed database context
- Numeric comparison on query_time
- Aggregation reduces result set
Performance Anti-Patterns
❌ Don't: Skip Time Filters
# BAD - Scans all partitions
user=admin# GOOD - Uses partition pruning
timestamp > now() - INTERVAL 1 DAY user=admin❌ Don't: Use Leading Wildcards
# BAD - Can't use indexes
user LIKE "%admin"# GOOD - Uses bloom filter index
user LIKE "admin%"❌ Don't: Query Extension Fields Without Time/Source Filters
# BAD - Scans all data for an unindexed field
aws_eventName="DeleteBucket"# GOOD - Narrow with indexed fields first
source_type=cloudtrail aws_eventName="DeleteBucket"❌ Don't: Aggregate High-Cardinality Fields
# BAD - Too many groups
* | stats count() by command_line# GOOD - Aggregate on low-cardinality
* | stats count() by command_line_name, action❌ Don't: Manual Prevalence JOINs
# BAD - Complex manual JOIN
EventID=1 | join file_hash [...]# GOOD - Use prevalence command
EventID=1 | prevalence hash_prevalence < 5 window=24hNext Steps
- Search & Query Language - Learn the query syntax
- UDM Fields - Complete field listing
- Detection Rules - Create threat detections
- Dashboards - Visualize your data
- Enrichments - Add context to your logs