Enrichment Architecture
Enrichment Architecture
This document explains how nano's enrichment system works internally, including data flow, performance optimizations, and extensibility.
System Overview
The enrichment system is designed for high-performance, zero-downtime operation with automatic data updates and an extensible architecture supporting geolocation, threat intelligence, anonymizer detection, and custom enrichment sources.
Data Flow
Enrichment Data Sync
The sync process ensures enrichment data stays current while maintaining system availability:
Log Enrichment Flow
How individual logs get enriched during ingestion:
Custom Enrichment Data Flow
Custom enrichments follow a distinct path through the Deno sandbox:
- Code Execution: TypeScript runs in a secure Deno sandbox
- Data Storage: Records are stored in ClickHouse
custom_enrichment_resultstable - Dictionary Refresh: ClickHouse dictionaries auto-refresh every 1-5 minutes
- Log Enrichment: New logs are automatically enriched with matching data
Performance Architecture
Zero-Downtime Updates
The staging table approach ensures continuous availability:
The update steps:
- Download — New data is downloaded in the background
- Stage — Data is loaded into a staging table
- Validate — Data integrity is verified
- Swap — Production table is atomically updated
- Cleanup — Old data is removed
This ensures that:
- Log ingestion never stops
- Lookups always return results
- Updates are atomic and consistent
Bulk Lookup Optimization
High-performance IP lookups using PostgreSQL's advanced features:
-- Optimized bulk lookup query
SELECT
ip_addr,
ie.country,
ie.country_code,
ie.asn,
ie.as_name
FROM UNNEST($1::text[]) AS ip_addr
LEFT JOIN LATERAL (
SELECT country, country_code, asn, as_name
FROM ip_enrichments ie
JOIN enrichment_sources es ON ie.source_id = es.id
WHERE ip_addr::inet <<= ie.network
AND es.enabled = true
ORDER BY masklen(ie.network) DESC
LIMIT 1
) ie ON trueKey optimizations:
- UNNEST: Process multiple IPs in single query
- LATERAL JOIN: Efficient per-IP lookups
- Network containment:
<<=operator for CIDR matching - Longest prefix match:
ORDER BY masklen() DESC - Index optimization: GiST indexes on network ranges
Caching Strategy
For high-volume environments, the system uses:
- Batch processing — Multiple IPs looked up in single queries
- Connection pooling — Efficient database connections
- Caching — In-memory caching for frequently accessed data
Database Schema
Core Tables
enrichment_sources — Configuration and metadata
CREATE TABLE enrichment_sources (
id VARCHAR PRIMARY KEY,
name VARCHAR NOT NULL,
source_type VARCHAR NOT NULL,
description TEXT,
download_url TEXT,
last_sync_at TIMESTAMPTZ,
last_sync_status VARCHAR,
record_count BIGINT DEFAULT 0,
config JSONB DEFAULT '{}',
enabled BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);ip_enrichments — Production enrichment data
CREATE TABLE ip_enrichments (
id BIGSERIAL PRIMARY KEY,
source_id VARCHAR REFERENCES enrichment_sources(id),
network CIDR NOT NULL,
country VARCHAR,
country_code VARCHAR(2),
continent VARCHAR,
continent_code VARCHAR(2),
asn VARCHAR,
as_name VARCHAR,
as_domain VARCHAR,
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(source_id, network)
);
-- Critical index for fast lookups
CREATE INDEX idx_ip_enrichments_network_gist
ON ip_enrichments USING GIST (network);Lookup Function
Optimized function for single IP lookups:
CREATE OR REPLACE FUNCTION lookup_ip_enrichment(ip_addr TEXT)
RETURNS TABLE(
country TEXT,
country_code TEXT,
continent TEXT,
continent_code TEXT,
asn TEXT,
as_name TEXT,
as_domain TEXT
) AS $$
BEGIN
RETURN QUERY
SELECT
ie.country,
ie.country_code,
ie.continent,
ie.continent_code,
ie.asn,
ie.as_name,
ie.as_domain
FROM ip_enrichments ie
JOIN enrichment_sources es ON ie.source_id = es.id
WHERE ip_addr::inet <<= ie.network
AND es.enabled = true
ORDER BY masklen(ie.network) DESC
LIMIT 1;
END;
$$ LANGUAGE plpgsql STABLE;Extensibility Framework
Source Plugin Architecture
New enrichment sources implement standard interfaces:
pub trait EnrichmentSource {
fn source_type(&self) -> &str;
fn download(&self, config: &SourceConfig) -> Result<Vec<u8>>;
fn parse(&self, data: &[u8]) -> Result<Vec<EnrichmentRecord>>;
fn schema(&self) -> TableSchema;
}
pub trait EnrichmentLookup {
fn lookup_single(&self, key: &str) -> Result<Option<EnrichmentResult>>;
fn lookup_bulk(&self, keys: &[&str]) -> Result<HashMap<String, EnrichmentResult>>;
}Current Implementations
Monitoring & Observability
Metrics Collection
Key metrics tracked by the enrichment system:
Health Checks
Automated monitoring ensures system reliability:
-
Sync Health
- Last successful sync timestamp
- Sync failure detection and alerting
- Data freshness monitoring
-
Lookup Performance
- Query latency percentiles
- Cache effectiveness
- Database connection health
-
Data Quality
- Record count validation
- Data integrity checks
- Coverage analysis
Security Considerations
Data Protection
- URL Security: Enrichment URLs contain tokens — stored securely
- Access Control: API endpoints require appropriate permissions
- Data Validation: All input data validated before storage
- Audit Logging: All configuration changes logged
Network Security
- HTTPS Only: All external downloads use encrypted connections
- Timeout Protection: Download timeouts prevent resource exhaustion
- Rate Limiting: Prevents abuse of external APIs
- Firewall Rules: Restrict outbound connections as needed
Custom Enrichment Sandbox
Custom code runs in a restricted Deno sandbox:
- Network Restrictions: Only allowed domains can be accessed
- No File System Access: Cannot read/write local files
- Memory Limits: Prevents resource exhaustion
- Execution Timeout: 60 second maximum runtime
- No Shell Access: Cannot execute system commands