nano SIEM
Enrichments

Enrichment Architecture

Enrichment Architecture

This document explains how nano's enrichment system works internally, including data flow, performance optimizations, and extensibility.

System Overview

The enrichment system is designed for high-performance, zero-downtime operation with automatic data updates and an extensible architecture supporting geolocation, threat intelligence, anonymizer detection, and custom enrichment sources.

Data Flow

Enrichment Data Sync

The sync process ensures enrichment data stays current while maintaining system availability:

Log Enrichment Flow

How individual logs get enriched during ingestion:

Custom Enrichment Data Flow

Custom enrichments follow a distinct path through the Deno sandbox:

  1. Code Execution: TypeScript runs in a secure Deno sandbox
  2. Data Storage: Records are stored in ClickHouse custom_enrichment_results table
  3. Dictionary Refresh: ClickHouse dictionaries auto-refresh every 1-5 minutes
  4. Log Enrichment: New logs are automatically enriched with matching data

Performance Architecture

Zero-Downtime Updates

The staging table approach ensures continuous availability:

The update steps:

  1. Download — New data is downloaded in the background
  2. Stage — Data is loaded into a staging table
  3. Validate — Data integrity is verified
  4. Swap — Production table is atomically updated
  5. Cleanup — Old data is removed

This ensures that:

  • Log ingestion never stops
  • Lookups always return results
  • Updates are atomic and consistent

Bulk Lookup Optimization

High-performance IP lookups using PostgreSQL's advanced features:

-- Optimized bulk lookup query
SELECT
    ip_addr,
    ie.country,
    ie.country_code,
    ie.asn,
    ie.as_name
FROM UNNEST($1::text[]) AS ip_addr
LEFT JOIN LATERAL (
    SELECT country, country_code, asn, as_name
    FROM ip_enrichments ie
    JOIN enrichment_sources es ON ie.source_id = es.id
    WHERE ip_addr::inet <<= ie.network
      AND es.enabled = true
    ORDER BY masklen(ie.network) DESC
    LIMIT 1
) ie ON true

Key optimizations:

  • UNNEST: Process multiple IPs in single query
  • LATERAL JOIN: Efficient per-IP lookups
  • Network containment: <<= operator for CIDR matching
  • Longest prefix match: ORDER BY masklen() DESC
  • Index optimization: GiST indexes on network ranges

Caching Strategy

For high-volume environments, the system uses:

  • Batch processing — Multiple IPs looked up in single queries
  • Connection pooling — Efficient database connections
  • Caching — In-memory caching for frequently accessed data

Database Schema

Core Tables

enrichment_sources — Configuration and metadata

CREATE TABLE enrichment_sources (
    id VARCHAR PRIMARY KEY,
    name VARCHAR NOT NULL,
    source_type VARCHAR NOT NULL,
    description TEXT,
    download_url TEXT,
    last_sync_at TIMESTAMPTZ,
    last_sync_status VARCHAR,
    record_count BIGINT DEFAULT 0,
    config JSONB DEFAULT '{}',
    enabled BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

ip_enrichments — Production enrichment data

CREATE TABLE ip_enrichments (
    id BIGSERIAL PRIMARY KEY,
    source_id VARCHAR REFERENCES enrichment_sources(id),
    network CIDR NOT NULL,
    country VARCHAR,
    country_code VARCHAR(2),
    continent VARCHAR,
    continent_code VARCHAR(2),
    asn VARCHAR,
    as_name VARCHAR,
    as_domain VARCHAR,
    created_at TIMESTAMPTZ DEFAULT NOW(),

    UNIQUE(source_id, network)
);

-- Critical index for fast lookups
CREATE INDEX idx_ip_enrichments_network_gist
ON ip_enrichments USING GIST (network);

Lookup Function

Optimized function for single IP lookups:

CREATE OR REPLACE FUNCTION lookup_ip_enrichment(ip_addr TEXT)
RETURNS TABLE(
    country TEXT,
    country_code TEXT,
    continent TEXT,
    continent_code TEXT,
    asn TEXT,
    as_name TEXT,
    as_domain TEXT
) AS $$
BEGIN
    RETURN QUERY
    SELECT
        ie.country,
        ie.country_code,
        ie.continent,
        ie.continent_code,
        ie.asn,
        ie.as_name,
        ie.as_domain
    FROM ip_enrichments ie
    JOIN enrichment_sources es ON ie.source_id = es.id
    WHERE ip_addr::inet <<= ie.network
      AND es.enabled = true
    ORDER BY masklen(ie.network) DESC
    LIMIT 1;
END;
$$ LANGUAGE plpgsql STABLE;

Extensibility Framework

Source Plugin Architecture

New enrichment sources implement standard interfaces:

pub trait EnrichmentSource {
    fn source_type(&self) -> &str;
    fn download(&self, config: &SourceConfig) -> Result<Vec<u8>>;
    fn parse(&self, data: &[u8]) -> Result<Vec<EnrichmentRecord>>;
    fn schema(&self) -> TableSchema;
}

pub trait EnrichmentLookup {
    fn lookup_single(&self, key: &str) -> Result<Option<EnrichmentResult>>;
    fn lookup_bulk(&self, keys: &[&str]) -> Result<HashMap<String, EnrichmentResult>>;
}

Current Implementations

Monitoring & Observability

Metrics Collection

Key metrics tracked by the enrichment system:

Health Checks

Automated monitoring ensures system reliability:

  1. Sync Health

    • Last successful sync timestamp
    • Sync failure detection and alerting
    • Data freshness monitoring
  2. Lookup Performance

    • Query latency percentiles
    • Cache effectiveness
    • Database connection health
  3. Data Quality

    • Record count validation
    • Data integrity checks
    • Coverage analysis

Security Considerations

Data Protection

  • URL Security: Enrichment URLs contain tokens — stored securely
  • Access Control: API endpoints require appropriate permissions
  • Data Validation: All input data validated before storage
  • Audit Logging: All configuration changes logged

Network Security

  • HTTPS Only: All external downloads use encrypted connections
  • Timeout Protection: Download timeouts prevent resource exhaustion
  • Rate Limiting: Prevents abuse of external APIs
  • Firewall Rules: Restrict outbound connections as needed

Custom Enrichment Sandbox

Custom code runs in a restricted Deno sandbox:

  • Network Restrictions: Only allowed domains can be accessed
  • No File System Access: Cannot read/write local files
  • Memory Limits: Prevents resource exhaustion
  • Execution Timeout: 60 second maximum runtime
  • No Shell Access: Cannot execute system commands
On this page

On this page