nano SIEM
User Guide

Data Management

Data Management

nano provides comprehensive data management capabilities for uploading files, creating lookup tables, and automating data collection through scheduled jobs. These features enable you to enrich your security data with external context and automate routine data ingestion tasks.

Overview

The data management system consists of three main components:

  • File Upload: Direct upload of CSV, JSON, and NDJSON files for log ingestion or lookup table creation
  • Lookup Tables: Reference data tables for enriching search results with contextual information
  • Scheduled Jobs: Automated data fetching from remote URLs on configurable schedules

All components support the same file formats and parsing configurations, providing a consistent experience across different data ingestion methods.

File Upload

Supported Formats

nano supports three primary file formats for data upload:

CSV (Comma-Separated Values)

  • Extensions: .csv, .tsv
  • Delimiters: Comma (,), Tab (\t), Pipe (|), Semicolon (;)
  • Headers: Optional first-row headers or custom column names
  • Encoding: UTF-8, UTF-16 (LE/BE), ISO-8859-1
  • Features: Automatic type inference, quoted field support, flexible column handling

JSON (JavaScript Object Notation)

  • Extensions: .json
  • Structure: Array of objects [{"field": "value"}, ...]
  • Encoding: UTF-8, UTF-16 (LE/BE)
  • Features: Nested object support, automatic flattening option

NDJSON (Newline-Delimited JSON)

  • Extensions: .ndjson, .jsonl
  • Structure: One JSON object per line
  • Encoding: UTF-8, UTF-16 (LE/BE)
  • Features: Streaming-friendly format, efficient for large datasets

Size Limitations

Limit TypeValueNotes
Maximum File Size100 MBPer upload operation
Maximum RecordsUnlimitedLimited by file size constraint
Preview Records10For file preview functionality
Lookup Table Rows1,000,000Per lookup table

Upload Process

  1. File Selection: Drag-and-drop or browse to select files
  2. Format Detection: Automatic detection based on file extension and content
  3. Configuration: Set parsing options (delimiter, headers, encoding)
  4. Preview: Review first 10 records and detected column types
  5. Destination: Choose between log ingestion or lookup table creation
  6. Processing: Parse and ingest data with progress tracking
  7. Validation: Error reporting for failed records

Configuration Options

CSV Configuration

format: csv
csv_delimiter: ","           # Delimiter character
csv_has_headers: true        # First row contains headers
custom_headers: []           # Override column names
encoding: "utf-8"           # File encoding
max_records: null           # Limit processing (null = unlimited)
skip_invalid: true          # Skip malformed records

JSON/NDJSON Configuration

format: json                # or "ndjson"
encoding: "utf-8"          # File encoding
max_records: null          # Limit processing
skip_invalid: true         # Skip invalid objects
flatten_json: false        # Flatten nested objects

Lookup Tables

Lookup tables provide reference data for enriching search results with contextual information such as asset inventories, user directories, threat intelligence, and geographic data.

Table Structure

Each lookup table consists of:

  • Columns: Typed fields with names and data types
  • Primary Key: Optional column for efficient lookups
  • Indexes: Automatic indexing for query performance
  • Metadata: Description, creation date, and statistics

Supported Data Types

TypeDescriptionPostgreSQL TypeExamples
textString valuesTEXTNames, descriptions, IDs
integerWhole numbersBIGINTCounts, IDs, ports
floatDecimal numbersDOUBLE PRECISIONScores, percentages
booleanTrue/false valuesBOOLEANFlags, status indicators
timestampDate/time valuesTIMESTAMPTZCreation dates, expiry
jsonComplex objectsJSONBNested data, arrays

Creating Lookup Tables

Via Web Interface

  1. Navigate to Lookup Tables page
  2. Click New Lookup Table
  3. Upload data file (CSV, JSON, NDJSON)
  4. Configure table settings:
    • Name: Unique identifier for the table
    • Description: Optional documentation
    • Primary Key: Column for efficient lookups
    • Update Mode: Replace or append data
  5. Review column types and override if needed
  6. Create table and ingest data

Via API

# Create lookup table with file upload
curl -X POST "/api/lookup/tables" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@threat_indicators.csv" \
  -F "config={
    \"name\": \"threat_indicators\",
    \"description\": \"Known malicious indicators\",
    \"primary_key\": \"indicator\",
    \"format\": \"csv\",
    \"mode\": \"replace\"
  }"

Querying Lookup Tables

Single Lookup

# Look up a single value
curl -X GET "/api/lookup/threat_indicators/lookup?key=indicator&value=malicious.com"

Batch Lookup

# Look up multiple values efficiently
curl -X POST "/api/lookup/threat_indicators/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "key_field": "indicator",
    "key_values": ["malicious.com", "bad-ip.net", "evil.org"],
    "output_fields": ["indicator", "type", "severity"]
  }'

In Search Queries

Use the lookup command to enrich search results:

-- Basic lookup
event_type="dns_query" 
| lookup threat_indicators query_name OUTPUT type, severity

-- Conditional lookup with filtering
event_type="network_connection"
| lookup asset_inventory src_ip OUTPUT asset_type, owner
| where asset_type="server"

Update Modes

Replace Mode

  • Behavior: Truncates existing data and inserts new records
  • Use Case: Complete data refresh from authoritative source
  • Performance: Faster for complete dataset replacement

Append Mode

  • Behavior: Adds new records to existing data
  • Use Case: Incremental updates and data accumulation
  • Performance: Efficient for adding new records

Management Operations

List Tables

curl -X GET "/api/lookup/tables"

Get Table Details

curl -X GET "/api/lookup/tables/threat_indicators"

Delete Table

curl -X DELETE "/api/lookup/tables/threat_indicators"

Scheduled Jobs

Scheduled jobs automate data fetching from remote URLs on configurable schedules, enabling continuous updates of lookup tables and log ingestion from external sources.

Job Configuration

Each scheduled job requires:

  • Name: Unique identifier for the job
  • URL: Remote endpoint to fetch data from
  • Schedule: Cron expression defining execution frequency
  • Destination: Target for ingested data (logs or lookup table)
  • Parser Config: File format and parsing options
  • Retry Policy: Error handling and retry behavior

Cron Expressions

Jobs use standard cron syntax for scheduling:

┌───────────── minute (0 - 59)
│ ┌─────────── hour (0 - 23)
│ │ ┌───────── day of month (1 - 31)
│ │ │ ┌─────── month (1 - 12)
│ │ │ │ ┌───── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
* * * * *

Common Patterns

PatternDescriptionCron Expression
Every hourTop of each hour0 * * * *
Every 6 hours00:00, 06:00, 12:00, 18:000 */6 * * *
Daily at midnightEvery day at 00:000 0 * * *
Daily at 6 AMEvery day at 06:000 6 * * *
Weekly on SundayEvery Sunday at 00:000 0 * * 0
Monthly on 1st1st of each month at 00:000 0 1 * *

Creating Scheduled Jobs

Via Web Interface

  1. Navigate to Scheduled Jobs page
  2. Click New Job
  3. Configure job settings:
    • Name: Descriptive job identifier
    • Description: Optional documentation
    • URL: Remote data source endpoint
    • Schedule: Cron expression or preset
    • Destination: Logs or lookup table
    • Format: CSV, JSON, or NDJSON
  4. Enable job and save configuration

Via API

# Create scheduled job
curl -X POST "/api/scheduler/jobs" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily Threat Feed Update",
    "description": "Fetch latest threat indicators",
    "cron_expression": "0 6 * * *",
    "url": "https://feeds.example.com/threats.csv",
    "destination": {
      "type": "lookup",
      "table_name": "threat_indicators",
      "mode": "replace"
    },
    "parser_config": {
      "format": "csv",
      "csv_has_headers": true
    },
    "enabled": true
  }'

Authentication

Jobs support HTTP authentication through custom headers:

{
  "auth_headers": {
    "Authorization": "Bearer your-api-token",
    "X-API-Key": "your-api-key"
  }
}

Retry Policy

Configure automatic retry behavior for failed jobs:

{
  "retry_policy": {
    "max_retries": 3,
    "retry_delay_secs": 60
  }
}

Job Management

List Jobs

curl -X GET "/api/scheduler/jobs"

Trigger Manual Execution

curl -X POST "/api/scheduler/jobs/{job_id}/trigger"

Enable/Disable Job

curl -X PATCH "/api/scheduler/jobs/{job_id}" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

View Execution History

curl -X GET "/api/scheduler/jobs/{job_id}/executions"

Query Integration

Lookup Command

The lookup command enriches search results with data from lookup tables:

Basic Syntax

| lookup <table_name> <input_field> [OUTPUT <output_fields>]

Examples

-- Enrich with all fields from lookup table
event_type="authentication" 
| lookup user_directory username

-- Enrich with specific fields only
event_type="network_connection"
| lookup asset_inventory src_ip OUTPUT asset_name, department, owner

-- Multiple field matching
event_type="dns_query"
| lookup threat_domains domain_name OUTPUT threat_type, severity, first_seen
| where threat_type="malware"

Advanced Lookup Patterns

Conditional Enrichment

-- Only lookup external IPs
event_type="network_connection"
| where NOT match(dest_ip, "^(10\.|192\.168\.|172\.(1[6-9]|2[0-9]|3[01])\.)")
| lookup ip_reputation dest_ip OUTPUT reputation, country, asn

Nested Lookups

-- Chain multiple lookups
event_type="authentication" 
| lookup user_directory username OUTPUT department, manager
| lookup department_info department OUTPUT budget_code, location

Lookup with Calculations

-- Use lookup results in calculations
event_type="file_access"
| lookup file_classification file_path OUTPUT sensitivity_level
| eval risk_score = case(
    sensitivity_level="public", 1,
    sensitivity_level="internal", 5,
    sensitivity_level="confidential", 10,
    sensitivity_level="secret", 20,
    1
)

Best Practices

File Upload

  1. Validate Data Quality: Review preview before uploading large files
  2. Use Appropriate Formats: CSV for tabular data, NDJSON for streaming
  3. Set Proper Encoding: Ensure UTF-8 encoding for international characters
  4. Handle Errors Gracefully: Enable skip_invalid for malformed records
  5. Monitor Upload Progress: Use progress tracking for large files

Lookup Tables

  1. Choose Meaningful Names: Use descriptive, consistent naming conventions
  2. Set Primary Keys: Define primary keys for efficient lookups
  3. Optimize Data Types: Use appropriate types to minimize storage
  4. Regular Updates: Keep reference data current and accurate
  5. Monitor Performance: Track query performance and table sizes

Scheduled Jobs

  1. Stagger Schedules: Avoid concurrent jobs competing for resources
  2. Handle Failures: Configure appropriate retry policies
  3. Monitor Execution: Review job logs and execution history
  4. Validate Data Sources: Ensure remote URLs remain accessible
  5. Test Thoroughly: Validate job configuration before enabling

Query Optimization

  1. Selective Output: Specify only needed fields in OUTPUT clause
  2. Filter Early: Apply WHERE clauses before lookup operations
  3. Index Usage: Leverage primary keys for efficient lookups
  4. Batch Operations: Use batch lookups for multiple values
  5. Cache Awareness: Understand lookup caching behavior

Troubleshooting

Common Upload Issues

File too large

  • Solution: Split large files or compress data
  • Limit: 100 MB maximum file size

Encoding errors

  • Solution: Convert to UTF-8 or specify correct encoding
  • Supported: UTF-8, UTF-16, ISO-8859-1

Parse failures

  • Solution: Validate file format and structure
  • Enable: skip_invalid option for partial processing

Lookup Table Issues

Slow queries

  • Solution: Define primary keys and optimize data types
  • Monitor: Query performance metrics

Memory usage

  • Solution: Limit table size and optimize column types
  • Limit: 1,000,000 rows per table

Update conflicts

  • Solution: Use appropriate update mode (replace vs append)
  • Consider: Data consistency requirements

Scheduled Job Issues

Authentication failures

  • Solution: Verify API keys and authentication headers
  • Check: Token expiration and permissions

Network timeouts

  • Solution: Increase retry delays and check connectivity
  • Monitor: Remote endpoint availability

Cron expression errors

  • Solution: Validate cron syntax and test expressions
  • Use: Preset patterns for common schedules

The data management system provides powerful capabilities for enriching your security data with external context and automating routine data collection tasks. By following these guidelines and best practices, you can build robust data pipelines that enhance your security analysis capabilities.

On this page

On this page