Data Management

nano provides comprehensive data management capabilities for uploading files, creating lookup tables, and automating data collection through scheduled jobs. These features enable you to enrich your security data with external context and automate routine data ingestion tasks.

Overview

The data management system consists of three main components:

File Upload: Direct upload of CSV, JSON, and NDJSON files for log ingestion or lookup table creation
Lookup Tables: Reference data tables for enriching search results with contextual information
Scheduled Jobs: Automated data fetching from remote URLs on configurable schedules

All components support the same file formats and parsing configurations, providing a consistent experience across different data ingestion methods.

File Upload

Supported Formats

nano supports three primary file formats for data upload:

CSV (Comma-Separated Values)

Extensions: .csv, .tsv
Delimiters: Comma (,), Tab (\t), Pipe (|), Semicolon (;)
Headers: Optional first-row headers or custom column names
Encoding: UTF-8, UTF-16 (LE/BE), ISO-8859-1
Features: Automatic type inference, quoted field support, flexible column handling

JSON (JavaScript Object Notation)

Extensions: .json
Structure: Array of objects [{"field": "value"}, ...]
Encoding: UTF-8, UTF-16 (LE/BE)
Features: Nested object support, automatic flattening option

NDJSON (Newline-Delimited JSON)

Extensions: .ndjson, .jsonl
Structure: One JSON object per line
Encoding: UTF-8, UTF-16 (LE/BE)
Features: Streaming-friendly format, efficient for large datasets

Size Limitations

Limit Type	Value	Notes
Maximum File Size	100 MB	Per upload operation
Maximum Records	Unlimited	Limited by file size constraint
Preview Records	10	For file preview functionality
Lookup Table Rows	1,000,000	Per lookup table

Upload Process

File Selection: Drag-and-drop or browse to select files
Format Detection: Automatic detection based on file extension and content
Configuration: Set parsing options (delimiter, headers, encoding)
Preview: Review first 10 records and detected column types
Destination: Choose between log ingestion or lookup table creation
Processing: Parse and ingest data with progress tracking
Validation: Error reporting for failed records

Configuration Options

CSV Configuration

format: csv
csv_delimiter: ","           # Delimiter character
csv_has_headers: true        # First row contains headers
custom_headers: []           # Override column names
encoding: "utf-8"           # File encoding
max_records: null           # Limit processing (null = unlimited)
skip_invalid: true          # Skip malformed records

JSON/NDJSON Configuration

format: json                # or "ndjson"
encoding: "utf-8"          # File encoding
max_records: null          # Limit processing
skip_invalid: true         # Skip invalid objects
flatten_json: false        # Flatten nested objects

Lookup Tables

Lookup tables provide reference data for enriching search results with contextual information such as asset inventories, user directories, threat intelligence, and geographic data.

Table Structure

Each lookup table consists of:

Columns: Typed fields with names and data types
Primary Key: Optional column for efficient lookups
Indexes: Automatic indexing for query performance
Metadata: Description, creation date, and statistics

Supported Data Types

Type	Description	PostgreSQL Type	Examples
`text`	String values	TEXT	Names, descriptions, IDs
`integer`	Whole numbers	BIGINT	Counts, IDs, ports
`float`	Decimal numbers	DOUBLE PRECISION	Scores, percentages
`boolean`	True/false values	BOOLEAN	Flags, status indicators
`timestamp`	Date/time values	TIMESTAMPTZ	Creation dates, expiry
`json`	Complex objects	JSONB	Nested data, arrays

Creating Lookup Tables

Via Web Interface

Navigate to Lookup Tables page
Click New Lookup Table
Upload data file (CSV, JSON, NDJSON)
Configure table settings:
- Name: Unique identifier for the table
- Description: Optional documentation
- Primary Key: Column for efficient lookups
- Update Mode: Replace or append data
Review column types and override if needed
Create table and ingest data

Via API

# Create lookup table with file upload
curl -X POST "/api/lookup/tables" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@threat_indicators.csv" \
  -F "config={
    \"name\": \"threat_indicators\",
    \"description\": \"Known malicious indicators\",
    \"primary_key\": \"indicator\",
    \"format\": \"csv\",
    \"mode\": \"replace\"
  }"

Querying Lookup Tables

Single Lookup

# Look up a single value
curl -X GET "/api/lookup/threat_indicators/lookup?key=indicator&value=malicious.com"

Batch Lookup

# Look up multiple values efficiently
curl -X POST "/api/lookup/threat_indicators/batch" \
  -H "Content-Type: application/json" \
  -d '{
    "key_field": "indicator",
    "key_values": ["malicious.com", "bad-ip.net", "evil.org"],
    "output_fields": ["indicator", "type", "severity"]
  }'

In Search Queries

Use the lookup command to enrich search results:

-- Basic lookup
event_type="dns_query" 
| lookup threat_indicators query_name OUTPUT type, severity

-- Conditional lookup with filtering
event_type="network_connection"
| lookup asset_inventory src_ip OUTPUT asset_type, owner
| where asset_type="server"

Update Modes

Replace Mode

Behavior: Truncates existing data and inserts new records
Use Case: Complete data refresh from authoritative source
Performance: Faster for complete dataset replacement

Append Mode

Behavior: Adds new records to existing data
Use Case: Incremental updates and data accumulation
Performance: Efficient for adding new records

Management Operations

List Tables

curl -X GET "/api/lookup/tables"

Get Table Details

curl -X GET "/api/lookup/tables/threat_indicators"

Delete Table

curl -X DELETE "/api/lookup/tables/threat_indicators"

Scheduled Jobs

Scheduled jobs automate data fetching from remote URLs on configurable schedules, enabling continuous updates of lookup tables and log ingestion from external sources.

Job Configuration

Each scheduled job requires:

Name: Unique identifier for the job
URL: Remote endpoint to fetch data from
Schedule: Cron expression defining execution frequency
Destination: Target for ingested data (logs or lookup table)
Parser Config: File format and parsing options
Retry Policy: Error handling and retry behavior

Cron Expressions

Jobs use standard cron syntax for scheduling:

┌───────────── minute (0 - 59)
│ ┌─────────── hour (0 - 23)
│ │ ┌───────── day of month (1 - 31)
│ │ │ ┌─────── month (1 - 12)
│ │ │ │ ┌───── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
* * * * *

Common Patterns

Pattern	Description	Cron Expression
Every hour	Top of each hour	`0 * * * *`
Every 6 hours	00:00, 06:00, 12:00, 18:00	`0 /6 * *`
Daily at midnight	Every day at 00:00	`0 0 * * *`
Daily at 6 AM	Every day at 06:00	`0 6 * * *`
Weekly on Sunday	Every Sunday at 00:00	`0 0 * * 0`
Monthly on 1st	1st of each month at 00:00	`0 0 1 * *`

Creating Scheduled Jobs

Via Web Interface

Navigate to Scheduled Jobs page
Click New Job
Configure job settings:
- Name: Descriptive job identifier
- Description: Optional documentation
- URL: Remote data source endpoint
- Schedule: Cron expression or preset
- Destination: Logs or lookup table
- Format: CSV, JSON, or NDJSON
Enable job and save configuration

Via API

# Create scheduled job
curl -X POST "/api/scheduler/jobs" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily Threat Feed Update",
    "description": "Fetch latest threat indicators",
    "cron_expression": "0 6 * * *",
    "url": "https://feeds.example.com/threats.csv",
    "destination": {
      "type": "lookup",
      "table_name": "threat_indicators",
      "mode": "replace"
    },
    "parser_config": {
      "format": "csv",
      "csv_has_headers": true
    },
    "enabled": true
  }'

Authentication

Jobs support HTTP authentication through custom headers:

{
  "auth_headers": {
    "Authorization": "Bearer your-api-token",
    "X-API-Key": "your-api-key"
  }
}

Retry Policy

Configure automatic retry behavior for failed jobs:

{
  "retry_policy": {
    "max_retries": 3,
    "retry_delay_secs": 60
  }
}

Job Management

List Jobs

curl -X GET "/api/scheduler/jobs"

Trigger Manual Execution

curl -X POST "/api/scheduler/jobs/{job_id}/trigger"

Enable/Disable Job

curl -X PATCH "/api/scheduler/jobs/{job_id}" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

View Execution History

curl -X GET "/api/scheduler/jobs/{job_id}/executions"

Query Integration

Lookup Command

The lookup command enriches search results with data from lookup tables:

Basic Syntax

| lookup <table_name> <input_field> [OUTPUT <output_fields>]

Examples

-- Enrich with all fields from lookup table
event_type="authentication" 
| lookup user_directory username

-- Enrich with specific fields only
event_type="network_connection"
| lookup asset_inventory src_ip OUTPUT asset_name, department, owner

-- Multiple field matching
event_type="dns_query"
| lookup threat_domains domain_name OUTPUT threat_type, severity, first_seen
| where threat_type="malware"

Advanced Lookup Patterns

Conditional Enrichment

-- Only lookup external IPs
event_type="network_connection"
| where NOT match(dest_ip, "^(10\.|192\.168\.|172\.(1[6-9]|2[0-9]|3[01])\.)")
| lookup ip_reputation dest_ip OUTPUT reputation, country, asn

Nested Lookups

-- Chain multiple lookups
event_type="authentication" 
| lookup user_directory username OUTPUT department, manager
| lookup department_info department OUTPUT budget_code, location

Lookup with Calculations

-- Use lookup results in calculations
event_type="file_access"
| lookup file_classification file_path OUTPUT sensitivity_level
| eval risk_score = case(
    sensitivity_level="public", 1,
    sensitivity_level="internal", 5,
    sensitivity_level="confidential", 10,
    sensitivity_level="secret", 20,
    1
)

Best Practices

File Upload

Validate Data Quality: Review preview before uploading large files
Use Appropriate Formats: CSV for tabular data, NDJSON for streaming
Set Proper Encoding: Ensure UTF-8 encoding for international characters
Handle Errors Gracefully: Enable skip_invalid for malformed records
Monitor Upload Progress: Use progress tracking for large files

Lookup Tables

Choose Meaningful Names: Use descriptive, consistent naming conventions
Set Primary Keys: Define primary keys for efficient lookups
Optimize Data Types: Use appropriate types to minimize storage
Regular Updates: Keep reference data current and accurate
Monitor Performance: Track query performance and table sizes

Scheduled Jobs

Stagger Schedules: Avoid concurrent jobs competing for resources
Handle Failures: Configure appropriate retry policies
Monitor Execution: Review job logs and execution history
Validate Data Sources: Ensure remote URLs remain accessible
Test Thoroughly: Validate job configuration before enabling

Query Optimization

Selective Output: Specify only needed fields in OUTPUT clause
Filter Early: Apply WHERE clauses before lookup operations
Index Usage: Leverage primary keys for efficient lookups
Batch Operations: Use batch lookups for multiple values
Cache Awareness: Understand lookup caching behavior

Troubleshooting

Common Upload Issues

File too large

Solution: Split large files or compress data
Limit: 100 MB maximum file size

Encoding errors

Solution: Convert to UTF-8 or specify correct encoding
Supported: UTF-8, UTF-16, ISO-8859-1

Parse failures

Solution: Validate file format and structure
Enable: skip_invalid option for partial processing

Lookup Table Issues

Slow queries

Solution: Define primary keys and optimize data types
Monitor: Query performance metrics

Memory usage

Solution: Limit table size and optimize column types
Limit: 1,000,000 rows per table

Update conflicts

Solution: Use appropriate update mode (replace vs append)
Consider: Data consistency requirements

Scheduled Job Issues

Authentication failures

Solution: Verify API keys and authentication headers
Check: Token expiration and permissions

Network timeouts

Solution: Increase retry delays and check connectivity
Monitor: Remote endpoint availability

Cron expression errors

Solution: Validate cron syntax and test expressions
Use: Preset patterns for common schedules

The data management system provides powerful capabilities for enriching your security data with external context and automating routine data collection tasks. By following these guidelines and best practices, you can build robust data pipelines that enhance your security analysis capabilities.

Data Management

On this page