Data Management
Data Management
nano provides comprehensive data management capabilities for uploading files, creating lookup tables, and automating data collection through scheduled jobs. These features enable you to enrich your security data with external context and automate routine data ingestion tasks.
Overview
The data management system consists of three main components:
- File Upload: Direct upload of CSV, JSON, and NDJSON files for log ingestion or lookup table creation
- Lookup Tables: Reference data tables for enriching search results with contextual information
- Scheduled Jobs: Automated data fetching from remote URLs on configurable schedules
All components support the same file formats and parsing configurations, providing a consistent experience across different data ingestion methods.
File Upload
Supported Formats
nano supports three primary file formats for data upload:
CSV (Comma-Separated Values)
- Extensions:
.csv,.tsv - Delimiters: Comma (
,), Tab (\t), Pipe (|), Semicolon (;) - Headers: Optional first-row headers or custom column names
- Encoding: UTF-8, UTF-16 (LE/BE), ISO-8859-1
- Features: Automatic type inference, quoted field support, flexible column handling
JSON (JavaScript Object Notation)
- Extensions:
.json - Structure: Array of objects
[{"field": "value"}, ...] - Encoding: UTF-8, UTF-16 (LE/BE)
- Features: Nested object support, automatic flattening option
NDJSON (Newline-Delimited JSON)
- Extensions:
.ndjson,.jsonl - Structure: One JSON object per line
- Encoding: UTF-8, UTF-16 (LE/BE)
- Features: Streaming-friendly format, efficient for large datasets
Size Limitations
| Limit Type | Value | Notes |
|---|---|---|
| Maximum File Size | 100 MB | Per upload operation |
| Maximum Records | Unlimited | Limited by file size constraint |
| Preview Records | 10 | For file preview functionality |
| Lookup Table Rows | 1,000,000 | Per lookup table |
Upload Process
- File Selection: Drag-and-drop or browse to select files
- Format Detection: Automatic detection based on file extension and content
- Configuration: Set parsing options (delimiter, headers, encoding)
- Preview: Review first 10 records and detected column types
- Destination: Choose between log ingestion or lookup table creation
- Processing: Parse and ingest data with progress tracking
- Validation: Error reporting for failed records
Configuration Options
CSV Configuration
format: csv
csv_delimiter: "," # Delimiter character
csv_has_headers: true # First row contains headers
custom_headers: [] # Override column names
encoding: "utf-8" # File encoding
max_records: null # Limit processing (null = unlimited)
skip_invalid: true # Skip malformed recordsJSON/NDJSON Configuration
format: json # or "ndjson"
encoding: "utf-8" # File encoding
max_records: null # Limit processing
skip_invalid: true # Skip invalid objects
flatten_json: false # Flatten nested objectsLookup Tables
Lookup tables provide reference data for enriching search results with contextual information such as asset inventories, user directories, threat intelligence, and geographic data.
Table Structure
Each lookup table consists of:
- Columns: Typed fields with names and data types
- Primary Key: Optional column for efficient lookups
- Indexes: Automatic indexing for query performance
- Metadata: Description, creation date, and statistics
Supported Data Types
| Type | Description | PostgreSQL Type | Examples |
|---|---|---|---|
text | String values | TEXT | Names, descriptions, IDs |
integer | Whole numbers | BIGINT | Counts, IDs, ports |
float | Decimal numbers | DOUBLE PRECISION | Scores, percentages |
boolean | True/false values | BOOLEAN | Flags, status indicators |
timestamp | Date/time values | TIMESTAMPTZ | Creation dates, expiry |
json | Complex objects | JSONB | Nested data, arrays |
Creating Lookup Tables
Via Web Interface
- Navigate to Lookup Tables page
- Click New Lookup Table
- Upload data file (CSV, JSON, NDJSON)
- Configure table settings:
- Name: Unique identifier for the table
- Description: Optional documentation
- Primary Key: Column for efficient lookups
- Update Mode: Replace or append data
- Review column types and override if needed
- Create table and ingest data
Via API
# Create lookup table with file upload
curl -X POST "/api/lookup/tables" \
-H "Content-Type: multipart/form-data" \
-F "file=@threat_indicators.csv" \
-F "config={
\"name\": \"threat_indicators\",
\"description\": \"Known malicious indicators\",
\"primary_key\": \"indicator\",
\"format\": \"csv\",
\"mode\": \"replace\"
}"Querying Lookup Tables
Single Lookup
# Look up a single value
curl -X GET "/api/lookup/threat_indicators/lookup?key=indicator&value=malicious.com"Batch Lookup
# Look up multiple values efficiently
curl -X POST "/api/lookup/threat_indicators/batch" \
-H "Content-Type: application/json" \
-d '{
"key_field": "indicator",
"key_values": ["malicious.com", "bad-ip.net", "evil.org"],
"output_fields": ["indicator", "type", "severity"]
}'In Search Queries
Use the lookup command to enrich search results:
-- Basic lookup
event_type="dns_query"
| lookup threat_indicators query_name OUTPUT type, severity
-- Conditional lookup with filtering
event_type="network_connection"
| lookup asset_inventory src_ip OUTPUT asset_type, owner
| where asset_type="server"Update Modes
Replace Mode
- Behavior: Truncates existing data and inserts new records
- Use Case: Complete data refresh from authoritative source
- Performance: Faster for complete dataset replacement
Append Mode
- Behavior: Adds new records to existing data
- Use Case: Incremental updates and data accumulation
- Performance: Efficient for adding new records
Management Operations
List Tables
curl -X GET "/api/lookup/tables"Get Table Details
curl -X GET "/api/lookup/tables/threat_indicators"Delete Table
curl -X DELETE "/api/lookup/tables/threat_indicators"Scheduled Jobs
Scheduled jobs automate data fetching from remote URLs on configurable schedules, enabling continuous updates of lookup tables and log ingestion from external sources.
Job Configuration
Each scheduled job requires:
- Name: Unique identifier for the job
- URL: Remote endpoint to fetch data from
- Schedule: Cron expression defining execution frequency
- Destination: Target for ingested data (logs or lookup table)
- Parser Config: File format and parsing options
- Retry Policy: Error handling and retry behavior
Cron Expressions
Jobs use standard cron syntax for scheduling:
┌───────────── minute (0 - 59)
│ ┌─────────── hour (0 - 23)
│ │ ┌───────── day of month (1 - 31)
│ │ │ ┌─────── month (1 - 12)
│ │ │ │ ┌───── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
* * * * *Common Patterns
| Pattern | Description | Cron Expression |
|---|---|---|
| Every hour | Top of each hour | 0 * * * * |
| Every 6 hours | 00:00, 06:00, 12:00, 18:00 | 0 */6 * * * |
| Daily at midnight | Every day at 00:00 | 0 0 * * * |
| Daily at 6 AM | Every day at 06:00 | 0 6 * * * |
| Weekly on Sunday | Every Sunday at 00:00 | 0 0 * * 0 |
| Monthly on 1st | 1st of each month at 00:00 | 0 0 1 * * |
Creating Scheduled Jobs
Via Web Interface
- Navigate to Scheduled Jobs page
- Click New Job
- Configure job settings:
- Name: Descriptive job identifier
- Description: Optional documentation
- URL: Remote data source endpoint
- Schedule: Cron expression or preset
- Destination: Logs or lookup table
- Format: CSV, JSON, or NDJSON
- Enable job and save configuration
Via API
# Create scheduled job
curl -X POST "/api/scheduler/jobs" \
-H "Content-Type: application/json" \
-d '{
"name": "Daily Threat Feed Update",
"description": "Fetch latest threat indicators",
"cron_expression": "0 6 * * *",
"url": "https://feeds.example.com/threats.csv",
"destination": {
"type": "lookup",
"table_name": "threat_indicators",
"mode": "replace"
},
"parser_config": {
"format": "csv",
"csv_has_headers": true
},
"enabled": true
}'Authentication
Jobs support HTTP authentication through custom headers:
{
"auth_headers": {
"Authorization": "Bearer your-api-token",
"X-API-Key": "your-api-key"
}
}Retry Policy
Configure automatic retry behavior for failed jobs:
{
"retry_policy": {
"max_retries": 3,
"retry_delay_secs": 60
}
}Job Management
List Jobs
curl -X GET "/api/scheduler/jobs"Trigger Manual Execution
curl -X POST "/api/scheduler/jobs/{job_id}/trigger"Enable/Disable Job
curl -X PATCH "/api/scheduler/jobs/{job_id}" \
-H "Content-Type: application/json" \
-d '{"enabled": false}'View Execution History
curl -X GET "/api/scheduler/jobs/{job_id}/executions"Query Integration
Lookup Command
The lookup command enriches search results with data from lookup tables:
Basic Syntax
| lookup <table_name> <input_field> [OUTPUT <output_fields>]Examples
-- Enrich with all fields from lookup table
event_type="authentication"
| lookup user_directory username
-- Enrich with specific fields only
event_type="network_connection"
| lookup asset_inventory src_ip OUTPUT asset_name, department, owner
-- Multiple field matching
event_type="dns_query"
| lookup threat_domains domain_name OUTPUT threat_type, severity, first_seen
| where threat_type="malware"Advanced Lookup Patterns
Conditional Enrichment
-- Only lookup external IPs
event_type="network_connection"
| where NOT match(dest_ip, "^(10\.|192\.168\.|172\.(1[6-9]|2[0-9]|3[01])\.)")
| lookup ip_reputation dest_ip OUTPUT reputation, country, asnNested Lookups
-- Chain multiple lookups
event_type="authentication"
| lookup user_directory username OUTPUT department, manager
| lookup department_info department OUTPUT budget_code, locationLookup with Calculations
-- Use lookup results in calculations
event_type="file_access"
| lookup file_classification file_path OUTPUT sensitivity_level
| eval risk_score = case(
sensitivity_level="public", 1,
sensitivity_level="internal", 5,
sensitivity_level="confidential", 10,
sensitivity_level="secret", 20,
1
)Best Practices
File Upload
- Validate Data Quality: Review preview before uploading large files
- Use Appropriate Formats: CSV for tabular data, NDJSON for streaming
- Set Proper Encoding: Ensure UTF-8 encoding for international characters
- Handle Errors Gracefully: Enable
skip_invalidfor malformed records - Monitor Upload Progress: Use progress tracking for large files
Lookup Tables
- Choose Meaningful Names: Use descriptive, consistent naming conventions
- Set Primary Keys: Define primary keys for efficient lookups
- Optimize Data Types: Use appropriate types to minimize storage
- Regular Updates: Keep reference data current and accurate
- Monitor Performance: Track query performance and table sizes
Scheduled Jobs
- Stagger Schedules: Avoid concurrent jobs competing for resources
- Handle Failures: Configure appropriate retry policies
- Monitor Execution: Review job logs and execution history
- Validate Data Sources: Ensure remote URLs remain accessible
- Test Thoroughly: Validate job configuration before enabling
Query Optimization
- Selective Output: Specify only needed fields in OUTPUT clause
- Filter Early: Apply WHERE clauses before lookup operations
- Index Usage: Leverage primary keys for efficient lookups
- Batch Operations: Use batch lookups for multiple values
- Cache Awareness: Understand lookup caching behavior
Troubleshooting
Common Upload Issues
File too large
- Solution: Split large files or compress data
- Limit: 100 MB maximum file size
Encoding errors
- Solution: Convert to UTF-8 or specify correct encoding
- Supported: UTF-8, UTF-16, ISO-8859-1
Parse failures
- Solution: Validate file format and structure
- Enable:
skip_invalidoption for partial processing
Lookup Table Issues
Slow queries
- Solution: Define primary keys and optimize data types
- Monitor: Query performance metrics
Memory usage
- Solution: Limit table size and optimize column types
- Limit: 1,000,000 rows per table
Update conflicts
- Solution: Use appropriate update mode (replace vs append)
- Consider: Data consistency requirements
Scheduled Job Issues
Authentication failures
- Solution: Verify API keys and authentication headers
- Check: Token expiration and permissions
Network timeouts
- Solution: Increase retry delays and check connectivity
- Monitor: Remote endpoint availability
Cron expression errors
- Solution: Validate cron syntax and test expressions
- Use: Preset patterns for common schedules
The data management system provides powerful capabilities for enriching your security data with external context and automating routine data collection tasks. By following these guidelines and best practices, you can build robust data pipelines that enhance your security analysis capabilities.