Supported Data Sources
Supported Data Sources
nano uses a source_type field to route incoming logs to the appropriate parser. Every log event must have a source_type defined for the system to process it correctly. This page explains which ingestion methods are supported and how to handle sources that don't natively support source type identification.
Source Type Requirement
When logs arrive at nano, the system needs to know what type of log it is (e.g., aws_cloudtrail, palo_alto, okta) to route it to the correct parser. Without a source type, logs cannot be parsed or stored.
Source types can be defined in several ways:
- HTTP header:
X-Source-Type: aws_cloudtrail - Event field:
.source_type = "palo_alto"(for Vector-to-Vector forwarding) - Feed configuration: One feed per log type (for cloud pull sources)
Directly Supported Sources
These ingestion methods support source type identification natively:
| Source | How source_type is defined | Best for |
|---|---|---|
| HTTP | X-Source-Type header | Applications, webhooks, log shippers (most common) |
| Vector | .source_type field in event | On-prem aggregators forwarding to cloud |
| AWS S3 | Feed configuration (one feed per log type) | CloudTrail, VPC Flow Logs, ALB logs (setup guide) |
| GCP Pub/Sub | Feed configuration (one feed per log type) | Cloud Audit Logs, Security Command Center (setup guide) |
| Kafka | Feed configuration or topic routing | High-volume streaming pipelines (setup guide) |
HTTP Ingestion
The most common and recommended method. Send logs via HTTP POST with the X-Source-Type header:
curl -X POST https://your-nanosiem.com:8080/ \
-H "Authorization: Bearer $TOKEN" \
-H "X-Source-Type: my_app" \
-H "Content-Type: application/json" \
-d '{"timestamp": "2024-01-01T12:00:00Z", "message": "User login"}'Works with any HTTP-capable log shipper:
- Fluentd/Fluent Bit (HTTP output)
- Filebeat (HTTP output)
- Cribl Stream
- Custom applications
Vector-to-Vector Forwarding
For on-premises deployments, use a local Vector instance as an aggregator that forwards to nano. The aggregator sets the source_type field before forwarding:
# On-premises Vector aggregator
[sources.firewall_syslog]
type = "syslog"
address = "0.0.0.0:514"
[transforms.tag_source]
type = "remap"
inputs = ["firewall_syslog"]
source = '.source_type = "palo_alto"'
[sinks.cloud_siem]
type = "vector"
inputs = ["tag_source"]
address = "your-nanosiem.com:6000"nano listens on port 6000 for Vector-to-Vector traffic and processes events with the source_type field already set.
Cloud Pull Sources (S3, Pub/Sub, Kafka)
For cloud-based log sources, the source type is defined at the feed level. Create one feed per log type:
- AWS CloudTrail Feed → source_type:
aws_cloudtrail - AWS VPC Flow Logs Feed → source_type:
aws_vpc_flow - GCP Audit Logs Feed → source_type:
gcp_audit
Each feed pulls from a specific queue/topic and knows what log type to expect.
Sources Requiring Vector Aggregator
The following protocols do not support source type identification natively. To ingest data from these sources, you must deploy a Vector aggregator on-premises that receives raw logs, tags them with the appropriate source_type, and forwards them to nano via the Vector protocol. See the Vector Aggregator guide for complete setup instructions.
Syslog
Raw syslog (RFC 3164/5424) has no concept of source type. Different devices send different log formats to the same syslog port.
Solution: Deploy a Vector aggregator that routes based on hostname, appname, or content:
# Vector aggregator for syslog
[sources.syslog_all]
type = "syslog"
address = "0.0.0.0:514"
[transforms.route_by_host]
type = "remap"
inputs = ["syslog_all"]
source = '''
# Route based on hostname pattern
if .hostname =~ /^fw-/ {
.source_type = "palo_alto"
} else if .hostname =~ /^sw-/ {
.source_type = "cisco_switch"
} else if .appname == "sshd" {
.source_type = "linux_auth"
} else if .appname == "nginx" {
.source_type = "nginx"
} else {
.source_type = "generic_syslog"
}
'''
[sinks.cloud_siem]
type = "vector"
inputs = ["route_by_host"]
address = "your-nanosiem.com:6000"OpenTelemetry (OTLP)
OpenTelemetry Protocol doesn't include a source type concept. OTLP is designed for observability data (traces, metrics, logs) but doesn't categorize logs by security source type.
Solution: Use a Vector aggregator to receive OTLP and tag with source type:
[sources.otlp]
type = "opentelemetry"
grpc.address = "0.0.0.0:4317"
http.address = "0.0.0.0:4318"
[transforms.tag_otlp]
type = "remap"
inputs = ["otlp"]
source = '''
# Tag based on resource attributes or service name
service = to_string(.resources.service.name) ?? "unknown"
if service == "api-gateway" {
.source_type = "api_gateway"
} else if service == "auth-service" {
.source_type = "auth_service"
} else {
.source_type = "otlp_" + service
}
'''
[sinks.cloud_siem]
type = "vector"
inputs = ["tag_otlp"]
address = "your-nanosiem.com:6000"Fluent Protocol (Fluentd/Fluent Bit)
The Fluent protocol uses tags but these don't map directly to security source types.
Solution: Use a Vector aggregator to receive Fluent protocol and map tags to source types:
[sources.fluent]
type = "fluent"
address = "0.0.0.0:24224"
[transforms.tag_fluent]
type = "remap"
inputs = ["fluent"]
source = '''
# Map Fluent tags to source types
tag = to_string(.tag) ?? ""
if starts_with(tag, "kube.") {
.source_type = "kubernetes"
} else if starts_with(tag, "docker.") {
.source_type = "docker"
} else if starts_with(tag, "app.auth") {
.source_type = "auth_service"
} else {
.source_type = "fluent_" + replace(tag, ".", "_")
}
'''
[sinks.cloud_siem]
type = "vector"
inputs = ["tag_fluent"]
address = "your-nanosiem.com:6000"File-Based Ingestion
Reading from files requires knowing what type of log each file contains.
Solution: Use a Vector aggregator with file sources configured per log type:
# Each file source knows its type
[sources.nginx_access]
type = "file"
include = ["/var/log/nginx/access.log"]
[sources.auth_log]
type = "file"
include = ["/var/log/auth.log"]
[transforms.tag_files]
type = "remap"
inputs = ["nginx_access"]
source = '.source_type = "nginx"'
[transforms.tag_auth]
type = "remap"
inputs = ["auth_log"]
source = '.source_type = "linux_auth"'
[sinks.cloud_siem]
type = "vector"
inputs = ["tag_files", "tag_auth"]
address = "your-nanosiem.com:6000"Aggregator Deployment Pattern
For on-premises environments, the recommended architecture is:
┌─────────────────────────────────────────────────────────────┐
│ On-Premises Network │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Firewall │ │ Servers │ │ Apps │ │ Network │ │
│ │ Syslog │ │ Syslog │ │ OTLP │ │ Devices │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴─────┬──────┴────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Vector │ │
│ │ Aggregator │ │
│ │ │ │
│ │ - Receives │ │
│ │ - Tags │ │
│ │ - Forwards │ │
│ └──────┬──────┘ │
│ │ │
└──────────────────────────┼──────────────────────────────────┘
│ Vector Protocol (port 6000)
│ TLS encrypted
▼
┌──────────────┐
│ nano │
│ (Cloud) │
└──────────────┘Benefits of this pattern:
- Centralized routing: All source type logic in one place
- Bandwidth optimization: Aggregator can batch and compress
- Security: Only one outbound connection to cloud
- Flexibility: Add new log sources without cloud changes
Source Type Aliases
nano automatically normalizes common source type variations:
| Input | Normalized to |
|---|---|
winlog, windows, winevt | windows_event |
apache_access, httpd | apache |
cloudtrail, aws_ct | aws_cloudtrail |
rsyslog, syslog-ng | syslog |
pan, panos | palo_alto |
asa, cisco_firewall | cisco_asa |
fortigate, fgt | fortinet |
Summary
| Want to ingest... | Use this method |
|---|---|
| Application logs | HTTP with X-Source-Type header |
| AWS CloudTrail/VPC Flow Logs | AWS S3 feed (one feed per type) |
| GCP Cloud logs | GCP Pub/Sub feed (one feed per type) |
| Kafka streams | Kafka feed (one feed per type) |
| Network device syslog | Vector Aggregator on-prem |
| Server syslog | Vector Aggregator on-prem |
| OpenTelemetry logs | Vector Aggregator on-prem |
| Fluentd/Fluent Bit | Vector Aggregator on-prem |
| Log files | Vector Aggregator on-prem |
For any source that doesn't support native source type identification, deploy a Vector aggregator to handle tagging and forwarding.