Supported Data Sources

nano uses a source_type field to route incoming logs to the appropriate parser. Every log event must have a source_type defined for the system to process it correctly. This page explains which ingestion methods are supported and how to handle sources that don't natively support source type identification.

Source Type Requirement

When logs arrive at nano, the system needs to know what type of log it is (e.g., aws_cloudtrail, palo_alto, okta) to route it to the correct parser. Without a source type, logs cannot be parsed or stored.

Source types can be defined in several ways:

HTTP header: X-Source-Type: aws_cloudtrail
Event field: .source_type = "palo_alto" (for Vector-to-Vector forwarding)
Feed configuration: One feed per log type (for cloud pull sources)

Directly Supported Sources

These ingestion methods support source type identification natively:

Source	How source_type is defined	Best for
HTTP	`X-Source-Type` header	Applications, webhooks, log shippers (most common)
Vector	`.source_type` field in event	On-prem aggregators forwarding to cloud
AWS S3	Feed configuration (one feed per log type)	CloudTrail, VPC Flow Logs, ALB logs (setup guide)
GCP Pub/Sub	Feed configuration (one feed per log type)	Cloud Audit Logs, Security Command Center (setup guide)
Kafka	Feed configuration or topic routing	High-volume streaming pipelines (setup guide)

HTTP Ingestion

The most common and recommended method. Send logs via HTTP POST with the X-Source-Type header:

curl -X POST https://your-nanosiem.com:8080/ \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Source-Type: my_app" \
  -H "Content-Type: application/json" \
  -d '{"timestamp": "2024-01-01T12:00:00Z", "message": "User login"}'

Works with any HTTP-capable log shipper:

Fluentd/Fluent Bit (HTTP output)
Filebeat (HTTP output)
Cribl Stream
Custom applications

Vector-to-Vector Forwarding

For on-premises deployments, use a local Vector instance as an aggregator that forwards to nano. The aggregator sets the source_type field before forwarding:

# On-premises Vector aggregator
[sources.firewall_syslog]
type = "syslog"
address = "0.0.0.0:514"

[transforms.tag_source]
type = "remap"
inputs = ["firewall_syslog"]
source = '.source_type = "palo_alto"'

[sinks.cloud_siem]
type = "vector"
inputs = ["tag_source"]
address = "your-nanosiem.com:6000"

nano listens on port 6000 for Vector-to-Vector traffic and processes events with the source_type field already set.

Cloud Pull Sources (S3, Pub/Sub, Kafka)

For cloud-based log sources, the source type is defined at the feed level. Create one feed per log type:

AWS CloudTrail Feed → source_type: aws_cloudtrail
AWS VPC Flow Logs Feed → source_type: aws_vpc_flow
GCP Audit Logs Feed → source_type: gcp_audit

Each feed pulls from a specific queue/topic and knows what log type to expect.

Sources Requiring Vector Aggregator

The following protocols do not support source type identification natively. To ingest data from these sources, you must deploy a Vector aggregator on-premises that receives raw logs, tags them with the appropriate source_type, and forwards them to nano via the Vector protocol. See the Vector Aggregator guide for complete setup instructions.

Syslog

Raw syslog (RFC 3164/5424) has no concept of source type. Different devices send different log formats to the same syslog port.

Solution: Deploy a Vector aggregator that routes based on hostname, appname, or content:

# Vector aggregator for syslog
[sources.syslog_all]
type = "syslog"
address = "0.0.0.0:514"

[transforms.route_by_host]
type = "remap"
inputs = ["syslog_all"]
source = '''
# Route based on hostname pattern
if .hostname =~ /^fw-/ {
    .source_type = "palo_alto"
} else if .hostname =~ /^sw-/ {
    .source_type = "cisco_switch"
} else if .appname == "sshd" {
    .source_type = "linux_auth"
} else if .appname == "nginx" {
    .source_type = "nginx"
} else {
    .source_type = "generic_syslog"
}
'''

[sinks.cloud_siem]
type = "vector"
inputs = ["route_by_host"]
address = "your-nanosiem.com:6000"

OpenTelemetry (OTLP)

OpenTelemetry Protocol doesn't include a source type concept. OTLP is designed for observability data (traces, metrics, logs) but doesn't categorize logs by security source type.

Solution: Use a Vector aggregator to receive OTLP and tag with source type:

[sources.otlp]
type = "opentelemetry"
grpc.address = "0.0.0.0:4317"
http.address = "0.0.0.0:4318"

[transforms.tag_otlp]
type = "remap"
inputs = ["otlp"]
source = '''
# Tag based on resource attributes or service name
service = to_string(.resources.service.name) ?? "unknown"
if service == "api-gateway" {
    .source_type = "api_gateway"
} else if service == "auth-service" {
    .source_type = "auth_service"
} else {
    .source_type = "otlp_" + service
}
'''

[sinks.cloud_siem]
type = "vector"
inputs = ["tag_otlp"]
address = "your-nanosiem.com:6000"

Fluent Protocol (Fluentd/Fluent Bit)

The Fluent protocol uses tags but these don't map directly to security source types.

Solution: Use a Vector aggregator to receive Fluent protocol and map tags to source types:

[sources.fluent]
type = "fluent"
address = "0.0.0.0:24224"

[transforms.tag_fluent]
type = "remap"
inputs = ["fluent"]
source = '''
# Map Fluent tags to source types
tag = to_string(.tag) ?? ""
if starts_with(tag, "kube.") {
    .source_type = "kubernetes"
} else if starts_with(tag, "docker.") {
    .source_type = "docker"
} else if starts_with(tag, "app.auth") {
    .source_type = "auth_service"
} else {
    .source_type = "fluent_" + replace(tag, ".", "_")
}
'''

[sinks.cloud_siem]
type = "vector"
inputs = ["tag_fluent"]
address = "your-nanosiem.com:6000"

File-Based Ingestion

Reading from files requires knowing what type of log each file contains.

Solution: Use a Vector aggregator with file sources configured per log type:

# Each file source knows its type
[sources.nginx_access]
type = "file"
include = ["/var/log/nginx/access.log"]

[sources.auth_log]
type = "file"
include = ["/var/log/auth.log"]

[transforms.tag_files]
type = "remap"
inputs = ["nginx_access"]
source = '.source_type = "nginx"'

[transforms.tag_auth]
type = "remap"
inputs = ["auth_log"]
source = '.source_type = "linux_auth"'

[sinks.cloud_siem]
type = "vector"
inputs = ["tag_files", "tag_auth"]
address = "your-nanosiem.com:6000"

Aggregator Deployment Pattern

For on-premises environments, the recommended architecture is:

┌─────────────────────────────────────────────────────────────┐
│                     On-Premises Network                      │
│                                                             │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐        │
│  │Firewall │  │ Servers │  │  Apps   │  │ Network │        │
│  │ Syslog  │  │ Syslog  │  │  OTLP   │  │ Devices │        │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘        │
│       │            │            │            │              │
│       └────────────┴─────┬──────┴────────────┘              │
│                          │                                  │
│                   ┌──────▼──────┐                           │
│                   │   Vector    │                           │
│                   │ Aggregator  │                           │
│                   │             │                           │
│                   │ - Receives  │                           │
│                   │ - Tags      │                           │
│                   │ - Forwards  │                           │
│                   └──────┬──────┘                           │
│                          │                                  │
└──────────────────────────┼──────────────────────────────────┘
                           │ Vector Protocol (port 6000)
                           │ TLS encrypted
                           ▼
                   ┌──────────────┐
                   │  nano    │
                   │   (Cloud)    │
                   └──────────────┘

Benefits of this pattern:

Centralized routing: All source type logic in one place
Bandwidth optimization: Aggregator can batch and compress
Security: Only one outbound connection to cloud
Flexibility: Add new log sources without cloud changes

Source Type Aliases

nano automatically normalizes common source type variations:

Input	Normalized to
`winlog`, `windows`, `winevt`	`windows_event`
`apache_access`, `httpd`	`apache`
`cloudtrail`, `aws_ct`	`aws_cloudtrail`
`rsyslog`, `syslog-ng`	`syslog`
`pan`, `panos`	`palo_alto`
`asa`, `cisco_firewall`	`cisco_asa`
`fortigate`, `fgt`	`fortinet`

Summary

Want to ingest...	Use this method
Application logs	HTTP with X-Source-Type header
AWS CloudTrail/VPC Flow Logs	AWS S3 feed (one feed per type)
GCP Cloud logs	GCP Pub/Sub feed (one feed per type)
Kafka streams	Kafka feed (one feed per type)
Network device syslog	Vector Aggregator on-prem
Server syslog	Vector Aggregator on-prem
OpenTelemetry logs	Vector Aggregator on-prem
Fluentd/Fluent Bit	Vector Aggregator on-prem
Log files	Vector Aggregator on-prem

For any source that doesn't support native source type identification, deploy a Vector aggregator to handle tagging and forwarding.

Supported Data Sources

On this page