nano SIEM
Integrations

Kafka

End-to-end guide for ingesting logs from Apache Kafka topics

Kafka

This guide walks through ingesting logs from Apache Kafka into nano. Kafka is ideal for high-volume streaming pipelines where logs are already flowing through a Kafka cluster — application logs, security events from SIEM forwarders, change data capture streams, or any structured data published to Kafka topics.

nano's Vector pipeline acts as a Kafka consumer, pulling messages from one or more topics in a consumer group.

Prerequisites

  • A running Kafka cluster (self-managed, Confluent Cloud, Amazon MSK, Redpanda, etc.)
  • Network connectivity from nano to the Kafka bootstrap servers
  • A running nano instance

Step 1: Prepare Your Kafka Topic

You likely already have topics with log data. If you're setting up a new topic for nano:

# Create a topic (adjust replication and partitions for your cluster)
kafka-topics.sh --create \
  --bootstrap-server kafka-1:9092 \
  --topic security-events \
  --partitions 6 \
  --replication-factor 3 \
  --config retention.ms=86400000

Topic Design Considerations

ApproachExampleProsCons
One topic per log typecloudtrail-events, okta-eventsSimple routing, independent retentionMore topics to manage
Shared topic with headers/keyssecurity-events with key = source typeFewer topicsRequires routing rules in nano

nano supports both approaches. With separate topics, each nano log source points at its own topic. With a shared topic, you use routing rules to direct messages to the right parser based on the topic name or message content.

Verify Your Topic Has Data

# Check topic exists and has partitions
kafka-topics.sh --describe \
  --bootstrap-server kafka-1:9092 \
  --topic security-events

# Peek at recent messages
kafka-console-consumer.sh \
  --bootstrap-server kafka-1:9092 \
  --topic security-events \
  --from-beginning --max-messages 3

Step 2: Create Kafka Credentials (If Required)

If your Kafka cluster uses SASL authentication, create a dedicated user for nano. If your cluster is unauthenticated (common in development or VPC-internal setups), skip to Step 3.

Confluent Cloud

# Create a service account
confluent iam service-account create nanosiem-reader \
  --description "nano log consumer"

# Create an API key for the service account
confluent api-key create \
  --service-account sa-xxxxx \
  --resource lkc-xxxxx

Grant the service account read access to your topics:

# Grant consumer group and topic read ACLs
confluent kafka acl create --allow \
  --service-account sa-xxxxx \
  --operations read \
  --topic security-events

confluent kafka acl create --allow \
  --service-account sa-xxxxx \
  --operations read \
  --consumer-group nanosiem

Amazon MSK

For MSK with IAM authentication, use SASL/SCRAM credentials stored in AWS Secrets Manager:

# Create a secret for SCRAM credentials
aws secretsmanager create-secret \
  --name AmazonMSK_nanosiem \
  --secret-string '{"username": "nanosiem-reader", "password": "YOUR_SECURE_PASSWORD"}'

# Associate the secret with your MSK cluster
aws kafka batch-associate-scram-secret \
  --cluster-arn arn:aws:kafka:us-east-1:ACCOUNT_ID:cluster/my-cluster/abc123 \
  --secret-arn-list arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:AmazonMSK_nanosiem-xxxxxx

Then create Kafka ACLs for the user:

kafka-acls.sh --bootstrap-server your-msk-broker:9096 \
  --command-config client.properties \
  --add --allow-principal User:nanosiem-reader \
  --operation Read --topic security-events

kafka-acls.sh --bootstrap-server your-msk-broker:9096 \
  --command-config client.properties \
  --add --allow-principal User:nanosiem-reader \
  --operation Read --group nanosiem

Self-Managed Kafka

Create a SCRAM user:

kafka-configs.sh --bootstrap-server kafka-1:9092 \
  --alter --add-config 'SCRAM-SHA-256=[password=YOUR_SECURE_PASSWORD]' \
  --entity-type users --entity-name nanosiem-reader

Grant read ACLs:

kafka-acls.sh --bootstrap-server kafka-1:9092 \
  --add --allow-principal User:nanosiem-reader \
  --operation Read --topic security-events \
  --group nanosiem

Required ACLs Summary

nano needs minimal read-only access:

ResourceOperationPurpose
TopicReadConsume messages
Consumer GroupReadJoin consumer group, commit offsets

nano does not need Write, Create, Delete, or Alter permissions.

Step 3: Store Credentials in nano

  1. Navigate to SettingsCloud Credentials
  2. Click Add Credential
  3. Fill in the form:
FieldValue
ProviderKafka
NameA descriptive name, e.g. Confluent Cloud Production
  1. Configure authentication:

For SASL Authentication

FieldValue
SASL MechanismPLAIN, SCRAM-SHA-256, or SCRAM-SHA-512
UsernameYour Kafka username or API key
PasswordYour Kafka password or API secret
Enable TLS/SSLChecked (required for most managed Kafka services)

For Unauthenticated Kafka

FieldValue
SASL MechanismNone
Enable TLS/SSLUnchecked (unless your cluster requires TLS without SASL)
  1. Click Save

Common SASL Mechanism by Provider

Kafka ProviderSASL MechanismTLS
Confluent CloudPLAINYes
Amazon MSK (SCRAM)SCRAM-SHA-512Yes
Redpanda CloudSCRAM-SHA-256Yes
AivenSCRAM-SHA-256Yes
Self-managed (internal)Varies or noneDepends

Step 4: Create a Log Source

  1. Navigate to FeedsNew Feed (or use the Log Source Wizard)
  2. Select "I have sample logs" and paste a representative message from your topic (see examples below)
  3. The AI will detect the format and generate a VRL parser
  4. Configure the source connection:
FieldValue
Source TypeKafka
Bootstrap ServersComma-separated broker addresses, e.g. kafka-1:9092,kafka-2:9092
TopicsOne or more topic names, e.g. security-events
Consumer Group IDA group ID for nano, e.g. nanosiem
Auto Offset Resetlatest (start from new messages) or earliest (consume all existing messages)
CredentialSelect the credential from Step 3, or "None" for unauthenticated clusters
  1. Set the feed metadata (name, category, vendor, product)
  2. Publish the parser to create a version and deploy to Vector

Bootstrap Server Formats

Kafka ProviderBootstrap Server Format
Confluent Cloudpkc-xxxxx.us-east-1.aws.confluent.cloud:9092
Amazon MSKb-1.mycluster.abc123.c2.kafka.us-east-1.amazonaws.com:9096
Redpanda Cloudseed-xxxxx.us-east-1.aws.redpanda.com:9092
Self-managedkafka-1.internal:9092,kafka-2.internal:9092

Sample Messages

JSON Application Logs

{
  "timestamp": "2025-01-15T14:23:45.678Z",
  "level": "ERROR",
  "service": "auth-service",
  "message": "Failed login attempt",
  "user": "admin@example.com",
  "source_ip": "203.0.113.50",
  "error_code": "INVALID_CREDENTIALS",
  "attempt_count": 5
}

Structured Security Events

{
  "event_type": "network_connection",
  "timestamp": "2025-01-15T14:23:45Z",
  "src_ip": "10.0.1.5",
  "dst_ip": "198.51.100.10",
  "dst_port": 443,
  "protocol": "TCP",
  "bytes_sent": 1240,
  "bytes_recv": 5600,
  "command_line": "/usr/bin/curl",
  "hostname": "web-server-01"
}

CEF (Common Event Format)

CEF:0|SecurityVendor|SecurityProduct|1.0|100|Suspicious Activity|7|src=10.0.1.5 dst=203.0.113.50 dpt=22 act=blocked msg=Brute force SSH attempt detected

Step 5: Verify Ingestion

After publishing, allow a minute for Vector to join the consumer group and start consuming.

Check Feed Health

  1. Go to Feeds → select your new log source
  2. On the Overview tab, check:
    • Status: Should show "Healthy"
    • Event Volume chart: Should show events arriving
    • Last Event: Should show a recent timestamp

Search Your Data

Navigate to Search and query for your source type:

source_type="kafka_security_events"

Check Consumer Group Status

Verify nano is consuming from the Kafka side:

kafka-consumer-groups.sh --describe \
  --bootstrap-server kafka-1:9092 \
  --group nanosiem

You should see:

  • CURRENT-OFFSET advancing as messages are consumed
  • LAG near zero (or decreasing if catching up)
  • STATE as Stable

Check for Errors

If no data appears:

  1. Check network connectivity — Can nano reach the bootstrap servers?

    # From the nano host
    nc -zv kafka-1 9092
  2. Check Vector logs for connection errors:

    docker logs nanosiem-vector 2>&1 | grep -i "kafka\|error\|sasl\|tls"
  3. Check ingestion errors in nano at SystemIngestion Errors

Multiple Topics

One Log Source per Topic

The simplest approach — create a separate nano log source for each Kafka topic, each with its own parser:

  • cloudtrail-events topic → aws_cloudtrail log source
  • okta-events topic → okta_sso log source
  • app-logs topic → my_app log source

All can share the same credential and consumer group.

One Source Configuration with Routing Rules

For a shared topic or when you want centralized management, use a Source Configuration with routing rules:

  1. Go to SettingsSource Configurations
  2. Create a Kafka source configuration with your broker details
  3. Add routing rules that match on topic name:
Match FieldMatch TypeMatch ValueTarget Source Type
topicexactcloudtrail-eventsaws_cloudtrail
topicexactokta-eventsokta_sso
topicprefixapp-application_logs
topicdefaultgeneric_kafka

Troubleshooting

"Connection refused" or timeout

  • Verify the bootstrap server addresses and ports are correct
  • Check firewall rules / security groups allow traffic from nano to Kafka
  • For managed services, ensure the cluster allows connections from nano's IP range
  • MSK: Check that public access is enabled if connecting from outside the VPC

"SASL authentication failed"

  • Verify the SASL mechanism matches what your Kafka cluster expects
  • Check username and password are correct
  • Confluent Cloud: Ensure you're using an API key/secret, not a cloud login
  • MSK SCRAM: Verify the secret is associated with the cluster

"SSL handshake failed"

  • Ensure TLS is enabled in the nano credential
  • If using a self-signed CA, provide the CA certificate via the API:
    curl -X POST http://localhost:3000/api/credentials \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "name": "Kafka Production",
        "provider": "kafka",
        "credentials": {
          "sasl_mechanism": "SCRAM-SHA-256",
          "sasl_username": "nanosiem-reader",
          "sasl_password": "your-password",
          "tls_enabled": true,
          "tls_ca_cert": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----"
        }
      }'

Consumer group is not consuming (LAG increasing)

  • Check that the consumer group ID in nano doesn't conflict with another consumer
  • Verify the topic name is spelled correctly (Kafka topic names are case-sensitive)
  • Check Vector resource allocation — high-volume topics may need more CPU/memory
  • Check for parse errors in SystemIngestion Errors
  • Verify the parser handles your message format — test with sample data in the parser editor
  • Ensure the log source is published and deployed

Performance Tuning

Partitions and Parallelism

Vector creates one consumer per log source. For high-throughput topics, increase the number of partitions to allow Kafka to distribute load:

kafka-topics.sh --alter \
  --bootstrap-server kafka-1:9092 \
  --topic security-events \
  --partitions 12

Consumer Group Coordination

If you run multiple nano Vector instances (e.g., in a Kubernetes deployment), they can share the same consumer group ID. Kafka will distribute partitions across the instances automatically, giving you horizontal scalability.

Offset Management

nano commits offsets automatically. If you need to reset offsets (e.g., to reprocess data):

# Reset to earliest (reprocess all data)
kafka-consumer-groups.sh --reset-offsets \
  --bootstrap-server kafka-1:9092 \
  --group nanosiem \
  --topic security-events \
  --to-earliest \
  --execute

Next Steps

On this page

On this page