🐛

Keploy — How eBPF Intercepts Traffic to Auto-Generate API Tests

A CNCF project that creates tests without code changes using Go + eBPF + transparent proxy

A CNCF project written in Go. 17k+ stars. Run keploy record once and it captures all API calls, DB queries, and external requests from your app, outputting them as YAML-based test cases + mocks.

The key is eBPF. No SDK to install, no code to modify. Every language eventually calls kernel syscalls (socket, connect, bind, sendmsg), so hooking there makes it language-agnostic.

Three-Layer Architecture

Agent layer — eBPF hooks + transparent proxy. Attaches to the kernel to intercept traffic.

Service layer — record, replay, mock, coverage orchestration. Core logic for capture/playback/comparison.

Platform layer — YAML storage, per-language coverage collection, Docker/Telemetry.

How eBPF Captures Traffic

Pre-compiled eBPF objects (bpf_x86_bpfel.o, bpf_arm64_bpfel.o) live in pkg/agent/hooks/linux/. Loaded via the cilium/ebpf Go library. No runtime compilation needed.

Key hooks:

sys_enter_socket tracepoint — intercepts socket creation syscalls to track new connections.

SockOps (cgroup-attached) — attached to the cgroupv2 path. The core hook that monitors all socket operations (connect, accept, etc.) within the cgroup.

connect4/connect6 — intercepts IPv4/IPv6 outbound connections.

cgBind4/cgBind6 — monitors bind() calls to detect which ports the app listens on.

BindEvents ring buffer — eBPF writes events to a ring buffer; Go userspace reads via ringbuf.NewReader.

The Core Trick: Destination Redirect

This is the clever part. Three shared eBPF maps are key:

clientRegistrationMap — registers the target app with eBPF.
agentRegistrationMap — registers the Keploy agent.
redirectProxyMap — maps source port → original destination (DestInfo struct: IPv4/IPv6 + port).

The eBPF program rewrites outgoing connection destinations to Keploy's local proxy (127.0.0.1:proxyPort). The real destination is stored in redirectProxyMap. The proxy calls GetDestinationInfo(srcPort) to find where traffic was originally headed.

From the app's perspective, it's connecting to the DB as usual. In reality, it's going through Keploy's proxy. The app code knows nothing.

Transparent Proxy Protocol Parsing

The transparent proxy at pkg/agent/proxy/ handles redirected traffic:

Accepts eBPF-redirected connections
Looks up original destination from eBPF maps
Identifies protocol from first bytes (registered parsers call MatchType())
Routes to the appropriate protocol handler

Supported protocols: HTTP, HTTP2, gRPC, MySQL (full wire protocol — connection phase, queries, prepared statements), PostgreSQL, MongoDB, Redis, Kafka. Generic binary capture as fallback.

TLS handled transparently via auto-generated CA certificates in pkg/agent/proxy/tls/.

Record Flow

pkg/service/record/record.go orchestrates:

Instrumentation.Setup() — loads eBPF hooks, starts proxy
Instrumentation.Run() — launches the target app
Three concurrent channels open:

GetIncoming() — inbound API requests/responses → TestCase objects
GetOutgoing() — outbound calls (DB, external APIs) → Mock objects
GetMappings() — links which mocks belong to which test case
1. Persisted as YAML. Test cases in testdb/, mocks in mockdb/.

Go's goroutines and channels are a perfect fit for this concurrent pipeline.

Replay Flow

pkg/service/replay/ executes tests:

Loads test cases + mocks from YAML
Loads mocks into proxy's in-memory DB (MockMemDb)
Sends recorded HTTP/gRPC requests to the app
App's outbound calls hit proxy → serves recorded mock responses
Compares actual vs expected responses
Denoising — auto-identifies noise fields (timestamps, random IDs) to exclude from comparison
Time freezing — freezes system time during replay

Coverage Calculation

pkg/platform/coverage/ has per-language implementations: Go, Java, Python, JavaScript, C#. Each hooks into native coverage tools (Go cover, Istanbul, coverage.py, JaCoCo, etc.).

AI Integration (utgen)

pkg/service/utgen/ provides LLM-based test expansion. Reads existing recordings + Swagger/OpenAPI schemas to generate missing field tests, boundary value tests, and type error tests.

What Happens When You Wrap Rails

keploy record -c "bundle exec rails s" — everything Rails does over the network gets recorded.

Captured:

Incoming HTTP requests/responses (API endpoints) → become test cases
PostgreSQL/MySQL queries → become mocks. Every SQL that ActiveRecord sends is captured at the wire protocol level
Redis calls (cache, Sidekiq) → mocks
External API calls (Faraday, Net::HTTP, etc.) → mocks
Kafka/RabbitMQ messages → mocks

Not captured:

SQLite — doesn't use network sockets. It's file I/O, so eBPF network hooks don't catch it
File reads/writes
In-memory operations
Inter-process communication on localhost (Unix domain sockets partially supported)

Key point: eBPF only catches network socket syscalls. Anything over TCP/UDP gets captured; everything else doesn't. If Rails uses PostgreSQL, every SELECT/INSERT/UPDATE from ActiveRecord gets captured via PostgreSQL wire protocol, and during replay, mocks respond without a real DB.

Linux Only — The Reality for macOS/Windows Developers

eBPF is a Linux kernel feature. Not available on macOS. Not on Windows either (Keploy has experimental Windows hooks but they're not production-ready).

On macOS, Docker is mandatory:

keploy record -c "docker compose up" --containerName "app"

eBPF runs inside Docker Desktop's Linux VM. This adds overhead if you're used to native local development. Running in CI/CD on Linux containers works fine.

eBPF Traffic Capture → Test Generation Flow

# 1. eBPF hooks attach to kernel

sys_enter_socket, SockOps(cgroup), connect4/6, cgBind4/6

# 2. Outbound connection destination redirected to proxy

App → connect(postgres:5432)

↓ eBPF overwrites destination to 127.0.0.1:proxyPort

↓ Original destination (postgres:5432) stored in redirectProxyMap

App → connect(127.0.0.1:proxyPort) [app is unaware]

# 3. Transparent proxy parses protocol

First bytes → MatchType() → PostgreSQL parser selected

Request/response pairs captured as Mock objects

# 4. Concurrent collection (3 goroutine channels)

GetIncoming() ──→ TestCase (API request/response)

GetOutgoing() ──→ Mock (DB/external API)

GetMappings() ──→ TestCase ↔ Mock linkage

# 5. YAML persistence

testdb/test-1.yaml + mockdb/mock-1.yaml

Source Code Structure

Directory	Role
pkg/agent/hooks/linux/	eBPF programs, maps, kernel hooks
pkg/agent/proxy/	Transparent proxy + DNS server
pkg/agent/proxy/integrations/	Protocol parsers (HTTP, MySQL, Postgres, Mongo, Redis, Kafka, gRPC)
pkg/service/record/	Recording orchestration
pkg/service/replay/	Test execution + response comparison
pkg/service/utgen/	AI-powered test generation
pkg/matcher/	Response comparison (HTTP, gRPC, schema)
pkg/platform/coverage/	Per-language coverage (Go/Java/Python/JS/C#)

eBPF Shared Maps — The Redirect Core

Map Name	Key	Value	Purpose
clientRegistrationMap	app PID	registration info	Identify target app
agentRegistrationMap	agent PID	proxy port	Identify Keploy agent
redirectProxyMap	source port	DestInfo (IP+port)	Store original destination

Design Points from a Go Perspective

Uses cilium/ebpf library to load/manage eBPF programs directly from Go. 3 goroutines collecting inbound/outbound/mappings concurrently via channels — textbook Go concurrency pattern. Protocol parsers abstracted via interfaces, so adding a new protocol only requires implementing RecordOutgoing() + MockOutgoing() + MatchType().

Ruby to Go

eBPF intercepts kernel syscalls (socket/connect/bind) and redirects app outbound connections to Keploy proxy

Transparent proxy identifies protocol from first bytes → routes to HTTP/MySQL/Postgres/MongoDB/Redis/Kafka parsers

Inbound (TestCase) + outbound (Mock) collected via 3 goroutine channels concurrently → saved as YAML

During replay, mocks loaded into in-memory DB, recorded requests sent to app, denoising removes timestamps etc. before comparison

Pros

✓ Zero code changes — eBPF operates at kernel level, no SDK or library needed
✓ Language agnostic — works with any language that makes syscalls: Go, Java, Python, Node, Rust etc.
✓ Wide protocol coverage — HTTP/gRPC/MySQL/Postgres/MongoDB/Redis/Kafka + transparent TLS

Cons

✗ Linux only — eBPF requires kernel 5.15+. macOS/Windows must run inside Docker, breaking native local dev workflows
✗ Embedded DBs like SQLite not captured — eBPF only catches network socket traffic. PostgreSQL/MySQL only
✗ Recording quality depends on actual traffic — paths not covered by traffic have no tests. Error cases and edge cases must be written manually
✗ Denoising isn't perfect — if dynamic field detection fails on timestamps, random IDs, etc., tests become flaky
✗ Only generates integration tests — doesn't create unit tests for business logic. "Does this method return the right value" still needs manual testing
✗ DB schema changes invalidate recordings — adding/removing columns makes mock SQL responses diverge from reality. Re-recording required

Use Cases

When you need to quickly build regression tests from real traffic for an app with no existing tests When you want to isolate external dependencies (DB, APIs, message queues) as mocks for offline testing