Keploy โ How eBPF Intercepts Traffic to Auto-Generate API Tests
A CNCF project that creates tests without code changes using Go + eBPF + transparent proxy
A CNCF project written in Go. 17k+ stars. Run keploy record once and it captures all API calls, DB queries, and external requests from your app, outputting them as YAML-based test cases + mocks.
The key is eBPF. No SDK to install, no code to modify. Every language eventually calls kernel syscalls (socket, connect, bind, sendmsg), so hooking there makes it language-agnostic.
Three-Layer Architecture
Agent layer โ eBPF hooks + transparent proxy. Attaches to the kernel to intercept traffic.
Service layer โ record, replay, mock, coverage orchestration. Core logic for capture/playback/comparison.
Platform layer โ YAML storage, per-language coverage collection, Docker/Telemetry.
How eBPF Captures Traffic
Pre-compiled eBPF objects (bpf_x86_bpfel.o, bpf_arm64_bpfel.o) live in pkg/agent/hooks/linux/. Loaded via the cilium/ebpf Go library. No runtime compilation needed.
Key hooks:
sys_enter_socket tracepoint โ intercepts socket creation syscalls to track new connections.
SockOps (cgroup-attached) โ attached to the cgroupv2 path. The core hook that monitors all socket operations (connect, accept, etc.) within the cgroup.
connect4/connect6 โ intercepts IPv4/IPv6 outbound connections.
cgBind4/cgBind6 โ monitors bind() calls to detect which ports the app listens on.
BindEvents ring buffer โ eBPF writes events to a ring buffer; Go userspace reads via ringbuf.NewReader.
The Core Trick: Destination Redirect
This is the clever part. Three shared eBPF maps are key:
clientRegistrationMap โ registers the target app with eBPF.
agentRegistrationMap โ registers the Keploy agent.
redirectProxyMap โ maps source port โ original destination (DestInfo struct: IPv4/IPv6 + port).
The eBPF program rewrites outgoing connection destinations to Keploy's local proxy (127.0.0.1:proxyPort). The real destination is stored in redirectProxyMap. The proxy calls GetDestinationInfo(srcPort) to find where traffic was originally headed.
From the app's perspective, it's connecting to the DB as usual. In reality, it's going through Keploy's proxy. The app code knows nothing.
Transparent Proxy Protocol Parsing
The transparent proxy at pkg/agent/proxy/ handles redirected traffic:
- Accepts eBPF-redirected connections
- Looks up original destination from eBPF maps
- Identifies protocol from first bytes (registered parsers call
MatchType()) - Routes to the appropriate protocol handler
Supported protocols: HTTP, HTTP2, gRPC, MySQL (full wire protocol โ connection phase, queries, prepared statements), PostgreSQL, MongoDB, Redis, Kafka. Generic binary capture as fallback.
TLS handled transparently via auto-generated CA certificates in pkg/agent/proxy/tls/.
Record Flow
pkg/service/record/record.go orchestrates:
Instrumentation.Setup()โ loads eBPF hooks, starts proxyInstrumentation.Run()โ launches the target app- Three concurrent channels open:
GetIncoming()โ inbound API requests/responses โTestCaseobjectsGetOutgoing()โ outbound calls (DB, external APIs) โMockobjectsGetMappings()โ links which mocks belong to which test case- Persisted as YAML. Test cases in
testdb/, mocks inmockdb/.
- Persisted as YAML. Test cases in
Go's goroutines and channels are a perfect fit for this concurrent pipeline.
Replay Flow
pkg/service/replay/ executes tests:
- Loads test cases + mocks from YAML
- Loads mocks into proxy's in-memory DB (
MockMemDb) - Sends recorded HTTP/gRPC requests to the app
- App's outbound calls hit proxy โ serves recorded mock responses
- Compares actual vs expected responses
- Denoising โ auto-identifies noise fields (timestamps, random IDs) to exclude from comparison
- Time freezing โ freezes system time during replay
Coverage Calculation
pkg/platform/coverage/ has per-language implementations: Go, Java, Python, JavaScript, C#. Each hooks into native coverage tools (Go cover, Istanbul, coverage.py, JaCoCo, etc.).
AI Integration (utgen)
pkg/service/utgen/ provides LLM-based test expansion. Reads existing recordings + Swagger/OpenAPI schemas to generate missing field tests, boundary value tests, and type error tests.
What Happens When You Wrap Rails
keploy record -c "bundle exec rails s" โ everything Rails does over the network gets recorded.
Captured:
Incoming HTTP requests/responses (API endpoints) โ become test cases
PostgreSQL/MySQL queries โ become mocks. Every SQL that ActiveRecord sends is captured at the wire protocol level
Redis calls (cache, Sidekiq) โ mocks
External API calls (Faraday, Net::HTTP, etc.) โ mocks
Kafka/RabbitMQ messages โ mocks
Not captured:
SQLite โ doesn't use network sockets. It's file I/O, so eBPF network hooks don't catch it
File reads/writes
In-memory operations
Inter-process communication on localhost (Unix domain sockets partially supported)
Key point: eBPF only catches network socket syscalls. Anything over TCP/UDP gets captured; everything else doesn't. If Rails uses PostgreSQL, every SELECT/INSERT/UPDATE from ActiveRecord gets captured via PostgreSQL wire protocol, and during replay, mocks respond without a real DB.
Linux Only โ The Reality for macOS/Windows Developers
eBPF is a Linux kernel feature. Not available on macOS. Not on Windows either (Keploy has experimental Windows hooks but they're not production-ready).
On macOS, Docker is mandatory:
keploy record -c "docker compose up" --containerName "app"
eBPF runs inside Docker Desktop's Linux VM. This adds overhead if you're used to native local development. Running in CI/CD on Linux containers works fine.
eBPF Traffic Capture โ Test Generation Flow
Source Code Structure
| Directory | Role |
|---|---|
| pkg/agent/hooks/linux/ | eBPF programs, maps, kernel hooks |
| pkg/agent/proxy/ | Transparent proxy + DNS server |
| pkg/agent/proxy/integrations/ | Protocol parsers (HTTP, MySQL, Postgres, Mongo, Redis, Kafka, gRPC) |
| pkg/service/record/ | Recording orchestration |
| pkg/service/replay/ | Test execution + response comparison |
| pkg/service/utgen/ | AI-powered test generation |
| pkg/matcher/ | Response comparison (HTTP, gRPC, schema) |
| pkg/platform/coverage/ | Per-language coverage (Go/Java/Python/JS/C#) |
eBPF Shared Maps โ The Redirect Core
| Map Name | Key | Value | Purpose |
|---|---|---|---|
| clientRegistrationMap | app PID | registration info | Identify target app |
| agentRegistrationMap | agent PID | proxy port | Identify Keploy agent |
| redirectProxyMap | source port | DestInfo (IP+port) | Store original destination |
Design Points from a Go Perspective
Uses cilium/ebpf library to load/manage eBPF programs directly from Go. 3 goroutines collecting inbound/outbound/mappings concurrently via channels โ textbook Go concurrency pattern. Protocol parsers abstracted via interfaces, so adding a new protocol only requires implementing RecordOutgoing() + MockOutgoing() + MatchType().
Ruby to Go
eBPF intercepts kernel syscalls (socket/connect/bind) and redirects app outbound connections to Keploy proxy
Transparent proxy identifies protocol from first bytes โ routes to HTTP/MySQL/Postgres/MongoDB/Redis/Kafka parsers
Inbound (TestCase) + outbound (Mock) collected via 3 goroutine channels concurrently โ saved as YAML
During replay, mocks loaded into in-memory DB, recorded requests sent to app, denoising removes timestamps etc. before comparison
Pros
- ✓ Zero code changes โ eBPF operates at kernel level, no SDK or library needed
- ✓ Language agnostic โ works with any language that makes syscalls: Go, Java, Python, Node, Rust etc.
- ✓ Wide protocol coverage โ HTTP/gRPC/MySQL/Postgres/MongoDB/Redis/Kafka + transparent TLS
Cons
- ✗ Linux only โ eBPF requires kernel 5.15+. macOS/Windows must run inside Docker, breaking native local dev workflows
- ✗ Embedded DBs like SQLite not captured โ eBPF only catches network socket traffic. PostgreSQL/MySQL only
- ✗ Recording quality depends on actual traffic โ paths not covered by traffic have no tests. Error cases and edge cases must be written manually
- ✗ Denoising isn't perfect โ if dynamic field detection fails on timestamps, random IDs, etc., tests become flaky
- ✗ Only generates integration tests โ doesn't create unit tests for business logic. "Does this method return the right value" still needs manual testing
- ✗ DB schema changes invalidate recordings โ adding/removing columns makes mock SQL responses diverge from reality. Re-recording required