Building a Go SDK for Langfuse

Langfuse is an open-source observability platform for LLM applications. It captures traces, scores model outputs, and helps you understand how your AI features behave in production. The official SDKs cover Python and TypeScript, but there wasn’t a Go option. I built langfuse-go to fill that gap.

Why Go needs this

Go is increasingly used for AI-adjacent infrastructure: API gateways that route to model providers, orchestration services that chain LLM calls, RAG pipelines that retrieve and augment context. These services need observability, and manually calling the Langfuse REST API from Go is tedious and error-prone. A proper SDK with type safety, automatic batching, and idiomatic patterns makes the integration straightforward.

Hierarchical tracing

Langfuse organizes observability data as a hierarchy: traces contain observations (spans), which contain generations (individual LLM calls). The SDK models this with a composition-based client architecture:

client := langfuse.NewClient(
    langfuse.WithPublicKey(os.Getenv("LANGFUSE_PUBLIC_KEY")),
    langfuse.WithSecretKey(os.Getenv("LANGFUSE_SECRET_KEY")),
)
defer client.Flush()

trace := client.Trace(langfuse.WithTraceName("chat-completion"))

span := trace.Span(langfuse.WithSpanName("retrieve-context"))
// ... do retrieval work
span.End()

generation := trace.Generation(
    langfuse.WithGenerationName("gpt-4-call"),
    langfuse.WithGenerationModel("gpt-4"),
    langfuse.WithGenerationInput(prompt),
    langfuse.WithGenerationOutput(response),
)
generation.End()

Each trace, span, and generation is linked by parent IDs. The SDK handles the ID management so you don’t need to manually thread trace IDs through your call stack.

graph TD
    T[Trace: chat-completion] --> S1[Span: retrieve-context]
    T --> G1[Generation: gpt-4-call]
    T --> S2[Span: post-process]
    G1 --> SC[Score: relevance=0.92]

The batch processor

The most interesting engineering challenge was the batch processor. You don’t want to make an HTTP call to Langfuse for every single event. That would kill latency in your hot path. Instead, the SDK queues events in a channel and flushes them in batches.

The flushing uses a dual-trigger strategy:

Size trigger - flush when the batch reaches a configurable number of events
Time trigger - flush after a configurable interval, even if the batch isn’t full

This means high-throughput services get efficient large batches, while low-throughput services don’t have events sitting in memory for minutes. The processor runs in its own goroutine and is fully concurrent-safe.

graph LR
    E1[Event] --> Q[Channel Queue]
    E2[Event] --> Q
    E3[Event] --> Q
    Q --> BP[Batch Processor]
    BP -->|size trigger| F[Flush to API]
    BP -->|time trigger| F
    F --> L[Langfuse API]

client := langfuse.NewClient(
    langfuse.WithBatchSize(100),
    langfuse.WithFlushInterval(5 * time.Second),
)

The Flush() call on shutdown drains the queue synchronously, so you don’t lose events when your process exits.

API design with functional options

Go doesn’t have optional parameters or method overloading, which makes API design for SDKs tricky. I used the functional options pattern throughout:

trace := client.Trace(
    langfuse.WithTraceName("my-trace"),
    langfuse.WithTraceUserId("user-123"),
    langfuse.WithTraceMetadata(map[string]any{"env": "prod"}),
)

This keeps the API clean. You only specify what you need, and new options can be added without breaking existing code. Each With* function returns an option that modifies the configuration internally.

Scoring and evaluation

Beyond tracing, the SDK supports Langfuse’s scoring system. You can attach numeric or categorical scores to traces and generations:

client.Score(
    langfuse.WithScoreTraceId(trace.TraceId()),
    langfuse.WithScoreName("relevance"),
    langfuse.WithScoreValue(0.92),
)

This is useful for automated evaluation pipelines. Run your LLM output through a scoring model and push the results to Langfuse for analysis. The platform then lets you slice and filter by score to find quality regressions.

Keeping dependencies minimal

The SDK has a single external dependency: go-resty for HTTP. Everything else uses the standard library. This was a deliberate choice. SDK dependencies become your users’ dependencies, and Go developers are rightly opinionated about keeping their dependency trees small.

What’s included

Beyond the core tracing and scoring, the SDK covers:

Dataset management - create and manage evaluation datasets
Annotation queues - support for human-in-the-loop review workflows
Media handling - attach files and media to traces
Prompt management - fetch and cache prompts from Langfuse

Demo: A terminal recording showing the SDK tracing a real LLM request is coming soon. Check back or follow the GitHub repo for updates.

The full source and documentation are on GitHub.