The system that turns "how many ergonomic chairs are in the West-region warehouse" into a BigQuery SQL plan, executes it, and streams a grounded answer back. Click any node to see what it does, how it fails, and where the latency lives.
Type a question — "Go production gateway" or
"big data spark".
Every project description, role, and skill on this site is indexed below with TF-IDF over normalised tokens, scored by cosine similarity at keystroke time. Honest about what it is — no LLM in the loop, no API call.
Every team that puts an LLM in front of users eventually rebuilds the same five things in some form: auth, rate limiting, cost tracking, model failover, and structured logging. The fifth one is the only one anyone enjoys.
go-llm-gateway is the version of those five things I keep coming back to. It is intentionally small — under 800 lines of Go, no external dependencies beyond net/http — because the alternative is to ship a 40MB binary that re-implements bad versions of http.ServeMux. The whole point is that an LLM gateway is boring HTTP with a few decisions that have to be right.
Every request gets a single struct shape. The gateway accepts OpenAI-compatible POST /v1/chat/completions and rewrites model internally — so a downstream switch from gpt-4o to gemini-1.5-pro is a config change, not a client change.
A sync.Mutex-protected map of net.IP → *bucket is enough for a single instance handling thousands of RPS. The bucket leaks tokens at a configured rate; when it empties, requests return 429. No Redis dependency for v1 — and crucially, no distributed rate limit lying to you about its accuracy.
Two atomic.Int64 counters per model — one for prompt tokens, one for completion tokens. Per-model cost is config; the gateway exposes GET /metrics with the running total in USD. No locks on the hot path. A background goroutine flushes to a logger every 10s.
A request lists models in priority order: [gemini-1.5-pro, gpt-4o, claude-3-sonnet]. On any 5xx or context-deadline error from the primary, the gateway retries against the next model with the same payload, decrementing a budget. Once the budget hits zero, it returns the last upstream error verbatim. Failures should be loud, not hidden.
Every request emits one JSON line at the end of its lifecycle — request_id, model, route, latency, prompt_tokens, completion_tokens, cost_usd, status. That's the contract any observability tool can read. No log levels, no debug spew. One line in, one line out.
(tenant_id, model) → counter.The gateway has been the load-bearing piece in three different projects I've shipped. The code is public: github.com/shubhambakre/go-llm-gateway. If you find a sharp edge, file an issue — or better, send the patch.
A fully functional, retro synthwave 8-bit platformer game where you play as an Engineering Wizard, dodging bugs, collecting skillset items, and deploying to production before the sprint deadline. Built entirely on HTML5 Canvas using Web Audio API for custom audio synthesis.