agentlens/launch/show-hn.md

# Show HN: AgentLens -- Open-source observability for AI agents that traces decisions, not just API calls

**Repo:** https://gitea.repi.fun/repi/agentlens
**Live demo:** https://agentlens.vectry.tech
**PyPI:** https://pypi.org/project/vectry-agentlens/

---

I've been building AI agents for a while and kept running into the same problem: when an agent does something unexpected, the existing observability tools (LangSmith, Helicone, etc.) show me the LLM API calls that happened, but not *why* the agent chose a particular path.

Knowing that GPT-4 was called with these tokens and returned in 1.2s doesn't help me understand why my agent picked tool A over tool B, or why it decided to escalate instead of retry.

So I built AgentLens. It traces agent *decisions* -- tool selection, routing, planning, retries, escalation, memory retrieval -- and captures the reasoning, alternatives considered, and confidence scores at each decision point.

Quick setup:

```bash
pip install vectry-agentlens
```

```python
import agentlens
from agentlens.integrations.openai import wrap_openai
from openai import OpenAI

agentlens.init(api_key="your-key", endpoint="http://localhost:4200")
client = OpenAI()
wrap_openai(client)
```

The `wrap_openai` call auto-instruments your OpenAI client. From there you can log decisions with `agentlens.log_decision()` specifying the type (TOOL_SELECTION, ROUTING, PLANNING, RETRY, ESCALATION, MEMORY_RETRIEVAL, or CUSTOM), what was chosen, what the alternatives were, and why.

The dashboard is a Next.js app that shows decision flows, timelines, and lets you drill into individual agent runs. You can filter by decision type, search by outcome, and see where agents are spending their "thinking" time.

Stack: Python SDK + Next.js 15 dashboard + PostgreSQL + Redis. Self-hostable via Docker Compose. MIT licensed.

Honest caveats: this is v0.1.0. I built it solo in about two weeks. The decision model works but the taxonomy is still evolving. There are rough edges. The SDK currently supports Python only. I haven't done serious load testing yet.

Would love feedback on the decision model and what decision types you'd want to see. If you're building agents and have opinions on what "observability" should actually mean for autonomous systems, I'd really like to hear it.