← back

how to actually build agents that work

my first post ever. let's see how this goes.

everyone's talking about agents.

but how the hell do you actually build a good one?

here's what i've learned building them in production:

pick the right model

claude opus 4.5 is my go-to for complex work. strong reasoning, great at multi-step tasks.

for speed-sensitive stuff or when you're cost-conscious, glm-4.7 on cerebras is insane - we're talking 1000+ tokens/sec. basically instant responses.

match the model to the task. don't use a sledgehammer to hang a picture frame.

get your tool calls right

this is where most agents break. messy tools = messy agent.

keep them explicit, typed, single responsibility:

// bad
doStuff(input) → magic happens

// good  
searchDocuments(query, limit) → documents[]
createTask(title, assignee, due) → task
sendSlack(channel, message) → confirmation

each tool does one thing. name it obviously. the model should know exactly what it does from the name alone.

choose your framework

building a UI?vercel AI SDK

great DX, streaming components that feel smooth, not locked into vercel despite the name. has everything: sub-agents, routers, MCP support.

streamUI → user sees thinking... → streams in the actual component

building backend agents? → two options i like:

agno - lightweight, gets out of your way

claude code SDK - becoming my favorite. you're not even locked into anthropic anymore - openrouter has anthropic-style endpoints now, so you can swap models easily. but if you do stick with anthropic's models, you get a lot of stuff already done.

you get: MCP tools without context bloat, sandboxed execution, memory built in. basically claude code as a library.

query(prompt, tools=['Read', 'Bash', 'Write'])
→ agent does the thing

we're in labor budgets now, not just tool budgets.

the boring stuff that matters

monitoring - langfuse for traces. berlin-based team, really good product.

session replays - posthog recordings + gemini flash reviewing them automatically. see where users get stuck without watching hours of video.

memory - start simple. explicit "remember this" tool. get fancy with mem0 later if you need automatic entity extraction.

small things, big impact

let users adjust the agent's tone/style. you won't guess right on day one.

voice input for busy people. they babble, agent distills. convenience > tokens.

ask yourself: "would linear build it this way?" native feel, no 10-step onboarding.


start simple. ship something. iterate with real data.

first post done. more to come.

@slobkebap