how to actually build agents that work
my first post ever. let's see how this goes.
everyone's talking about agents.
but how the hell do you actually build a good one?
here's what i've learned building them in production:
pick the right model
claude opus 4.5 is my go-to for complex work. strong reasoning, great at multi-step tasks.
for speed-sensitive stuff or when you're cost-conscious, glm-4.7 on cerebras is insane - we're talking 1000+ tokens/sec. basically instant responses.
match the model to the task. don't use a sledgehammer to hang a picture frame.
get your tool calls right
this is where most agents break. messy tools = messy agent.
keep them explicit, typed, single responsibility:
// bad
doStuff(input) → magic happens
// good
searchDocuments(query, limit) → documents[]
createTask(title, assignee, due) → task
sendSlack(channel, message) → confirmation
each tool does one thing. name it obviously. the model should know exactly what it does from the name alone.
choose your framework
building a UI? → vercel AI SDK
great DX, streaming components that feel smooth, not locked into vercel despite the name. has everything: sub-agents, routers, MCP support.
streamUI → user sees thinking... → streams in the actual component
building backend agents? → two options i like:
agno - lightweight, gets out of your way
claude code SDK - becoming my favorite. you're not even locked into anthropic anymore - openrouter has anthropic-style endpoints now, so you can swap models easily. but if you do stick with anthropic's models, you get a lot of stuff already done.
you get: MCP tools without context bloat, sandboxed execution, memory built in. basically claude code as a library.
query(prompt, tools=['Read', 'Bash', 'Write'])
→ agent does the thing
we're in labor budgets now, not just tool budgets.
the boring stuff that matters
monitoring - langfuse for traces. berlin-based team, really good product.
session replays - posthog recordings + gemini flash reviewing them automatically. see where users get stuck without watching hours of video.
memory - start simple. explicit "remember this" tool. get fancy with mem0 later if you need automatic entity extraction.
small things, big impact
let users adjust the agent's tone/style. you won't guess right on day one.
voice input for busy people. they babble, agent distills. convenience > tokens.
ask yourself: "would linear build it this way?" native feel, no 10-step onboarding.
start simple. ship something. iterate with real data.
first post done. more to come.