Standards for building agents, better
-
Updated
May 11, 2026 - TypeScript
Standards for building agents, better
Agentic testing for agentic codebases
The definitive benchmark for AI agents on OpenClaw. 45 tasks across 4 tiers. Powered by MyClaw.ai
Ship agents you can audit.
Agent Verifier is a coding agent skill that verifies code against organizational policies, code quality patterns, security requirements, and framework best practices — before code ships. Works with Claude Code, Cursor, Windsurf, and 30+ agents.
The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.
infrastructure chaos to test the resilience of ai agents
Turn failed AI agent runs into replayable regression tests. Catch regressions before you ship.
Typed Kotlin DSL framework for AI agent systems.
GitHub template for agent-testable SaaS apps. Next.js 16 + shadcn/ui + Neon Postgres + agent-browser e2e testing via accessibility tree.
Deterministic runtime for agent evaluation
pytest plugin for deterministic testing of AI agents. Assert agent actions, not vibes.
A living world where agents exist as participants alongside NPCs, internal actors, real service APIs, budgets, policies, and consequences.
Generate deterministic pytest tests for your AI agents from one real run, then replay them in CI for $0. Fast, flake-free agent testing.
Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601
Behavior-regression testing for LLM agents. 4-class attribution, 6-field FAIL schema, $-cost gating, flaky detection. Bash + jq. Works with opencode today, runner-pluggable.
Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.
Playwright for AI Agents. Test what your agent DOES, not what it SAYS. YAML-first behavioral testing. Catch PII leaks, tool abuse, step explosions. 3200+ tests.
Evaluation and competition arena for testing agents, systems, or workflows in structured local-first scenarios.
Intent-first unit testing framework for AI agents in Node.js and TypeScript.
Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.
To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."