Question 1

What is ARA Eval?

Accepted Answer

ARA Eval (Agent Risk Assessment) is an open-source framework for evaluating whether AI agents can safely act without human approval. It decomposes risk into 7 dimensions — Decision Reversibility, Failure Blast Radius, Regulatory Exposure, Decision Time Pressure, Data Confidence, Accountability Chain, and Graceful Degradation — producing a risk fingerprint rather than a single score.

Question 2

What are the 7 risk dimensions in ARA Eval?

Accepted Answer

The 7 dimensions are: (1) Decision Reversibility — can you undo it? (2) Failure Blast Radius — how many people, systems, or dollars are affected? (3) Regulatory Exposure — does it touch compliance? (4) Decision Time Pressure — how long before action is required? (5) Data Confidence — does the agent have enough signal? (6) Accountability Chain — who is responsible and can you audit? (7) Graceful Degradation — does it fail safely or cascade?

Question 3

How do hard gates work in ARA Eval?

Accepted Answer

Hard gates are deterministic rules enforced by code, not the LLM. If Regulatory Exposure or Failure Blast Radius scores an "A" (highest risk), autonomy is not permitted regardless of other dimension scores. The LLM classifies the dimensions, but the gating logic is deterministic — you can swap models or change prompts without affecting the safety policy.

Question 4

Who is ARA Eval for?

Accepted Answer

ARA Eval is built for three audiences: enterprises evaluating which workflows can safely use autonomous agents, universities teaching AI governance (includes a 5-week MBA capstone syllabus), and regulators stress-testing policy frameworks against real-world complexity like Hong Kong's HKMA GenAI Sandbox, PIPL, and PDPO overlap.

Dimension	What it measures	Gate
Decision Reversibility	Can you undo it?	Soft
Failure Blast Radius	How many people/systems/dollars?	Hard
Regulatory Exposure	Does it touch compliance?	Hard
Decision Time Pressure	How long before you must act?	Soft
Data Confidence	Does the agent have enough signal?	Soft
Accountability Chain	Who’s responsible? Can you audit?	Soft
Graceful Degradation	Does it fail safely or cascade?	Soft

ARA Eval

Risk Is a Fingerprint, Not a Score

The 7 Dimensions

Hard Gates: The Aviation Principle

LLM-as-Judge Results

Who It's For

What's Included

Built By