Security model

SecureBench evaluates agents in adversarial workspace settings. The active families, repo_patch and terminal_task, both involve candidate-controlled files, commands, or repository state. Scoring must use trusted evaluator inputs.

Controls

Control	Default	Purpose	Security property
Agentic-only families	`repo_patch`, `terminal_task`	Focus on repository and terminal workflows	Keeps contracts aligned with workspace-producing agents
Visibility lanes	Agent receives `public` only	Rows can contain checker files, test patches, policies, and hidden analysis data	Prevents evaluator-data exposure through task JSON or agent materialization
Two-phase execution	Produce candidate, then verify	Candidate production is untrusted; scoring is trusted	Keeps verifier inputs and checker code out of the agent view
Fresh verifier sandbox	New sandbox per verification	Agent workspaces can contain modified files, wrappers, and state	Reduces carry-over from candidate production into scoring
Docker hardening	No network, dropped capabilities, `no-new-privileges`, memory/PID limits, read-only root by default	Verifier logic should not need broad egress or host-like privilege	Limits exfiltration, privilege expansion, and resource abuse
Repo patch containment	Canonical diffs, base-commit checks, path policy, stdin patch application	Patches can alter tests instead of solving tasks	Blocks common test-infrastructure tampering before and after apply
Terminal checker isolation	Trusted checker mounts plus isolated `pytest` and `script` runners	Terminal workspaces can contain malicious Python wrappers and startup files	Keeps checker code outside the candidate workspace
Redacted results	Non-public resource values redacted	JSONL output must not become an answer key	Reduces hidden-data leakage through result artifacts

These controls reduce risk. They do not make Docker, Git, shell, Python, package managers, or provider CLIs complete security boundaries.

Threat model

SecureBench assumes the evaluated agent, or code it produced, may try to:

read hidden tests, checker files, gold patches, or evaluator policy
modify test files, runners, dependency files, CI, or shell wrappers
skip, replace, or weaken verifier logic
probe provider credential handling and network egress from the harness container
poison resumed result records when --resume is used

Trusted	Untrusted
SecureBench framework code	Candidate patches and workspace files
Benchmark eval asset root and hidden resources	Agent-produced stdout/stderr
Verifier sandbox setup	Prior `candidates.jsonl` rows when using `--resume`
Tester YAML command harness config	Provider CLI tool execution inside the agent container

Visibility lanes

Lane	Purpose
`public`	Instructions and public assets visible to the agent
`evaluation_inputs`	Tests, checkers, dangerous-command declarations, and candidate policy used during verification
`hidden`	Gold patches or expected state retained for trusted analysis and reporting

The compiler maps each supported family field into one lane. Harnesses write only task.agent_payload() to the agent workspace. Result summaries redact non-public values.

Filesystem controls

Materialized paths reject absolute paths, .., backslashes, reserved roots, and parent/child mount collisions.
File-backed assets must resolve under pack asset roots and must not be symlinks.
Public assets default to read-only bind mounts.
Evaluation checker files for terminal tasks are mounted read-only under /opt/securebench/evaluator.
Host workspace staging uses sandbox path resolution to block traversal and symlink escapes.

Repo patch verification

Repo-patch candidates are collected as canonical git diffs:


git diff HEAD --binary --full-index --no-ext-diff --no-textconv --

Verifier controls:

Reject noncanonical candidate patches without unambiguous diff --git sections.
Check benchmark image HEAD against input.base_commit.
Apply setup, candidate, and hidden test patches through git apply --binary over stdin.
Commit setup state as a verifier-only baseline with hooks and signing disabled.
Enforce candidate path policy before apply and after apply from NUL-delimited Git paths.
Deny common tests, CI, dependency and build config, lockfiles, shell scripts, Python startup customization files, SecureBench-owned paths, absolute paths, and traversal paths by default.
Support eval.candidate_policy.allow_paths, allow_sensitive_paths, and patch_preserved_paths.
Run the declared argv-array command in Docker with network disabled by default.

Known gap: the verifier still runs benchmark-authored checks in the candidate-mutated repository. Result metadata reports this as containment_profile: "shared_runtime" and hidden_test_runtime_secrecy: false. Benchmark authors remain in the trusted computing base through commands, allowlists, setup patches, and hidden tests.

Terminal task verification

Terminal-task candidates are final workspace directories. Trusted checker files come from benchmark eval assets, not from the candidate workspace.

Verifier controls:

Materialize eval inputs at verification time.
Mount checker files read-only under /opt/securebench/evaluator.
Run with network disabled by default.
Support structured pytest and script checker sources.
Deny dangerous verifier commands by default.

pytest mode removes candidate workspace import paths and invokes pytest with isolated Python. script mode starts Bash with profiles disabled, removes Python startup-path environment overrides, disables user site loading and unsafe Python paths, and exports python and python3 wrappers that reject candidate-owned interpreters.

Known gap: checkers may intentionally execute agent-produced binaries or inspect agent-produced data. Checker authors must treat every workspace file as hostile input.

Harness security

Command harness

The command harness runs tester-provided commands in Docker and materializes only agent-visible resources into the agent workspace. Treat command harness config as trusted tester configuration.

Provider CLI harnesses

Codex and Claude Code harnesses:

run inside the benchmark image with a read-only tooling overlay and separate writable tool home
materialize only public task data
use egress allowlisting through the framework proxy
invoke provider CLIs with automation-oriented approval and sandbox bypass flags
inject dummy API keys into the agent container and replace them through the framework-owned relay

The egress proxy validates hostnames, resolves them inside the proxy, rejects DNS answers containing non-public or non-unicast addresses, and connects only to validated numeric addresses on allowed ports.

Audit commands

securebench audit and securebench audit-self exercise built-in checks for the current agentic surface:

agent payloads contain only public resource keys
agent materialization excludes non-public resources
result records redact non-public values
repo-patch default policy denies common test-infrastructure tampering
repo-patch tasks warn when eval.candidate_policy.allow_paths is absent
dynamic smoke probes cover repo-patch and terminal-task tampering patterns

Passing the built-in audit suite means the current implementation resisted those probes. It does not prove that every benchmark-authored checker is secure or meaningful.

Known gaps

Result records are not signed.
--resume trusts existing candidates.jsonl records for completed task IDs.
Result records do not include full benchmark pack digests or verifier code version identifiers.
Producer and verifier stdout/stderr can contain sensitive information if a harness or checker prints it.