Benchmark schema reference

SecureBench accepts two executable benchmark families: repo_patch and terminal_task.

Manifest

Field	Type	Required	Description
`id`	string	yes	Benchmark pack identifier
`version`	integer	yes	Pack version
`defaults.family`	string	no	Default row family: `repo_patch` or `terminal_task`
`defaults.environment`	object	no	Merged into each row
`asset_roots.public`	string	no	Default `assets/`
`asset_roots.eval`	string	no	Default `hidden/`
`asset_defaults.read_only`	boolean	no	Default `true`

Asset root paths must be relative POSIX paths without .. or backslashes.

JSONL row

Field	Type	Required	Description
`id`	string	yes	Task ID
`family`	string	conditional	Required if not set in manifest defaults
`input`	object	family-specific	Public fields
`eval`	object	family-specific	Verifier fields
`assets`	object[]	no	Public file assets
`environment`	object	no	Overrides manifest environment
`metadata`	object	no	Opaque metadata

Unknown row fields and unknown family names raise ConfigError.

Asset object

Field	Type	Description
`path`	string	Relative to `asset_roots.public` or eval root
`mount`	string	Workspace destination path. Defaults to `path` when omitted.
`read_only`	boolean	Override default read-only behavior

Environment object

Field	Type	Families	Description
`image`	string	both	Docker image reference
`workdir`	string	both	Container working directory; required for `repo_patch` verification
`timeout_seconds`	number	both	Positive timeout
`materialize_workdir_from_image`	boolean	`terminal_task`	Copy image workdir to host workspace before harness

`repo_patch`

Required input fields: repo, base_commit, instructions.

Optional input fields: hints.

Required eval fields: tests.

Optional eval fields: candidate_policy, gold_patch.

`tests`

Field	Type	Required	Description
`source`	`"command"`	yes	Command-test verifier mode
`command`	string[]	yes	Non-empty argv array
`workdir`	string	no	Overrides environment workdir
`timeout_seconds`	number	no	Positive timeout
`setup_patch`	string or inline patch object	no	Verifier-only setup patch
`test_patch`	string or inline patch object	no	Hidden/eval test patch

Patch object:


{ "source": "inline", "patch": "..." }

`candidate_policy`

Field	Type	Description
`allow_paths`	string[]	Intended implementation edit surface
`allow_sensitive_paths`	string[]	Sensitive paths explicitly allowed for this task
`patch_preserved_paths`	string[]	Visible verifier-supporting files whose candidate edits are stripped

`terminal_task`

Required input fields: instructions.

Optional input fields: context.

Required eval fields: checker.

Optional eval fields: run_tests, test_files, expected_state, needed_commands.

`checker`

Field	Type	Required	Description
`source`	`"pytest"` or `"script"`	yes	Checker execution mode
`path`	string	yes	Safe relative path under `asset_roots.eval`
`timeout_seconds`	number	no	Positive timeout

needed_commands is an array of known dangerous command names. Currently supported: chroot.

Compiled resource visibility

Family	Field	Visibility
`repo_patch`	`tests`, `candidate_policy`	`evaluation_inputs`
`repo_patch`	`gold_patch`	`hidden`
`terminal_task`	`checker`, `needed_commands`, `run_tests`, `test_files`	`evaluation_inputs`
`terminal_task`	`expected_state`	`hidden`

Internal task spec

Tests can build compiled tasks through task_from_spec():


{
  "id": "...",
  "benchmark_id": "...",
  "task_type": "terminal_task",
  "metadata": {},
  "resources": {
    "instructions": {"value": "...", "visibility": "public"}
  }
}

Error types

Error	When
`ConfigError`	Manifest, row, or tester parsing
`MaterializationError`	Unsafe materialization plans
`PathPolicyError`	Path policy violations