Python API reference

Public package exports from securebench/__init__.py:


from securebench import (
    TesterRunSummary,
    VerificationResult,
    Verifier,
    run_tester_config,
    verifier_for_task_type,
)

Runner

`run_tester_config(config, *, limit=None, progress=None, resume=False) -> TesterRunSummary`

Main orchestration entry point. Implemented in securebench/tester_run.py.

`TesterRunSummary`

Field / property	Type	Description
`run_id`	str	From tester YAML
`output_dir`	str	Resolved output path
`output_path`	str	Path to `candidates.jsonl`
`total`	int	Record count
`verified`	int	Tasks with verifiers run
`passed`	int	Tasks where `passed is True`
`score_sum`	float	Sum of scores
`verification_status`	str	`complete`, `partial`, or `pending`

`with_tester_overrides(config, *, output_dir=None) -> TesterConfig`

Apply CLI-style overrides without mutating files.

`verify_candidate(task, candidate, *, verification_policy=None) -> VerificationResult | None`

Verify one artifact outside the full runner loop.

`candidate_record(run_id, task, candidate, verification=None) -> dict`

Serialize one JSONL record. Used internally and useful in tests.

Configuration

`load_tester_config(path) -> TesterConfig`

Parse tester YAML from path.

`parse_tester_config(data, *, base_dir=None) -> TesterConfig`

Parse tester YAML from a dict. Used in tests.

`TesterConfig`

Field	Type
`schema_version`	str
`run`	`TesterRunSection`
`benchmark`	`TesterBenchmarkSection`
`harness`	`TesterHarnessSection`
`verification`	`VerificationPolicy`

Benchmark packs

`load_benchmark_pack(manifest_path, tasks_path) -> BenchmarkPack`

`load_benchmark_manifest(path) -> BenchmarkPackManifest`

`compile_benchmark_pack(pack, *, limit=None) -> Iterator[SecureBenchTask]`

`compile_benchmark_row(row, *, manifest) -> SecureBenchTask`

Tasks and resources

`SecureBenchTask`

Key methods: agent_payload(), public_payload(), evaluation_payload(), hidden_payload(), view_for(component), resource_summary().

`task_from_spec(spec) -> SecureBenchTask`

Build a task from a normalized spec dict.

Resource helpers

resource_value, resource_text

`ResourceBundle`, `Resource`, `ComponentView`

See securebench/resources.py. Helpers: public(), evaluation_input(), hidden().

Candidates

`CandidateArtifact`

Fields: patch, workspace, stdout, stderr, metadata.

Method: for_task(task) -> str

`CandidateProducer`

Abstract method: produce(task, **context) -> CandidateArtifact

Extraction

default_extraction_spec, extract_candidate, extraction_instructions, CandidateProductionTimeout

Verifiers

`Verifier`

Abstract method: verify(task, candidate, **context) -> VerificationResult

`VerificationResult`

Fields: task_id, status, passed, score, stdout, stderr, metadata.

`verifier_for_task_type(task_type) -> Verifier | None`

Implemented task types: repo_patch, terminal_task.

Harnesses

`build_harness_producer(harness, *, workspace_root=None) -> CandidateProducer`

Not exported from top-level __init__, but stable for integrations:


from securebench.harnesses import build_harness_producer

Sandboxes


from securebench.sandboxes import DockerSandbox, HostSandbox, PolicySandbox, CommandPolicy

`Sandbox`

Methods: run, write_file, read_file, extract_file.

`CommandResult`

Fields: command, exit_code, stdout, stderr, timed_out, timeout_seconds.

Timeout exit code constant: TIMEOUT_EXIT_CODE = 124.

Families


from securebench.families import (
    family_contract_for,
    validate_benchmark_row_family,
    known_family_contracts,
)

Workspaces


from securebench.workspaces.materialization import VisibilityAwareMaterializer
from securebench.workspaces.path_policy import validate_workspace_mount_for_component

Errors


from securebench.errors import ConfigError

Environment


from securebench.env import load_env_file

Progress


from securebench.progress import (
    NullProgressReporter,
    StreamProgressReporter,
    emit_progress,
    progress_context,
)

See Progress events.

Python API reference

Runner

run_tester_config(config, *, limit=None, progress=None, resume=False) -> TesterRunSummary

TesterRunSummary

with_tester_overrides(config, *, output_dir=None) -> TesterConfig

verify_candidate(task, candidate, *, verification_policy=None) -> VerificationResult | None

candidate_record(run_id, task, candidate, verification=None) -> dict

Configuration

load_tester_config(path) -> TesterConfig

parse_tester_config(data, *, base_dir=None) -> TesterConfig

TesterConfig