Skip to Content
ReferencePython API

Python API Reference

Public package exports from securebench/__init__.py:

from securebench import ( TesterRunSummary, VerificationResult, Verifier, run_tester_config, verifier_for_task_type, )

Runner

run_tester_config(config, *, limit=None, progress=None, resume=False) → TesterRunSummary

Main orchestration entry. See securebench/tester_run.py.

TesterRunSummary

Field / propertyTypeDescription
run_idstrFrom tester YAML
output_dirstrResolved output path
output_pathstrPath to candidates.jsonl
totalintRecord count
verifiedintTasks with verifiers run
passedintTasks where passed is True
score_sumfloatSum of scores
verification_statusstrcomplete, partial, or pending

with_tester_overrides(config, *, output_dir=None) → TesterConfig

Apply CLI-style overrides without mutating files.

verify_candidate(task, candidate, *, verification_policy=None) → VerificationResult | None

Verify one artifact outside the full runner loop.

candidate_record(run_id, task, candidate, verification=None) → dict

Serialize one JSONL record (used internally; useful for tests).

Configuration

load_tester_config(path) → TesterConfig

Parse tester YAML from path.

parse_tester_config(data, *, base_dir=None) → TesterConfig

Parse from dict (tests).

TesterConfig dataclass

FieldType
schema_versionstr
runTesterRunSection
benchmarkTesterBenchmarkSection
harnessTesterHarnessSection
verificationVerificationPolicy

Benchmark packs

load_benchmark_pack(manifest_path, tasks_path) → BenchmarkPack

load_benchmark_manifest(path) → BenchmarkPackManifest

compile_benchmark_pack(pack, *, limit=None) → Iterator[SecureBenchTask]

compile_benchmark_row(row, *, manifest) → SecureBenchTask

Tasks and resources

SecureBenchTask

Key methods: agent_payload(), public_payload(), evaluation_payload(), hidden_payload(), view_for(component), resource_summary().

task_from_spec(spec) → SecureBenchTask

Build task from normalized spec dict.

Resource helpers

resource_value, resource_text, optional_resource_text, resource_tuple, resource_text_mapping, resource_test_groups

ResourceBundle, Resource, ComponentView

See securebench/resources.py. Helpers: public(), evaluation_input(), hidden().

Candidates

CandidateArtifact

Fields: text, patch, workspace, stdout, stderr, metadata.

Method: for_task(task) → str

CandidateProducer

Abstract: produce(task, **context) → CandidateArtifact

Extraction

default_extraction_spec, extract_candidate, extraction_instructions, CandidateProductionTimeout

Verifiers

Verifier

Abstract: verify(task, candidate, **context) → VerificationResult

VerificationResult

Fields: task_id, status, passed, score, stdout, stderr, metadata

verifier_for_task_type(task_type) → Verifier | None

Implemented types: multiple_choice, short_answer, free_response, code_completion, repo_patch, terminal_task.

Harnesses

build_harness_producer(harness, *, workspace_root=None) → CandidateProducer

Not exported from top-level __init__ but stable for integrations:

from securebench.harnesses import build_harness_producer

Sandboxes

from securebench.sandboxes import DockerSandbox, HostSandbox, PolicySandbox, CommandPolicy

Sandbox methods

run, write_file, read_file, extract_file

CommandResult

Fields: command, exit_code, stdout, stderr, timed_out, timeout_seconds

Timeout exit code constant: TIMEOUT_EXIT_CODE = 124

Families

from securebench.families import ( family_contract_for, validate_benchmark_row_family, known_family_contracts, )

Workspaces

from securebench.workspaces.materialization import VisibilityAwareMaterializer from securebench.workspaces.path_policy import validate_workspace_mount_for_component

Errors

from securebench.errors import ConfigError

Environment

from securebench.env import load_env_file

Progress

from securebench.progress import ( NullProgressReporter, StreamProgressReporter, emit_progress, progress_context, )

See Progress Events.

Legacy code

legacy/securebench/ and legacy/securebench_agent/ contain archived adapters and agent runtime — not imported by the active CLI.

Last updated on