Python API Reference
Public package exports from securebench/__init__.py:
from securebench import (
TesterRunSummary,
VerificationResult,
Verifier,
run_tester_config,
verifier_for_task_type,
)Runner
run_tester_config(config, *, limit=None, progress=None, resume=False) → TesterRunSummary
Main orchestration entry. See securebench/tester_run.py.
TesterRunSummary
| Field / property | Type | Description |
|---|---|---|
run_id | str | From tester YAML |
output_dir | str | Resolved output path |
output_path | str | Path to candidates.jsonl |
total | int | Record count |
verified | int | Tasks with verifiers run |
passed | int | Tasks where passed is True |
score_sum | float | Sum of scores |
verification_status | str | complete, partial, or pending |
with_tester_overrides(config, *, output_dir=None) → TesterConfig
Apply CLI-style overrides without mutating files.
verify_candidate(task, candidate, *, verification_policy=None) → VerificationResult | None
Verify one artifact outside the full runner loop.
candidate_record(run_id, task, candidate, verification=None) → dict
Serialize one JSONL record (used internally; useful for tests).
Configuration
load_tester_config(path) → TesterConfig
Parse tester YAML from path.
parse_tester_config(data, *, base_dir=None) → TesterConfig
Parse from dict (tests).
TesterConfig dataclass
| Field | Type |
|---|---|
schema_version | str |
run | TesterRunSection |
benchmark | TesterBenchmarkSection |
harness | TesterHarnessSection |
verification | VerificationPolicy |
Benchmark packs
load_benchmark_pack(manifest_path, tasks_path) → BenchmarkPack
load_benchmark_manifest(path) → BenchmarkPackManifest
compile_benchmark_pack(pack, *, limit=None) → Iterator[SecureBenchTask]
compile_benchmark_row(row, *, manifest) → SecureBenchTask
Tasks and resources
SecureBenchTask
Key methods: agent_payload(), public_payload(), evaluation_payload(), hidden_payload(), view_for(component), resource_summary().
task_from_spec(spec) → SecureBenchTask
Build task from normalized spec dict.
Resource helpers
resource_value, resource_text, optional_resource_text, resource_tuple, resource_text_mapping, resource_test_groups
ResourceBundle, Resource, ComponentView
See securebench/resources.py. Helpers: public(), evaluation_input(), hidden().
Candidates
CandidateArtifact
Fields: text, patch, workspace, stdout, stderr, metadata.
Method: for_task(task) → str
CandidateProducer
Abstract: produce(task, **context) → CandidateArtifact
Extraction
default_extraction_spec, extract_candidate, extraction_instructions, CandidateProductionTimeout
Verifiers
Verifier
Abstract: verify(task, candidate, **context) → VerificationResult
VerificationResult
Fields: task_id, status, passed, score, stdout, stderr, metadata
verifier_for_task_type(task_type) → Verifier | None
Implemented types: multiple_choice, short_answer, free_response, code_completion, repo_patch, terminal_task.
Harnesses
build_harness_producer(harness, *, workspace_root=None) → CandidateProducer
Not exported from top-level __init__ but stable for integrations:
from securebench.harnesses import build_harness_producerSandboxes
from securebench.sandboxes import DockerSandbox, HostSandbox, PolicySandbox, CommandPolicySandbox methods
run, write_file, read_file, extract_file
CommandResult
Fields: command, exit_code, stdout, stderr, timed_out, timeout_seconds
Timeout exit code constant: TIMEOUT_EXIT_CODE = 124
Families
from securebench.families import (
family_contract_for,
validate_benchmark_row_family,
known_family_contracts,
)Workspaces
from securebench.workspaces.materialization import VisibilityAwareMaterializer
from securebench.workspaces.path_policy import validate_workspace_mount_for_componentErrors
from securebench.errors import ConfigErrorEnvironment
from securebench.env import load_env_fileProgress
from securebench.progress import (
NullProgressReporter,
StreamProgressReporter,
emit_progress,
progress_context,
)See Progress Events.
Legacy code
legacy/securebench/ and legacy/securebench_agent/ contain archived adapters and agent runtime — not imported by the active CLI.