Jury Run · fake

Disputatio Fake E2E Fixture 20260601T201438Z

Back to debate · 2f30e730-c722-43fd-bb58-4b123c6ee704

completed
Models3/30 failed/partial
Average Score7.06completed judgements
Tokens3805recorded usage
Cost$0.000000estimated/actual ledger

Aggregation

Consensus and disagreement should stay visible.
Consensus7.03aggregate score
Divergence1.50model disagreement

Deterministic fake aggregation completed using median with divergence.

Model Judgements

Free models may fail JSON compliance; partial completion is expected.

model_a_budget

completed
7.03total score
JSON
True
Schema
True
Latency
210 ms
Tokens
1240
Cost
$0
Dimensions
9
Provider
fake
Finish
n/a
Reasoning
0 tokens
Attempts
1

Fake Model A produced a deterministic fake judgement for deployment validation.

model_b_reasoning

completed
6.33total score
JSON
True
Schema
True
Latency
310 ms
Tokens
1310
Cost
$0
Dimensions
9
Provider
fake
Finish
n/a
Reasoning
0 tokens
Attempts
1

Fake Model B produced a deterministic fake judgement for deployment validation.

model_c_contrast

completed
7.83total score
JSON
True
Schema
True
Latency
260 ms
Tokens
1255
Cost
$0
Dimensions
9
Provider
fake
Finish
n/a
Reasoning
0 tokens
Attempts
1

Fake Model C produced a deterministic fake judgement for deployment validation.

Score Dimensions

Only schema-valid model judgements are shown here. Invalid JSON responses stay visible in the model cards and raw artifacts.

ModelDimensionScoreConfidenceReason
model_a_budgetclarity7.400.9000Deterministic fake score for clarity.
model_a_budgetcontext_fidelity7.200.9000Deterministic fake score for context_fidelity.
model_a_budgetcounterarguments7.100.9000Deterministic fake score for counterarguments.
model_a_budgetevidence6.700.9000Deterministic fake score for evidence.
model_a_budgetfactual_grounding6.800.9000Deterministic fake score for factual_grounding.
model_a_budgetfairness7.000.9000Deterministic fake score for fairness.
model_a_budgetlogic7.200.9000Deterministic fake score for logic.
model_a_budgetrelevance7.300.9000Deterministic fake score for relevance.
model_a_budgetrhetorical_manipulation6.600.9000Deterministic fake score for rhetorical_manipulation.
model_b_reasoningclarity6.700.9000Deterministic fake score for clarity.
model_b_reasoningcontext_fidelity6.500.9000Deterministic fake score for context_fidelity.
model_b_reasoningcounterarguments6.400.9000Deterministic fake score for counterarguments.
model_b_reasoningevidence6.000.9000Deterministic fake score for evidence.
model_b_reasoningfactual_grounding6.100.9000Deterministic fake score for factual_grounding.
model_b_reasoningfairness6.300.9000Deterministic fake score for fairness.
model_b_reasoninglogic6.500.9000Deterministic fake score for logic.
model_b_reasoningrelevance6.600.9000Deterministic fake score for relevance.
model_b_reasoningrhetorical_manipulation5.900.9000Deterministic fake score for rhetorical_manipulation.
model_c_contrastclarity8.200.9000Deterministic fake score for clarity.
model_c_contrastcontext_fidelity8.000.9000Deterministic fake score for context_fidelity.
model_c_contrastcounterarguments7.900.9000Deterministic fake score for counterarguments.
model_c_contrastevidence7.500.9000Deterministic fake score for evidence.
model_c_contrastfactual_grounding7.600.9000Deterministic fake score for factual_grounding.
model_c_contrastfairness7.800.9000Deterministic fake score for fairness.
model_c_contrastlogic8.000.9000Deterministic fake score for logic.
model_c_contrastrelevance8.100.9000Deterministic fake score for relevance.
model_c_contrastrhetorical_manipulation7.400.9000Deterministic fake score for rhetorical_manipulation.

Artifacts

KindPathSizeChecksum
report_mdreports/fake_jury_2f30e730-c722-43fd-bb58-4b123c6ee704.md
open
669eddd5bff7a5963ef
report_htmlreports/fake_jury_2f30e730-c722-43fd-bb58-4b123c6ee704.html4398676be3ed1c0171e

Manifest

Software
0.4.0
Prompt Hash
5358e5ba89ce6b5e2b
Rubric Hash
c9144dd5d4c5fcd823
Input Hash
1e566c63812e138299