Forge Reliability Scoreboard

Models Tested

118

Signed Runs

Scenario Categories

Ed25519

Signature Algorithm

90–100%

Excellent — strong adversarial reliability

75–89%

Good — reliable with minor gaps

<75%

Needs work — significant weaknesses

How scoring works →

Reliability Rankings

#	Model	Avg Score	Best Score	Runs	Scenarios	Latest Report
1	Qwen/Qwen2.5-Coder-14B-Instruct-AWQ	93.2%	93.2%	2	74	62fe4842fcae…
2	TitanML/Llama3-OpenBioLLM-70B-AWQ-4bit	84.2%	84.5%	2	161	cbaa89e50c9b…
3	stelterlab/phi-4-AWQ	83.8%	83.8%	2	74	2bdcd83755bb…
4	hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4	83.5%	83.9%	2	161	9e3d2b71dec3…
5	hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4	81.4%	81.4%	4	161	b48017adfee9…
6	solidrust/gemma-2-9b-it-AWQ	81.1%	81.1%	2	74	2a3af65746d5…
7	casperhansen/llama-3-8b-instruct-awq	81.1%	81.1%	2	74	8d6277f5c429…
8	Qwen/Qwen2.5-Coder-7B-Instruct-AWQ	81.1%	81.1%	2	74	a3732226e714…
9	Qwen/Qwen2.5-72B-Instruct-AWQ	80.8%	80.8%	2	161	e649c6682656…
10	casperhansen/llama-3-70b-instruct-awq	80.1%	80.1%	4	161	1b3d5f6bf755…
11	gghfez/Mistral-Small-3.2-24B-Instruct-hf-AWQ	77.8%	81.1%	4	74	315fb5a5696a…
12	stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ	77.7%	78.4%	2	74	7e5d464be7bb…
13	Qwen/Qwen2.5-Coder-32B-Instruct-AWQ	76.4%	77%	2	74	20e7e330151e…
14	Qwen/Qwen2.5-32B-Instruct-AWQ	75.9%	77.6%	6	161	38e675ffc366…
15	Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ	75.8%	75.8%	2	161	ee1d4f1f1c68…
16	TheBloke/Llama-2-7B-Chat-AWQ	75.7%	75.7%	2	74	c9b6869ff2b9…
17	casperhansen/mixtral-instruct-awq	74.2%	74.5%	2	161	e05253676c39…
18	Qwen/Qwen3-14B-AWQ	73%	73%	2	74	6a0320159671…
19	Qwen/Qwen2.5-Coder-3B-Instruct-AWQ	72.3%	73%	2	74	8f2a5b01ab61…
20	Qwen/Qwen2.5-1.5B-Instruct-AWQ	72.3%	73%	2	74	c4ad2b39236f…
21	casperhansen/llama-3.3-70b-instruct-awq	72.1%	72.1%	6	161	25f823d8bcb6…
22	cyankiwi/Nemotron-Orchestrator-8B-AWQ-4bit	71.6%	71.6%	2	74	6538e56364de…
23	hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4	71.6%	71.6%	2	74	27b3f09cf1d6…
24	Qwen/Qwen2-7B-Instruct-AWQ	70.9%	71.6%	2	74	adc456efa45f…
25	LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct-AWQ	70.3%	70.3%	2	74	083a5844b10f…
26	Qwen/Qwen3-8B-AWQ	70.3%	70.3%	2	74	3a46b1c666d1…
27	Qwen/Qwen3-4B-AWQ	70.3%	70.3%	2	74	238fb7a8f2eb…
28	Qwen/Qwen2.5-7B-Instruct-AWQ	69.6%	70.3%	2	74	073523aaaa74…
29	Valdemardi/DeepSeek-R1-Distill-Qwen-32B-AWQ	68.9%	68.9%	2	161	1fdcf63a9180…
30	cyankiwi/GLM-4.7-Flash-AWQ-4bit	68.9%	68.9%	2	74	c50af6ebd091…
31	Qwen/Qwen3-32B-AWQ	68.3%	68.3%	2	161	787244443b55…
32	Qwen/QwQ-32B-AWQ	68.3%	68.3%	2	161	1be0eeeb2095…
33	Qwen/Qwen2.5-14B-Instruct-AWQ	67.6%	67.6%	2	74	15ace62cc75e…
34	Qwen/Qwen2.5-3B-Instruct	66.2%	66.2%	2	74	bc9373c5618f…
35	solidrust/Phi-3-mini-4k-instruct-AWQ	66.2%	66.2%	2	74	c243aa340f7e…
36	Qwen/Qwen2.5-3B-Instruct-AWQ	64.9%	64.9%	2	74	4d5623aa43fb…
37	stelterlab/Mistral-Small-24B-Instruct-2501-AWQ	62.2%	62.2%	2	74	ffa9976e7260…
38	Qwen/Qwen2-1.5B-Instruct-AWQ	62.2%	62.2%	2	74	5a142d54943e…
39	casperhansen/mistral-nemo-instruct-2407-awq	60.1%	60.8%	2	74	7c066bc021eb…
40	Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ	60.1%	60.8%	2	74	48436f56ed74…
41	speakleash/Bielik-11B-v3.0-Instruct-awq	55.4%	56.8%	2	74	99fb78105fe6…
42	TheBloke/Mistral-7B-Instruct-v0.2-AWQ	49.3%	50%	2	74	1a6d918ccae4…
43	Qwen/Qwen2.5-0.5B-Instruct-AWQ	45.3%	48.7%	2	74	f61cfc37ae37…
44	TechxGenus/DeepSeek-Coder-V2-Lite-Instruct-AWQ	43.9%	46%	2	74	6b509ced9ec4…
45	solidrust/Mistral-7B-Instruct-v0.3-AWQ	43.9%	44.6%	2	74	cc47ffbd3592…
46	TheBloke/OpenHermes-2.5-Mistral-7B-AWQ	42.6%	43.2%	2	74	48f33ceacc52…
47	LGAI-EXAONE/EXAONE-4.0-32B-AWQ	41.6%	41.6%	2	161	12da2b539376…
48	casperhansen/mistral-7b-instruct-v0.1-awq	39.9%	40.5%	2	74	f0a91339b1a0…
49	TheBloke/zephyr-7B-beta-AWQ	39.9%	40.5%	2	74	46f3d3843ee0…
50	LGAI-EXAONE/EXAONE-4.0-1.2B-AWQ	39.2%	40.5%	2	74	509ac848736d…
51	TheBloke/Synthia-v3.0-11B-AWQ	33.8%	33.8%	2	74	bb875e4a813b…
52	TheBloke/TinyLlama-1.1B-Chat-v1.0-AWQ	29.1%	29.7%	2	74	0f647df1ba47…

Average score is computed across all signed runs for that model. Each run is verified against the submitter's Ed25519 machine key.

Recent Reports

Run ID	Model	Score	Submitted
25f823d8bcb66207	casperhansen/llama-3.3-70b-instruct-awq	72.1%	2026-06-08 11:34
f239306710a02d36	casperhansen/llama-3.3-70b-instruct-awq	72.1%	2026-06-08 10:32
cbaa89e50c9b6cef	TitanML/Llama3-OpenBioLLM-70B-AWQ-4bit	84.5%	2026-06-08 09:28
1abb10dd187f78b8	TitanML/Llama3-OpenBioLLM-70B-AWQ-4bit	83.9%	2026-06-08 08:53
ee1d4f1f1c6853df	Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ	75.8%	2026-06-08 08:13
6144551c0ac0d07f	Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ	75.8%	2026-06-08 04:00
315fb5a5696a96bf	gghfez/Mistral-Small-3.2-24B-Instruct-hf-AWQ	75.2%	2026-06-08 00:50
1fdcf63a9180984c	Valdemardi/DeepSeek-R1-Distill-Qwen-32B-AWQ	68.9%	2026-06-08 00:38
38e675ffc366b2ba	Qwen/Qwen2.5-32B-Instruct-AWQ	77.6%	2026-06-08 00:10
c6972fe358fa4e78	gghfez/Mistral-Small-3.2-24B-Instruct-hf-AWQ	75.2%	2026-06-07 23:45
1b3d5f6bf755659e	casperhansen/llama-3-70b-instruct-awq	80.1%	2026-06-07 23:35
8faaddc770367976	Qwen/Qwen2.5-32B-Instruct-AWQ	77%	2026-06-07 23:23
18c41471d2144c7b	casperhansen/llama-3-70b-instruct-awq	80.1%	2026-06-07 22:43
2cefe70d6ef44ad6	casperhansen/llama-3.3-70b-instruct-awq	72.1%	2026-06-07 21:47
9584df43e0e9ceac	Valdemardi/DeepSeek-R1-Distill-Qwen-32B-AWQ	68.9%	2026-06-07 21:26
a59e39db5aeed1f9	casperhansen/llama-3.3-70b-instruct-awq	72.1%	2026-06-07 20:45
b48017adfee9239e	hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4	81.4%	2026-06-07 19:39
e05253676c397fdc	casperhansen/mixtral-instruct-awq	73.9%	2026-06-07 19:21
039d3e4015f59317	hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4	81.4%	2026-06-07 18:52
44588fefd43976ce	casperhansen/mixtral-instruct-awq	74.5%	2026-06-07 18:43

Add your model to the scoreboard

forge break --model qwen3:14b --share

Requires a Forge BPoS passport. Results are signed with your machine's Ed25519 key and verified server-side before appearing. Forge is model-agnostic — local models, OpenAI, Anthropic, and any Ollama-compatible backend are all supported.

GitHub · JSON API