Forge Reliability Scoreboard

Community-generated AI reliability benchmarks — cryptographically signed, tamper-evident.

Not a vendor benchmark. Not self-reported. Real inference. Real signatures.

52
Models Tested
118
Signed Runs
8
Scenario Categories
Ed25519
Signature Algorithm
90–100%
Excellent — strong adversarial reliability
75–89%
Good — reliable with minor gaps
<75%
Needs work — significant weaknesses

Reliability Rankings

# Model Avg Score Best Score Runs Scenarios Latest Report
1 Qwen/Qwen2.5-Coder-14B-Instruct-AWQ 93.2% 93.2% 2 74 62fe4842fcae…
2 TitanML/Llama3-OpenBioLLM-70B-AWQ-4bit 84.2% 84.5% 2 161 cbaa89e50c9b…
3 stelterlab/phi-4-AWQ 83.8% 83.8% 2 74 2bdcd83755bb…
4 hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 83.5% 83.9% 2 161 9e3d2b71dec3…
5 hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 81.4% 81.4% 4 161 b48017adfee9…
6 solidrust/gemma-2-9b-it-AWQ 81.1% 81.1% 2 74 2a3af65746d5…
7 casperhansen/llama-3-8b-instruct-awq 81.1% 81.1% 2 74 8d6277f5c429…
8 Qwen/Qwen2.5-Coder-7B-Instruct-AWQ 81.1% 81.1% 2 74 a3732226e714…
9 Qwen/Qwen2.5-72B-Instruct-AWQ 80.8% 80.8% 2 161 e649c6682656…
10 casperhansen/llama-3-70b-instruct-awq 80.1% 80.1% 4 161 1b3d5f6bf755…
11 gghfez/Mistral-Small-3.2-24B-Instruct-hf-AWQ 77.8% 81.1% 4 74 315fb5a5696a…
12 stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ 77.7% 78.4% 2 74 7e5d464be7bb…
13 Qwen/Qwen2.5-Coder-32B-Instruct-AWQ 76.4% 77% 2 74 20e7e330151e…
14 Qwen/Qwen2.5-32B-Instruct-AWQ 75.9% 77.6% 6 161 38e675ffc366…
15 Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ 75.8% 75.8% 2 161 ee1d4f1f1c68…
16 TheBloke/Llama-2-7B-Chat-AWQ 75.7% 75.7% 2 74 c9b6869ff2b9…
17 casperhansen/mixtral-instruct-awq 74.2% 74.5% 2 161 e05253676c39…
18 Qwen/Qwen3-14B-AWQ 73% 73% 2 74 6a0320159671…
19 Qwen/Qwen2.5-Coder-3B-Instruct-AWQ 72.3% 73% 2 74 8f2a5b01ab61…
20 Qwen/Qwen2.5-1.5B-Instruct-AWQ 72.3% 73% 2 74 c4ad2b39236f…
21 casperhansen/llama-3.3-70b-instruct-awq 72.1% 72.1% 6 161 25f823d8bcb6…
22 cyankiwi/Nemotron-Orchestrator-8B-AWQ-4bit 71.6% 71.6% 2 74 6538e56364de…
23 hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 71.6% 71.6% 2 74 27b3f09cf1d6…
24 Qwen/Qwen2-7B-Instruct-AWQ 70.9% 71.6% 2 74 adc456efa45f…
25 LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct-AWQ 70.3% 70.3% 2 74 083a5844b10f…
26 Qwen/Qwen3-8B-AWQ 70.3% 70.3% 2 74 3a46b1c666d1…
27 Qwen/Qwen3-4B-AWQ 70.3% 70.3% 2 74 238fb7a8f2eb…
28 Qwen/Qwen2.5-7B-Instruct-AWQ 69.6% 70.3% 2 74 073523aaaa74…
29 Valdemardi/DeepSeek-R1-Distill-Qwen-32B-AWQ 68.9% 68.9% 2 161 1fdcf63a9180…
30 cyankiwi/GLM-4.7-Flash-AWQ-4bit 68.9% 68.9% 2 74 c50af6ebd091…
31 Qwen/Qwen3-32B-AWQ 68.3% 68.3% 2 161 787244443b55…
32 Qwen/QwQ-32B-AWQ 68.3% 68.3% 2 161 1be0eeeb2095…
33 Qwen/Qwen2.5-14B-Instruct-AWQ 67.6% 67.6% 2 74 15ace62cc75e…
34 Qwen/Qwen2.5-3B-Instruct 66.2% 66.2% 2 74 bc9373c5618f…
35 solidrust/Phi-3-mini-4k-instruct-AWQ 66.2% 66.2% 2 74 c243aa340f7e…
36 Qwen/Qwen2.5-3B-Instruct-AWQ 64.9% 64.9% 2 74 4d5623aa43fb…
37 stelterlab/Mistral-Small-24B-Instruct-2501-AWQ 62.2% 62.2% 2 74 ffa9976e7260…
38 Qwen/Qwen2-1.5B-Instruct-AWQ 62.2% 62.2% 2 74 5a142d54943e…
39 casperhansen/mistral-nemo-instruct-2407-awq 60.1% 60.8% 2 74 7c066bc021eb…
40 Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ 60.1% 60.8% 2 74 48436f56ed74…
41 speakleash/Bielik-11B-v3.0-Instruct-awq 55.4% 56.8% 2 74 99fb78105fe6…
42 TheBloke/Mistral-7B-Instruct-v0.2-AWQ 49.3% 50% 2 74 1a6d918ccae4…
43 Qwen/Qwen2.5-0.5B-Instruct-AWQ 45.3% 48.7% 2 74 f61cfc37ae37…
44 TechxGenus/DeepSeek-Coder-V2-Lite-Instruct-AWQ 43.9% 46% 2 74 6b509ced9ec4…
45 solidrust/Mistral-7B-Instruct-v0.3-AWQ 43.9% 44.6% 2 74 cc47ffbd3592…
46 TheBloke/OpenHermes-2.5-Mistral-7B-AWQ 42.6% 43.2% 2 74 48f33ceacc52…
47 LGAI-EXAONE/EXAONE-4.0-32B-AWQ 41.6% 41.6% 2 161 12da2b539376…
48 casperhansen/mistral-7b-instruct-v0.1-awq 39.9% 40.5% 2 74 f0a91339b1a0…
49 TheBloke/zephyr-7B-beta-AWQ 39.9% 40.5% 2 74 46f3d3843ee0…
50 LGAI-EXAONE/EXAONE-4.0-1.2B-AWQ 39.2% 40.5% 2 74 509ac848736d…
51 TheBloke/Synthia-v3.0-11B-AWQ 33.8% 33.8% 2 74 bb875e4a813b…
52 TheBloke/TinyLlama-1.1B-Chat-v1.0-AWQ 29.1% 29.7% 2 74 0f647df1ba47…

Average score is computed across all signed runs for that model. Each run is verified against the submitter's Ed25519 machine key.

Recent Reports

Run IDModelScoreSubmitted
25f823d8bcb66207 casperhansen/llama-3.3-70b-instruct-awq 72.1% 2026-06-08 11:34
f239306710a02d36 casperhansen/llama-3.3-70b-instruct-awq 72.1% 2026-06-08 10:32
cbaa89e50c9b6cef TitanML/Llama3-OpenBioLLM-70B-AWQ-4bit 84.5% 2026-06-08 09:28
1abb10dd187f78b8 TitanML/Llama3-OpenBioLLM-70B-AWQ-4bit 83.9% 2026-06-08 08:53
ee1d4f1f1c6853df Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ 75.8% 2026-06-08 08:13
6144551c0ac0d07f Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ 75.8% 2026-06-08 04:00
315fb5a5696a96bf gghfez/Mistral-Small-3.2-24B-Instruct-hf-AWQ 75.2% 2026-06-08 00:50
1fdcf63a9180984c Valdemardi/DeepSeek-R1-Distill-Qwen-32B-AWQ 68.9% 2026-06-08 00:38
38e675ffc366b2ba Qwen/Qwen2.5-32B-Instruct-AWQ 77.6% 2026-06-08 00:10
c6972fe358fa4e78 gghfez/Mistral-Small-3.2-24B-Instruct-hf-AWQ 75.2% 2026-06-07 23:45
1b3d5f6bf755659e casperhansen/llama-3-70b-instruct-awq 80.1% 2026-06-07 23:35
8faaddc770367976 Qwen/Qwen2.5-32B-Instruct-AWQ 77% 2026-06-07 23:23
18c41471d2144c7b casperhansen/llama-3-70b-instruct-awq 80.1% 2026-06-07 22:43
2cefe70d6ef44ad6 casperhansen/llama-3.3-70b-instruct-awq 72.1% 2026-06-07 21:47
9584df43e0e9ceac Valdemardi/DeepSeek-R1-Distill-Qwen-32B-AWQ 68.9% 2026-06-07 21:26
a59e39db5aeed1f9 casperhansen/llama-3.3-70b-instruct-awq 72.1% 2026-06-07 20:45
b48017adfee9239e hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 81.4% 2026-06-07 19:39
e05253676c397fdc casperhansen/mixtral-instruct-awq 73.9% 2026-06-07 19:21
039d3e4015f59317 hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 81.4% 2026-06-07 18:52
44588fefd43976ce casperhansen/mixtral-instruct-awq 74.5% 2026-06-07 18:43

Add your model to the scoreboard

forge break --model qwen3:14b --share

Requires a Forge BPoS passport. Results are signed with your machine's Ed25519 key and verified server-side before appearing. Forge is model-agnostic — local models, OpenAI, Anthropic, and any Ollama-compatible backend are all supported.

GitHub  ·  JSON API