Categoria

llama.cpp

Pagina 1 di 1

Maurizio Fonte - Consulente Informatico - Ingegnere del Software e Cyber Security Specialist Freelance

Benchmarking Qwen3.6-35B-A3B on a 16GB RTX 5060 Ti: A Full Engineering Teardown

16/06/2026

The engineering companion to the strategic piece on local inference, deliberately exhaustive. llama.cpp build flags for Blackwell, VRAM accounting to the MiB, context ceilings per quantization, prefill and decode throughput with and without MTP, a roofline analysis of why speculative decoding helps this MoE, a 200-call agentic tool-calling harness, and an autopsy of a KV-cache compression technique that crashed with its CUDA stack trace. Every figure measured on one fixed rig. Continua a leggere

Ultima modifica: Martedì 16 Giugno 2026, alle 14:04

Calendario

Archivi

Giugno 2026 19
Maggio 2026 24
Aprile 2026 28
Marzo 2026 36
Febbraio 2026 36
Gennaio 2026 34
Dicembre 2025 23
Novembre 2025 20
Ottobre 2025 23
Settembre 2025 23
Agosto 2025 1
Luglio 2025 23
Giugno 2025 30
Maggio 2025 27
Aprile 2025 16
Marzo 2025 14
Febbraio 2025 17
Gennaio 2025 23
Giugno 2023 1
Maggio 2023 1
Agosto 2022 1
Gennaio 2021 2
Agosto 2020 1
Marzo 2020 1
Marzo 2018 5
Febbraio 2018 3
Maggio 2017 5
Marzo 2017 1
Luglio 2016 2
Marzo 2016 1
Febbraio 2016 2
Marzo 2015 2
Novembre 2013 1
Giugno 2012 2
Maggio 2011 1
Dicembre 2010 1
Ottobre 2010 1
Maggio 2010 1
Dicembre 2009 3
Giugno 2009 9