Categoria

Pagina 1 di 1

Maurizio Fonte - Consulente Informatico - Ingegnere del Software e Cyber Security Specialist Freelance

Benchmarking Qwen3.6-35B-A3B on a 16GB RTX 5060 Ti: A Full Engineering Teardown

Benchmarking Qwen3.6-35B-A3B on a 16GB RTX 5060 Ti: A Full Engineering Teardown The engineering companion to the strategic piece on local inference, deliberately exhaustive. llama.cpp build flags for Blackwell, VRAM accounting to the MiB, context ceilings per quantization, prefill and decode throughput with and without MTP, a roofline analysis of why speculative decoding helps this MoE, a 200-call agentic tool-calling harness, and an autopsy of a KV-cache compression technique that crashed with its CUDA stack trace. Every figure measured on one fixed rig. Continua a leggere
Ultima modifica: