Hi everyone,
I’m using a model quantized with Brevitas. What I can’t figure out is:
How do Concrete-ML/FINN decide the accumulator bit-width for MAC operations?
In other words:
Does Brevitas export something explicit about the accumulator precision (via the quantizer metadata / ONNX / QONNX), and Concrete-ML / FINN just read it?
Is the accumulator bit-width computed later during conversion/graph analysis?
If it’s computed internally, where in the Concrete-ML codebase does this happen, and what rule is used? (even just a pointer to the relevant function/file would be great)
Accumulator bit-widths are not chosen explicitly in Concrete-ML. Instead, we statically derive the exact accumulator range at compilation time from the integer MAC operations, and allocate the minimal safe bit-width to avoid any overflow.
That said, Concrete-ML provides the rounding_threshold_bits parameter, which controls how many least-significant bits can be rounded away after accumulation. (see Advanced features | Concrete ML) . This effectively brings the accumulator down to a smaller, fixed precision. We typically set this to 6, as it offers a good trade-off between FHE performance and model accuracy.
Thanks a lot for the clarification, that makes sense.
Just to double-check my understanding and make it explicit, am I correct in saying that Brevitas does not export any explicit accumulator bit-width (e.g. via ONNX metadata), but only the quantization parameters for weights and activations (bit-width, scale, signedness, etc.)?
And that, as a consequence, Concrete-ML / FINN always recompute the accumulator range internally from the integer MAC graph, rather than reading a predefined accumulator precision from Brevitas?
I just want to be sure I’m not missing some Brevitas-side hint that is consumed during conversion.