LLM for FHE Inference example in docs throwing error

outofstep58 · February 3, 2026, 12:35am

Python 3.12.3

concrete-ml: 1.9.0

poetry: 1.8.4

I copied the example in the docs for compiling an LLM for FHE Inference exactly and I get the following error:

Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 2330.19it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     | Details
---------------------+------------+--------
h.{0...11}.attn.bias | UNEXPECTED |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Compiling FHE layers:   0%|                                                                                                                                                                | 0/49 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/aclifton/crypto-ai-irad/scratch/check_base_model.py", line 49, in <module>
    hybrid_model.compile_model(input_tensor, n_bits=8, use_dynamic_quantization=True)
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/torch/hybrid_model.py", line 670, in compile_model
    self.private_q_modules[name] = compile_torch_model(
                                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/torch/compile.py", line 347, in compile_torch_model
    return _compile_torch_or_onnx_model(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/torch/compile.py", line 263, in _compile_torch_or_onnx_model
    quantized_module.compile(
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/quantization/quantized_module.py", line 932, in compile
    self.fhe_circuit = compiler.compile(
                       ^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/compilation/compiler.py", line 203, in compile
    fhe_module = self._module_compiler.compile(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/compilation/module_compiler.py", line 423, in compile
    ).convert_many(graphs, mlir_context)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 106, in convert_many
    @func.FuncOp.from_py_func(*input_types, name=name)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/mlir/dialects/_func_ops_ext.py", line 187, in decorator
    return_values = f(*func_args, **func_kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 129, in main
    self.node(ctx, node, preds)
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 310, in node
    conversion = converter(ctx, node, preds)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 803, in tlu
    ctx.error(highlights)
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/context.py", line 272, in error
    GraphProcessor.error(self.graph, highlights)
  File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/representation/graph.py", line 1040, in error
    raise RuntimeError(message)
RuntimeError: Function you are trying to compile cannot be compiled

%0 = _x                                            # EncryptedTensor<int8, shape=(1, 32, 768)>          ∈ [-128, 127]
%1 = reshape(%0, newshape=[ -1 768])               # EncryptedTensor<int8, shape=(32, 768)>             ∈ [-128, 127]
%2 = [[-21 -12  ...    0   1]]                     # ClearTensor<int8, shape=(768, 2304)>               ∈ [-127, 125]            @ /Gemm.matmul
%3 = matmul(%1, %2)                                # EncryptedTensor<int17, shape=(32, 2304)>           ∈ [-65407, 60730]        @ /Gemm.matmul
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this 17-bit value is used as an input to a table lookup
                                                                                                                                                /home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/quantization/quantized_ops.py:385
%4 = subgraph(%3)                                  # EncryptedTensor<uint8, shape=(32, 2304)>           ∈ [0, 255]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ but only up to 16-bit table lookups are supported
                                                                                                                   /home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/quantization/quantizers.py:717
%5 = reshape(%4, newshape=[   1   32 2304])        # EncryptedTensor<uint8, shape=(1, 32, 2304)>        ∈ [0, 255]
return %5

Subgraphs:

    %4 = subgraph(%3):

         %0 = input                            # EncryptedTensor<uint10, shape=(32, 2304)>         @ /Gemm.matmul
         %1 = astype(%0, dtype=float64)        # EncryptedTensor<float64, shape=(32, 2304)>        @ /Gemm.matmul_rounding
         %2 = 0                                # ClearScalar<uint1>
         %3 = add(%1, %2)                      # EncryptedTensor<float64, shape=(32, 2304)>
         %4 = [[-14190   ... 8   -924]]        # ClearTensor<int17, shape=(1, 2304)>
         %5 = subtract(%3, %4)                 # EncryptedTensor<float64, shape=(32, 2304)>
         %6 = 0.00011426456113290066           # ClearScalar<float64>
         %7 = multiply(%6, %5)                 # EncryptedTensor<float64, shape=(32, 2304)>
         %8 = [ 0.480339 ... .00324764]        # ClearTensor<float32, shape=(2304,)>
         %9 = add(%7, %8)                      # EncryptedTensor<float64, shape=(32, 2304)>
        %10 = 0.05182091052524997              # ClearScalar<float64>
        %11 = divide(%9, %10)                  # EncryptedTensor<float64, shape=(32, 2304)>
        %12 = 125                              # ClearScalar<uint7>
        %13 = add(%11, %12)                    # EncryptedTensor<float64, shape=(32, 2304)>
        %14 = rint(%13)                        # EncryptedTensor<float64, shape=(32, 2304)>
        %15 = 0                                # ClearScalar<uint1>
        %16 = 255                              # ClearScalar<uint8>
        %17 = clip(%14, %15, %16)              # EncryptedTensor<float64, shape=(32, 2304)>
        %18 = astype(%17, dtype=int_)          # EncryptedTensor<uint1, shape=(32, 2304)>
        return %18

Any ideas on how to correct this?