Python 3.12.3
concrete-ml: 1.9.0
poetry: 1.8.4
I copied the example in the docs for compiling an LLM for FHE Inference exactly and I get the following error:
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 148/148 [00:00<00:00, 2330.19it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key | Status | Details
---------------------+------------+--------
h.{0...11}.attn.bias | UNEXPECTED |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Compiling FHE layers: 0%| | 0/49 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/aclifton/crypto-ai-irad/scratch/check_base_model.py", line 49, in <module>
hybrid_model.compile_model(input_tensor, n_bits=8, use_dynamic_quantization=True)
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/torch/hybrid_model.py", line 670, in compile_model
self.private_q_modules[name] = compile_torch_model(
^^^^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/torch/compile.py", line 347, in compile_torch_model
return _compile_torch_or_onnx_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/torch/compile.py", line 263, in _compile_torch_or_onnx_model
quantized_module.compile(
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/quantization/quantized_module.py", line 932, in compile
self.fhe_circuit = compiler.compile(
^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/compilation/compiler.py", line 203, in compile
fhe_module = self._module_compiler.compile(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/compilation/module_compiler.py", line 423, in compile
).convert_many(graphs, mlir_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 106, in convert_many
@func.FuncOp.from_py_func(*input_types, name=name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/mlir/dialects/_func_ops_ext.py", line 187, in decorator
return_values = f(*func_args, **func_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 129, in main
self.node(ctx, node, preds)
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 310, in node
conversion = converter(ctx, node, preds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/converter.py", line 803, in tlu
ctx.error(highlights)
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/mlir/context.py", line 272, in error
GraphProcessor.error(self.graph, highlights)
File "/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/fhe/representation/graph.py", line 1040, in error
raise RuntimeError(message)
RuntimeError: Function you are trying to compile cannot be compiled
%0 = _x # EncryptedTensor<int8, shape=(1, 32, 768)> ∈ [-128, 127]
%1 = reshape(%0, newshape=[ -1 768]) # EncryptedTensor<int8, shape=(32, 768)> ∈ [-128, 127]
%2 = [[-21 -12 ... 0 1]] # ClearTensor<int8, shape=(768, 2304)> ∈ [-127, 125] @ /Gemm.matmul
%3 = matmul(%1, %2) # EncryptedTensor<int17, shape=(32, 2304)> ∈ [-65407, 60730] @ /Gemm.matmul
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this 17-bit value is used as an input to a table lookup
/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/quantization/quantized_ops.py:385
%4 = subgraph(%3) # EncryptedTensor<uint8, shape=(32, 2304)> ∈ [0, 255]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ but only up to 16-bit table lookups are supported
/home/aclifton/crypto-ai-irad/.venv/lib/python3.12/site-packages/concrete/ml/quantization/quantizers.py:717
%5 = reshape(%4, newshape=[ 1 32 2304]) # EncryptedTensor<uint8, shape=(1, 32, 2304)> ∈ [0, 255]
return %5
Subgraphs:
%4 = subgraph(%3):
%0 = input # EncryptedTensor<uint10, shape=(32, 2304)> @ /Gemm.matmul
%1 = astype(%0, dtype=float64) # EncryptedTensor<float64, shape=(32, 2304)> @ /Gemm.matmul_rounding
%2 = 0 # ClearScalar<uint1>
%3 = add(%1, %2) # EncryptedTensor<float64, shape=(32, 2304)>
%4 = [[-14190 ... 8 -924]] # ClearTensor<int17, shape=(1, 2304)>
%5 = subtract(%3, %4) # EncryptedTensor<float64, shape=(32, 2304)>
%6 = 0.00011426456113290066 # ClearScalar<float64>
%7 = multiply(%6, %5) # EncryptedTensor<float64, shape=(32, 2304)>
%8 = [ 0.480339 ... .00324764] # ClearTensor<float32, shape=(2304,)>
%9 = add(%7, %8) # EncryptedTensor<float64, shape=(32, 2304)>
%10 = 0.05182091052524997 # ClearScalar<float64>
%11 = divide(%9, %10) # EncryptedTensor<float64, shape=(32, 2304)>
%12 = 125 # ClearScalar<uint7>
%13 = add(%11, %12) # EncryptedTensor<float64, shape=(32, 2304)>
%14 = rint(%13) # EncryptedTensor<float64, shape=(32, 2304)>
%15 = 0 # ClearScalar<uint1>
%16 = 255 # ClearScalar<uint8>
%17 = clip(%14, %15, %16) # EncryptedTensor<float64, shape=(32, 2304)>
%18 = astype(%17, dtype=int_) # EncryptedTensor<uint1, shape=(32, 2304)>
return %18
Any ideas on how to correct this?