Using concrete-ml with vllm

concrete-ml: 1.9.0
python: 3.12.3

Is it possible to use concrete-ml with a model deployed in vllm? I searched this forum and the issues in github and didn’t see mention of vllm. I’ve got a llama variation (mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) deployed on some hardware using vllm but I’m unsure if it’s possible to use concrete-ml with vllm?

If there is a way, I’d appreciate some guidance on how to implement it. If there isn’t a way to use vllm with concrete-ml, what would be the best alternative approach to use concrete-ml with this model on some gpus?

Thanks in advance for your help!

Any thoughts on this post?

Hi @outofstep58 ,

FHE is not ready to run LLM models today. The token generation speed with decent context size would easily be hours to generate a single token.

@jfrery Thank you for your feedback.

Would concrete-ml work with a gpt2 or llama3 model deployed in vllm?