concrete-ml: 1.9.0
python: 3.12.3
Is it possible to use concrete-ml with a model deployed in vllm? I searched this forum and the issues in github and didn’t see mention of vllm. I’ve got a llama variation (mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) deployed on some hardware using vllm but I’m unsure if it’s possible to use concrete-ml with vllm?
If there is a way, I’d appreciate some guidance on how to implement it. If there isn’t a way to use vllm with concrete-ml, what would be the best alternative approach to use concrete-ml with this model on some gpus?
Thanks in advance for your help!