Enable vllm multimodal minicpm-v-2-6 (#12074)

* enable minicpm-v-2-6 * add image_url readme
2024-09-13 13:28:35 +08:00 · 2024-09-13 13:28:35 +08:00 · d703e4f127
commit d703e4f127
parent a767438546
2 changed files with 35 additions and 0 deletions
--- a/python/llm/example/GPU/vLLM-Serving/README.md
+++ b/python/llm/example/GPU/vLLM-Serving/README.md
@ -128,6 +128,35 @@ curl http://localhost:8000/v1/completions \
 }' &
 ```
 ##### Image input
 image input only supports [MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6)now.
 ```bash
 curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniCPM-V-2_6",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "图片里有什么?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 128
  }'
 ```
 #### Tensor parallel
 > Note: We recommend to use docker for tensor parallel deployment.
--- a/python/llm/src/ipex_llm/vllm/xpu/model_convert.py
+++ b/python/llm/src/ipex_llm/vllm/xpu/model_convert.py
@ -102,6 +102,12 @@ def get_load_function(low_bit):
                modules = ["35.mlp", "36.mlp", "37.mlp", "38.mlp", "39.mlp"]
            else:
                modules = None
            if "minicpm" in self.model_config.model.lower():
                modules = ["vpm", "resampler"]
            # only for minicpm_2_6
            if "minicpm-v" in self.model_config.model.lower():
                from ipex_llm.transformers.models.minicpmv import merge_qkv
                self.model.vpm.apply(merge_qkv)
            optimize_model(self.model, low_bit=low_bit, torch_dtype=self.model_config.dtype,
                           modules_to_not_convert=modules)
            self.model = self.model.to(device=self.device_config.device,