Commit graph

3492 commits

Author SHA1 Message Date
Ch1y0q
17c23cd759
add llama3.2 GPU example (#12137)
* add llama3.2 GPU example

* change prompt format reference url

* update

* add Meta-Llama-3.2-1B-Instruct sample output

* update wording
2024-09-29 14:41:54 +08:00
Yuwen Hu
f71b38a994
Update MiniCPM_V_26 GPU example with save & load (#12127) 2024-09-26 17:40:22 +08:00
Yishuo Wang
669ff1a97b
fix sd1.5 (#12129) 2024-09-26 17:15:16 +08:00
Yishuo Wang
a266528719
optimize llama 3.2 rope (#12128) 2024-09-26 16:08:10 +08:00
Yishuo Wang
584c3489e7
add basic support for llama3.2 (#12125) 2024-09-26 15:46:19 +08:00
Yishuo Wang
66f419f8b7
fix qwen2 vl (#12126) 2024-09-26 15:44:02 +08:00
Ch1y0q
2ea13d502f
Add minicpm3 gpu example (#12114)
* add minicpm3 gpu example

* update GPU example

* update

---------

Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>
2024-09-26 13:51:37 +08:00
Yishuo Wang
77af9bc5fa
support passing None to low_bit in optimize_model (#12121) 2024-09-26 11:09:35 +08:00
Yishuo Wang
47e0b83cbf
optimize sd 1.5 (#12119) 2024-09-25 15:45:13 +08:00
Jin, Qiao
2bedb17be7
Add Qwen2.5 NPU Example (#12110)
* Add Qwen2.5 NPU Example

* fix

* Merge qwen2.py and qwen2.5.py into qwen.py

* Fix description
2024-09-25 15:20:03 +08:00
Shaojun Liu
657889e3e4
use english prompt by default (#12115) 2024-09-24 17:40:50 +08:00
Yishuo Wang
5d63aef60b
optimize qwen2 vl again (#12109) 2024-09-23 13:22:01 +08:00
Ruonan Wang
03bd01c99c
optimize npu qwen2 (#12107) 2024-09-20 19:46:16 +08:00
Jinhe
02399021d6
add npu load_low_bit api in all-in-one benchmark (#12103) 2024-09-20 17:56:08 +08:00
Yuwen Hu
47a9597f24
Add missing link for Qwen2.5 to CN-ZH readme (#12106) 2024-09-20 17:30:30 +08:00
Yishuo Wang
9239fd4f12
add basic support and optimization for qwen2-vl (#12104) 2024-09-20 17:23:06 +08:00
Yuwen Hu
828fa01ad3
[NPU] Add mixed_precision for Qwen2 7B (#12098)
* Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct

* Small fix

* Fixed on load low bit with mixed precision

* Small fix

* Update example accordingly

* Update for default prompt

* Update base on comments

* Final fix
2024-09-20 16:36:21 +08:00
Ch1y0q
2269768e71
add internvl2 example (#12102)
* add internvl2 example

* add to README.md

* update

* add link to zh-CN readme
2024-09-20 16:31:54 +08:00
joan726
ad1fe77fe6
Add language switching (#12096) 2024-09-20 16:05:20 +08:00
Ruonan Wang
09b8c80d9d
update code for NPU qwen2 (#12094)
* update code

* fix
2024-09-20 15:58:32 +08:00
Jin, Qiao
db7500bfd4
Add Qwen2.5 GPU example (#12101)
* Add Qwen2.5 GPU example

* fix end line

* fix description
2024-09-20 15:55:57 +08:00
Guancheng Fu
b36359e2ab
Fix xpu serving image oneccl (#12100) 2024-09-20 15:25:41 +08:00
Yishuo Wang
54b973c744
fix ipex_llm import in transformers 4.45 (#12099) 2024-09-20 15:24:59 +08:00
Guancheng Fu
a6cbc01911
Use new oneccl for ipex-llm serving image (#12097) 2024-09-20 14:52:49 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input (#12095)
* update vllm_online_benchmark script to support long input

* update guide
2024-09-20 14:18:30 +08:00
Ch1y0q
9650bf616a
add transpose_value_cache for NPU benchmark (#12092)
* add `transpose_value_cache`

* update

* update
2024-09-19 18:45:05 +08:00
Yuwen Hu
f7fb3c896c
Update lm_head optimization for Qwen2 7B (#12090) 2024-09-18 17:02:02 +08:00
Xu, Shuo
ee33b93464
Longbench: NV code to ipex-llm (#11662)
* add nv longbench

* LongBench: NV code to ipex-llm

* ammend

* add more models support

* ammend

* optimize LongBench's user experience

* ammend

* ammend

* fix typo

* ammend

* remove cuda related information & add a readme

* add license to python scripts & polish the readme

* ammend

* ammend

---------

Co-authored-by: cyita <yitastudy@gmail.com>
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2024-09-18 15:55:14 +08:00
Wang, Jian4
40e463c66b
Enable vllm load gptq model (#12083)
* enable vllm load gptq model

* update

* update

* update

* update style
2024-09-18 14:41:00 +08:00
Xiangyu Tian
c2774e1a43
Update oneccl to 0.0.3 in serving-xpu image (#12088) 2024-09-18 14:29:17 +08:00
Ruonan Wang
081af41def
[NPU] Optimize Qwen2 lm_head to use INT4 (#12072)
* temp save

* update

* fix

* fix

* Split lm_head into 7 parts & remove int8 for lm_head when sym_int4

* Simlify and add condition to code

* Small fix

* refactor some code

* fix style

* fix style

* fix style

* fix

* fix

* temp sav e

* refactor

* fix style

* further refactor

* simplify code

* meet code review

* fix style

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-09-14 15:26:46 +08:00
joan726
18714ceac7
Update README.md (#12084)
Modify vLLM related links
2024-09-14 15:24:08 +08:00
Ch1y0q
b4b8c3e495
add lowbit_path for generate.py, fix npu_model (#12077)
* add `lowbit_path` for `generate.py`, fix `npu_model`

* update `README.md`
2024-09-13 17:28:05 +08:00
Wang, Jian4
d703e4f127
Enable vllm multimodal minicpm-v-2-6 (#12074)
* enable minicpm-v-2-6

* add image_url readme
2024-09-13 13:28:35 +08:00
Ruonan Wang
a767438546
fix typo (#12076)
* fix typo

* fix
2024-09-13 11:44:42 +08:00
Ruonan Wang
3f0b24ae2b
update cpp quickstart (#12075)
* update cpp quickstart

* fix style
2024-09-13 11:35:32 +08:00
Shaojun Liu
9b4fee8b5b
disable nightly release for finetune images (#12070) 2024-09-12 15:10:50 +08:00
Shaojun Liu
beb876665d
pin gradio version to fix connection error (#12069) 2024-09-12 14:36:09 +08:00
Ruonan Wang
48d9092b5a
upgrade OneAPI version for cpp Windows (#12063)
* update version

* update quickstart
2024-09-12 11:12:12 +08:00
Jinhe
e78e45ee01
update NPU readme: run conhost as administrator (#12066) 2024-09-11 17:54:04 +08:00
Jinhe
4ca330da15
Fix NPU load error message and add minicpm npu lowbit feat (#12064)
* fix npu_model raise sym_int4 error

* add load_lowbit

* remove print&perf
2024-09-11 16:56:35 +08:00
Jinhe
32e8362da7
added minicpm cpu examples (#12027)
* minicpm cpu examples

* add link for minicpm-2
2024-09-11 15:51:21 +08:00
Ruonan Wang
a0c73c26d8
clean NPU code (#12060)
* clean code

* remove time.perf_counter()
2024-09-11 15:10:35 +08:00
Wang, Jian4
c75f3dd874
vllm no padding glm4 to avoid nan error (#12062)
* no padding glm4

* add codegeex
2024-09-11 13:44:40 +08:00
Chu,Youcheng
649390c464
fix: textual and env variable adjustment (#12038) 2024-09-11 13:38:01 +08:00
Yuwen Hu
c94032f97e
Try to fix llamaindex ut again (#12061) 2024-09-11 12:11:04 +08:00
Shaojun Liu
7e1e51d91a
Update vllm setting (#12059)
* revert

* update

* update

* update
2024-09-11 11:45:08 +08:00
Wang, Jian4
30a8680645
Update for vllm one card padding (#12058) 2024-09-11 10:52:55 +08:00
Zijie Li
c5fdfde1bd
fix npu-model prompt (#12057) 2024-09-11 10:06:45 +08:00
Yuwen Hu
94dade9aca
Fix UT of ipex_llm.llamaindex (#12055) 2024-09-11 09:58:43 +08:00