Zijie Li
3d044dbf53
add llama3.2-vision Pytorch example ( #12165 )
2024-10-09 09:20:42 +08:00
Shaojun Liu
e2ef9e938e
Delete deprecated docs/readthedocs directory ( #12164 )
2024-10-08 14:48:02 +08:00
Yishuo Wang
644af2a76e
add basic llama 3.2 vision support ( #12163 )
2024-10-08 10:46:48 +08:00
Ch1y0q
9b75806d14
Update Windows GPU quickstart regarding demo ( #12124 )
...
* use Qwen2-1.5B-Instruct in demo
* update
* add reference link
* update
* update
2024-09-29 18:08:49 +08:00
Ch1y0q
17c23cd759
add llama3.2 GPU example ( #12137 )
...
* add llama3.2 GPU example
* change prompt format reference url
* update
* add Meta-Llama-3.2-1B-Instruct sample output
* update wording
2024-09-29 14:41:54 +08:00
Yuwen Hu
f71b38a994
Update MiniCPM_V_26 GPU example with save & load ( #12127 )
2024-09-26 17:40:22 +08:00
Yishuo Wang
669ff1a97b
fix sd1.5 ( #12129 )
2024-09-26 17:15:16 +08:00
Yishuo Wang
a266528719
optimize llama 3.2 rope ( #12128 )
2024-09-26 16:08:10 +08:00
Yishuo Wang
584c3489e7
add basic support for llama3.2 ( #12125 )
2024-09-26 15:46:19 +08:00
Yishuo Wang
66f419f8b7
fix qwen2 vl ( #12126 )
2024-09-26 15:44:02 +08:00
Ch1y0q
2ea13d502f
Add minicpm3 gpu example ( #12114 )
...
* add minicpm3 gpu example
* update GPU example
* update
---------
Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>
2024-09-26 13:51:37 +08:00
Yishuo Wang
77af9bc5fa
support passing None to low_bit in optimize_model ( #12121 )
2024-09-26 11:09:35 +08:00
Yishuo Wang
47e0b83cbf
optimize sd 1.5 ( #12119 )
2024-09-25 15:45:13 +08:00
Jin, Qiao
2bedb17be7
Add Qwen2.5 NPU Example ( #12110 )
...
* Add Qwen2.5 NPU Example
* fix
* Merge qwen2.py and qwen2.5.py into qwen.py
* Fix description
2024-09-25 15:20:03 +08:00
Shaojun Liu
657889e3e4
use english prompt by default ( #12115 )
2024-09-24 17:40:50 +08:00
Yishuo Wang
5d63aef60b
optimize qwen2 vl again ( #12109 )
2024-09-23 13:22:01 +08:00
Ruonan Wang
03bd01c99c
optimize npu qwen2 ( #12107 )
2024-09-20 19:46:16 +08:00
Jinhe
02399021d6
add npu load_low_bit api in all-in-one benchmark ( #12103 )
2024-09-20 17:56:08 +08:00
Yuwen Hu
47a9597f24
Add missing link for Qwen2.5 to CN-ZH readme ( #12106 )
2024-09-20 17:30:30 +08:00
Yishuo Wang
9239fd4f12
add basic support and optimization for qwen2-vl ( #12104 )
2024-09-20 17:23:06 +08:00
Yuwen Hu
828fa01ad3
[NPU] Add mixed_precision for Qwen2 7B ( #12098 )
...
* Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct
* Small fix
* Fixed on load low bit with mixed precision
* Small fix
* Update example accordingly
* Update for default prompt
* Update base on comments
* Final fix
2024-09-20 16:36:21 +08:00
Ch1y0q
2269768e71
add internvl2 example ( #12102 )
...
* add internvl2 example
* add to README.md
* update
* add link to zh-CN readme
2024-09-20 16:31:54 +08:00
joan726
ad1fe77fe6
Add language switching ( #12096 )
2024-09-20 16:05:20 +08:00
Ruonan Wang
09b8c80d9d
update code for NPU qwen2 ( #12094 )
...
* update code
* fix
2024-09-20 15:58:32 +08:00
Jin, Qiao
db7500bfd4
Add Qwen2.5 GPU example ( #12101 )
...
* Add Qwen2.5 GPU example
* fix end line
* fix description
2024-09-20 15:55:57 +08:00
Guancheng Fu
b36359e2ab
Fix xpu serving image oneccl ( #12100 )
2024-09-20 15:25:41 +08:00
Yishuo Wang
54b973c744
fix ipex_llm import in transformers 4.45 ( #12099 )
2024-09-20 15:24:59 +08:00
Guancheng Fu
a6cbc01911
Use new oneccl for ipex-llm serving image ( #12097 )
2024-09-20 14:52:49 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input ( #12095 )
...
* update vllm_online_benchmark script to support long input
* update guide
2024-09-20 14:18:30 +08:00
Ch1y0q
9650bf616a
add transpose_value_cache for NPU benchmark ( #12092 )
...
* add `transpose_value_cache`
* update
* update
2024-09-19 18:45:05 +08:00
Yuwen Hu
f7fb3c896c
Update lm_head optimization for Qwen2 7B ( #12090 )
2024-09-18 17:02:02 +08:00
Xu, Shuo
ee33b93464
Longbench: NV code to ipex-llm ( #11662 )
...
* add nv longbench
* LongBench: NV code to ipex-llm
* ammend
* add more models support
* ammend
* optimize LongBench's user experience
* ammend
* ammend
* fix typo
* ammend
* remove cuda related information & add a readme
* add license to python scripts & polish the readme
* ammend
* ammend
---------
Co-authored-by: cyita <yitastudy@gmail.com>
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2024-09-18 15:55:14 +08:00
Wang, Jian4
40e463c66b
Enable vllm load gptq model ( #12083 )
...
* enable vllm load gptq model
* update
* update
* update
* update style
2024-09-18 14:41:00 +08:00
Xiangyu Tian
c2774e1a43
Update oneccl to 0.0.3 in serving-xpu image ( #12088 )
2024-09-18 14:29:17 +08:00
Ruonan Wang
081af41def
[NPU] Optimize Qwen2 lm_head to use INT4 ( #12072 )
...
* temp save
* update
* fix
* fix
* Split lm_head into 7 parts & remove int8 for lm_head when sym_int4
* Simlify and add condition to code
* Small fix
* refactor some code
* fix style
* fix style
* fix style
* fix
* fix
* temp sav e
* refactor
* fix style
* further refactor
* simplify code
* meet code review
* fix style
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-09-14 15:26:46 +08:00
joan726
18714ceac7
Update README.md ( #12084 )
...
Modify vLLM related links
2024-09-14 15:24:08 +08:00
Ch1y0q
b4b8c3e495
add lowbit_path for generate.py, fix npu_model ( #12077 )
...
* add `lowbit_path` for `generate.py`, fix `npu_model`
* update `README.md`
2024-09-13 17:28:05 +08:00
Wang, Jian4
d703e4f127
Enable vllm multimodal minicpm-v-2-6 ( #12074 )
...
* enable minicpm-v-2-6
* add image_url readme
2024-09-13 13:28:35 +08:00
Ruonan Wang
a767438546
fix typo ( #12076 )
...
* fix typo
* fix
2024-09-13 11:44:42 +08:00
Ruonan Wang
3f0b24ae2b
update cpp quickstart ( #12075 )
...
* update cpp quickstart
* fix style
2024-09-13 11:35:32 +08:00
Shaojun Liu
9b4fee8b5b
disable nightly release for finetune images ( #12070 )
2024-09-12 15:10:50 +08:00
Shaojun Liu
beb876665d
pin gradio version to fix connection error ( #12069 )
2024-09-12 14:36:09 +08:00
Ruonan Wang
48d9092b5a
upgrade OneAPI version for cpp Windows ( #12063 )
...
* update version
* update quickstart
2024-09-12 11:12:12 +08:00
Jinhe
e78e45ee01
update NPU readme: run conhost as administrator ( #12066 )
2024-09-11 17:54:04 +08:00
Jinhe
4ca330da15
Fix NPU load error message and add minicpm npu lowbit feat ( #12064 )
...
* fix npu_model raise sym_int4 error
* add load_lowbit
* remove print&perf
2024-09-11 16:56:35 +08:00
Jinhe
32e8362da7
added minicpm cpu examples ( #12027 )
...
* minicpm cpu examples
* add link for minicpm-2
2024-09-11 15:51:21 +08:00
Ruonan Wang
a0c73c26d8
clean NPU code ( #12060 )
...
* clean code
* remove time.perf_counter()
2024-09-11 15:10:35 +08:00
Wang, Jian4
c75f3dd874
vllm no padding glm4 to avoid nan error ( #12062 )
...
* no padding glm4
* add codegeex
2024-09-11 13:44:40 +08:00
Chu,Youcheng
649390c464
fix: textual and env variable adjustment ( #12038 )
2024-09-11 13:38:01 +08:00
Yuwen Hu
c94032f97e
Try to fix llamaindex ut again ( #12061 )
2024-09-11 12:11:04 +08:00