Yishuo Wang
9ea694484d
refactor ot remove old rope usage ( #12224 )
2024-10-17 17:06:09 +08:00
Jiao Wang
667f0db466
Update Eagle example to Eagle2+ipex-llm integration ( #11717 )
...
* update to e2 example
* update
* update
2024-10-16 23:16:14 -07:00
Jinhe
f983f1a8f4
Add Qwen2-VL gpu example ( #12135 )
...
* qwen2-vl readme
* add qwen2-vl example
* fix
* fix
* fix
* add link
* Update regarding modules_to_not_convert and readme
* Further fix
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-11 18:25:23 +08:00
Ruonan Wang
4d93bb81fe
Initial support of NPU level0 Model ( #12177 )
...
* first commit to support load dll and init llm pipeline
* add init generate
* fix style
* small updates
* fix style and check tokens number
2024-10-11 09:45:53 +08:00
Zijie Li
3d044dbf53
add llama3.2-vision Pytorch example ( #12165 )
2024-10-09 09:20:42 +08:00
Ch1y0q
17c23cd759
add llama3.2 GPU example ( #12137 )
...
* add llama3.2 GPU example
* change prompt format reference url
* update
* add Meta-Llama-3.2-1B-Instruct sample output
* update wording
2024-09-29 14:41:54 +08:00
Yuwen Hu
f71b38a994
Update MiniCPM_V_26 GPU example with save & load ( #12127 )
2024-09-26 17:40:22 +08:00
Ch1y0q
2ea13d502f
Add minicpm3 gpu example ( #12114 )
...
* add minicpm3 gpu example
* update GPU example
* update
---------
Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>
2024-09-26 13:51:37 +08:00
Jin, Qiao
2bedb17be7
Add Qwen2.5 NPU Example ( #12110 )
...
* Add Qwen2.5 NPU Example
* fix
* Merge qwen2.py and qwen2.5.py into qwen.py
* Fix description
2024-09-25 15:20:03 +08:00
Yuwen Hu
828fa01ad3
[NPU] Add mixed_precision for Qwen2 7B ( #12098 )
...
* Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct
* Small fix
* Fixed on load low bit with mixed precision
* Small fix
* Update example accordingly
* Update for default prompt
* Update base on comments
* Final fix
2024-09-20 16:36:21 +08:00
Ch1y0q
2269768e71
add internvl2 example ( #12102 )
...
* add internvl2 example
* add to README.md
* update
* add link to zh-CN readme
2024-09-20 16:31:54 +08:00
Jin, Qiao
db7500bfd4
Add Qwen2.5 GPU example ( #12101 )
...
* Add Qwen2.5 GPU example
* fix end line
* fix description
2024-09-20 15:55:57 +08:00
Ch1y0q
b4b8c3e495
add lowbit_path for generate.py, fix npu_model ( #12077 )
...
* add `lowbit_path` for `generate.py`, fix `npu_model`
* update `README.md`
2024-09-13 17:28:05 +08:00
Wang, Jian4
d703e4f127
Enable vllm multimodal minicpm-v-2-6 ( #12074 )
...
* enable minicpm-v-2-6
* add image_url readme
2024-09-13 13:28:35 +08:00
Jinhe
e78e45ee01
update NPU readme: run conhost as administrator ( #12066 )
2024-09-11 17:54:04 +08:00
Jinhe
4ca330da15
Fix NPU load error message and add minicpm npu lowbit feat ( #12064 )
...
* fix npu_model raise sym_int4 error
* add load_lowbit
* remove print&perf
2024-09-11 16:56:35 +08:00
Jinhe
32e8362da7
added minicpm cpu examples ( #12027 )
...
* minicpm cpu examples
* add link for minicpm-2
2024-09-11 15:51:21 +08:00
Zijie Li
c5fdfde1bd
fix npu-model prompt ( #12057 )
2024-09-11 10:06:45 +08:00
Ch1y0q
73a4360f3f
update lowbit path for baichuan2, qwen2, generate.py ( #12051 )
...
* update lowbit path for baichuan2, qwen2, `generate.py`
* update readme
2024-09-10 15:35:24 +08:00
Yuwen Hu
f61b1785fb
Small update to NPU example readme ( #12034 )
...
* Small update to NPU example readme
* Small fix
2024-09-06 15:54:23 +08:00
Ruonan Wang
0d04531ae0
update NPU readme of Qwen2 ( #12032 )
...
* update readme
* update broadcast
2024-09-06 15:02:39 +08:00
binbin Deng
5b18bb3c4a
Add recommend version for mtl npu ( #12024 )
2024-09-05 16:28:53 +08:00
Ch1y0q
820f8a4554
add --lowbit-path option for NPU llama example ( #12020 )
...
* add option" `--lowbit-path`
* add descriptions in `README.md` and formatting
* Update llama.py
2024-09-05 15:31:01 +08:00
Wang, Jian4
b3b2cd64b4
Support lightweight-serving glm-4v-9b ( #11994 )
...
* enable glm-4v-9b serving
* update readme
* update for no image input
2024-09-05 09:25:08 +08:00
Jinhe
164f47adbd
MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates ( #11988 )
...
* minicpm example updates
* --stream
2024-09-03 17:02:06 +08:00
Jin, Qiao
2e54f4402b
Rename MiniCPM-V-2_6 CPU example ( #11998 )
2024-09-03 16:50:42 +08:00
Jin, Qiao
65e281bb29
Add MiniCPM-V cpu example ( #11975 )
...
* Add MiniCPM-V cpu example
* fix
* fix
* fix
* fix
2024-09-02 10:17:57 +08:00
Ruonan Wang
79978e6f36
update npu multimodal readme ( #11979 )
...
* update npu readme of multimodal
* small fix
* meet comment
2024-08-30 19:02:06 +08:00
Ruonan Wang
4811a490ef
small fix ( #11978 )
...
* fix
* meet comment
2024-08-30 17:55:15 +08:00
Ruonan Wang
573c20bae6
fix npu lm_head cpu condition ( #11976 )
...
* fix
* fix
* fix
* fix stype
* fix style
* fix style
2024-08-30 17:11:26 +08:00
Ruonan Wang
60aa1a2c0f
Initial NPU support for MiniCPM-V-2_6 ( #11966 )
...
* initial pr
* update npu model
* fix
* fix kv cache type
* fix
* small fix
* fix style
* fix model id
* change inter_pp=4
* address comment
* fix
* fix style
* fix
* rebase
2024-08-30 16:34:35 +08:00
SONG Ge
158289d205
[NPU] Add initial support for minicpm-llama-v2.5 ( #11962 )
...
* add initial support for minicpm-llama-v2.5
* update impl
* add minicpm-llama3-v2.5 example
2024-08-30 16:00:33 +08:00
binbin Deng
cd077881f1
Disable lm head ( #11972 )
2024-08-30 11:05:18 +08:00
Yuwen Hu
2e49e1f8e9
Further fix for MiniCPM-V-2_6 example ( #11965 )
2024-08-29 19:14:13 +08:00
Jason Dai
431affd0a0
Update README.md ( #11964 )
2024-08-29 18:56:35 +08:00
binbin Deng
14b2c8dc32
Update qwen2-7b example script ( #11961 )
2024-08-29 18:25:17 +08:00
Yuwen Hu
7abe17d6f7
Update MiniCPM-V-2_6 Example ( #11958 )
...
* Update example scripts regarding warmup, stream generate, moudles to not convert, etc.
* Update readme accordingly
* Fix based on comments
* Small fix
* Remove n_predict
2024-08-29 18:23:48 +08:00
Yina Chen
5f7ff76ea5
update troubleshooting ( #11960 )
2024-08-29 17:44:22 +08:00
Yina Chen
882f4a5ff7
Add lnl npu driver recommend version and enable cpu_lm_head on llama3 ( #11952 )
...
* update lnl npu driver version and enable cpu_lm_head on llama3
* update
* fix style
* typo
* address comments
* update
* add qwen2-7b
2024-08-29 15:01:18 +08:00
binbin Deng
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU ( #11912 )
2024-08-29 13:34:20 +08:00
SONG Ge
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing ( #11949 )
...
* add minicpm-2b support
* update example for minicpm-2b
* add LNL NPU driver requirement in readme
2024-08-28 18:08:49 +08:00
hxsz1997
e23549f63f
Update llamaindex examples ( #11940 )
...
* modify rag.py
* update readme of gpu example
* update llamaindex cpu example and readme
* add llamaindex doc
* update note style
* import before instancing IpexLLMEmbedding
* update index in readme
* update links
* update link
* update related links
2024-08-28 14:03:44 +08:00
Zijie Li
90f692937d
Update npu baichuan2 ( #11939 )
2024-08-27 16:56:26 +08:00
Jiao Wang
b4b6ddf73c
NPU Baichuan2 Multi- Process example ( #11928 )
2024-08-27 15:25:49 +08:00
SONG Ge
a81a329a5f
[NPU] Add example for NPU multi-processing minicpm-1b model ( #11935 )
...
* add minicpm example
2024-08-27 14:57:46 +08:00
Ch1y0q
730d9ec811
Add Qwen2-audio example ( #11835 )
...
* add draft for qwen2-audio
* update example for `Qwen2-Audio`
* update
* update
* add warmup
2024-08-27 13:35:24 +08:00
Yina Chen
e246f1e258
update llama3 npu example ( #11933 )
2024-08-27 13:03:18 +08:00
binbin Deng
14dddfc0d6
Update NPU example readme ( #11931 )
2024-08-27 12:44:58 +08:00
Zijie Li
6c3eb1e1e8
refactor from_pretrained API for NPU ( #11927 )
2024-08-27 09:50:30 +08:00
binbin Deng
dd303776cf
Add troubleshooting about transpose value setting
2024-08-26 16:06:32 +08:00