Commit graph

3838 commits

Author SHA1 Message Date
Xu, Shuo
62318964fa
Update llama example information (#12640)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2025-01-02 13:48:39 +08:00
Yishuo Wang
81211fd010
remove unused code (#12635) 2025-01-02 13:31:09 +08:00
binbin Deng
534566e290
[NPU] Support minicpm-v with python cpp backend (#12637) 2025-01-02 11:13:15 +08:00
Yishuo Wang
f289f68d57
small fix (#12634) 2024-12-30 17:14:25 +08:00
Yishuo Wang
2d08155513
remove bmm, which is only required in ipex 2.0 (#12630) 2024-12-27 17:28:57 +08:00
binbin Deng
f17ccfa61a
[NPU] Fix save-load usage of minicpm models (#12628) 2024-12-27 15:56:46 +08:00
Yishuo Wang
c72a5db757
remove unused code again (#12624) 2024-12-27 14:17:11 +08:00
binbin Deng
46eeab4479
[NPU] Fix regression caused by layer_norm change (#12627) 2024-12-27 14:08:49 +08:00
Ruonan Wang
90f6709486
[remove pipeline examples (#12626) 2024-12-27 13:42:28 +08:00
Zijie Li
5f04ed7254
NPU] Update prompt format for baichuan2-pipeline (#12625) 2024-12-27 11:30:54 +08:00
Yishuo Wang
34dbdb8ee3
small fix (#12623) 2024-12-27 10:19:27 +08:00
Xu, Shuo
55ce091242
Add GLM4-Edge-V GPU example (#12596)
* Add GLM4-Edge-V examples

* polish readme

* revert wrong changes

* polish readme

* polish readme

* little polish in reference info and indent

* Small fix and sample output updates

* Update main readme

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-27 09:40:29 +08:00
binbin Deng
796ee571a5
[NPU doc] Update verified platforms (#12621) 2024-12-26 17:39:13 +08:00
Ruonan Wang
bbdbbb0d88
[NPU] Compatible with other third-party models like auto-round (#12620)
* support third party model

* simplify code

* fix sty;e

* fix sym int4 GW

* code refactor

* fix
2024-12-26 17:25:18 +08:00
Yishuo Wang
a9abde0b5d
support passing attn_scale to sdpa (#12619) 2024-12-26 16:58:09 +08:00
Shaojun Liu
40a7d2b4f0
Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments (#12618)
* run c-eval on multi-GPUs

* Update README.md
2024-12-26 15:23:32 +08:00
Zijie Li
ccc4055058
[NPU] Update prompt format for baichuan2 (#12615)
* Update baichuan2.py

* style fix
2024-12-26 11:41:37 +08:00
Yishuo Wang
1604b4ead8
small fix (#12616) 2024-12-26 11:35:12 +08:00
Ruonan Wang
d841e1dc0d
[NPU] update convert script based on latest usage (#12617) 2024-12-26 11:23:04 +08:00
Xu, Shuo
ef585d3360
Polish Readme for ModelScope-related examples (#12603) 2024-12-26 10:52:47 +08:00
Shaojun Liu
28737c250c
Update Dockerfile (#12585) 2024-12-26 10:20:52 +08:00
Yishuo Wang
a596f1ae5f
remove bigdl-llm test to fix langchain UT (#12613) 2024-12-26 10:17:25 +08:00
Ruonan Wang
9e895f04ec
[NPU] fix npu save (#12614)
* fix npu save

* update
2024-12-26 09:21:16 +08:00
Mingqi Hu
0477fe6480
[docs] Update doc for latest open webui: 0.4.8 (#12591)
* Update open webui doc

* Resolve comments
2024-12-26 09:18:20 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization (#12609) 2024-12-25 17:04:32 +08:00
Yishuo Wang
5f5ac8a856
fix llama related import (#12611) 2024-12-25 16:23:52 +08:00
Jason Dai
54b1d7d333
Update README.zh-CN.md (#12610) 2024-12-25 15:38:59 +08:00
Yishuo Wang
4e6b9d804f
add compresskv back for mistral (#12607)
* add compresskv back for mistral

* fix

* fix
2024-12-25 11:06:08 +08:00
joan726
9c9800be31
Update README.zh-CN.md (#12570) 2024-12-24 20:32:36 +08:00
Yishuo Wang
4135b895b3
refactor chatglm2, internlm, stablelm and qwen (#12604) 2024-12-24 18:18:00 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 (#12605) 2024-12-24 17:52:32 +08:00
binbin Deng
45f8f72a28
[NPU] Fix minicpm on MTL (#12599) 2024-12-24 15:37:56 +08:00
Yishuo Wang
ad2dc965c5
refactor mllama, gpt2 and internvl (#12602) 2024-12-24 14:18:31 +08:00
Yishuo Wang
7aaf02f602
refactor baichuan, glm4 and minicpm3 (#12600) 2024-12-24 14:16:30 +08:00
Zijie Li
c410d9cf73
[NPU] support asym_int4 for baichuan (#12576)
* add npu support for baichuan

* Update baichuan_mp.py

* Update baichuan_mp.py
2024-12-24 09:17:50 +08:00
Yishuo Wang
098eb335b2
refactor sd 1.5 and qwen2-vl and fix (#12590) 2024-12-20 17:34:55 +08:00
Yishuo Wang
b050368efc
refactor yuan2 and starcoder2 and fix (#12589) 2024-12-20 16:41:50 +08:00
Yishuo Wang
6ea8033635
refactor glm edge (#12588) 2024-12-20 15:36:57 +08:00
Xu, Shuo
b0338c5529
Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 (#12583)
* Add --modelscope option for glm-v4 and MiniCPM-V-2_6

* glm-edge

* minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-20 13:54:17 +08:00
Yishuo Wang
f3b5fad3be
refactor qwen2 and llama3 (#12587) 2024-12-20 13:25:25 +08:00
Shaojun Liu
51ff9ebd8a
Upgrade oneccl version to 0.0.6.3 (#12560)
* Update Dockerfile

* Update Dockerfile

* Update start-vllm-service.sh
2024-12-20 09:29:16 +08:00
Xu, Shuo
47da3c999f
Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 (#12564)
* Add --modelscope for more models

* minicpm

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-19 17:25:46 +08:00
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni (#12582) 2024-12-19 17:23:01 +08:00
binbin Deng
4e7e988f70
[NPU] Fix MTL and ARL support (#12580) 2024-12-19 16:55:30 +08:00
Yishuo Wang
80f2fdc37b
optimize new minicpm model (#12579) 2024-12-19 14:22:47 +08:00
Yishuo Wang
4540424271
optimize siglip attention again (#12578) 2024-12-19 13:40:48 +08:00
Yishuo Wang
e0921f80c1
padding mask on torch side (#12577) 2024-12-19 10:53:02 +08:00
Xu, Shuo
47e90a362f
Add --modelscope in GPU examples for glm4, codegeex2, qwen2 and qwen2.5 (#12561)
* Add --modelscope for more models

* imporve readme

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-19 10:00:39 +08:00
SONG Ge
28e81fda8e
Replace runner doc in ollama quickstart (#12575) 2024-12-18 19:05:28 +08:00
SONG Ge
f7a2bd21cf
Update ollama and llama.cpp readme (#12574) 2024-12-18 17:33:20 +08:00