Yishuo Wang
ea65e4fecc
remove falcon support and related UT ( #12656 )
2025-01-07 09:26:00 +08:00
Yina Chen
fae73eee79
[NPU] Support save npu quantized model without npu dependency ( #12647 )
...
* support save awq
* load quantized model & save npu compiled model
* fix style
* update
* fix dll load issue
* update error message
* fix style
2025-01-06 18:06:22 +08:00
Yishuo Wang
502461d836
remove unnecessary ipex kernel usage ( #12649 )
2025-01-03 16:45:24 +08:00
Yishuo Wang
9f8b134889
add ipex-llm custom kernel registration ( #12648 )
2025-01-03 16:45:04 +08:00
binbin Deng
0b377100c5
Add guide for save-load usage ( #12498 )
2025-01-03 16:30:15 +08:00
Wang, Jian4
6711a48a36
Enable internvl2-8b on vllm( #12645 )
2025-01-03 14:49:36 +08:00
Zijie Li
8fd2dcba86
Add benchmark_util for transformers >= 4.47.0 ( #12644 )
2025-01-03 10:48:29 +08:00
SONG Ge
550fa01649
[Doc] Update ipex-llm ollama troubleshooting for v0.4.6 ( #12642 )
...
* update ollama v0.4.6 troubleshooting
* update chinese ollama-doc
2025-01-02 17:28:54 +08:00
Yina Chen
8e5328e9b4
add disable opts for awq ( #12641 )
2025-01-02 15:45:22 +08:00
Xu, Shuo
62318964fa
Update llama example information ( #12640 )
...
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2025-01-02 13:48:39 +08:00
Yishuo Wang
81211fd010
remove unused code ( #12635 )
2025-01-02 13:31:09 +08:00
binbin Deng
534566e290
[NPU] Support minicpm-v with python cpp backend ( #12637 )
2025-01-02 11:13:15 +08:00
Yishuo Wang
f289f68d57
small fix ( #12634 )
2024-12-30 17:14:25 +08:00
Yishuo Wang
2d08155513
remove bmm, which is only required in ipex 2.0 ( #12630 )
2024-12-27 17:28:57 +08:00
binbin Deng
f17ccfa61a
[NPU] Fix save-load usage of minicpm models ( #12628 )
2024-12-27 15:56:46 +08:00
Yishuo Wang
c72a5db757
remove unused code again ( #12624 )
2024-12-27 14:17:11 +08:00
binbin Deng
46eeab4479
[NPU] Fix regression caused by layer_norm change ( #12627 )
2024-12-27 14:08:49 +08:00
Ruonan Wang
90f6709486
[remove pipeline examples ( #12626 )
2024-12-27 13:42:28 +08:00
Zijie Li
5f04ed7254
NPU] Update prompt format for baichuan2-pipeline ( #12625 )
2024-12-27 11:30:54 +08:00
Yishuo Wang
34dbdb8ee3
small fix ( #12623 )
2024-12-27 10:19:27 +08:00
Xu, Shuo
55ce091242
Add GLM4-Edge-V GPU example ( #12596 )
...
* Add GLM4-Edge-V examples
* polish readme
* revert wrong changes
* polish readme
* polish readme
* little polish in reference info and indent
* Small fix and sample output updates
* Update main readme
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-27 09:40:29 +08:00
binbin Deng
796ee571a5
[NPU doc] Update verified platforms ( #12621 )
2024-12-26 17:39:13 +08:00
Ruonan Wang
bbdbbb0d88
[NPU] Compatible with other third-party models like auto-round ( #12620 )
...
* support third party model
* simplify code
* fix sty;e
* fix sym int4 GW
* code refactor
* fix
2024-12-26 17:25:18 +08:00
Yishuo Wang
a9abde0b5d
support passing attn_scale to sdpa ( #12619 )
2024-12-26 16:58:09 +08:00
Shaojun Liu
40a7d2b4f0
Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments ( #12618 )
...
* run c-eval on multi-GPUs
* Update README.md
2024-12-26 15:23:32 +08:00
Zijie Li
ccc4055058
[NPU] Update prompt format for baichuan2 ( #12615 )
...
* Update baichuan2.py
* style fix
2024-12-26 11:41:37 +08:00
Yishuo Wang
1604b4ead8
small fix ( #12616 )
2024-12-26 11:35:12 +08:00
Ruonan Wang
d841e1dc0d
[NPU] update convert script based on latest usage ( #12617 )
2024-12-26 11:23:04 +08:00
Xu, Shuo
ef585d3360
Polish Readme for ModelScope-related examples ( #12603 )
2024-12-26 10:52:47 +08:00
Shaojun Liu
28737c250c
Update Dockerfile ( #12585 )
2024-12-26 10:20:52 +08:00
Yishuo Wang
a596f1ae5f
remove bigdl-llm test to fix langchain UT ( #12613 )
2024-12-26 10:17:25 +08:00
Ruonan Wang
9e895f04ec
[NPU] fix npu save ( #12614 )
...
* fix npu save
* update
2024-12-26 09:21:16 +08:00
Mingqi Hu
0477fe6480
[docs] Update doc for latest open webui: 0.4.8 ( #12591 )
...
* Update open webui doc
* Resolve comments
2024-12-26 09:18:20 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization ( #12609 )
2024-12-25 17:04:32 +08:00
Yishuo Wang
5f5ac8a856
fix llama related import ( #12611 )
2024-12-25 16:23:52 +08:00
Jason Dai
54b1d7d333
Update README.zh-CN.md ( #12610 )
2024-12-25 15:38:59 +08:00
Yishuo Wang
4e6b9d804f
add compresskv back for mistral ( #12607 )
...
* add compresskv back for mistral
* fix
* fix
2024-12-25 11:06:08 +08:00
joan726
9c9800be31
Update README.zh-CN.md ( #12570 )
2024-12-24 20:32:36 +08:00
Yishuo Wang
4135b895b3
refactor chatglm2, internlm, stablelm and qwen ( #12604 )
2024-12-24 18:18:00 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 ( #12605 )
2024-12-24 17:52:32 +08:00
binbin Deng
45f8f72a28
[NPU] Fix minicpm on MTL ( #12599 )
2024-12-24 15:37:56 +08:00
Yishuo Wang
ad2dc965c5
refactor mllama, gpt2 and internvl ( #12602 )
2024-12-24 14:18:31 +08:00
Yishuo Wang
7aaf02f602
refactor baichuan, glm4 and minicpm3 ( #12600 )
2024-12-24 14:16:30 +08:00
Zijie Li
c410d9cf73
[NPU] support asym_int4 for baichuan ( #12576 )
...
* add npu support for baichuan
* Update baichuan_mp.py
* Update baichuan_mp.py
2024-12-24 09:17:50 +08:00
Yishuo Wang
098eb335b2
refactor sd 1.5 and qwen2-vl and fix ( #12590 )
2024-12-20 17:34:55 +08:00
Yishuo Wang
b050368efc
refactor yuan2 and starcoder2 and fix ( #12589 )
2024-12-20 16:41:50 +08:00
Yishuo Wang
6ea8033635
refactor glm edge ( #12588 )
2024-12-20 15:36:57 +08:00
Xu, Shuo
b0338c5529
Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 ( #12583 )
...
* Add --modelscope option for glm-v4 and MiniCPM-V-2_6
* glm-edge
* minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-20 13:54:17 +08:00
Yishuo Wang
f3b5fad3be
refactor qwen2 and llama3 ( #12587 )
2024-12-20 13:25:25 +08:00
Shaojun Liu
51ff9ebd8a
Upgrade oneccl version to 0.0.6.3 ( #12560 )
...
* Update Dockerfile
* Update Dockerfile
* Update start-vllm-service.sh
2024-12-20 09:29:16 +08:00