-
659d15defc
Fix wrong attention mask and garbage output for
inputs_embeds inputs during lookup generation (#11989)
Yuwen Hu
2024-09-02 19:09:12 +0800
-
2f3d1bd0ec
hotfix qwen2-7b weight setting (#11991)
binbin Deng
2024-09-02 18:11:08 +0800
-
a40ea7038d
Fix AttributeError of qwen2-1.5B (#11990)
binbin Deng
2024-09-02 17:55:10 +0800
-
c48817bd43
Support Qwen2-7b MLP in int4 and transpose_value_cache=True (#11968)
Yang Wang
2024-09-01 23:37:44 -0700
-
65e281bb29
Add MiniCPM-V cpu example (#11975)
Jin, Qiao
2024-09-02 10:17:57 +0800
-
79978e6f36
update npu multimodal readme (#11979)
Ruonan Wang
2024-08-30 04:02:06 -0700
-
4811a490ef
small fix (#11978)
Ruonan Wang
2024-08-30 02:55:15 -0700
-
573c20bae6
fix npu lm_head cpu condition (#11976)
Ruonan Wang
2024-08-30 02:11:26 -0700
-
60aa1a2c0f
Initial NPU support for MiniCPM-V-2_6 (#11966)
Ruonan Wang
2024-08-30 01:34:35 -0700
-
158289d205
[NPU] Add initial support for minicpm-llama-v2.5 (#11962)
SONG Ge
2024-08-30 16:00:33 +0800
-
ae7302a654
add gptq option for ppl test (#11921)
Chu,Youcheng
2024-08-30 13:43:48 +0800
-
1e8c87050f
fix model path (#11973)
Shaojun Liu
2024-08-30 13:28:28 +0800
-
e895e1b4c5
modification on llamacpp readme after Ipex-llm latest update (#11971)
Jinhe
2024-08-30 11:36:45 +0800
-
cd077881f1
Disable lm head (#11972)
binbin Deng
2024-08-30 11:05:18 +0800
-
7d103417b8
Fix glm4-9b-chat nan error on vllm 0.3.3 (#11970)
Wang, Jian4
2024-08-30 09:50:18 +0800
-
77b04efcc5
add notes for
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS (#11936)
Ch1y0q
2024-08-30 09:26:47 +0800
-
fbf088f61e
remove obselete npu code (#11967)
Yang Wang
2024-08-29 14:16:44 -0700
-
a9e485eb1b
Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963)
Yuwen Hu
2024-08-29 19:22:09 +0800
-
2e49e1f8e9
Further fix for MiniCPM-V-2_6 example (#11965)
Yuwen Hu
2024-08-29 19:14:13 +0800
-
431affd0a0
Update README.md (#11964)
Jason Dai
2024-08-29 18:56:35 +0800
-
14b2c8dc32
Update qwen2-7b example script (#11961)
binbin Deng
2024-08-29 18:25:17 +0800
-
7abe17d6f7
Update MiniCPM-V-2_6 Example (#11958)
Yuwen Hu
2024-08-29 18:23:48 +0800
-
6fc9340d53
restore ollama webui quickstart (#11955)
Jinhe
2024-08-29 17:53:19 +0800
-
5f7ff76ea5
update troubleshooting (#11960)
Yina Chen
2024-08-29 12:44:22 +0300
-
882f4a5ff7
Add lnl npu driver recommend version and enable cpu_lm_head on llama3 (#11952)
Yina Chen
2024-08-29 10:01:18 +0300
-
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU (#11912)
binbin Deng
2024-08-29 13:34:20 +0800
-
63ac5f64bb
Refactor NPU baichuan multiple-process (#11945)
Jiao Wang
2024-08-28 11:33:40 -0700
-
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing (#11949)
SONG Ge
2024-08-28 18:08:49 +0800
-
0fbb10259a
use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953)
Yishuo Wang
2024-08-28 17:35:05 +0800
-
0a7bd274e2
Add vllm awq loading logic (#11950)
Guancheng Fu
2024-08-28 16:46:18 +0800
-
b38fb67bec
[NPU] lm head to cpu (#11943)
Yina Chen
2024-08-28 11:34:07 +0300
-
ec67ee7177
added accelerate version specification in open webui quickstart(#11948)
Jinhe
2024-08-28 15:02:39 +0800
-
e23549f63f
Update llamaindex examples (#11940)
hxsz1997
2024-08-28 09:03:44 +0300
-
23f51f87f0
update tag to 2.2.0-SNAPSHOT (#11947)
Shaojun Liu
2024-08-28 09:20:32 +0800
-
460bc96d32
update version of llama.cpp / ollama (#11930)
Ruonan Wang
2024-08-27 06:21:44 -0700
-
bec00e2015
Improve baichuan2 NPU performance (#11942)
binbin Deng
2024-08-27 18:37:08 +0800
-
90f692937d
Update npu baichuan2 (#11939)
Zijie Li
2024-08-27 16:56:26 +0800
-
7f7f6c89f5
Quick fix benchmark script (#11938)
binbin Deng
2024-08-27 15:29:27 +0800
-
b4b6ddf73c
NPU Baichuan2 Multi- Process example (#11928)
Jiao Wang
2024-08-27 00:25:49 -0700
-
e211a5b076
update minicpm to meet latest refactor (#11937)
SONG Ge
2024-08-27 15:08:01 +0800
-
a81a329a5f
[NPU] Add example for NPU multi-processing minicpm-1b model (#11935)
SONG Ge
2024-08-27 14:57:46 +0800
-
7c8c9a0670
Update benchmark script for NPU (#11932)
binbin Deng
2024-08-27 14:41:14 +0800
-
730d9ec811
Add Qwen2-audio example (#11835)
Ch1y0q
2024-08-27 13:35:24 +0800
-
b11b28e9a9
update CORE_XE_VERSION to 2.6.0 (#11929)
Shaojun Liu
2024-08-27 13:10:13 +0800
-
e246f1e258
update llama3 npu example (#11933)
Yina Chen
2024-08-27 08:03:18 +0300
-
14dddfc0d6
Update NPU example readme (#11931)
binbin Deng
2024-08-27 12:44:58 +0800
-
6c3eb1e1e8
refactor from_pretrained API for NPU (#11927)
Zijie Li
2024-08-27 09:50:30 +0800
-
7ca557aada
LLM: Fix vLLM CPU convert error (#11926)
Xiangyu Tian
2024-08-27 09:22:19 +0800
-
5a8fc1baa2
update troubleshooting for llama.cpp and ollama (#11890)
Ch1y0q
2024-08-26 20:55:23 +0800
-
c1d07bc626
Support streaming for lookup generation (#11922)
Yuwen Hu
2024-08-26 19:33:31 +0800
-
a0bbd8e28d
All-in-one benchmark update regarding performance mode for input length threshold (#11920)
Yuwen Hu
2024-08-26 18:52:13 +0800
-
019f725d4d
[NPU] Add support for running mp minicpm model on npu (#11909)
SONG Ge
2024-08-26 17:52:55 +0800
-
dd303776cf
Add troubleshooting about transpose value setting
binbin Deng
2024-08-26 16:06:32 +0800
-
e5dc4e9123
disable outdated scheduled workflow (#11915)
Shaojun Liu
2024-08-24 07:17:42 +0800
-
24c279e0ae
Update
IPEX_LLM_PERFORMANCE_MODE with input length threshold (#11908)
Yuwen Hu
2024-08-23 20:49:15 +0800
-
303a090a6b
Add lm_head optimization on NPU (#11903)
binbin Deng
2024-08-23 15:51:07 +0800
-
23631cd357
disable lm_head opt for baichuan2-13b (#11905)
Yina Chen
2024-08-23 10:39:47 +0300
-
4cf640c548
update docker image tag to 2.2.0-SNAPSHOT (#11904)
Shaojun Liu
2024-08-23 13:57:41 +0800
-
650e6e6ce4
Merge pull request #11891 from hxsz1997/baichuan2-compresskv
hxsz1997
2024-08-23 06:09:58 +0300
-
-
4a61f7d20d
update mlp of llama (#11897)
Ruonan Wang
2024-08-22 05:34:53 -0700
-
420ce7d164
Fix non-stop at eos token problem for lookup generation (#11896)
Yuwen Hu
2024-08-22 18:55:59 +0800
-
4cf03d6212
update baichuan-7b
Huang, Xinshengzi
2024-08-22 18:16:33 +0800
-
794abe2ce8
update npu-readme (#11900)
Zijie Li
2024-08-22 17:49:35 +0800
-
278b191dc1
Fix optimize lm head error (#11899)
Guancheng Fu
2024-08-22 17:45:26 +0800
-
c5b51d41fb
Update pypi tag to 2.2.0.dev0 (#11895)
Shaojun Liu
2024-08-22 16:48:09 +0800
-
18662dca1c
change 5 pytorch/huggingface models to fp16 (#11894)
Jinhe
2024-08-22 16:12:09 +0800
-
5c4ed00593
Add lightweight-serving whisper asr example (#11847)
Wang, Jian4
2024-08-22 15:46:28 +0800
-
eb1e65f8a9
add comment
Huang, Xinshengzi
2024-08-22 15:14:47 +0800
-
a2be3d7501
add comment of compress kv in attention forward
Huang, Xinshengzi
2024-08-22 15:11:55 +0800
-
a8e2573421
added tokenization file for codegeex2-6b in pytorch-models(#11875)
Jinhe
2024-08-22 14:37:56 +0800
-
ce7de77085
add comment of change in model forward
Huang, Xinshengzi
2024-08-22 14:29:27 +0800
-
42398a0045
add comment
Huang, Xinshengzi
2024-08-22 13:17:13 +0800
-
48a827aa07
fix typos
Huang, Xinshengzi
2024-08-22 11:35:47 +0800
-
8a5df93de2
fix typos
Huang, Xinshengzi
2024-08-22 11:33:07 +0800
-
01ed397e7a
fix typos
Huang, Xinshengzi
2024-08-22 11:31:25 +0800
-
c6ed1c412d
fix typos
Huang, Xinshengzi
2024-08-22 11:26:49 +0800
-
2a0aa9271b
fix typos
Huang, Xinshengzi
2024-08-22 11:23:22 +0800
-
4adadddbbc
fix typos
Huang, Xinshengzi
2024-08-22 11:12:23 +0800
-
bac98baab9
Make performance test install specific ipex-llm version from pypi (#11892)
Yuwen Hu
2024-08-22 11:10:12 +0800
-
6a5ca17afc
fix typoes
Huang, Xinshengzi
2024-08-22 11:09:58 +0800
-
72a7bf624b
Support qwen2-1.5b with fused decoderlayer optimization on NPU (#11888)
binbin Deng
2024-08-22 11:09:12 +0800
-
6bb9035788
fix typos
Huang, Xinshengzi
2024-08-22 11:08:48 +0800
-
86248b0505
add compress_kv for baichuan2
Huang, Xinshengzi
2024-08-22 10:59:08 +0800
-
bdbe995b01
Update README.md (#11889)
Zijie Li
2024-08-22 09:40:16 +0800
-
cc27321441
support chatglm4 in lookup (#11855)
Yina Chen
2024-08-21 10:53:17 +0300
-
0236de3ac2
set IPEX_LLM_LAST_LM_HEAD=1 as default (#11885)
Yina Chen
2024-08-21 10:06:12 +0300
-
8c5c7f32dd
Update doc for running npu generate example with ipex-llm[npu] (#11876)
SONG Ge
2024-08-21 13:45:29 +0800
-
209d42ab79
Refactor npu mp to make it easier to integrate new models (#11873)
Yang Wang
2024-08-20 20:58:47 -0700
-
537c0d2767
fix vllm qwen2 models (#11879)
Guancheng Fu
2024-08-21 11:05:24 +0800
-
bd1e490d62
fix phi3 (#11878)
Yishuo Wang
2024-08-21 10:31:41 +0800
-
eab6f6dde4
Spr perf small fix (#11874)
Yuwen Hu
2024-08-21 09:35:26 +0800
-
37106a877c
igpu performance test smal fix (#11872)
Yuwen Hu
2024-08-21 03:09:14 +0800
-
bdaeee1d63
Fix run_decoders bug (#11871)
Yang Wang
2024-08-20 12:04:59 -0700
-
32f0a77846
feat: update readme for ppl test (#11865)
Chu,Youcheng
2024-08-20 20:13:54 +0800
-
5df00869de
Update local import for ppl (#11866)
RyuKosei
2024-08-20 18:50:00 +0800
-
c3c058373f
Update compresskv model forward type logic (#11868)
Yina Chen
2024-08-20 13:11:37 +0300
-
3ee194d983
Pytorch models transformers version update (#11860)
Jinhe
2024-08-20 18:01:42 +0800
-
0d58c2fdf9
Update performance test regarding updated default
transformers==4.37.0 (#11869)
Yuwen Hu
2024-08-20 17:59:28 +0800
-
5e8286f72d
Update
ipex-llm default transformers version to 4.37.0 (#11859)
Yuwen Hu
2024-08-20 17:37:58 +0800
-
d4ee0a89f3
optimize phi3 memory usage (#11867)
Yishuo Wang
2024-08-20 17:32:51 +0800