Jin, Qiao
82a61b5cf3
Limit trl version in example ( #12332 )
...
* Limit trl version in example
* Limit trl version in example
2024-11-05 14:50:10 +08:00
Yuwen Hu
923d696854
Small fix to LNL performance tests ( #12333 )
2024-11-05 13:24:58 +08:00
Zijie Li
45b0d371aa
update benchmark readme ( #12323 )
...
* update benchmark readme
update new comment with memory usage included
* Update README.md
2024-11-05 08:19:08 +08:00
Yuwen Hu
e2adc974fd
Small fix to LNL performance tests ( #12331 )
2024-11-04 19:22:41 +08:00
Yuwen Hu
522cdf8e9d
Add initial support for LNL nightly performance tests ( #12326 )
...
* Add initial support for LNL nightly performance tests
* Small fix
2024-11-04 18:53:51 +08:00
Zhao Changmin
1b637e4477
Add chatglm2&3 fuse mlp ( #12328 )
...
* add chatglm fuse mlp
2024-11-04 18:04:41 +08:00
Yina Chen
94c4ce389f
[NPU] Add env to disable compile opt ( #12330 )
...
* add env to disable compile opt
* fix style
* fix style
2024-11-04 17:46:17 +08:00
Ch1y0q
e54af44ed6
Add transformers_int4_npu_pipeline_win in all-in-one benchmark ( #12325 )
...
* add transformers_int4_npu_pipeline_win
* bugfix
* bugfix: wrong actual_output_len
* fix format
* bugfix & update `README.md`
2024-11-04 16:00:20 +08:00
binbin Deng
5ee6f97d6f
[NPU L0] Add layernorm weight as const / input setting ( #12322 )
2024-11-04 15:46:24 +08:00
Chu,Youcheng
a01371f90b
Doc: update harness readme ( #12324 )
2024-11-04 14:58:54 +08:00
Yuwen Hu
4644cb640c
Perf test further fix regarding trl version ( #12321 )
2024-11-04 11:01:25 +08:00
Ruonan Wang
8fe01c9e4d
[NPU pipeline] update cmake usage of pipeline ( #12320 )
2024-11-04 10:30:03 +08:00
Kai Huang
c8679ad592
Qwen layernorm as input ( #12309 )
...
* qwen layernorm as input
* add group size
2024-11-04 09:51:15 +08:00
Yuwen Hu
94ce447794
Fix performance tests regarding trl version ( #12319 )
...
* Fix performance tests regarding trl version
* Small fix
2024-11-04 09:42:18 +08:00
Yuwen Hu
20755e8077
Small fix to all-in-one benchmark scripts ( #12317 )
2024-11-01 19:16:25 +08:00
Ch1y0q
48123af463
add npu_group_size for transformers_int4_npu_win in all-in-one benchmark api ( #12316 )
...
* add `npu_group_size` for `transformers_int4_npu_win`
small bugfix
* update
2024-11-01 18:44:27 +08:00
Zijie Li
cd5e22cee5
Update Llava GPU Example ( #12311 )
...
* update-llava-example
* add warmup
* small fix on llava example
* remove space& extra print prompt
* renew example
* small fix
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-11-01 17:06:00 +08:00
binbin Deng
f53bb4ea0b
[NPU L0] Update 1st token generation ( #12314 )
2024-11-01 17:02:07 +08:00
binbin Deng
d409d9d0eb
[NPU L0] Update streaming mode of example ( #12312 )
2024-11-01 15:38:10 +08:00
Jin, Qiao
126f95be80
Fix DPO finetuning example ( #12313 )
2024-11-01 13:29:44 +08:00
Yina Chen
05c5d0267a
[NPU] Llama2 prefill use ov sdp ( #12310 )
...
* prefill use sdp
* add param
* update
* fix style
* fix style
* meet comments
2024-11-01 11:05:20 +08:00
binbin Deng
eda764909c
Add minicpm-2b in L0 pipeline ( #12308 )
2024-11-01 09:30:01 +08:00
Yishuo Wang
b9853f98b3
fix qwen2 attention_mask slice ( #12307 )
2024-10-31 17:00:05 +08:00
Jin, Qiao
3df6195cb0
Fix application quickstart ( #12305 )
...
* fix graphrag quickstart
* fix axolotl quickstart
* fix ragflow quickstart
* fix ragflow quickstart
* fix graphrag toc
* fix comments
* fix comment
* fix comments
2024-10-31 16:57:35 +08:00
binbin Deng
4892df61c9
Add qwen2-1.5b in l0 pipeline example ( #12306 )
2024-10-31 16:44:25 +08:00
Jinhe
30f668c206
updated transformers & accelerate requirements ( #12301 )
2024-10-31 15:59:40 +08:00
Xin Qiu
97a0f7fd35
Codegeex support ( #12303 )
...
* new codegeex attn
* use kv cache
* add compress/quantize kv
* remove compress/quantize kv
* fix style check
* fix style
* fix codegeex
2024-10-31 15:28:56 +08:00
Yishuo Wang
72605c7016
fix llama3.1/3.2 quantize kv check ( #12302 )
2024-10-31 11:55:07 +08:00
Kai Huang
416c19165c
Add Qwen pipeline and example ( #12292 )
...
* support qwen pipeline
* update error msg
* style
* meet review
* minor
2024-10-31 11:25:25 +08:00
Rahul Nair
4cf1ccc43a
Update DPO EADME.md ( #12162 )
...
bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available
2024-10-31 10:56:46 +08:00
Chu,Youcheng
29400e2e75
feat: change oneccl to internal ( #12296 )
...
* feat: change oneccl
* fix: restore llama-70b
* fix: remove tab
* fix: remove extra blank
* small fix
* add comments
* fix: add a blank space
2024-10-31 09:51:43 +08:00
Zijie Li
6f22133efc
Update AWQ and GPTQ GPU example ( #12300 )
2024-10-31 09:35:31 +08:00
Yina Chen
0763268e4c
[NPU]Qwen2 groupwise performance opt ( #12299 )
...
* qwen2 gw performance opt
* remove debug
2024-10-30 17:40:21 +08:00
binbin Deng
41b8064554
Support minicpm-1B in level0 pipeline ( #12297 )
2024-10-30 17:21:47 +08:00
Jinhe
46d8300f6b
bugfix for qlora finetuning on GPU ( #12298 )
...
* bugfix for qlora 100 step error
* indent fix
* annotation fix
2024-10-30 16:54:10 +08:00
Yina Chen
70037ad55f
Groupwise prefill optimization ( #12291 )
...
* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3
* slice -> split
* remove debug
* fix style
* add dpu
2024-10-30 14:59:45 +08:00
Yishuo Wang
540eaeb12c
refactor attention_softmax ( #12295 )
2024-10-30 13:20:50 +08:00
Ruonan Wang
2b2cb9c693
[NPU pipeline] Support save & load and update examples ( #12293 )
...
* support save & load, update llama examples
* update baichuan2 example
* update readme
2024-10-30 10:02:00 +08:00
Yuwen Hu
5a15098835
Initial support for quantized forward on CPU when quantization_group_size=0 ( #12282 )
...
* Initial support for quantized forward on CPU when quantization_group_size=0
* Style fix
* Style fix
* Small fix
* Small fix
2024-10-29 19:40:17 +08:00
binbin Deng
3feb58d1e4
Support baichuan2 for level0 pipeline ( #12289 )
2024-10-29 19:24:16 +08:00
Zhao Changmin
546f455e8e
Patch sdpa check function in specific module attributes table ( #12285 )
2024-10-29 18:41:09 +08:00
Jun Wang
3700e81977
[fix] vllm-online-benchmark first token latency error ( #12271 )
2024-10-29 17:54:36 +08:00
joan726
0bbc04b5ec
Add ollama_quickstart.zh-CN.md ( #12284 )
...
* Add ollama_quickstart.zh-CN.md
Add ollama_quickstart.zh-CN.md
* Update ollama_quickstart.zh-CN.md
Add Chinese and English switching
* Update ollama_quickstart.md
Add Chinese and English switching
* Update README.zh-CN.md
Modify the related link to ollama_quickstart.zh-CN.md
* Update ollama_quickstart.zh-CN.md
Modified based on comments.
* Update ollama_quickstart.zh-CN.md
Modified based on comments
2024-10-29 15:12:44 +08:00
Ruonan Wang
821b0033ed
[NPU L0] update layernorm & code refactor ( #12287 )
...
* update layernorm & code refactor
* fix style
* add common utils
* change to Pool()
* remove print
2024-10-29 15:01:45 +08:00
Yina Chen
4467645088
[NPU] Support l0 Llama groupwise ( #12276 )
...
* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3
2024-10-28 17:06:55 +08:00
Jason Dai
1cef0c4948
Update README.md ( #12286 )
2024-10-28 17:06:16 +08:00
Guancheng Fu
67014cb29f
Add benchmark_latency.py to docker serving image ( #12283 )
2024-10-28 16:19:59 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline ( #12279 )
...
* first commit
* update example
* fix style
* update example
* embedding as const
* fix generate
* code refactor
* meet code review
* fix style
* change max_output_len to max_context_len
* fix all-in-one
* fix example
* add check for new tokens
2024-10-28 16:05:49 +08:00
Yuwen Hu
42a528ded9
Small update to MTL iGPU Linux Prerequisites installation guide ( #12281 )
...
* Small update MTL iGPU Linux Prerequisites installation guide
* Small fix
2024-10-28 14:12:07 +08:00
Yuwen Hu
16074ae2a4
Update Linux prerequisites installation guide for MTL iGPU ( #12263 )
...
* Update Linux prerequisites installation guide for MTL iGPU
* Further link update
* Small fixes
* Small fix
* Update based on comments
* Small fix
* Make oneAPI installation a shared section for both MTL iGPU and other GPU
* Small fix
* Small fix
* Clarify description
2024-10-28 09:27:14 +08:00