Yishuo Wang
e54c428d30
add bf16/fp16 fuse mlp support ( #9726 )
2023-12-20 10:40:45 +08:00
Heyang Sun
612651cb5d
fix typo ( #9723 )
2023-12-20 09:41:59 +08:00
WeiguangHan
3aa8b66bc3
LLM: remove starcoder-15.5b model temporarily ( #9720 )
2023-12-19 20:14:46 +08:00
Yishuo Wang
522cf5ed82
[LLM] Improve chatglm2/3 rest token performance with long context ( #9716 )
2023-12-19 17:29:38 +08:00
Yishuo Wang
f2e6abb563
fix mlp batch size check ( #9718 )
2023-12-19 14:22:22 +08:00
Heyang Sun
1fa7793fc0
Load Mixtral GGUF Model ( #9690 )
...
* Load Mixtral GGUF Model
* refactor
* fix empty tensor when to cpu
* update gpu and cpu readmes
* add dtype when set tensor into module
2023-12-19 13:54:38 +08:00
Qiyuan Gong
d0a3095b97
[LLM] IPEX auto importer ( #9706 )
...
* IPEX auto importer and get_ipex_version.
* Add BIGDL_IMPORT_IPEX to control auto import, default is false.
2023-12-19 13:39:38 +08:00
Yang Wang
f4fb58d99c
fusing qkv project and rope ( #9612 )
...
* Try fusing qkv project and rope
* add fused mlp
* fuse append cache
* fix style and clean up code
* clean up
2023-12-18 16:45:00 -08:00
Kai Huang
4c112ee70c
Rename qwen in model name for arc perf test ( #9712 )
2023-12-18 20:34:31 +08:00
Cengguang Zhang
4d22add4af
LLM: fix qwen efficiency issue in perf-test.
2023-12-18 18:32:54 +08:00
Ruonan Wang
8ed89557e5
LLM: add mlp optimization of mixtral ( #9709 )
2023-12-18 16:59:52 +08:00
Chen, Zhentao
b3647507c0
Fix harness workflow ( #9704 )
...
* error when larger than 0.001
* fix env setup
* fix typo
* fix typo
2023-12-18 15:42:10 +08:00
binbin Deng
12df70953e
LLM: add resume_from_checkpoint related section ( #9705 )
2023-12-18 12:27:02 +08:00
Xin Qiu
320110d158
handle empty fused norm result ( #9688 )
...
* handle empty fused norm result
* remove fast_rms_norm
* fix style
2023-12-18 09:56:11 +08:00
Ziteng Zhang
67cc155771
[LLM] Correct chat format of llama and add llama_stream_chat in chat.py
...
* correct chat format of llama
* add llama_stream_chat
2023-12-15 16:36:46 +08:00
Ziteng Zhang
0d41b7ba7b
[LLM] Correct chat format & add stop words for chatglm3 in chat.py
...
* correct chat format of chatglm3
* correct stop words of chatglm3
2023-12-15 16:35:17 +08:00
Ziteng Zhang
d57efd8eb9
[LM] Add stop_word for Qwen model and correct qwen chat format in chat.py ( #9642 )
...
* add stop words list for qwen
* change qwen chat format
2023-12-15 14:53:58 +08:00
SONG Ge
d5b81af7bd
Support mixtral attention optimization on transformers-v4.36.0 ( #9674 )
...
* add example code to support mistral/mixtral attention on transformers v4.36.0
* update
* style fix
* add update for seen-tokens
* support mixtral
* rm mistral change
* small fix
* add more comments and remove use_cache part
---------
Co-authored-by: plusbang <binbin1.deng@intel.com>
2023-12-15 14:30:23 +08:00
Cengguang Zhang
adbef56001
LLM: update qwen attention forward. ( #9695 )
...
* feat: update qwen attention forward.
* fix: style.
2023-12-15 14:06:15 +08:00
Wang, Jian4
b8437a1c1e
LLM: Add gguf mistral model support ( #9691 )
...
* add mistral support
* need to upgrade transformers version
* update
2023-12-15 13:37:39 +08:00
Wang, Jian4
496bb2e845
LLM: Support load BaiChuan model family gguf model ( #9685 )
...
* support baichuan model family gguf model
* update gguf generate.py
* add verify models
* add support model_family
* update
* update style
* update type
* update readme
* update
* remove support model_family
2023-12-15 13:34:33 +08:00
Lilac09
3afed99216
fix path issue ( #9696 )
2023-12-15 11:21:49 +08:00
Jason Dai
37f509bb95
Update readme ( #9692 )
2023-12-14 19:50:21 +08:00
WeiguangHan
1f0245039d
LLM: check the final csv results for arc perf test ( #9684 )
...
* LLM: check the final csv results for arc perf test
* delete useless python script
* change threshold
* revert the llm_performance_tests.yml
2023-12-14 19:46:08 +08:00
Yishuo Wang
9a330bfc2b
fix fuse mlp when using q5_0 or fp8 ( #9689 )
2023-12-14 16:16:05 +08:00
Yuwen Hu
82ac2dbf55
[LLM] Small fixes for win igpu test for ipex 2.1 ( #9686 )
...
* Fixes to install for igpu performance tests
* Small update for core performance tests model lists
2023-12-14 15:39:51 +08:00
WeiguangHan
3e8d198b57
LLM: add eval func ( #9662 )
...
* Add eval func
* add left eval
2023-12-14 14:59:02 +08:00
Ziteng Zhang
21c7503a42
[LLM] Correct prompt format of Qwen in generate.py ( #9678 )
...
* Change qwen prompt format to chatml
2023-12-14 14:01:30 +08:00
Qiyuan Gong
223c9622f7
[LLM] Mixtral CPU examples ( #9673 )
...
* Mixtral CPU PyTorch and hugging face examples, based on #9661 and #9671
2023-12-14 10:35:11 +08:00
Xin Qiu
5e46e0e5af
fix baichuan2-7b 1st token performance regression on xpu ( #9683 )
...
* fix baichuan2-7b 1st token performance regression
* add comments
* fix style
2023-12-14 09:58:32 +08:00
ZehuaCao
877229f3be
[LLM]Add Yi-34B-AWQ to verified AWQ model. ( #9676 )
...
* verfiy Yi-34B-AWQ
* update
2023-12-14 09:55:47 +08:00
binbin Deng
68a4be762f
remove disco mixtral, update oneapi version ( #9671 )
2023-12-13 23:24:59 +08:00
Ruonan Wang
1456d30765
LLM: add dot to option name in setup ( #9682 )
2023-12-13 20:57:27 +08:00
Yuwen Hu
cbdd49f229
[LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 ( #9679 )
...
* Change igpu win tests for ipex 2.1 and oneapi 2024.0
* Qwen model repo id updates; updates model list for 512-64
* Add .eval for win igpu all-in-one benchmark for best performance
2023-12-13 18:52:29 +08:00
Mingyu Wei
16febc949c
[LLM] Add exclude option in all-in-one performance test ( #9632 )
...
* add exclude option in all-in-one perf test
* update arc-perf-test.yaml
* Exclude in_out_pairs in main function
* fix some bugs
* address Kai's comments
* define excludes at the beginning
* add bloomz:2048 to exclude
2023-12-13 18:13:06 +08:00
Ruonan Wang
9b9cd51de1
LLM: update setup to provide new install option to support ipex 2.1 & oneapi 2024 ( #9647 )
...
* update setup
* default to 2.0 now
* meet code review
2023-12-13 17:31:56 +08:00
Yishuo Wang
09ca540f9b
use fuse mlp in qwen ( #9672 )
2023-12-13 17:20:08 +08:00
Ruonan Wang
c7741c4e84
LLM: update moe block convert to optimize rest token latency of Mixtral ( #9669 )
...
* update moe block convert
* further accelerate final_hidden_states
* fix style
* fix style
2023-12-13 16:17:06 +08:00
ZehuaCao
503880809c
verfiy codeLlama ( #9668 )
2023-12-13 15:39:31 +08:00
Xiangyu Tian
1c6499e880
[LLM] vLLM: Support Mixtral Model ( #9670 )
...
Add Mixtral support for BigDL vLLM.
2023-12-13 14:44:47 +08:00
Ruonan Wang
dc5b1d7e9d
LLM: integrate sdp kernel for FP16 rest token inference on GPU [DG2/ATSM] ( #9633 )
...
* integrate sdp
* update api
* fix style
* meet code review
* fix
* distinguish mtl from arc
* small fix
2023-12-13 11:29:57 +08:00
Qiyuan Gong
5b0e7e308c
[LLM] Add support for empty activation ( #9664 )
...
* Add support for empty activation, e.g., [0, 4096]. Empty activation is allowed by PyTorch.
* Add comments.
2023-12-13 11:07:45 +08:00
SONG Ge
284e7697b1
[LLM] Optimize ChatGLM2 kv_cache to support beam_search on ARC ( #9579 )
...
* optimize kv_cache to support beam_search on Arc
* correctness test update
* fix query_length issue
* simplify implementation
* only enable the optimization on gpu device
* limit the beam_search support only enabled with gpu device and batch_size > 1
* add comments for beam_search case and revert ut change
* meet comments
* add more comments to describe the differece between multi-cases
2023-12-13 11:02:14 +08:00
Heyang Sun
c64e2248ef
fix str returned by get_int_from_str rather than expected int ( #9667 )
2023-12-13 11:01:21 +08:00
binbin Deng
bf1bcf4a14
add official Mixtral model support ( #9663 )
2023-12-12 22:27:07 +08:00
Ziteng Zhang
8931f2eb62
[LLM] Fix transformer qwen size mismatch and rename causal_mask ( #9655 )
...
* Fix size mismatching caused by context_layer
* Change registered_causal_mask to causal_mask
2023-12-12 20:57:40 +08:00
binbin Deng
2fe38b4b9b
LLM: add mixtral GPU examples ( #9661 )
2023-12-12 20:26:36 +08:00
Yuwen Hu
968d99e6f5
Remove empty cache between each iteration of generation ( #9660 )
2023-12-12 17:24:06 +08:00
Xin Qiu
0e639b920f
disable test_optimized_model.py temporarily due to out of memory on A730M(pr validation machine) ( #9658 )
...
* disable test_optimized_model.py
* disable seq2seq
2023-12-12 17:13:52 +08:00
binbin Deng
59ce86d292
LLM: support optimize_model=True for Mixtral model ( #9657 )
2023-12-12 16:41:26 +08:00