Commit graph

196 commits

Author SHA1 Message Date
Yang Wang
118249b011 support transformers 4.34+ for llama (#9229) 2023-10-19 22:36:30 -07:00
Chen, Zhentao
5850241423 correct Readme GPU example and API docstring (#9225)
* update readme to correct GPU usage

* update from_pretrained supported low bit options

* fix stype check
2023-10-19 16:08:47 +08:00
Yang Wang
b0ddde0410 Fix removing convert dtype bug (#9216)
* Fix removing convert dtype bug

* fix style
2023-10-18 11:24:22 -07:00
Ruonan Wang
942d6418e7 LLM: fix chatglm kv cache (#9215) 2023-10-18 19:09:53 +08:00
SONG Ge
0765f94770 [LLM] Optimize kv_cache for mistral model family (#9189)
* add kv_cache optimization for mistral model

* kv_cache optimize for mistral

* update stylr

* update
2023-10-18 15:13:37 +08:00
Ruonan Wang
3555ebc148 LLM: fix wrong length in gptj kv_cache optimization (#9210)
* fix wrong length in gptj kv cache

* update
2023-10-18 14:59:02 +08:00
Shengsheng Huang
6dad8d16df optimize NormHead for Baichuan2 (#9205)
* optimize NormHead for Baichuan2

* fix ut and change name

* rename functions
2023-10-18 14:05:07 +08:00
Ruonan Wang
09815f7064 LLM: fix RMSNorm optimization of Baichuan2-13B/Baichuan-13B (#9204)
* fix rmsnorm of baichuan2-13B

* update baichuan1-13B too

* fix style
2023-10-17 18:40:34 +08:00
Ruonan Wang
c0497ab41b LLM: support kv_cache optimization for Qwen-VL-Chat (#9193)
* dupport qwen_vl_chat

* fix style
2023-10-17 13:33:56 +08:00
binbin Deng
1cd9ab15b8 LLM: fix ChatGLMConfig check (#9191) 2023-10-17 11:52:56 +08:00
Yang Wang
7160afd4d1 Support XPU DDP training and autocast for LowBitMatmul (#9167)
* support autocast in low bit matmul

* Support XPU DDP training

* fix  amp
2023-10-16 20:47:19 -07:00
Ruonan Wang
77afb8796b LLM: fix convert of chatglm (#9190) 2023-10-17 10:48:13 +08:00
dingbaorong
af3b575c7e expose modules_to_not_convert in optimize_model (#9180)
* expose modules_to_not_convert in optimize_model

* some fixes
2023-10-17 09:50:26 +08:00
Cengguang Zhang
5ca8a851e9 LLM: add fuse optimization for Mistral. (#9184)
* add fuse optimization for mistral.

* fix.

* fix

* fix style.

* fix.

* fix error.

* fix style.

* fix style.
2023-10-16 16:50:31 +08:00
Jiao Wang
49e1381c7f update rope (#9155) 2023-10-15 21:51:45 -07:00
binbin Deng
a164c24746 LLM: add kv_cache optimization for chatglm2-6b-32k (#9165) 2023-10-16 10:43:15 +08:00
Yang Wang
7a2de00b48 Fixes for xpu Bf16 training (#9156)
* Support bf16 training

* Use a stable transformer version

* remove env

* fix style
2023-10-14 21:28:59 -07:00
Cengguang Zhang
51a133de56 LLM: add fuse rope and norm optimization for Baichuan. (#9166)
* add fuse rope optimization.

* add rms norm optimization.
2023-10-13 17:36:52 +08:00
Cengguang Zhang
433f408081 LLM: Add fuse rope and norm optimization for Aquila. (#9161)
* add fuse norm optimization.

* add fuse rope optimization
2023-10-13 14:18:37 +08:00
SONG Ge
e7aa67e141 [LLM] Add rope optimization for internlm (#9159)
* add rope and norm optimization for internlm and gptneox

* revert gptneox back and split with pr#9155 #

* add norm_forward

* style fix

* update

* update
2023-10-13 14:18:28 +08:00
Ruonan Wang
b8aee7bb1b LLM: Fix Qwen kv_cache optimization (#9148)
* first commit

* ut pass

* accelerate rotate half by using common util function

* fix style
2023-10-12 15:49:42 +08:00
binbin Deng
69942d3826 LLM: fix model check before attention optimization (#9149) 2023-10-12 15:21:51 +08:00
binbin Deng
eb3fb18eb4 LLM: improve PyTorch API doc (#9128) 2023-10-11 15:03:39 +08:00
Zhao Changmin
1709beba5b LLM: Explicitly close pickle file pointer before removing temporary directory (#9120)
* fp close
2023-10-10 14:57:23 +08:00
binbin Deng
e4d1457a70 LLM: improve transformers style API doc (#9113) 2023-10-10 09:31:00 +08:00
Zhao Changmin
edccfb2ed3 LLM: Check model device type (#9092)
* check model device
2023-10-09 15:49:15 +08:00
Yina Chen
4c4f8d1663 [LLM]Fix Arc falcon abnormal output issue (#9096)
* update

* update

* fix error & style

* fix style

* update train

* to input_seq_size
2023-10-09 15:09:37 +08:00
Zhao Changmin
548e4dd5fe LLM: Adapt transformers models for optimize model SL (#9022)
* LLM: Adapt transformers model for SL
2023-10-09 11:13:44 +08:00
Ruonan Wang
f64257a093 LLM: basic api support for esimd fp16 (#9067)
* basic api support for fp16

* fix style

* fix

* fix error and style

* fix style

* meet code review

* update based on comments
2023-10-09 11:05:17 +08:00
Xin Qiu
b3e94a32d4 change log4error import (#9098) 2023-10-08 09:23:28 +08:00
Kai Huang
78ea7ddb1c Combine apply_rotary_pos_emb for gpt-neox (#9074) 2023-10-07 16:27:46 +08:00
Yang Wang
36dd4afd61 Fix llama when rope scaling is not None (#9086)
* Fix llama when rope scaling is not None

* fix style

* fix style
2023-10-06 13:27:37 -07:00
Yang Wang
fcb1c618a0 using bigdl-llm fused rope for llama (#9066)
* optimize llama xpu rope

* fix bug

* fix style

* refine append cache

* remove check

* do not cache cos sin

* remove unnecessary changes

* clean up

* fix style

* check for training
2023-10-06 09:57:29 -07:00
Jiao Wang
aefa5a5bfe Qwen kv cache (#9079)
* qwen and aquila

* update

* update

* style
2023-10-05 11:59:17 -07:00
Jiao Wang
d5ca1f32b6 Aquila KV cache optimization (#9080)
* update

* update

* style
2023-10-05 11:10:57 -07:00
Yang Wang
88565c76f6 add export merged model example (#9018)
* add export merged model example

* add sources

* add script

* fix style
2023-10-04 21:18:52 -07:00
Yang Wang
0cd8f1c79c Use ipex fused rms norm for llama (#9081)
* also apply rmsnorm

* fix cpu
2023-10-04 21:04:55 -07:00
Cengguang Zhang
fb883100e7 LLM: support chatglm-18b convert attention forward in benchmark scripts. (#9072)
* add chatglm-18b convert.

* fix if statement.

* fix
2023-09-28 14:04:52 +08:00
Yishuo Wang
6de2189e90 [LLM] fix chatglm main choice (#9073) 2023-09-28 11:23:37 +08:00
Cengguang Zhang
b4a1266ef0 [WIP] LLM: add kv cache support for internlm. (#9036)
* LLM: add kv cache support for internlm

* add internlm apply_rotary_pos_emb

* fix.

* fix style.
2023-09-25 14:16:59 +08:00
Ruonan Wang
975da86e00 LLM: fix gptneox kv cache (#9044) 2023-09-25 13:03:57 +08:00
Jiao Wang
028a6d9383 MPT model optimize for long sequence (#9020)
* mpt_long_seq

* update

* update

* update

* style

* style2

* update
2023-09-21 21:27:23 -07:00
Ruonan Wang
b943d73844 LLM: refactor kv cache (#9030)
* refactor utils

* meet code review; update all models

* small fix
2023-09-21 21:28:03 +08:00
Cengguang Zhang
868511cf02 LLM: fix kv cache issue of bloom and falcon. (#9029) 2023-09-21 18:12:20 +08:00
Ruonan Wang
bf51ec40b2 LLM: Fix empty cache (#9024)
* fix

* fix

* update example
2023-09-21 17:16:07 +08:00
Yina Chen
714884414e fix error (#9025) 2023-09-21 16:42:11 +08:00
SONG Ge
fa47967583 [LLM] Optimize kv_cache for gptj model family (#9010)
* optimize gptj model family attention

* add license and comment for dolly-model

* remove xpu mentioned

* remove useless info

* code sytle

* style fix

* code style in gptj fix

* remove gptj arch

* move apply_rotary_pos_emb into utils

* kv_seq_length update

* use hidden_states instead of query layer to reach batch size
2023-09-21 10:42:08 +08:00
Cengguang Zhang
b3cad7de57 LLM: add bloom kv cache support (#9012)
* LLM: add bloom kv cache support

* fix style.
2023-09-20 21:10:53 +08:00
Kai Huang
156af15d1e Add NF3 (#9008)
* add nf3

* grammar
2023-09-20 20:03:07 +08:00
Kai Huang
6981745fe4 Optimize kv_cache for gpt-neox model family (#9015)
* override gptneox

* style

* move to utils

* revert
2023-09-20 19:59:19 +08:00