Commit graph

1563 commits

Author SHA1 Message Date
dingbaorong
5a2ce421af add cpu and gpu examples of flan-t5 (#9171)
* add cpu and gpu examples of flan-t5

* address yuwen's comments
* Add explanation  why we add modules to not convert
* Refine prompt and add a translation example
* Add a empty line at the end of files

* add examples of flan-t5 using optimize_mdoel api

* address bin's comments

* address binbin's comments

* add flan-t5 in readme
2023-10-24 15:24:01 +08:00
Yining Wang
4a19f50d16 phi-1_5 CPU and GPU examples (#9173)
* eee

* add examples on CPU and GPU

* fix

* fix

* optimize model examples

* have updated

* Warmup and configs added

* Update two tables
2023-10-24 15:08:04 +08:00
Ziteng Zhang
ca2965fb9f hosted k8s.png on readthedocs (#9258) 2023-10-24 15:07:16 +08:00
SONG Ge
bfc1e2d733 add fused rms optimization for chatglm model (#9256) 2023-10-24 14:40:58 +08:00
Ruonan Wang
b15656229e LLM: fix benchmark issue (#9255) 2023-10-24 14:15:05 +08:00
Guancheng Fu
f37547249d Refine README/CICD (#9253) 2023-10-24 12:56:03 +08:00
Guancheng Fu
9faa2f1eef Fix bigdl-llm-serving-tdx image (#9251) 2023-10-24 10:49:35 +08:00
binbin Deng
db37edae8a LLM: update langchain api document page (#9222) 2023-10-24 10:13:41 +08:00
Xin Qiu
0c5055d38c add position_ids and fuse embedding for falcon (#9242)
* add position_ids for falcon

* add cpu

* add cpu

* add license
2023-10-24 09:58:20 +08:00
Guancheng Fu
7f66bc5c14 Fix bigdl-llm-serving-cpu Dockerfile (#9247) 2023-10-23 16:51:30 +08:00
Guancheng Fu
6cb884d82d Fix missing manually_build_for_testing entry (#9245) 2023-10-23 16:35:09 +08:00
Guancheng Fu
2ead3f7d54 add manually build (#9244) 2023-10-23 15:53:30 +08:00
Wang, Jian4
c14a61681b Add load low-bit in model-serving for reduce EPC (#9239)
* init load low-bit

* fix

* fix
2023-10-23 11:28:20 +08:00
Yina Chen
0383306688 Add arc fp8 support (#9232)
* add fp8 support

* add log

* fix style
2023-10-20 17:15:07 +08:00
Jason Dai
26850ebd36 Update readme (#9237) 2023-10-20 16:13:25 +08:00
Yang Wang
118249b011 support transformers 4.34+ for llama (#9229) 2023-10-19 22:36:30 -07:00
binbin Deng
7e96d3e79a LLM: improve gpu supports key feature doc page (#9212) 2023-10-19 18:40:48 +08:00
Shaojun Liu
9dc76f19c0 fix hadolint error (#9223) 2023-10-19 16:22:32 +08:00
Chen, Zhentao
5850241423 correct Readme GPU example and API docstring (#9225)
* update readme to correct GPU usage

* update from_pretrained supported low bit options

* fix stype check
2023-10-19 16:08:47 +08:00
WeiguangHan
f87f67ee1c LLM: arc perf test for some popular models (#9188) 2023-10-19 15:56:15 +08:00
Ziteng Zhang
0d62bd4adb Added Docker installation guide and modified link in Dockerfile (#9224)
* changed '/ppml' into '/bigdl' and modified llama-7b

* Added the contents of finetuning in README

* Modified link of qlora_finetuning.py in Dockerfile
2023-10-19 15:28:05 +08:00
Lilac09
160c543a26 README for BigDL-LLM on docker (#9197)
* add instruction for MacOS/Linux

* modify html label of gif images

* organize structure of README

* change title name

* add inference-xpu, serving-cpu and serving-xpu parts

* revise README

* revise README

* revise README
2023-10-19 13:48:06 +08:00
Yang Wang
b0ddde0410 Fix removing convert dtype bug (#9216)
* Fix removing convert dtype bug

* fix style
2023-10-18 11:24:22 -07:00
Ruonan Wang
942d6418e7 LLM: fix chatglm kv cache (#9215) 2023-10-18 19:09:53 +08:00
SONG Ge
0765f94770 [LLM] Optimize kv_cache for mistral model family (#9189)
* add kv_cache optimization for mistral model

* kv_cache optimize for mistral

* update stylr

* update
2023-10-18 15:13:37 +08:00
Ruonan Wang
3555ebc148 LLM: fix wrong length in gptj kv_cache optimization (#9210)
* fix wrong length in gptj kv cache

* update
2023-10-18 14:59:02 +08:00
Shengsheng Huang
6dad8d16df optimize NormHead for Baichuan2 (#9205)
* optimize NormHead for Baichuan2

* fix ut and change name

* rename functions
2023-10-18 14:05:07 +08:00
Jin Qiao
a3b664ed03 LLM: add GPU More-Data-Types and Save/Load example (#9199) 2023-10-18 13:13:45 +08:00
Ziteng Zhang
2f14f53b1c changed '/ppml' into '/bigdl' and modified llama-7b (#9209) 2023-10-18 10:25:12 +08:00
WeiguangHan
b9194c5786 LLM: skip some model tests using certain api (#9163)
* LLM: Skip some model tests using certain api

* initialize variable named result
2023-10-18 09:39:27 +08:00
Ruonan Wang
09815f7064 LLM: fix RMSNorm optimization of Baichuan2-13B/Baichuan-13B (#9204)
* fix rmsnorm of baichuan2-13B

* update baichuan1-13B too

* fix style
2023-10-17 18:40:34 +08:00
binbin Deng
efcda3892f LLM: add save & load usage in PyTorch API key feature page (#9183) 2023-10-17 17:24:51 +08:00
Jin Qiao
d7ce78edf0 LLM: fix portable zip README image link (#9201)
* LLM: fix portable zip readme img link

* LLM: make README first image center align
2023-10-17 16:38:22 +08:00
binbin Deng
330e67e2c0 LLM: update example doc page (#9186) 2023-10-17 16:26:11 +08:00
Cheen Hau, 俊豪
66c2e45634 Add unit tests for optimized model correctness (#9151)
* Add test to check correctness of optimized model

* Refactor optimized model test

* Use models in llm-unit-test

* Use AutoTokenizer for bloom

* Print out each passed test

* Remove unused tokenizer from import
2023-10-17 14:46:41 +08:00
Jin Qiao
d946bd7c55 LLM: add CPU More-Data-Types and Save-Load examples (#9179) 2023-10-17 14:38:52 +08:00
Ruonan Wang
c0497ab41b LLM: support kv_cache optimization for Qwen-VL-Chat (#9193)
* dupport qwen_vl_chat

* fix style
2023-10-17 13:33:56 +08:00
binbin Deng
1cd9ab15b8 LLM: fix ChatGLMConfig check (#9191) 2023-10-17 11:52:56 +08:00
Yang Wang
7160afd4d1 Support XPU DDP training and autocast for LowBitMatmul (#9167)
* support autocast in low bit matmul

* Support XPU DDP training

* fix  amp
2023-10-16 20:47:19 -07:00
Ruonan Wang
77afb8796b LLM: fix convert of chatglm (#9190) 2023-10-17 10:48:13 +08:00
dingbaorong
af3b575c7e expose modules_to_not_convert in optimize_model (#9180)
* expose modules_to_not_convert in optimize_model

* some fixes
2023-10-17 09:50:26 +08:00
Cengguang Zhang
5ca8a851e9 LLM: add fuse optimization for Mistral. (#9184)
* add fuse optimization for mistral.

* fix.

* fix

* fix style.

* fix.

* fix error.

* fix style.

* fix style.
2023-10-16 16:50:31 +08:00
Jiao Wang
49e1381c7f update rope (#9155) 2023-10-15 21:51:45 -07:00
Jason Dai
b192a8032c Update llm-readme (#9176) 2023-10-16 10:54:52 +08:00
binbin Deng
a164c24746 LLM: add kv_cache optimization for chatglm2-6b-32k (#9165) 2023-10-16 10:43:15 +08:00
Lilac09
326ef7f491 add README for llm-inference-cpu (#9147)
* add README for llm-inference-cpu

* modify README

* add README for llm-inference-cpu on Windows
2023-10-16 10:27:44 +08:00
Yang Wang
7a2de00b48 Fixes for xpu Bf16 training (#9156)
* Support bf16 training

* Use a stable transformer version

* remove env

* fix style
2023-10-14 21:28:59 -07:00
Cengguang Zhang
51a133de56 LLM: add fuse rope and norm optimization for Baichuan. (#9166)
* add fuse rope optimization.

* add rms norm optimization.
2023-10-13 17:36:52 +08:00
Jin Qiao
db7f938fdc LLM: add replit and starcoder to gpu pytorch model example (#9154) 2023-10-13 15:44:17 +08:00
Jin Qiao
797b156a0d LLM: add dolly-v1 and dolly-v2 to gpu pytorch model example (#9153) 2023-10-13 15:43:35 +08:00