Commit graph

1499 commits

Author SHA1 Message Date
binbin Deng
995b0f119f LLM: update some gpu examples (#9136) 2023-10-11 14:23:56 +08:00
Ruonan Wang
1c8d5da362 LLM: fix llama tokenizer for all-in-one benchmark (#9129)
* fix tokenizer for gpu benchmark

* fix ipex fp16

* meet code review

* fix
2023-10-11 13:39:39 +08:00
binbin Deng
2ad67a18b1 LLM: add mistral examples (#9121) 2023-10-11 13:38:15 +08:00
Ruonan Wang
1363e666fc LLM: update benchmark_util.py for beam search (#9126)
* update reorder_cache

* fix
2023-10-11 09:41:53 +08:00
Guoqiong Song
e8c5645067 add LLM example of aquila on GPU (#9056)
* aquila, dolly-v1, dolly-v2, vacuna
2023-10-10 17:01:35 -07:00
Yuwen Hu
dc70fc7b00 Update performance tests for dependency of bigdl-core-xe-esimd (#9124) 2023-10-10 19:32:17 +08:00
Lilac09
30e3c196f3 Merge pull request #9108 from Zhengjin-Wang/main
Add instruction for chat.py in bigdl-llm-cpu
2023-10-10 16:40:52 +08:00
Lilac09
1e78b0ac40 Optimize LoRA Docker by Shrinking Image Size (#9110)
* modify dockerfile

* modify dockerfile
2023-10-10 15:53:17 +08:00
Ruonan Wang
388f688ef3 LLM: update setup.py to add bigdl-core-xe package (#9122) 2023-10-10 15:02:48 +08:00
Zhao Changmin
1709beba5b LLM: Explicitly close pickle file pointer before removing temporary directory (#9120)
* fp close
2023-10-10 14:57:23 +08:00
Yuwen Hu
0e09dd926b [LLM] Fix example test (#9118)
* Update llm example test link due to example layout change

* Add better change detect
2023-10-10 13:24:18 +08:00
Ruonan Wang
ad7d9231f5 LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112)
* add pvc bash

* meet code review

* rename to run-max-gpu.sh
2023-10-10 10:18:41 +08:00
Lilac09
6264381f2e Merge pull request #9117 from Zhengjin-Wang/manually_build
add llm-serving-xpu on github action
2023-10-10 10:09:06 +08:00
Zhengjin Wang
0dbb3a283e amend manually_build 2023-10-10 10:03:23 +08:00
Zhengjin Wang
bb3bb46400 add llm-serving-xpu on github action 2023-10-10 09:48:58 +08:00
binbin Deng
e4d1457a70 LLM: improve transformers style API doc (#9113) 2023-10-10 09:31:00 +08:00
Yuwen Hu
65212451cc [LLM] Small update to performance tests (#9106)
* small updates to llm performance tests regarding model handling

* Small fix
2023-10-09 16:55:25 +08:00
Zhao Changmin
edccfb2ed3 LLM: Check model device type (#9092)
* check model device
2023-10-09 15:49:15 +08:00
Heyang Sun
2c0c9fecd0 refine LLM containers (#9109) 2023-10-09 15:45:30 +08:00
binbin Deng
5e9962b60e LLM: update example layout (#9046) 2023-10-09 15:36:39 +08:00
Yina Chen
4c4f8d1663 [LLM]Fix Arc falcon abnormal output issue (#9096)
* update

* update

* fix error & style

* fix style

* update train

* to input_seq_size
2023-10-09 15:09:37 +08:00
Wang
a1aefdb8f4 modify README 2023-10-09 13:36:29 +08:00
Wang
3814abf95a add instruction for chat.py 2023-10-09 12:57:28 +08:00
Zhao Changmin
548e4dd5fe LLM: Adapt transformers models for optimize model SL (#9022)
* LLM: Adapt transformers model for SL
2023-10-09 11:13:44 +08:00
Ruonan Wang
f64257a093 LLM: basic api support for esimd fp16 (#9067)
* basic api support for fp16

* fix style

* fix

* fix error and style

* fix style

* meet code review

* update based on comments
2023-10-09 11:05:17 +08:00
Wang
a42c25436e Merge remote-tracking branch 'upstream/main' 2023-10-09 10:55:18 +08:00
JIN Qiao
65373d2a8b LLM: adjust portable zip content (#9054)
* LLM: adjust portable zip content

* LLM: adjust portable zip README
2023-10-09 10:51:19 +08:00
Guancheng Fu
df8df751c4 Modify readme for bigdl-llm-serving-cpu (#9105) 2023-10-09 09:56:09 +08:00
Heyang Sun
2756f9c20d XPU QLoRA Container (#9082)
* XPU QLoRA Container

* fix apt issue

* refine
2023-10-08 11:04:20 +08:00
ZehuaCao
aad68100ae Add trusted-bigdl-llm-serving-tdx image. (#9093)
* add entrypoint in cpu serving

* kubernetes support for fastchat cpu serving

* Update Readme

* add image to manually_build action

* update manually_build.yml

* update README.md

* update manually_build.yaml

* update attestation_cli.py

* update manually_build.yml

* update Dockerfile

* rename

* update trusted-bigdl-llm-serving-tdx Dockerfile
2023-10-08 10:13:51 +08:00
Xin Qiu
b3e94a32d4 change log4error import (#9098) 2023-10-08 09:23:28 +08:00
Kai Huang
78ea7ddb1c Combine apply_rotary_pos_emb for gpt-neox (#9074) 2023-10-07 16:27:46 +08:00
Heyang Sun
0b40ef8261 separate trusted and native llm cpu finetune from lora (#9050)
* seperate trusted-llm and bigdl from lora finetuning

* add k8s for trusted llm finetune

* refine

* refine

* rename cpu to tdx in trusted llm

* solve conflict

* fix typo

* resolving conflict

* Delete docker/llm/finetune/lora/README.md

* fix

---------

Co-authored-by: Uxito-Ada <seusunheyang@foxmail.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-10-07 15:26:59 +08:00
Wang
4aee952b10 Merge remote-tracking branch 'upstream/main' 2023-10-07 09:53:52 +08:00
ZehuaCao
b773d67dd4 Add Kubernetes support for BigDL-LLM-serving CPU. (#9071) 2023-10-07 09:37:48 +08:00
Yang Wang
36dd4afd61 Fix llama when rope scaling is not None (#9086)
* Fix llama when rope scaling is not None

* fix style

* fix style
2023-10-06 13:27:37 -07:00
Yang Wang
fcb1c618a0 using bigdl-llm fused rope for llama (#9066)
* optimize llama xpu rope

* fix bug

* fix style

* refine append cache

* remove check

* do not cache cos sin

* remove unnecessary changes

* clean up

* fix style

* check for training
2023-10-06 09:57:29 -07:00
Jason Dai
50044640c0 Update README.md (#9085) 2023-10-06 21:54:18 +08:00
Jiao Wang
aefa5a5bfe Qwen kv cache (#9079)
* qwen and aquila

* update

* update

* style
2023-10-05 11:59:17 -07:00
Jiao Wang
d5ca1f32b6 Aquila KV cache optimization (#9080)
* update

* update

* style
2023-10-05 11:10:57 -07:00
Jason Dai
7506100bd5 Update readme (#9084) 2023-10-05 16:54:09 +08:00
Yang Wang
88565c76f6 add export merged model example (#9018)
* add export merged model example

* add sources

* add script

* fix style
2023-10-04 21:18:52 -07:00
Yang Wang
0cd8f1c79c Use ipex fused rms norm for llama (#9081)
* also apply rmsnorm

* fix cpu
2023-10-04 21:04:55 -07:00
Cengguang Zhang
fb883100e7 LLM: support chatglm-18b convert attention forward in benchmark scripts. (#9072)
* add chatglm-18b convert.

* fix if statement.

* fix
2023-09-28 14:04:52 +08:00
Yishuo Wang
6de2189e90 [LLM] fix chatglm main choice (#9073) 2023-09-28 11:23:37 +08:00
binbin Deng
760183bac6 LLM: update key feature and installation page of document (#9068) 2023-09-27 15:44:34 +08:00
Lilac09
c91b2bd574 fix:modify indentation (#9070)
* modify Dockerfile

* add README.md

* add README.md

* Modify Dockerfile

* Add bigdl inference cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build

* Modify Dockerfile

* Add bigdl inference cpu image build

* Add bigdl inference cpu image build

* Add bigdl llm xpu image build

* manually build

* recover file

* manually build

* recover file

* modify indentation
2023-09-27 14:53:52 +08:00
Wang
ddcd9e7d0a modify indentation 2023-09-27 14:49:58 +08:00
Wang
9935772f24 recover file 2023-09-26 15:50:51 +08:00
Wang
efc2158215 manually build 2023-09-26 15:47:47 +08:00