Xin Qiu
49a39452c6
update benchmark ( #8899 )
2023-09-06 15:11:43 +08:00
Kai Huang
4a9ff050a1
Add qlora nf4 ( #8782 )
...
* add nf4
* dequant nf4
* style
2023-09-06 09:39:22 +08:00
xingyuan li
704a896e90
[LLM] Add perf test on xpu for bigdl-llm ( #8866 )
...
* add xpu latency job
* update install way
* remove duplicated workflow
* add perf upload
2023-09-05 17:36:24 +09:00
Zhao Changmin
95271f10e0
LLM: Rename low bit layer ( #8875 )
...
* rename lowbit
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-05 13:21:12 +08:00
Yina Chen
74a2c2ddf5
Update optimize_model=True in llama2 chatglm2 arc examples ( #8878 )
...
* add optimize_model=True in llama2 chatglm2 examples
* add ipex optimize in gpt-j example
2023-09-05 10:35:37 +08:00
Jason Dai
5e58f698cd
Update readthedocs ( #8882 )
2023-09-04 15:42:16 +08:00
Song Jiaming
7b3ac66e17
[LLM] auto performance test fix specific settings to template ( #8876 )
2023-09-01 15:49:04 +08:00
Yang Wang
242c9d6036
Fix chatglm2 multi-turn streamchat ( #8867 )
2023-08-31 22:13:49 -07:00
Song Jiaming
c06f1ca93e
[LLM] auto perf test to output to csv ( #8846 )
2023-09-01 10:48:00 +08:00
Zhao Changmin
9c652fbe95
LLM: Whisper long segment recognize example ( #8826 )
...
* LLM: Long segment recognize example
2023-08-31 16:41:25 +08:00
Yishuo Wang
a232c5aa21
[LLM] add protobuf in bigdl-llm dependency ( #8861 )
2023-08-31 15:23:31 +08:00
xingyuan li
de6c6bb17f
[LLM] Downgrade amx build gcc version and remove avx flag display ( #8856 )
...
* downgrade to gcc 11
* remove avx display
2023-08-31 14:08:13 +09:00
Yang Wang
3b4f4e1c3d
Fix llama attention optimization for XPU ( #8855 )
...
* Fix llama attention optimization fo XPU
* fix chatglm2
* fix typo
2023-08-30 21:30:49 -07:00
Shengsheng Huang
7b566bf686
[LLM] add new API for optimize any pytorch models ( #8827 )
...
* add new API for optimize any pytorch models
* change test util name
* revise API and update UT
* fix python style
* update ut config, change default value
* change defaults, disable ut transcribe
2023-08-30 19:41:53 +08:00
Xin Qiu
8eca982301
windows add env ( #8852 )
2023-08-30 15:54:52 +08:00
Zhao Changmin
731916c639
LLM: Enable attempting loading method automatically ( #8841 )
...
* enable auto load method
* warning error
* logger info
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 15:41:55 +08:00
Yishuo Wang
bba73ec9d2
[LLM] change chatglm native int4 checkpoint name ( #8851 )
2023-08-30 15:05:19 +08:00
Yina Chen
55e705a84c
[LLM] Support the rest of AutoXXX classes in Transformers API ( #8815 )
...
* add transformers auto models
* fix
2023-08-30 11:16:14 +08:00
Zhao Changmin
887018b0f2
Update ut save&load ( #8847 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 10:32:57 +08:00
Yina Chen
3462fd5c96
Add arc gpt-j example ( #8840 )
2023-08-30 10:31:24 +08:00
Ruonan Wang
f42c0bad1b
LLM: update GPU doc ( #8845 )
2023-08-30 09:24:19 +08:00
Jason Dai
aab7deab1f
Reorganize GPU examples ( #8844 )
2023-08-30 08:32:08 +08:00
Yang Wang
a386ad984e
Add Data Center GPU Flex Series to Readme ( #8835 )
...
* Add Data Center GPU Flex Series to Readme
* remove
* update starcoder
2023-08-29 11:19:09 -07:00
Yishuo Wang
7429ea0606
[LLM] support transformer int4 + amx int4 ( #8838 )
2023-08-29 17:27:18 +08:00
Ruonan Wang
ddff7a6f05
Update readme of GPU to specify oneapi version( #8820 )
2023-08-29 13:14:22 +08:00
Zhao Changmin
bb31d4fe80
LLM: Implement hf low_cpu_mem_usage with 1xbinary file peak memory on transformer int4 ( #8731 )
...
* 1x peak memory
2023-08-29 09:33:17 +08:00
Yina Chen
35fdf94031
[LLM]Arc starcoder example ( #8814 )
...
* arc starcoder example init
* add log
* meet comments
2023-08-28 16:48:00 +08:00
xingyuan li
6a902b892e
[LLM] Add amx build step ( #8822 )
...
* add amx build step
2023-08-28 17:41:18 +09:00
Ruonan Wang
eae92bc7da
llm: quick fix path ( #8810 )
2023-08-25 16:02:31 +08:00
Ruonan Wang
0186f3ab2f
llm: update all ARC int4 examples ( #8809 )
...
* update GPU examples
* update other examples
* fix
* update based on comment
2023-08-25 15:26:10 +08:00
Song Jiaming
b8b1b6888b
[LLM] Performance test ( #8796 )
2023-08-25 14:31:45 +08:00
Yang Wang
9d0f6a8cce
rename math.py in example to avoid conflict ( #8805 )
2023-08-24 21:06:31 -07:00
SONG Ge
d2926c7672
[LLM] Unify Langchain Native and Transformers LLM API ( #8752 )
...
* deprecate BigDLNativeTransformers and add specific LMEmbedding method
* deprecate and add LM methods for langchain llms
* add native params to native langchain
* new imple for embedding
* move ut from bigdlnative to casual llm
* rename embeddings api and examples update align with usage updating
* docqa example hot-fix
* add more api docs
* add langchain ut for starcoder
* support model_kwargs for transformer methods when calling causalLM and add ut
* ut fix for transformers embedding
* update for langchain causal supporting transformers
* remove model_family in readme doc
* add model_families params to support more models
* update api docs and remove chatglm embeddings for now
* remove chatglm embeddings in examples
* new refactor for ut to add bloom and transformers llama ut
* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00
binbin Deng
5582872744
LLM: update chatglm example to be more friendly for beginners ( #8795 )
2023-08-25 10:55:01 +08:00
Yina Chen
7c37424a63
Fix voice assistant example input error on Linux ( #8799 )
...
* fix linux error
* update
* remove alsa log
2023-08-25 10:47:27 +08:00
Yang Wang
bf3591e2ff
Optimize chatglm2 for bf16 ( #8725 )
...
* make chatglm works with bf16
* fix style
* support chatglm v1
* fix style
* fix style
* add chatglm2 file
2023-08-24 10:04:25 -07:00
xingyuan li
c94bdd3791
[LLM] Merge windows & linux nightly test ( #8756 )
...
* fix download statement
* add check before build wheel
* use curl to upload files
* windows unittest won't upload converted model
* split llm-cli test into windows & linux versions
* update tempdir create way
* fix nightly converted model name
* windows llm-cli starcoder test temply disabled
* remove taskset dependency
* rename llm_unit_tests_linux to llm_unit_tests
2023-08-23 12:48:41 +09:00
Jason Dai
dcadd09154
Update llm document ( #8784 )
2023-08-21 22:34:44 +08:00
Yishuo Wang
611c1fb628
[LLM] change default n_threads of native int4 langchain API ( #8779 )
2023-08-21 13:30:12 +08:00
Yishuo Wang
3d1f2b44f8
LLM: change default n_threads of native int4 models ( #8776 )
2023-08-18 15:46:19 +08:00
Yishuo Wang
2ba2133613
fix starcoder chinese output ( #8773 )
2023-08-18 13:37:02 +08:00
binbin Deng
548f7a6cf7
LLM: update convert of llama family to support llama2-70B ( #8747 )
2023-08-18 09:30:35 +08:00
Yina Chen
4afea496ab
support q8_0 ( #8765 )
2023-08-17 15:06:36 +08:00
Ruonan Wang
e9aa2bd890
LLM: reduce GPU 1st token latency and update example ( #8763 )
...
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
binbin Deng
06609d9260
LLM: add qwen example on arc ( #8757 )
2023-08-16 17:11:08 +08:00
SONG Ge
f4164e4492
[BigDL LLM] Update readme for unifying transformers API ( #8737 )
...
* update readme doc
* fix readthedocs error
* update comment
* update exception error info
* invalidInputError instead
* fix readme typo error and remove import error
* fix more typo
2023-08-16 14:22:32 +08:00
Song Jiaming
c1f9af6d97
[LLM] chatglm example and transformers low-bit examples ( #8751 )
2023-08-16 11:41:44 +08:00
Ruonan Wang
8805186f2f
LLM: add benchmark tool for gpu ( #8760 )
...
* add benchmark tool for gpu
* update
2023-08-16 11:22:10 +08:00
binbin Deng
97283c033c
LLM: add falcon example on arc ( #8742 )
2023-08-15 17:38:38 +08:00
binbin Deng
8c55911308
LLM: add baichuan-13B on arc example ( #8755 )
2023-08-15 15:07:04 +08:00