Commit graph

26 commits

Author SHA1 Message Date
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error (#10934) 2024-05-07 09:25:20 +08:00
ZehuaCao
36eb8b2e96
Add llama3 speculative example (#10856)
* Initial llama3 speculative example

* update README

* update README

* update README
2024-04-23 17:03:54 +08:00
ZehuaCao
92ea54b512
Fix speculative decoding bug (#10855) 2024-04-23 14:28:31 +08:00
Wang, Jian4
18c032652d
LLM: Add mixtral speculative CPU example (#10830)
* init mixtral sp example

* use different prompt_format

* update output

* update
2024-04-23 10:05:51 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error (#10832)
* remove

* update

* remove torchscript
2024-04-22 15:53:09 +08:00
ZehuaCao
0646e2c062
Fix short prompt for IPEX_CPU speculative decoding cause no_attr error (#10783) 2024-04-17 16:19:57 +08:00
Xiangyu Tian
301504aa8d
Fix transformers version warning (#10732) 2024-04-11 13:12:49 +08:00
Shaojun Liu
f37a1f2a81
Upgrade to python 3.11 (#10711)
* create conda env with python 3.11

* recommend to use Python 3.11

* update
2024-04-09 17:41:17 +08:00
Wang, Jian4
16b2ef49c6
Update_document by heyang (#30) 2024-03-25 10:06:02 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Heyang Sun
36a9e88104 Speculative Starcoder on CPU (#10138)
* Speculative Starcoder on CPU

* enable kv-cache pre-allocation

* refine codes

* refine

* fix style

* fix style

* fix style

* refine

* refine

* Update speculative.py

* Update gptbigcode.py

* fix style

* Update speculative.py

* enable mixed-datatype layernorm on top of torch API

* adaptive dtype

* Update README.md
2024-02-27 09:57:29 +08:00
Wang, Jian4
6c74b99a28 LLM: Update qwen readme (#10245) 2024-02-26 17:03:09 +08:00
Wang, Jian4
f9b75f900b LLM: Enable qwen target_model ipex (#10232)
* change order

* enable qwen ipex

* update qwen example

* update

* fix style

* update
2024-02-26 16:41:12 +08:00
Ziteng Zhang
ea23afc8ec [LLM]update ipex part in mistral example readme (#10239)
* update ipex part in mistral example readme
2024-02-26 14:35:20 +08:00
Xiangyu Tian
85a99e13e8 LLM: Fix ChatGLM3 Speculative Example (#10236)
Fix ChatGLM3 Speculative Example.
2024-02-26 10:57:28 +08:00
Xiangyu Tian
f445217d02 LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.
2024-02-22 16:01:11 +08:00
Ziteng Zhang
276ef0e885 Speculative Ziya on CPU (#10160)
* Speculative Ziya on CPU

* Without part of Accelerate with BIGDL_OPT_IPEX
2024-02-21 10:30:39 +08:00
Wang, Jian4
d3591383d5 LLM : Add CPU chatglm3 speculative example (#10004)
* init chatglm

* update

* update
2024-02-19 13:38:52 +08:00
Heyang Sun
177273c1a4 IPEX Speculative Support for Baichuan2 7B (#10112)
* IPEX Speculative Support for Baichuan2 7B

* fix license problems

* refine
2024-02-19 09:12:57 +08:00
Heyang Sun
601024f418 Mistral CPU example of speculative decoding (#10024)
* Mistral CPU example of speculative decoding

* update transformres version

* update example

* Update README.md
2024-02-01 10:52:32 +08:00
Heyang Sun
7284edd9b7 Vicuna CPU example of speculative decoding (#10018)
* Vicuna CPU example of speculative decoding

* Update speculative.py

* Update README.md

* add requirements for ipex

* Update README.md

* Update speculative.py

* Update speculative.py
2024-01-31 11:23:50 +08:00
Wang, Jian4
fb53b994f8 LLM : Add llama ipex optimized (#10046)
* init ipex

* remove padding
2024-01-31 10:38:46 +08:00
Heyang Sun
b1ff28ceb6 LLama2 CPU example of speculative decoding (#9962)
* LLama2 example of speculative decoding

* add docs

* Update speculative.py

* Update README.md

* Update README.md

* Update speculative.py

* remove autocast
2024-01-31 09:45:20 +08:00
Xiangyu Tian
9978089796 [LLM] Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example (#10028)
Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example
2024-01-30 17:11:37 +08:00
Heyang Sun
cc3f122f6a Baichuan2 CPU example of speculative decoding (#10003)
* Baichuan2 CPU example of speculative decoding

* Update generate.py

* Update README.md

* Update generate.py

* Update generate.py

* Update generate.py

* fix default model

* fix wrong chinese coding

* Update generate.py

* update prompt

* update sample outputs

* baichuan 7b needs transformers==4.31.0

* rename example file's name
2024-01-29 14:21:09 +08:00
Wang, Jian4
093e6f8f73 LLM: Add qwen CPU speculative example (#9985)
* init from gpu

* update for cpu

* update

* update

* fix xpu readme

* update

* update example prompt

* update prompt and add 72b

* update

* update
2024-01-25 17:01:34 +08:00