Commit graph

7 commits

Author SHA1 Message Date
Jason Dai
3bc3d0bbcd Update self-speculative readme (#9986) 2024-01-24 22:37:32 +08:00
Ruonan Wang
d4f65a6033 LLM: add mistral speculative example (#9976)
* add mistral example

* update
2024-01-24 17:35:15 +08:00
Yina Chen
b176cad75a LLM: Add baichuan2 gpu spec example (#9973)
* add baichuan2 gpu spec example

* update readme & example

* remove print

* fix typo

* meet comments

* revert

* update
2024-01-24 16:40:16 +08:00
Yina Chen
5aa4b32c1b LLM: Add qwen spec gpu example (#9965)
* add qwen spec gpu example

* update readme

---------

Co-authored-by: rnwang04 <ruonan1.wang@intel.com>
2024-01-23 15:59:43 +08:00
Ruonan Wang
60b35db1f1 LLM: add chatglm3 speculative decoding example (#9966)
* add chatglm3 example

* update

* fix
2024-01-23 15:54:12 +08:00
Ruonan Wang
27b19106f3 LLM: add readme for speculative decoding gpu examples (#9961)
* add readme

* add readme

* meet code review
2024-01-23 12:54:19 +08:00
Ruonan Wang
3e601f9a5d LLM: Support speculative decoding in bigdl-llm (#9951)
* first commit

* fix error, add llama example

* hidden print

* update api usage

* change to api v3

* update

* meet code review

* meet code review, fix style

* add reference, fix style

* fix style

* fix first token time
2024-01-22 19:14:56 +08:00