Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|
binbin Deng
|
db8e90796a
|
LLM: add avg token latency information and benchmark guide of autotp (#9940)
|
2024-01-19 15:09:57 +08:00 |
|
Xin Qiu
|
ea0853c0b5
|
update benchmark_utils readme (#8925)
* update readme
* meet code review
|
2023-09-08 10:30:26 +08:00 |
|
Ruonan Wang
|
e9aa2bd890
|
LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
|
2023-08-16 18:01:23 +08:00 |
|
Ruonan Wang
|
8805186f2f
|
LLM: add benchmark tool for gpu (#8760)
* add benchmark tool for gpu
* update
|
2023-08-16 11:22:10 +08:00 |
|
Ruonan Wang
|
64b38e1dc8
|
llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460)
* add benchmark utils
* fix
* fix bug and add readme
* hidden latency data
|
2023-07-06 09:49:52 +08:00 |
|