Xiangyu Tian
b26409d53f
R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example ( #12854 )
...
* init
* fix
* update
* update
* fix
* fix
2025-02-19 18:33:21 +08:00
Zijie Li
8fd2dcba86
Add benchmark_util for transformers >= 4.47.0 ( #12644 )
2025-01-03 10:48:29 +08:00
Zijie Li
7d80db710e
Add benchmark_util for transformers >= 4.44.0 ( #12171 )
...
* Create benchmark_util_4_45.py
* Update __init__.py
* Update lint-python
* Update benchmark_util_4_45.py
* Update benchmark_util_4_45.py
* Create benchmark_util_4_44.py
2024-10-14 15:40:12 +08:00
Guancheng Fu
69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 ( #12042 )
...
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746 )
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838 )
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957 )
* Fix vLLM not convert issues (#11817 ) (#11918 )
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969 )
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008 )
* Update 0.5.4 dockerfile (#12021 )
* Add vllm awq loading logic (#11987 )
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030 )
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040 )
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043 )
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046 )
* Fix undesired modifications (#12048 )
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
Zijie Li
e7f7141781
Add benchmark util for transformers 4.42 ( #11725 )
...
* add new benchmark_util.py
Add new benchmark_util.py for transformers>=4.43.1. The old one renamed to benchmark_util_prev.py.
* Small fix to import code
* Update __init__.py
* fix file names
* Update lint-python
Update lint-python to exclude benchmark_util_4_29.py
benchmark_util_4_43.py
* Update benchmark_util_4_43.py
* add benchmark_util for transformers 4.42
2024-08-07 08:48:07 +08:00
Zijie Li
8fb36b9f4a
add new benchmark_util.py ( #11713 )
...
* add new benchmark_util.py
2024-08-05 16:18:48 +08:00
Yishuo Wang
7f88ce23cd
add more gemma2 optimization ( #11673 )
2024-07-29 11:13:00 +08:00
Yishuo Wang
3e8819734b
add basic gemma2 optimization ( #11672 )
2024-07-29 10:46:51 +08:00
Wang, Jian4
74950a152a
Fix tgi_api_server error file name ( #11075 )
2024-05-20 16:48:40 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using ( #11027 )
...
* mv benchmark_util.py to utils/
* remove
* update
2024-05-15 14:16:35 +08:00
Shaojun Liu
a10f5a1b8d
add python style check ( #10620 )
...
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
2024-04-02 16:17:56 +08:00
Wang, Jian4
0193f29411
LLM : Enable gguf float16 and Yuan2 model ( #10372 )
...
* enable float16
* add yun files
* enable yun
* enable set low_bit on yuan2
* update
* update license
* update generate
* update readme
* update python style
* update
2024-03-13 10:19:18 +08:00
Guancheng Fu
bf579507c2
Integrate vllm ( #9310 )
...
* done
* Rename structure
* add models
* Add structure/sampling_params,sequence
* add input_metadata
* add outputs
* Add policy,logger
* add and update
* add parallelconfig back
* core/scheduler.py
* Add llm_engine.py
* Add async_llm_engine.py
* Add tested entrypoint
* fix minor error
* Fix everything
* fix kv cache view
* fix
* fix
* fix
* format&refine
* remove logger from repo
* try to add token latency
* remove logger
* Refine config.py
* finish worker.py
* delete utils.py
* add license
* refine
* refine sequence.py
* remove sampling_params.py
* finish
* add license
* format
* add license
* refine
* refine
* Refine line too long
* remove exception
* so dumb style-check
* refine
* refine
* refine
* refine
* refine
* refine
* add README
* refine README
* add warning instead error
* fix padding
* add license
* format
* format
* format fix
* Refine vllm dependency (#1 )
vllm dependency clear
* fix licence
* fix format
* fix format
* fix
* adapt LLM engine
* fix
* add license
* fix format
* fix
* Moving README.md to the correct position
* Fix readme.md
* done
* guide for adding models
* fix
* Fix README.md
* Add new model readme
* remove ray-logic
* refactor arg_utils.py
* remove distributed_init_method logic
* refactor entrypoints
* refactor input_metadata
* refactor model_loader
* refactor utils.py
* refactor models
* fix api server
* remove vllm.stucture
* revert by txy 1120
* remove utils
* format
* fix license
* add bigdl model
* Refer to a specfic commit
* Change code base
* add comments
* add async_llm_engine comment
* refine
* formatted
* add worker comments
* add comments
* add comments
* fix style
* add changes
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-11-23 16:46:45 +08:00
Yuwen Hu
0e09dd926b
[LLM] Fix example test ( #9118 )
...
* Update llm example test link due to example layout change
* Add better change detect
2023-10-10 13:24:18 +08:00
Song Jiaming
c1f9af6d97
[LLM] chatglm example and transformers low-bit examples ( #8751 )
2023-08-16 11:41:44 +08:00
Song Jiaming
e717e304a6
LLM first example test and template ( #8658 )
2023-08-10 10:03:11 +08:00
Shengsheng Huang
02c583144c
[LLM] langchain integrations and examples ( #8256 )
...
* langchain intergrations and examples
* add licences and rename
* add licences
* fix license issues and change backbone to model_family
* update examples to use model_family param
* fix linting
* fix code style
* exclude langchain integration from stylecheck
* update langchain examples and update integrations based on latets changes
* update simple llama-cpp-python style API example
* remove bloom in README
* change default n_threads to 2 and remove redundant code
---------
Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-12 19:22:07 +08:00
binbin Deng
8421af51ae
LLM: support converting to ggml format ( #8235 )
...
* add convert
* fix
* fix
* fix
* try
* test
* update check
* fix
* fix
2023-05-31 15:20:06 +08:00
Pingchuan Ma (Henry)
1f913a6941
[LLM] Add LLM pep8 coding style checking ( #8233 )
...
* add LLM pep8 coding checking
* resolve bugs in testing scripts and code style revision
2023-05-30 15:58:14 +08:00