Zhao Changmin
|
6a0134a9b2
|
support q4_0_rtn (#11477)
* q4_0_rtn
|
2024-07-02 16:57:02 +08:00 |
|
Jun Wang
|
6352c718f3
|
[update] merge manually build for testing function to manualy build (#11491)
|
2024-07-02 16:28:15 +08:00 |
|
Yishuo Wang
|
5e967205ac
|
remove the code converts input to fp16 before calling batch forward kernel (#11489)
|
2024-07-02 16:23:53 +08:00 |
|
Yuwen Hu
|
1638573f56
|
Update llama cpp quickstart regarding windows prerequisites to avoid misleading (#11490)
|
2024-07-02 16:15:47 +08:00 |
|
Yuwen Hu
|
986b10e397
|
Further fix for performance tests triggered by pr (#11488)
|
2024-07-02 15:29:42 +08:00 |
|
Yuwen Hu
|
bb6953c19e
|
Support pr validate perf test (#11486)
* Support triggering performance tests through commits
* Small fix
* Small fix
* Small fixes
|
2024-07-02 15:20:42 +08:00 |
|
Wang, Jian4
|
4390e7dc49
|
Fix codegeex2 transformers version (#11487)
|
2024-07-02 15:09:28 +08:00 |
|
Guancheng Fu
|
4fbb0d33ae
|
Pin compute runtime version for xpu images (#11479)
* pin compute runtime version
* fix done
|
2024-07-01 21:41:02 +08:00 |
|
Shaojun Liu
|
a1164e45b6
|
Enable Release Pypi workflow to be called in another repo (#11483)
|
2024-07-01 19:48:21 +08:00 |
|
Yuwen Hu
|
fb4774b076
|
Update pull request template for manually-ttriggered Unit tests (#11482)
|
2024-07-01 19:06:29 +08:00 |
|
Yuwen Hu
|
ca24794dd0
|
Fixes for performance test triggering (#11481)
|
2024-07-01 18:39:54 +08:00 |
|
Yuwen Hu
|
6bdc562f4c
|
Enable triggering nightly tests/performance tests from another repo (#11480)
* Enable triggering from another workflow for nightly tests and example tests
* Enable triggering from another workflow for nightly performance tests
|
2024-07-01 17:45:42 +08:00 |
|
Yishuo Wang
|
ec3a912ab6
|
optimize npu llama long context performance (#11478)
|
2024-07-01 16:49:23 +08:00 |
|
Heyang Sun
|
913e750b01
|
fix non-string deepseed config path bug (#11476)
* fix non-string deepseed config path bug
* Update lora_finetune_chatglm.py
|
2024-07-01 15:53:50 +08:00 |
|
binbin Deng
|
48ad482d3d
|
Fix import error caused by pydantic on cpu (#11474)
|
2024-07-01 15:49:49 +08:00 |
|
Yuwen Hu
|
dbba51f455
|
Enable LLM UT workflow to be called in another repo (#11475)
* Enable LLM UT workflow to be called in another repo
* Small fixes
* Small fix
|
2024-07-01 15:26:17 +08:00 |
|
Yishuo Wang
|
39bcb33a67
|
add sdp support for stablelm 3b (#11473)
|
2024-07-01 14:56:15 +08:00 |
|
Zhao Changmin
|
cf8eb7b128
|
Init NPU quantize method and support q8_0_rtn (#11452)
* q8_0_rtn
* fix float point
|
2024-07-01 13:45:07 +08:00 |
|
Yishuo Wang
|
319a3b36b2
|
fix npu llama2 (#11471)
|
2024-07-01 10:14:11 +08:00 |
|
Heyang Sun
|
07362ffffc
|
ChatGLM3-6B LoRA Fine-tuning Demo (#11450)
* ChatGLM3-6B LoRA Fine-tuning Demo
* refine
* refine
* add 2-card deepspeed
* refine format
* add mpi4py and deepspeed install
|
2024-07-01 09:18:39 +08:00 |
|
Wang, Jian4
|
e000ac90c4
|
Add pp_serving example to serving image (#11433)
* init pp
* update
* update
* no clone ipex-llm again
|
2024-06-28 16:45:25 +08:00 |
|
Xiangyu Tian
|
fd933c92d8
|
Fix: Correct num_requests in benchmark for Pipeline Parallel Serving (#11462)
|
2024-06-28 16:10:51 +08:00 |
|
Wang, Jian4
|
b7bc1023fb
|
Add vllm_online_benchmark.py (#11458)
* init
* update and add
* update
|
2024-06-28 14:59:06 +08:00 |
|
SichengStevenLi
|
86b81c09d9
|
Table of Contents in Quickstart Files (#11437)
* fixed a minor grammar mistake
* added table of contents
* added table of contents
* changed table of contents indexing
* added table of contents
* added table of contents, changed grammar
* added table of contents
* added table of contents
* added table of contents
* added table of contents
* added table of contents
* added table of contents, modified chapter numbering
* fixed troubleshooting section redirection path
* added table of contents
* added table of contents, modified section numbering
* added table of contents, modified section numbering
* added table of contents
* added table of contents, changed title size, modified numbering
* added table of contents, changed section title size and capitalization
* added table of contents, modified section numbering
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents capitalization issue
* changed table of contents capitalization issue
* changed table of contents location
* changed table of contents
* changed table of contents
* changed section capitalization
* removed comments
* removed comments
* removed comments
|
2024-06-28 10:41:00 +08:00 |
|
SONG Ge
|
a414e3ff8a
|
add pipeline parallel support with load_low_bit (#11414)
|
2024-06-28 10:17:56 +08:00 |
|
Cengguang Zhang
|
d0b801d7bc
|
LLM: change write mode in all-in-one benchmark. (#11444)
* LLM: change write mode in all-in-one benchmark.
* update output style.
|
2024-06-27 19:36:38 +08:00 |
|
binbin Deng
|
987017ef47
|
Update pipeline parallel serving for more model support (#11428)
|
2024-06-27 18:21:01 +08:00 |
|
Yishuo Wang
|
029ff15d28
|
optimize npu llama2 first token performance (#11451)
|
2024-06-27 17:37:33 +08:00 |
|
Qiyuan Gong
|
4e4ecd5095
|
Control sys.modules ipex duplicate check with BIGDL_CHECK_DUPLICATE_IMPORT (#11453)
* Control sys.modules ipex duplicate check with BIGDL_CHECK_DUPLICATE_IMPORT。
|
2024-06-27 17:21:45 +08:00 |
|
Yishuo Wang
|
c6e5ad668d
|
fix internlm xcomposser meta-instruction typo (#11448)
|
2024-06-27 15:29:43 +08:00 |
|
Yishuo Wang
|
f89ca23748
|
optimize npu llama2 perf again (#11445)
|
2024-06-27 15:13:42 +08:00 |
|
Shaojun Liu
|
13f59ae6b4
|
Fix llm binary build linux-build-avxvnni failure (#11447)
* skip gpg check failure
* skip gpg check
|
2024-06-27 14:12:14 +08:00 |
|
Yishuo Wang
|
cf0f5c4322
|
change npu document (#11446)
|
2024-06-27 13:59:59 +08:00 |
|
binbin Deng
|
508c364a79
|
Add precision option in PP inference examples (#11440)
|
2024-06-27 09:24:27 +08:00 |
|
Jason Dai
|
e9e8f9b4d4
|
Update Readme (#11441)
|
2024-06-26 19:48:07 +08:00 |
|
Jason Dai
|
2939f1ac60
|
Update README.md (#11439)
|
2024-06-26 19:25:58 +08:00 |
|
Yishuo Wang
|
2a0f8087e3
|
optimize qwen2 gpu memory usage again (#11435)
|
2024-06-26 16:52:29 +08:00 |
|
Shaojun Liu
|
ab9f7f3ac5
|
FIX: Qwen1.5-GPTQ-Int4 inference error (#11432)
* merge_qkv if quant_method is 'gptq'
* fix python style checks
* refactor
* update GPU example
|
2024-06-26 15:36:22 +08:00 |
|
Guancheng Fu
|
99cd16ef9f
|
Fix error while using pipeline parallism (#11434)
|
2024-06-26 15:33:47 +08:00 |
|
Yuwen Hu
|
a45ceac4e4
|
Update main readme for missing quickstarts (#11427)
* Update main readme to add missing quckstart
* Update quickstart index page
* Small fixes
* Small fix
|
2024-06-26 13:51:42 +08:00 |
|
Jiao Wang
|
40fa23560e
|
Fix LLAVA example on CPU (#11271)
* update
* update
* update
* update
|
2024-06-25 20:04:59 -07:00 |
|
Yishuo Wang
|
ca0e69c3a7
|
optimize npu llama perf again (#11431)
|
2024-06-26 10:52:54 +08:00 |
|
Yishuo Wang
|
9f6e5b4fba
|
optimize llama npu perf (#11426)
|
2024-06-25 17:43:20 +08:00 |
|
binbin Deng
|
e473b8d946
|
Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423)
|
2024-06-25 15:49:32 +08:00 |
|
binbin Deng
|
aacc1fd8c0
|
Fix shape error when run qwen1.5-14b using deepspeed autotp (#11420)
|
2024-06-25 13:48:37 +08:00 |
|
Yishuo Wang
|
3b23de684a
|
update npu examples (#11422)
|
2024-06-25 13:32:53 +08:00 |
|
Xiangyu Tian
|
8ddae22cfb
|
LLM: Refactor Pipeline-Parallel-FastAPI example (#11319)
Initially Refactor for Pipeline-Parallel-FastAPI example
|
2024-06-25 13:30:36 +08:00 |
|
SONG Ge
|
34c15d3a10
|
update pp document (#11421)
|
2024-06-25 10:17:20 +08:00 |
|
Xin Qiu
|
9e4ee61737
|
rename BIGDL_OPTIMIZE_LM_HEAD to IPEX_LLM_LAST_LM_HEAD and add qwen2 (#11418)
|
2024-06-24 18:42:37 +08:00 |
|
Yuwen Hu
|
75f836f288
|
Add extra warmup for THUDM/glm-4-9b-chat in igpu-performance test (#11417)
|
2024-06-24 18:08:05 +08:00 |
|