Yishuo Wang
|
6bcdc6cc8f
|
fix qwen2 cpu (#11663)
|
2024-07-26 13:41:51 +08:00 |
|
Yishuo Wang
|
019da6c0ab
|
use mlp silu_mul fusion in qwen2 to optimize memory usage (#11574)
|
2024-07-13 16:32:54 +08:00 |
|
Yishuo Wang
|
99b2802d3b
|
optimize qewn2 memory (#11535)
|
2024-07-09 17:14:01 +08:00 |
|
Yishuo Wang
|
7cb09a8eac
|
optimize qwen2 memory usage again (#11520)
|
2024-07-05 17:32:34 +08:00 |
|
Yishuo Wang
|
2a0f8087e3
|
optimize qwen2 gpu memory usage again (#11435)
|
2024-06-26 16:52:29 +08:00 |
|
Shaojun Liu
|
ab9f7f3ac5
|
FIX: Qwen1.5-GPTQ-Int4 inference error (#11432)
* merge_qkv if quant_method is 'gptq'
* fix python style checks
* refactor
* update GPU example
|
2024-06-26 15:36:22 +08:00 |
|
binbin Deng
|
aacc1fd8c0
|
Fix shape error when run qwen1.5-14b using deepspeed autotp (#11420)
|
2024-06-25 13:48:37 +08:00 |
|
Yishuo Wang
|
abe53eaa4f
|
optimize qwen1.5/2 memory usage when running long input with fp16 (#11403)
|
2024-06-24 13:43:04 +08:00 |
|
Yishuo Wang
|
f0fdfa081b
|
Optimize qwen 1.5 14B batch performance (#11370)
|
2024-06-20 17:23:39 +08:00 |
|
binbin Deng
|
60cb1dac7c
|
Support PP for qwen1.5 (#11300)
|
2024-06-13 17:35:24 +08:00 |
|
Yishuo Wang
|
2623944604
|
qwen2 sdpa small fix (#11261)
|
2024-06-07 14:42:18 +08:00 |
|
Yishuo Wang
|
2e4ccd541c
|
fix qwen2 cpu (#11240)
|
2024-06-06 16:24:19 +08:00 |
|
Yishuo Wang
|
e738ec38f4
|
disable quantize kv in specific qwen model (#11238)
|
2024-06-06 14:08:39 +08:00 |
|
Yishuo Wang
|
d307622797
|
fix first token sdp with batch (#11153)
|
2024-05-28 15:03:06 +08:00 |
|
Yina Chen
|
b6b70d1ba0
|
Divide core-xe packages (#11131)
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
|
2024-05-28 12:00:18 +08:00 |
|
Yishuo Wang
|
f00625f9a4
|
refactor qwen2 (#11087)
|
2024-05-21 16:53:42 +08:00 |
|
Yishuo Wang
|
170e3d65e0
|
use new sdp and fp32 sdp (#11007)
|
2024-05-14 14:29:18 +08:00 |
|
Cengguang Zhang
|
cfed76b2ed
|
LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937)
* LLM: add split tensor support for baichuan2-7b and qwen1.5-7b.
* fix style.
* fix style.
* fix style.
* add support for mistral and fix condition threshold.
* fix style.
* fix comments.
|
2024-05-10 16:40:15 +08:00 |
|
Yishuo Wang
|
d884c62dc4
|
remove new_layout parameter (#10906)
|
2024-04-29 10:31:50 +08:00 |
|
Xin Qiu
|
e764f9b1b1
|
Disable fast fused rope on UHD (#10780)
* use decoding fast path
* update
* update
* cleanup
|
2024-04-18 10:03:53 +08:00 |
|
Cengguang Zhang
|
3e2662c87e
|
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771)
|
2024-04-16 09:32:30 +08:00 |
|
Keyan (Kyrie) Zhang
|
585c174e92
|
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.
* Fix style
|
2024-04-10 10:48:46 +08:00 |
|
Jiao Wang
|
69bdbf5806
|
Fix vllm print error message issue (#10664)
* update chatglm readme
* Add condition to invalidInputError
* update
* update
* style
|
2024-04-05 15:08:13 -07:00 |
|
Xin Qiu
|
1dd40b429c
|
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv
* update qwen
* update qwen2
|
2024-03-26 08:34:00 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|