Commit graph

25 commits

Author SHA1 Message Date
Yishuo Wang
6bcdc6cc8f
fix qwen2 cpu (#11663) 2024-07-26 13:41:51 +08:00
Yishuo Wang
019da6c0ab
use mlp silu_mul fusion in qwen2 to optimize memory usage (#11574) 2024-07-13 16:32:54 +08:00
Yishuo Wang
99b2802d3b
optimize qewn2 memory (#11535) 2024-07-09 17:14:01 +08:00
Yishuo Wang
7cb09a8eac
optimize qwen2 memory usage again (#11520) 2024-07-05 17:32:34 +08:00
Yishuo Wang
2a0f8087e3
optimize qwen2 gpu memory usage again (#11435) 2024-06-26 16:52:29 +08:00
Shaojun Liu
ab9f7f3ac5
FIX: Qwen1.5-GPTQ-Int4 inference error (#11432)
* merge_qkv if quant_method is 'gptq'

* fix python style checks

* refactor

* update GPU example
2024-06-26 15:36:22 +08:00
binbin Deng
aacc1fd8c0
Fix shape error when run qwen1.5-14b using deepspeed autotp (#11420) 2024-06-25 13:48:37 +08:00
Yishuo Wang
abe53eaa4f
optimize qwen1.5/2 memory usage when running long input with fp16 (#11403) 2024-06-24 13:43:04 +08:00
Yishuo Wang
f0fdfa081b
Optimize qwen 1.5 14B batch performance (#11370) 2024-06-20 17:23:39 +08:00
binbin Deng
60cb1dac7c
Support PP for qwen1.5 (#11300) 2024-06-13 17:35:24 +08:00
Yishuo Wang
2623944604
qwen2 sdpa small fix (#11261) 2024-06-07 14:42:18 +08:00
Yishuo Wang
2e4ccd541c
fix qwen2 cpu (#11240) 2024-06-06 16:24:19 +08:00
Yishuo Wang
e738ec38f4
disable quantize kv in specific qwen model (#11238) 2024-06-06 14:08:39 +08:00
Yishuo Wang
d307622797
fix first token sdp with batch (#11153) 2024-05-28 15:03:06 +08:00
Yina Chen
b6b70d1ba0
Divide core-xe packages (#11131)
* temp

* add batch

* fix style

* update package name

* fix style

* add workflow

* use temp version to run uts

* trigger performance test

* trigger win igpu perf

* revert workflow & setup
2024-05-28 12:00:18 +08:00
Yishuo Wang
f00625f9a4
refactor qwen2 (#11087) 2024-05-21 16:53:42 +08:00
Yishuo Wang
170e3d65e0
use new sdp and fp32 sdp (#11007) 2024-05-14 14:29:18 +08:00
Cengguang Zhang
cfed76b2ed
LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937)
* LLM: add split tensor support for baichuan2-7b and qwen1.5-7b.

* fix style.

* fix style.

* fix style.

* add support for mistral and fix condition threshold.

* fix  style.

* fix comments.
2024-05-10 16:40:15 +08:00
Yishuo Wang
d884c62dc4
remove new_layout parameter (#10906) 2024-04-29 10:31:50 +08:00
Xin Qiu
e764f9b1b1
Disable fast fused rope on UHD (#10780)
* use decoding fast path

* update

* update

* cleanup
2024-04-18 10:03:53 +08:00
Cengguang Zhang
3e2662c87e
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
Keyan (Kyrie) Zhang
585c174e92
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.

* Fix style
2024-04-10 10:48:46 +08:00
Jiao Wang
69bdbf5806
Fix vllm print error message issue (#10664)
* update chatglm readme

* Add condition to invalidInputError

* update

* update

* style
2024-04-05 15:08:13 -07:00
Xin Qiu
1dd40b429c
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv

* update qwen

* update qwen2
2024-03-26 08:34:00 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Renamed from python/llm/src/bigdl/llm/transformers/models/qwen2.py (Browse further)