ipex-llm

Author	SHA1	Message	Date
Qiyuan Gong	492ed3fd41	Add verified models to GPU finetune README (#11088 ) * Add verified models to GPU finetune README	2024-05-21 15:49:15 +08:00
Qiyuan Gong	1210491748	ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example (#11078 ) * Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example * Remove unnecessary tokenization setting.	2024-05-21 15:29:43 +08:00
ZehuaCao	842d6dfc2d	Further Modify CPU example (#11081 ) * modify CPU example * update	2024-05-21 13:55:47 +08:00
Yishuo Wang	d830a63bb7	refactor qwen (#11074 )	2024-05-20 18:08:37 +08:00
Wang, Jian4	74950a152a	Fix tgi_api_server error file name (#11075 )	2024-05-20 16:48:40 +08:00
Yishuo Wang	4e97047d70	fix baichuan2 13b fp16 (#11071 )	2024-05-20 11:21:20 +08:00
binbin Deng	7170dd9192	Update guide for running qwen with AutoTP (#11065 )	2024-05-20 10:53:17 +08:00
Wang, Jian4	a2e1578fd9	Merge tgi_api_server to main (#11036 ) * init * fix style * speculative can not use benchmark * add tgi server readme	2024-05-20 09:15:03 +08:00
Yishuo Wang	31ce3e0c13	refactor baichuan2-13b (#11064 )	2024-05-17 16:25:30 +08:00
ZehuaCao	56cb992497	LLM: Modify CPU Installation Command for most examples (#11049 ) * init * refine * refine * refine * modify hf-agent example * modify all CPU model example * remove readthedoc modify * replace powershell with cmd * fix repo * fix repo * update * remove comment on windows code block * update * update * update * update --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-05-17 15:52:20 +08:00
Ruonan Wang	f1156e6b20	support gguf_q4k_m / gguf_q4k_s (#10887 ) * initial commit * UPDATE * fix style * fix style * add gguf_q4k_s * update comment * fix	2024-05-17 14:30:09 +08:00
Yishuo Wang	981d668be6	refactor baichuan2-7b (#11062 )	2024-05-17 13:01:34 +08:00
Xiangyu Tian	d963e95363	LLM: Modify CPU Installation Command for documentation (#11042 ) * init * refine * refine * refine * refine comments	2024-05-17 10:14:00 +08:00
Ruonan Wang	3a72e5df8c	disable mlp fusion of fp6 on mtl (#11059 )	2024-05-17 10:10:16 +08:00
SONG Ge	192ae35012	Add support for llama2 quantize_kv with transformers 4.38.0 (#11054 ) * add support for llama2 quantize_kv with transformers 4.38.0 * fix code style * fix code style	2024-05-16 22:23:39 +08:00
SONG Ge	16b2a418be	hotfix native_sdp ut (#11046 ) * hotfix native_sdp * update	2024-05-16 17:15:37 +08:00
Xin Qiu	6be70283b7	fix chatglm run error (#11045 ) * fix chatglm * update * fix style	2024-05-16 15:39:18 +08:00
Yishuo Wang	8cae897643	use new rope in phi3 (#11047 )	2024-05-16 15:12:35 +08:00
Jin Qiao	9a96af4232	Remove oneAPI pip install command in related examples (#11030 ) * Remove pip install command in windows installation guide * fix chatglm3 installation guide * Fix gemma cpu example * Apply on other examples * fix	2024-05-16 10:46:29 +08:00
Xiangyu Tian	612a365479	LLM: Install CPU version torch with extras [all] (#10868 ) Modify setup.py to install CPU version torch with extras [all]	2024-05-16 10:39:55 +08:00
Yishuo Wang	59df750326	Use new sdp again (#11025 )	2024-05-16 09:33:34 +08:00
SONG Ge	9942a4ba69	[WIP] Support llama2 with transformers==4.38.0 (#11024 ) * support llama2 with transformers==4.38.0 * add supprot for quantize_qkv * add original support for 4.38.0 now * code style fix	2024-05-15 18:07:00 +08:00
Yina Chen	686f6038a8	Support fp6 save & load (#11034 )	2024-05-15 17:52:02 +08:00
Ruonan Wang	ac384e0f45	add fp6 mlp fusion (#11032 ) * add fp6 fusion * add qkv fusion for fp6 * remove qkv first	2024-05-15 17:42:50 +08:00
Wang, Jian4	2084ebe4ee	Enable fastchat benchmark latency (#11017 ) * enable fastchat benchmark * add readme * update readme * update	2024-05-15 14:52:09 +08:00
hxsz1997	93d40ab127	Update lookahead strategy (#11021 ) * update lookahead strategy * remove lines * fix python style check	2024-05-15 14:48:05 +08:00
Wang, Jian4	d9f71f1f53	Update benchmark util for example using (#11027 ) * mv benchmark_util.py to utils/ * remove * update	2024-05-15 14:16:35 +08:00
binbin Deng	4053a6ef94	Update environment variable setting in AutoTP with arc (#11018 )	2024-05-15 10:23:58 +08:00
Yishuo Wang	fad1dbaf60	use sdp fp8 causal kernel (#11023 )	2024-05-15 10:22:35 +08:00
Yishuo Wang	ee325e9cc9	fix phi3 (#11022 )	2024-05-15 09:32:12 +08:00
Ziteng Zhang	7d3791c819	[LLM] Add llama3 alpaca qlora example (#11011 ) * Add llama3 finetune example based on alpaca qlora example	2024-05-15 09:17:32 +08:00
Zhao Changmin	0a732bebe7	Add phi3 cached RotaryEmbedding (#11013 ) * phi3cachedrotaryembed * pep8	2024-05-15 08:16:43 +08:00
Yina Chen	893197434d	Add fp6 support on gpu (#11008 ) * add fp6 support * fix style	2024-05-14 16:31:44 +08:00
Zhao Changmin	b03c859278	Add phi3RMS (#10988 ) * phi3RMS	2024-05-14 15:16:27 +08:00
Yishuo Wang	170e3d65e0	use new sdp and fp32 sdp (#11007 )	2024-05-14 14:29:18 +08:00
Qiyuan Gong	c957ea3831	Add axolotl main support and axolotl Llama-3-8B QLoRA example (#10984 ) * Support axolotl main (796a085). * Add axolotl Llama-3-8B QLoRA example. * Change `sequence_len` to 256 for alpaca, and revert `lora_r` value. * Add example to quick_start.	2024-05-14 13:43:59 +08:00
Yuwen Hu	fb656fbf74	Add requirements for oneAPI pypi packages for windows Intel GPU users (#11009 )	2024-05-14 13:40:54 +08:00
Shaojun Liu	7f8c5b410b	Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) (#10970 ) * add entrypoint.sh * add quickstart * remove entrypoint * update * Install related library of benchmarking * update * print out results * update docs * minor update * update * update quickstart * update * update * update * update * update * update * add chat & example section * add more details * minor update * rename quickstart * update * minor update * update * update config.yaml * update readme * use --gpu * add tips * minor update * update	2024-05-14 12:58:31 +08:00
Guancheng Fu	a465111cf4	Update README.md (#11003 )	2024-05-13 16:44:48 +08:00
Guancheng Fu	74997a3ed1	Adding load_low_bit interface for ipex_llm_worker (#11000 ) * initial implementation, need tests * fix * fix baichuan issue * fix typo	2024-05-13 15:30:19 +08:00
Yishuo Wang	1b3c7a6928	remove phi3 empty cache (#10997 )	2024-05-13 14:09:55 +08:00
ZehuaCao	99255fe36e	fix ppl (#10996 )	2024-05-13 13:57:19 +08:00
Kai Huang	f8dd2e52ad	Fix Langchain upstream ut (#10985 ) * Fix Langchain upstream ut * Small fix * Install bigdl-llm * Update run-langchain-upstream-tests.sh * Update run-langchain-upstream-tests.sh * Update llm_unit_tests.yml * Update run-langchain-upstream-tests.sh * Update llm_unit_tests.yml * Update run-langchain-upstream-tests.sh * fix git checkout * fix --------- Co-authored-by: Zhangky11 <2321096202@qq.com> Co-authored-by: Keyan (Kyrie) Zhang <79576162+Zhangky11@users.noreply.github.com>	2024-05-11 14:40:37 +08:00
Yuwen Hu	9f6358e4c2	Deprecate support for pytorch 2.0 on Linux for `ipex-llm >= 2.1.0b20240511` (#10986 ) * Remove xpu_2.0 option in setup.py * Disable xpu_2.0 test in UT and nightly * Update docs for deprecated pytorch 2.0 * Small doc update	2024-05-11 12:33:35 +08:00
Yishuo Wang	ad96f32ce0	optimize phi3 1st token performance (#10981 )	2024-05-10 17:33:46 +08:00
Cengguang Zhang	cfed76b2ed	LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937 ) * LLM: add split tensor support for baichuan2-7b and qwen1.5-7b. * fix style. * fix style. * fix style. * add support for mistral and fix condition threshold. * fix style. * fix comments.	2024-05-10 16:40:15 +08:00
binbin Deng	f9615f12d1	Add driver related packages version check in env script (#10977 )	2024-05-10 15:02:58 +08:00
Kai Huang	a6342cc068	Empty cache after phi first attention to support 4k input (#10972 ) * empty cache * fix style	2024-05-09 19:50:04 +08:00
Yishuo Wang	e753125880	use fp16_sdp when head_dim=96 (#10976 )	2024-05-09 17:02:59 +08:00
Yishuo Wang	697ca79eca	use quantize kv and sdp in phi3-mini (#10973 )	2024-05-09 15:16:18 +08:00

1 2 3 4 5 ...

1339 commits