ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	1a8bab172e	add minicpm 1B/2B npu support (#11507 )	2024-07-04 16:31:04 +08:00
Yishuo Wang	bb0a84044b	add qwen2 npu support (#11504 )	2024-07-04 11:01:25 +08:00
Shaojun Liu	932ef78131	Update Workflow Inputs, Runner, and PR Validation Process (#11501 ) * update check-artifact runner label to Shire * update github.event.inputs to inputs * update PR template	2024-07-03 16:49:54 +08:00
Xin Qiu	f84ca99b9f	optimize gemma2 rmsnorm (#11500 )	2024-07-03 15:21:03 +08:00
Wang, Jian4	61c36ba085	Add pp_serving verified models (#11498 ) * add verified models * update * verify large model * update commend	2024-07-03 14:57:09 +08:00
binbin Deng	9274282ef7	Support pipeline parallel for glm-4-9b-chat (#11463 )	2024-07-03 14:25:28 +08:00
Shaojun Liu	e7ab93b55c	Update pull_request_template.md (#11484 ) * Update pull_request_template.md * refine	2024-07-03 11:13:16 +08:00
Yishuo Wang	d97c2664ce	use new fuse rope in stablelm family (#11497 )	2024-07-03 11:08:26 +08:00
Jun Wang	18c973dc3e	Wang jun/ipex llm workflow (#11499 ) * [update] merge manually build for testing function to manualy build * [FIX] change public type to string * [FIX] change public type to string * [FIX] remove github.event prefix for inputs	2024-07-03 10:13:42 +08:00
Yuwen Hu	e53bd4401c	Small typo fixes in binary build workflow (#11494 )	2024-07-02 19:11:43 +08:00
Yuwen Hu	4e32c92979	Further fix for triggering perf test from commit (#11493 ) * Further fix for triggering perf test from commit * Small fix	2024-07-02 18:56:53 +08:00
Xu, Shuo	52519e07df	remove models we no longer need in benchmark. (#11492 ) Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-07-02 17:20:48 +08:00
Zhao Changmin	6a0134a9b2	support q4_0_rtn (#11477 ) * q4_0_rtn	2024-07-02 16:57:02 +08:00
Jun Wang	6352c718f3	[update] merge manually build for testing function to manualy build (#11491 )	2024-07-02 16:28:15 +08:00
Yishuo Wang	5e967205ac	remove the code converts input to fp16 before calling batch forward kernel (#11489 )	2024-07-02 16:23:53 +08:00
Yuwen Hu	1638573f56	Update llama cpp quickstart regarding windows prerequisites to avoid misleading (#11490 )	2024-07-02 16:15:47 +08:00
Yuwen Hu	986b10e397	Further fix for performance tests triggered by pr (#11488 )	2024-07-02 15:29:42 +08:00
Yuwen Hu	bb6953c19e	Support pr validate perf test (#11486 ) * Support triggering performance tests through commits * Small fix * Small fix * Small fixes	2024-07-02 15:20:42 +08:00
Wang, Jian4	4390e7dc49	Fix codegeex2 transformers version (#11487 )	2024-07-02 15:09:28 +08:00
Guancheng Fu	4fbb0d33ae	Pin compute runtime version for xpu images (#11479 ) * pin compute runtime version * fix done	2024-07-01 21:41:02 +08:00
Shaojun Liu	a1164e45b6	Enable Release Pypi workflow to be called in another repo (#11483 )	2024-07-01 19:48:21 +08:00
Yuwen Hu	fb4774b076	Update pull request template for manually-ttriggered Unit tests (#11482 )	2024-07-01 19:06:29 +08:00
Yuwen Hu	ca24794dd0	Fixes for performance test triggering (#11481 )	2024-07-01 18:39:54 +08:00
Yuwen Hu	6bdc562f4c	Enable triggering nightly tests/performance tests from another repo (#11480 ) * Enable triggering from another workflow for nightly tests and example tests * Enable triggering from another workflow for nightly performance tests	2024-07-01 17:45:42 +08:00
Yishuo Wang	ec3a912ab6	optimize npu llama long context performance (#11478 )	2024-07-01 16:49:23 +08:00
Heyang Sun	913e750b01	fix non-string deepseed config path bug (#11476 ) * fix non-string deepseed config path bug * Update lora_finetune_chatglm.py	2024-07-01 15:53:50 +08:00
binbin Deng	48ad482d3d	Fix import error caused by pydantic on cpu (#11474 )	2024-07-01 15:49:49 +08:00
Yuwen Hu	dbba51f455	Enable LLM UT workflow to be called in another repo (#11475 ) * Enable LLM UT workflow to be called in another repo * Small fixes * Small fix	2024-07-01 15:26:17 +08:00
Yishuo Wang	39bcb33a67	add sdp support for stablelm 3b (#11473 )	2024-07-01 14:56:15 +08:00
Zhao Changmin	cf8eb7b128	Init NPU quantize method and support q8_0_rtn (#11452 ) * q8_0_rtn * fix float point	2024-07-01 13:45:07 +08:00
Yishuo Wang	319a3b36b2	fix npu llama2 (#11471 )	2024-07-01 10:14:11 +08:00
Heyang Sun	07362ffffc	ChatGLM3-6B LoRA Fine-tuning Demo (#11450 ) * ChatGLM3-6B LoRA Fine-tuning Demo * refine * refine * add 2-card deepspeed * refine format * add mpi4py and deepspeed install	2024-07-01 09:18:39 +08:00
Wang, Jian4	e000ac90c4	Add pp_serving example to serving image (#11433 ) * init pp * update * update * no clone ipex-llm again	2024-06-28 16:45:25 +08:00
Xiangyu Tian	fd933c92d8	Fix: Correct num_requests in benchmark for Pipeline Parallel Serving (#11462 )	2024-06-28 16:10:51 +08:00
Wang, Jian4	b7bc1023fb	Add vllm_online_benchmark.py (#11458 ) * init * update and add * update	2024-06-28 14:59:06 +08:00
SichengStevenLi	86b81c09d9	Table of Contents in Quickstart Files (#11437 ) * fixed a minor grammar mistake * added table of contents * added table of contents * changed table of contents indexing * added table of contents * added table of contents, changed grammar * added table of contents * added table of contents * added table of contents * added table of contents * added table of contents * added table of contents, modified chapter numbering * fixed troubleshooting section redirection path * added table of contents * added table of contents, modified section numbering * added table of contents, modified section numbering * added table of contents * added table of contents, changed title size, modified numbering * added table of contents, changed section title size and capitalization * added table of contents, modified section numbering * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents syntax * changed table of contents capitalization issue * changed table of contents capitalization issue * changed table of contents location * changed table of contents * changed table of contents * changed section capitalization * removed comments * removed comments * removed comments	2024-06-28 10:41:00 +08:00
SONG Ge	a414e3ff8a	add pipeline parallel support with load_low_bit (#11414 )	2024-06-28 10:17:56 +08:00
Cengguang Zhang	d0b801d7bc	LLM: change write mode in all-in-one benchmark. (#11444 ) * LLM: change write mode in all-in-one benchmark. * update output style.	2024-06-27 19:36:38 +08:00
binbin Deng	987017ef47	Update pipeline parallel serving for more model support (#11428 )	2024-06-27 18:21:01 +08:00
Yishuo Wang	029ff15d28	optimize npu llama2 first token performance (#11451 )	2024-06-27 17:37:33 +08:00
Qiyuan Gong	4e4ecd5095	Control sys.modules ipex duplicate check with BIGDL_CHECK_DUPLICATE_IMPORT (#11453 ) * Control sys.modules ipex duplicate check with BIGDL_CHECK_DUPLICATE_IMPORT。	2024-06-27 17:21:45 +08:00
Yishuo Wang	c6e5ad668d	fix internlm xcomposser meta-instruction typo (#11448 )	2024-06-27 15:29:43 +08:00
Yishuo Wang	f89ca23748	optimize npu llama2 perf again (#11445 )	2024-06-27 15:13:42 +08:00
Shaojun Liu	13f59ae6b4	Fix llm binary build linux-build-avxvnni failure (#11447 ) * skip gpg check failure * skip gpg check	2024-06-27 14:12:14 +08:00
Yishuo Wang	cf0f5c4322	change npu document (#11446 )	2024-06-27 13:59:59 +08:00
binbin Deng	508c364a79	Add precision option in PP inference examples (#11440 )	2024-06-27 09:24:27 +08:00
Jason Dai	e9e8f9b4d4	Update Readme (#11441 )	2024-06-26 19:48:07 +08:00
Jason Dai	2939f1ac60	Update README.md (#11439 )	2024-06-26 19:25:58 +08:00
Yishuo Wang	2a0f8087e3	optimize qwen2 gpu memory usage again (#11435 )	2024-06-26 16:52:29 +08:00
Shaojun Liu	ab9f7f3ac5	FIX: Qwen1.5-GPTQ-Int4 inference error (#11432 ) * merge_qkv if quant_method is 'gptq' * fix python style checks * refactor * update GPU example	2024-06-26 15:36:22 +08:00

1 2 3 4 5 ...

3104 commits