ipex-llm

Author	SHA1	Message	Date
Jun Wang	0b953e61ef	[REFINE] graphmode code (#12540 )	2024-12-16 09:17:01 +08:00
binbin Deng	caf15cc5ef	[NPU] Add `IPEX_LLM_NPU_MTL` to enable support on mtl (#12543 )	2024-12-13 17:01:13 +08:00
Yishuo Wang	c090d167dc	remove old rope usage (#12544 )	2024-12-13 16:54:58 +08:00
SONG Ge	5402fc65c8	[Ollama] Update ipex-llm ollama readme to v0.4.6 (#12542 ) * Update ipex-llm ollama readme to v0.4.6	2024-12-13 16:26:12 +08:00
binbin Deng	d20a968ce2	[NPU] Fix generate example (#12541 )	2024-12-13 14:07:24 +08:00
Yishuo Wang	15219944b8	optimize glm edge again (#12539 )	2024-12-13 13:52:39 +08:00
binbin Deng	6596c18489	[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input (#12537 )	2024-12-13 13:49:56 +08:00
Ruonan Wang	7cc01fdc86	[NPU] further fix of `new_value_states` (#12538 )	2024-12-13 13:42:00 +08:00
Heyang Sun	fa261b8af1	torch 2.3 inference docker (#12517 ) * torch 2.3 inference docker * Update README.md * add convert code * rename image * remove 2.1 and add graph example * Update README.md	2024-12-13 10:47:04 +08:00
Yuwen Hu	b747f3f6b8	Small fix to GPU installation guide (#12536 )	2024-12-13 10:02:47 +08:00
binbin Deng	f36c23664f	[NPU] Fix abnormal output with latest driver (#12530 )	2024-12-12 17:56:30 +08:00
Yishuo Wang	ffce86d69f	add basic glm-edge-v support (#12533 )	2024-12-12 17:25:48 +08:00
Yishuo Wang	3e0823d2ae	add basic glm-edge support (#12531 )	2024-12-12 16:02:22 +08:00
Yuwen Hu	dbaf4abcb3	[NPU] Update C++ example with repetition_penalty & update Python code accordingly (#12528 ) * Update c++ npu examples with repetition penalty * Fit python with updated C++ API * Style fix * Small fix * Small fix	2024-12-12 13:42:55 +08:00
Shaojun Liu	2cce89691a	Enable `use_batch_forward` Optimization on Battlemage GPU (#12516 ) * Update get_xpu_device_type() to support bmg * enable use_batch_forward for bmg * Update low_bit_linear.py * Update utils.py * use batch kernel for fp8e5	2024-12-12 12:44:36 +08:00
binbin Deng	6fc27da9c1	[NPU] Update glm-edge support in docs (#12529 )	2024-12-12 11:14:09 +08:00
binbin Deng	509bdb4661	[NPU] Fix minicpm-2B error (#12527 )	2024-12-11 16:49:32 +08:00
Xu, Shuo	fd9cf767ed	All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper failed. (#12526 )	2024-12-11 16:20:55 +08:00
Ruonan Wang	41ef4974ab	[NPU] fix `transpose_value = False` for NPU `optimize_model=True` (#12525 )	2024-12-11 15:51:39 +08:00
Ruonan Wang	588bfa24dc	support hqq (#12518 ) * support * fix	2024-12-11 15:43:02 +08:00
Yuwen Hu	68f2873bd3	[NPU] Support repetition penalty for simple generate, Python (cpp backend) (#12522 ) * Initial support of repetition penalty on NPU (cpp backend) for simple generate * Bug fix for generation config and others * Remove unnecessary print and style fix * Remove unnecessary print * Fix based on comments	2024-12-11 14:55:25 +08:00
Yishuo Wang	77404d2a63	support new model (#12523 )	2024-12-11 13:41:15 +08:00
Wang, Jian4	922958c018	vllm oneccl upgrade to b9 (#12520 )	2024-12-10 15:02:56 +08:00
binbin Deng	ea55235cbd	[NPU] Support glm-edge models (#12511 )	2024-12-09 14:06:27 +08:00
binbin Deng	12c78978dd	[NPU C++] Update example with conversation mode support (#12510 )	2024-12-06 12:46:37 +08:00
Yuwen Hu	0918d3baca	[NPU] Fix hf generate with save/load generation config for Python (cpp backend) (#12509 ) * Fix hf generate with save/load generation config * Small fix * Fix based on comments	2024-12-05 19:19:58 +08:00
Ruonan Wang	49ab8974fa	[NPU] initial support of `asym_int4_rtn` (#12484 ) * initiail support of q4_1 * fix * fix * update * update min to Z1 * update * fix * update * fix style * fix * support qwen2 optimize_model=True mp version * temp save * fix * fix style * replace min with zero * support split linear for q4_1 * fix lm_head with mixed_precision=True * fix style * revert test code * add down proj back for q4_0 * remove print	2024-12-05 17:40:36 +08:00
Yuwen Hu	60bafab855	Small fixes to main readme (#12508 )	2024-12-05 16:08:43 +08:00
Jason Dai	0a3eda06d0	Update README.md (#12507 )	2024-12-05 15:46:53 +08:00
Jinhe	5e1416c9aa	fix readme for npu cpp examples and llama.cpp (#12505 ) * fix cpp readme * fix cpp readme * fix cpp readme	2024-12-05 12:32:42 +08:00
Yuwen Hu	727f29968c	Add NPU demo gif to main readme (#12503 ) * Add NPU demo gif to main readme * Small fix * Update based on comments * Test on style fix	2024-12-05 12:24:27 +08:00
binbin Deng	f56a111aa2	[NPU] Fix load-low-bit benchmark script (#12502 )	2024-12-05 10:01:32 +08:00
Yuwen Hu	84f1c4ad57	Small fix for NPU Python cpp simple generate regarding eos tokens (#12501 )	2024-12-04 18:54:06 +08:00
Kai Huang	d8b14a6305	Update save/load comments (#12500 )	2024-12-04 18:51:38 +08:00
Kai Huang	b89ea1b0cf	Support save/load model for hf generate (#12499 ) * change dummy model * style * meet review	2024-12-04 18:26:39 +08:00
Kai Huang	7d27f134dd	Fix hf generate for llama3.2 (#12497 ) * fix kv condition] * meet review	2024-12-04 17:54:40 +08:00
Chu,Youcheng	ffa9a9e1b3	Update streaming in npu examples (#12495 ) * feat: add streaming * Update readme accordingly --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-04 17:51:10 +08:00
Yishuo Wang	a9e3f7f14c	optimize minicpm (#12496 )	2024-12-04 17:14:16 +08:00
joan726	ae9c2154f4	Added cross-links (#12494 ) * Update install_linux_gpu.zh-CN.md Add the link for guide of windows installation. * Update install_windows_gpu.zh-CN.md Add the link for guide of linux installation. * Update install_windows_gpu.md Add the link for guide of Linux installation. * Update install_linux_gpu.md Add the link for guide of Windows installation. * Update install_linux_gpu.md Modify based on comments. * Update install_windows_gpu.md Modify based on comments	2024-12-04 16:53:13 +08:00
Yishuo Wang	e0bf0054e1	small fix (#12493 )	2024-12-04 16:37:39 +08:00
Kai Huang	7ff4533b39	Support hf generate (#12477 ) * generate * style * update * remove timing * style * style * combine generate api * simple in kwargs	2024-12-04 16:31:09 +08:00
Yuwen Hu	ef4028ac2d	[NPU] Support split `lm_head` for Qwen2 with CPP (#12491 ) * Use split for Qwen2 lm_head instead of slice in optimize_pre * Support split lm_head in Qwen2 python cpp backend * Fit with Python acc lib pipeline * Removed default mixed_precision=True in all-in-one and related examples * Small fix * Style fix * Fix based on comments * Fix based on comments * Stype fix	2024-12-04 14:41:08 +08:00
Yishuo Wang	5629fdd518	optimize qwen2_vl multiple image input or video input (#12487 )	2024-12-04 09:24:38 +08:00
binbin Deng	c59284418c	Hotfix of BCE-Emdedding model (#12490 )	2024-12-03 18:16:04 +08:00
Jason Dai	80f15e41f5	Update README.md (#12489 )	2024-12-03 18:02:28 +08:00
Yuwen Hu	4ac66db034	[NPU] Support streaming in Python (cpp backend) (#12488 ) * Support streaming in NPU Python (cpp backend) * Small fix	2024-12-03 17:17:26 +08:00
Jin, Qiao	7082844f3f	Fix NPU LLM example save/load tokenizer (#12485 )	2024-12-03 16:30:55 +08:00
Jin, Qiao	5fe766788e	Fix MiniCPM-V-2_6 running on NPU (#12486 )	2024-12-03 16:16:29 +08:00
Ruonan Wang	598603bea6	small fix of imatrix (#12480 )	2024-12-03 10:46:36 +08:00
binbin Deng	ab01753b1c	[NPU] update save-load API usage (#12473 )	2024-12-03 09:46:15 +08:00

1 2 3 4 5 ...

3774 commits