ipex-llm

Author	SHA1	Message	Date
Xu, Shuo	b0338c5529	Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 (#12583 ) * Add --modelscope option for glm-v4 and MiniCPM-V-2_6 * glm-edge * minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2 --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-20 13:54:17 +08:00
Yishuo Wang	f3b5fad3be	refactor qwen2 and llama3 (#12587 )	2024-12-20 13:25:25 +08:00
Shaojun Liu	51ff9ebd8a	Upgrade oneccl version to 0.0.6.3 (#12560 ) * Update Dockerfile * Update Dockerfile * Update start-vllm-service.sh	2024-12-20 09:29:16 +08:00
Xu, Shuo	47da3c999f	Add `--modelscope` in GPU examples for minicpm, minicpm3, baichuan2 (#12564 ) * Add --modelscope for more models * minicpm --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-19 17:25:46 +08:00
Yishuo Wang	3eeb02f1be	support Megrez-3B-Omni (#12582 )	2024-12-19 17:23:01 +08:00
binbin Deng	4e7e988f70	[NPU] Fix MTL and ARL support (#12580 )	2024-12-19 16:55:30 +08:00
Yishuo Wang	80f2fdc37b	optimize new minicpm model (#12579 )	2024-12-19 14:22:47 +08:00
Yishuo Wang	4540424271	optimize siglip attention again (#12578 )	2024-12-19 13:40:48 +08:00
Yishuo Wang	e0921f80c1	padding mask on torch side (#12577 )	2024-12-19 10:53:02 +08:00
Xu, Shuo	47e90a362f	Add `--modelscope` in GPU examples for glm4, codegeex2, qwen2 and qwen2.5 (#12561 ) * Add --modelscope for more models * imporve readme --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-19 10:00:39 +08:00
SONG Ge	28e81fda8e	Replace runner doc in ollama quickstart (#12575 )	2024-12-18 19:05:28 +08:00
SONG Ge	f7a2bd21cf	Update ollama and llama.cpp readme (#12574 )	2024-12-18 17:33:20 +08:00
Yishuo Wang	e2ae42929a	small fix (#12573 )	2024-12-18 15:48:22 +08:00
Yishuo Wang	a4eb561f36	optimize siglip attention on arc (#12569 )	2024-12-18 14:19:43 +08:00
Zijie Li	1a2ab12876	[NPU] support asym_int4 for minicpm (#12567 )	2024-12-18 10:55:35 +08:00
Jason Dai	6e801bc4e1	Update readme (#12565 )	2024-12-18 09:33:16 +08:00
Yuwen Hu	6278cafc25	Add `setuptools` as a basic dependency (#12563 ) * Add setuptools as a basic dependency * Remove unnecessary requirements of setuptools in example/unit/nightly tests	2024-12-17 16:56:41 +08:00
binbin Deng	694d14b2b4	[NPU doc] Add ARL runtime configuration (#12562 )	2024-12-17 16:08:42 +08:00
Shaojun Liu	429bf1ffeb	Change: Use cn mirror for PyTorch extension installation to resolve network issues (#12559 ) * Update Dockerfile * Update Dockerfile * Update Dockerfile	2024-12-17 14:22:50 +08:00
Zijie Li	fcb474820d	[NPU] support asym_int4 for llama (#12556 ) * add llama-imatrix * fix bugs in llama.py * style fix	2024-12-17 14:01:17 +08:00
Yuwen Hu	d127a8654c	Small typo fixes (#12558 )	2024-12-17 13:54:13 +08:00
Yishuo Wang	a608f26cc8	use new fused layer norm (#12553 )	2024-12-17 13:52:35 +08:00
binbin Deng	680ea7e4a8	[NPU doc] Update configuration for different platforms (#12554 )	2024-12-17 10:15:09 +08:00
Xu, Shuo	ccc18eefb5	Add Modelscope option for chatglm3 on GPU (#12545 ) * Add Modelscope option for GPU model chatglm3 * Update readme * Update readme * Update readme * Update readme * format update --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-16 20:00:37 +08:00
Yishuo Wang	5ae0006103	remove old rope usage (#12552 )	2024-12-16 15:59:36 +08:00
Chu,Youcheng	a86487c539	Add GLM-Edge GPU example (#12483 ) * feat: initial commit * generate.py and README updates * Update link for main readme * Update based on comments * Small fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-16 14:39:19 +08:00
Jun Wang	0b953e61ef	[REFINE] graphmode code (#12540 )	2024-12-16 09:17:01 +08:00
binbin Deng	caf15cc5ef	[NPU] Add `IPEX_LLM_NPU_MTL` to enable support on mtl (#12543 )	2024-12-13 17:01:13 +08:00
Yishuo Wang	c090d167dc	remove old rope usage (#12544 )	2024-12-13 16:54:58 +08:00
SONG Ge	5402fc65c8	[Ollama] Update ipex-llm ollama readme to v0.4.6 (#12542 ) * Update ipex-llm ollama readme to v0.4.6	2024-12-13 16:26:12 +08:00
binbin Deng	d20a968ce2	[NPU] Fix generate example (#12541 )	2024-12-13 14:07:24 +08:00
Yishuo Wang	15219944b8	optimize glm edge again (#12539 )	2024-12-13 13:52:39 +08:00
binbin Deng	6596c18489	[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input (#12537 )	2024-12-13 13:49:56 +08:00
Ruonan Wang	7cc01fdc86	[NPU] further fix of `new_value_states` (#12538 )	2024-12-13 13:42:00 +08:00
Heyang Sun	fa261b8af1	torch 2.3 inference docker (#12517 ) * torch 2.3 inference docker * Update README.md * add convert code * rename image * remove 2.1 and add graph example * Update README.md	2024-12-13 10:47:04 +08:00
Yuwen Hu	b747f3f6b8	Small fix to GPU installation guide (#12536 )	2024-12-13 10:02:47 +08:00
binbin Deng	f36c23664f	[NPU] Fix abnormal output with latest driver (#12530 )	2024-12-12 17:56:30 +08:00
Yishuo Wang	ffce86d69f	add basic glm-edge-v support (#12533 )	2024-12-12 17:25:48 +08:00
Yishuo Wang	3e0823d2ae	add basic glm-edge support (#12531 )	2024-12-12 16:02:22 +08:00
Yuwen Hu	dbaf4abcb3	[NPU] Update C++ example with repetition_penalty & update Python code accordingly (#12528 ) * Update c++ npu examples with repetition penalty * Fit python with updated C++ API * Style fix * Small fix * Small fix	2024-12-12 13:42:55 +08:00
Shaojun Liu	2cce89691a	Enable `use_batch_forward` Optimization on Battlemage GPU (#12516 ) * Update get_xpu_device_type() to support bmg * enable use_batch_forward for bmg * Update low_bit_linear.py * Update utils.py * use batch kernel for fp8e5	2024-12-12 12:44:36 +08:00
binbin Deng	6fc27da9c1	[NPU] Update glm-edge support in docs (#12529 )	2024-12-12 11:14:09 +08:00
binbin Deng	509bdb4661	[NPU] Fix minicpm-2B error (#12527 )	2024-12-11 16:49:32 +08:00
Xu, Shuo	fd9cf767ed	All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper failed. (#12526 )	2024-12-11 16:20:55 +08:00
Ruonan Wang	41ef4974ab	[NPU] fix `transpose_value = False` for NPU `optimize_model=True` (#12525 )	2024-12-11 15:51:39 +08:00
Ruonan Wang	588bfa24dc	support hqq (#12518 ) * support * fix	2024-12-11 15:43:02 +08:00
Yuwen Hu	68f2873bd3	[NPU] Support repetition penalty for simple generate, Python (cpp backend) (#12522 ) * Initial support of repetition penalty on NPU (cpp backend) for simple generate * Bug fix for generation config and others * Remove unnecessary print and style fix * Remove unnecessary print * Fix based on comments	2024-12-11 14:55:25 +08:00
Yishuo Wang	77404d2a63	support new model (#12523 )	2024-12-11 13:41:15 +08:00
Wang, Jian4	922958c018	vllm oneccl upgrade to b9 (#12520 )	2024-12-10 15:02:56 +08:00
binbin Deng	ea55235cbd	[NPU] Support glm-edge models (#12511 )	2024-12-09 14:06:27 +08:00

1 2 3 4 5 ...

3800 commits