ipex-llm

Author	SHA1	Message	Date
Yuwen Hu	d11f257ee7	Add GPU example for MiniCPM-o-2_6 (#12735 ) * Add init example for omni mode * Small fix * Small fix * Add chat example * Remove lagecy link * Further update link * Add readme * Small fix * Update main readme link * Update based on comments * Small fix * Small fix * Small fix	2025-01-23 16:10:19 +08:00
Yuwen Hu	dcca522618	Remove sdpa available patch (#12734 )	2025-01-22 17:22:28 +08:00
Xiangyu Tian	c9b6c94a59	vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728 ) Update vLLM-cpu to v0.6.6-post1	2025-01-22 15:03:01 +08:00
Ruonan Wang	78cca0a68c	[NPU] update llm-npu-cli example (#12729 ) * update cli example * add license * rename * update readme sample output	2025-01-22 09:59:27 +08:00
Yishuo Wang	6789e5d92f	small fix (#12727 )	2025-01-21 17:27:18 +08:00
Yishuo Wang	085974e307	fix nf4 to cpu (#12722 )	2025-01-21 09:23:22 +08:00
Yuwen Hu	9aa4be8ced	Update runtime configuration on MTL (#12720 )	2025-01-20 11:06:37 +08:00
Yishuo Wang	bda87c21eb	add support and optimization for minicpmo audio part (#12716 )	2025-01-16 16:39:00 +08:00
Yuwen Hu	534e0e6774	Update dependency for PyTorch 2.6 RC support for woq int4 (#12714 )	2025-01-16 15:51:57 +08:00
Zhao Changmin	54d6328b3c	woq int4 fwd (#12711 )	2025-01-16 15:48:05 +08:00
Yishuo Wang	b62734748f	add support and optimization for minicpmo vision part (#12713 )	2025-01-16 14:51:00 +08:00
Yuwen Hu	c52bdff76b	Update Deepseek coder GPU example (#12712 ) * Update Deepseek coder GPU example * Fix based on comment	2025-01-16 14:05:31 +08:00
Yuwen Hu	9d65dcd7ef	Fix deepseek coder with linear rope type support on GPU (#12709 ) * Fix deepseek coder with linear rope type * Style fix * Move to optimize_pre * Small fix * Small fix * Small fix to not affect other cases * Style fixes * Update function name * Small fix * Small fix * Small fix * Fix for low transformers version first * Style fix * Small fix	2025-01-15 21:12:34 +08:00
Cengguang Zhang	9930351112	LLM: add new qtype woq_int4 to support gemm int4 temporary. (#12706 ) This PR add temporary qtype woq_int4 to avoid affecting other qtype and models. Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2025-01-15 14:41:33 +08:00
Xu, Shuo	350fae285d	Add Qwen2-VL HF GPU example with ModelScope Support (#12606 ) * Add qwen2-vl example * complete generate.py & readme * improve lint style * update 1-6 * update main readme * Format and other small fixes --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2025-01-13 15:42:04 +08:00
Yuwen Hu	a1da7908b9	Fix name device is not found bug (#12703 )	2025-01-13 10:11:02 +08:00
Yishuo Wang	db9db51e2c	fix lnl perf (#12700 )	2025-01-10 18:00:58 +08:00
binbin Deng	da8bcb7db1	[NPU ] fix load logic of glm-edge models (#12698 )	2025-01-10 16:08:37 +08:00
Yishuo Wang	f8dc408888	fix user issue (#12692 )	2025-01-10 10:18:47 +08:00
Yishuo Wang	68857494a5	refactor to simplify following upgrade 2 (#12685 )	2025-01-10 09:29:03 +08:00
Yishuo Wang	7234c9b27b	update quantize kv cache condition (#12681 )	2025-01-09 15:23:04 +08:00
Yuwen Hu	5d8081afbc	Remove dummy model from performance tests (#12682 )	2025-01-09 14:50:17 +08:00
Yishuo Wang	1ec40cd09e	refactor to simplify following upgrade (#12680 )	2025-01-09 13:34:30 +08:00
Yishuo Wang	5c24276fc4	fix custom kernel registration (#12674 )	2025-01-08 17:39:17 +08:00
Yishuo Wang	a22a8c21bb	small fix and remove ununsed code about ipex (#12671 )	2025-01-08 17:39:04 +08:00
Yishuo Wang	c11f5f0fcd	also convert SdpaAttention in optimize_model (#12673 )	2025-01-08 16:48:03 +08:00
Yishuo Wang	7dd156d292	small fix and add comment (#12670 )	2025-01-08 10:56:50 +08:00
Yishuo Wang	ccf618ff4a	Remove all ipex usage (#12666 )	2025-01-08 10:31:18 +08:00
Yuwen Hu	5db6f9dcde	Add option with PyTorch 2.6 RC version for testing purposes (#12668 ) * Add option with PyTorch 2.6 RC version for testing purposes * Small update	2025-01-07 18:28:55 +08:00
Yishuo Wang	f9ee7898c8	fix onednn dependency bug (#12665 )	2025-01-07 16:26:56 +08:00
Yishuo Wang	29ad5c449e	refactor codegeex to remove ipex kernel usage (#12664 )	2025-01-07 16:17:40 +08:00
Yuwen Hu	525b0ee991	[NPU] Tiny fixes on examples (#12661 )	2025-01-07 14:30:38 +08:00
Yuwen Hu	ebdf19fa7e	[NPU] Further fix saving of generation config (#12657 ) * Further fix saving of generation config * Fix based on comments * Small fix	2025-01-07 13:53:54 +08:00
Yuwen Hu	381d448ee2	[NPU] Example & Quickstart updates (#12650 ) * Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix	2025-01-07 13:52:41 +08:00
Yishuo Wang	ddc0ef3993	refactor device check and remove cohere/mixtral support (#12659 )	2025-01-07 11:15:51 +08:00
Yishuo Wang	ea65e4fecc	remove falcon support and related UT (#12656 )	2025-01-07 09:26:00 +08:00
Yina Chen	fae73eee79	[NPU] Support save npu quantized model without npu dependency (#12647 ) * support save awq * load quantized model & save npu compiled model * fix style * update * fix dll load issue * update error message * fix style	2025-01-06 18:06:22 +08:00
Yishuo Wang	502461d836	remove unnecessary ipex kernel usage (#12649 )	2025-01-03 16:45:24 +08:00
Yishuo Wang	9f8b134889	add ipex-llm custom kernel registration (#12648 )	2025-01-03 16:45:04 +08:00
binbin Deng	0b377100c5	Add guide for save-load usage (#12498 )	2025-01-03 16:30:15 +08:00
Wang, Jian4	6711a48a36	Enable internvl2-8b on vllm(#12645 )	2025-01-03 14:49:36 +08:00
Zijie Li	8fd2dcba86	Add benchmark_util for `transformers >= 4.47.0` (#12644 )	2025-01-03 10:48:29 +08:00
Yina Chen	8e5328e9b4	add disable opts for awq (#12641 )	2025-01-02 15:45:22 +08:00
Xu, Shuo	62318964fa	Update llama example information (#12640 ) Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2025-01-02 13:48:39 +08:00
Yishuo Wang	81211fd010	remove unused code (#12635 )	2025-01-02 13:31:09 +08:00
binbin Deng	534566e290	[NPU] Support minicpm-v with python cpp backend (#12637 )	2025-01-02 11:13:15 +08:00
Yishuo Wang	f289f68d57	small fix (#12634 )	2024-12-30 17:14:25 +08:00
Yishuo Wang	2d08155513	remove bmm, which is only required in ipex 2.0 (#12630 )	2024-12-27 17:28:57 +08:00
binbin Deng	f17ccfa61a	[NPU] Fix save-load usage of minicpm models (#12628 )	2024-12-27 15:56:46 +08:00
Yishuo Wang	c72a5db757	remove unused code again (#12624 )	2024-12-27 14:17:11 +08:00
binbin Deng	46eeab4479	[NPU] Fix regression caused by layer_norm change (#12627 )	2024-12-27 14:08:49 +08:00
Ruonan Wang	90f6709486	[remove pipeline examples (#12626 )	2024-12-27 13:42:28 +08:00
Zijie Li	5f04ed7254	NPU] Update prompt format for baichuan2-pipeline (#12625 )	2024-12-27 11:30:54 +08:00
Yishuo Wang	34dbdb8ee3	small fix (#12623 )	2024-12-27 10:19:27 +08:00
Xu, Shuo	55ce091242	Add GLM4-Edge-V GPU example (#12596 ) * Add GLM4-Edge-V examples * polish readme * revert wrong changes * polish readme * polish readme * little polish in reference info and indent * Small fix and sample output updates * Update main readme --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-27 09:40:29 +08:00
binbin Deng	796ee571a5	[NPU doc] Update verified platforms (#12621 )	2024-12-26 17:39:13 +08:00
Ruonan Wang	bbdbbb0d88	[NPU] Compatible with other third-party models like auto-round (#12620 ) * support third party model * simplify code * fix sty;e * fix sym int4 GW * code refactor * fix	2024-12-26 17:25:18 +08:00
Yishuo Wang	a9abde0b5d	support passing attn_scale to sdpa (#12619 )	2024-12-26 16:58:09 +08:00
Shaojun Liu	40a7d2b4f0	Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments (#12618 ) * run c-eval on multi-GPUs * Update README.md	2024-12-26 15:23:32 +08:00
Zijie Li	ccc4055058	[NPU] Update prompt format for baichuan2 (#12615 ) * Update baichuan2.py * style fix	2024-12-26 11:41:37 +08:00
Yishuo Wang	1604b4ead8	small fix (#12616 )	2024-12-26 11:35:12 +08:00
Ruonan Wang	d841e1dc0d	[NPU] update convert script based on latest usage (#12617 )	2024-12-26 11:23:04 +08:00
Xu, Shuo	ef585d3360	Polish Readme for ModelScope-related examples (#12603 )	2024-12-26 10:52:47 +08:00
Yishuo Wang	a596f1ae5f	remove bigdl-llm test to fix langchain UT (#12613 )	2024-12-26 10:17:25 +08:00
Ruonan Wang	9e895f04ec	[NPU] fix npu save (#12614 ) * fix npu save * update	2024-12-26 09:21:16 +08:00
Yishuo Wang	6249c1e373	rewrite llama optimization (#12609 )	2024-12-25 17:04:32 +08:00
Yishuo Wang	5f5ac8a856	fix llama related import (#12611 )	2024-12-25 16:23:52 +08:00
Yishuo Wang	4e6b9d804f	add compresskv back for mistral (#12607 ) * add compresskv back for mistral * fix * fix	2024-12-25 11:06:08 +08:00
Yishuo Wang	4135b895b3	refactor chatglm2, internlm, stablelm and qwen (#12604 )	2024-12-24 18:18:00 +08:00
Yishuo Wang	073f936c37	refactor mistral and phi3 (#12605 )	2024-12-24 17:52:32 +08:00
binbin Deng	45f8f72a28	[NPU] Fix minicpm on MTL (#12599 )	2024-12-24 15:37:56 +08:00
Yishuo Wang	ad2dc965c5	refactor mllama, gpt2 and internvl (#12602 )	2024-12-24 14:18:31 +08:00
Yishuo Wang	7aaf02f602	refactor baichuan, glm4 and minicpm3 (#12600 )	2024-12-24 14:16:30 +08:00
Zijie Li	c410d9cf73	[NPU] support asym_int4 for baichuan (#12576 ) * add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py	2024-12-24 09:17:50 +08:00
Yishuo Wang	098eb335b2	refactor sd 1.5 and qwen2-vl and fix (#12590 )	2024-12-20 17:34:55 +08:00
Yishuo Wang	b050368efc	refactor yuan2 and starcoder2 and fix (#12589 )	2024-12-20 16:41:50 +08:00
Yishuo Wang	6ea8033635	refactor glm edge (#12588 )	2024-12-20 15:36:57 +08:00
Xu, Shuo	b0338c5529	Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 (#12583 ) * Add --modelscope option for glm-v4 and MiniCPM-V-2_6 * glm-edge * minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2 --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-20 13:54:17 +08:00
Yishuo Wang	f3b5fad3be	refactor qwen2 and llama3 (#12587 )	2024-12-20 13:25:25 +08:00
Xu, Shuo	47da3c999f	Add `--modelscope` in GPU examples for minicpm, minicpm3, baichuan2 (#12564 ) * Add --modelscope for more models * minicpm --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-19 17:25:46 +08:00
Yishuo Wang	3eeb02f1be	support Megrez-3B-Omni (#12582 )	2024-12-19 17:23:01 +08:00
binbin Deng	4e7e988f70	[NPU] Fix MTL and ARL support (#12580 )	2024-12-19 16:55:30 +08:00
Yishuo Wang	80f2fdc37b	optimize new minicpm model (#12579 )	2024-12-19 14:22:47 +08:00
Yishuo Wang	4540424271	optimize siglip attention again (#12578 )	2024-12-19 13:40:48 +08:00
Yishuo Wang	e0921f80c1	padding mask on torch side (#12577 )	2024-12-19 10:53:02 +08:00
Xu, Shuo	47e90a362f	Add `--modelscope` in GPU examples for glm4, codegeex2, qwen2 and qwen2.5 (#12561 ) * Add --modelscope for more models * imporve readme --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-19 10:00:39 +08:00
Yishuo Wang	e2ae42929a	small fix (#12573 )	2024-12-18 15:48:22 +08:00
Yishuo Wang	a4eb561f36	optimize siglip attention on arc (#12569 )	2024-12-18 14:19:43 +08:00
Zijie Li	1a2ab12876	[NPU] support asym_int4 for minicpm (#12567 )	2024-12-18 10:55:35 +08:00
Yuwen Hu	6278cafc25	Add `setuptools` as a basic dependency (#12563 ) * Add setuptools as a basic dependency * Remove unnecessary requirements of setuptools in example/unit/nightly tests	2024-12-17 16:56:41 +08:00
Zijie Li	fcb474820d	[NPU] support asym_int4 for llama (#12556 ) * add llama-imatrix * fix bugs in llama.py * style fix	2024-12-17 14:01:17 +08:00
Yishuo Wang	a608f26cc8	use new fused layer norm (#12553 )	2024-12-17 13:52:35 +08:00
binbin Deng	680ea7e4a8	[NPU doc] Update configuration for different platforms (#12554 )	2024-12-17 10:15:09 +08:00
Xu, Shuo	ccc18eefb5	Add Modelscope option for chatglm3 on GPU (#12545 ) * Add Modelscope option for GPU model chatglm3 * Update readme * Update readme * Update readme * Update readme * format update --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-16 20:00:37 +08:00
Yishuo Wang	5ae0006103	remove old rope usage (#12552 )	2024-12-16 15:59:36 +08:00
Chu,Youcheng	a86487c539	Add GLM-Edge GPU example (#12483 ) * feat: initial commit * generate.py and README updates * Update link for main readme * Update based on comments * Small fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-16 14:39:19 +08:00
Jun Wang	0b953e61ef	[REFINE] graphmode code (#12540 )	2024-12-16 09:17:01 +08:00
binbin Deng	caf15cc5ef	[NPU] Add `IPEX_LLM_NPU_MTL` to enable support on mtl (#12543 )	2024-12-13 17:01:13 +08:00
Yishuo Wang	c090d167dc	remove old rope usage (#12544 )	2024-12-13 16:54:58 +08:00
binbin Deng	d20a968ce2	[NPU] Fix generate example (#12541 )	2024-12-13 14:07:24 +08:00
Yishuo Wang	15219944b8	optimize glm edge again (#12539 )	2024-12-13 13:52:39 +08:00
binbin Deng	6596c18489	[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input (#12537 )	2024-12-13 13:49:56 +08:00
Ruonan Wang	7cc01fdc86	[NPU] further fix of `new_value_states` (#12538 )	2024-12-13 13:42:00 +08:00
Heyang Sun	fa261b8af1	torch 2.3 inference docker (#12517 ) * torch 2.3 inference docker * Update README.md * add convert code * rename image * remove 2.1 and add graph example * Update README.md	2024-12-13 10:47:04 +08:00
binbin Deng	f36c23664f	[NPU] Fix abnormal output with latest driver (#12530 )	2024-12-12 17:56:30 +08:00
Yishuo Wang	ffce86d69f	add basic glm-edge-v support (#12533 )	2024-12-12 17:25:48 +08:00
Yishuo Wang	3e0823d2ae	add basic glm-edge support (#12531 )	2024-12-12 16:02:22 +08:00
Yuwen Hu	dbaf4abcb3	[NPU] Update C++ example with repetition_penalty & update Python code accordingly (#12528 ) * Update c++ npu examples with repetition penalty * Fit python with updated C++ API * Style fix * Small fix * Small fix	2024-12-12 13:42:55 +08:00
Shaojun Liu	2cce89691a	Enable `use_batch_forward` Optimization on Battlemage GPU (#12516 ) * Update get_xpu_device_type() to support bmg * enable use_batch_forward for bmg * Update low_bit_linear.py * Update utils.py * use batch kernel for fp8e5	2024-12-12 12:44:36 +08:00
binbin Deng	6fc27da9c1	[NPU] Update glm-edge support in docs (#12529 )	2024-12-12 11:14:09 +08:00
binbin Deng	509bdb4661	[NPU] Fix minicpm-2B error (#12527 )	2024-12-11 16:49:32 +08:00
Xu, Shuo	fd9cf767ed	All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper failed. (#12526 )	2024-12-11 16:20:55 +08:00
Ruonan Wang	41ef4974ab	[NPU] fix `transpose_value = False` for NPU `optimize_model=True` (#12525 )	2024-12-11 15:51:39 +08:00
Ruonan Wang	588bfa24dc	support hqq (#12518 ) * support * fix	2024-12-11 15:43:02 +08:00
Yuwen Hu	68f2873bd3	[NPU] Support repetition penalty for simple generate, Python (cpp backend) (#12522 ) * Initial support of repetition penalty on NPU (cpp backend) for simple generate * Bug fix for generation config and others * Remove unnecessary print and style fix * Remove unnecessary print * Fix based on comments	2024-12-11 14:55:25 +08:00
Yishuo Wang	77404d2a63	support new model (#12523 )	2024-12-11 13:41:15 +08:00
binbin Deng	ea55235cbd	[NPU] Support glm-edge models (#12511 )	2024-12-09 14:06:27 +08:00
binbin Deng	12c78978dd	[NPU C++] Update example with conversation mode support (#12510 )	2024-12-06 12:46:37 +08:00
Yuwen Hu	0918d3baca	[NPU] Fix hf generate with save/load generation config for Python (cpp backend) (#12509 ) * Fix hf generate with save/load generation config * Small fix * Fix based on comments	2024-12-05 19:19:58 +08:00
Ruonan Wang	49ab8974fa	[NPU] initial support of `asym_int4_rtn` (#12484 ) * initiail support of q4_1 * fix * fix * update * update min to Z1 * update * fix * update * fix style * fix * support qwen2 optimize_model=True mp version * temp save * fix * fix style * replace min with zero * support split linear for q4_1 * fix lm_head with mixed_precision=True * fix style * revert test code * add down proj back for q4_0 * remove print	2024-12-05 17:40:36 +08:00
Jinhe	5e1416c9aa	fix readme for npu cpp examples and llama.cpp (#12505 ) * fix cpp readme * fix cpp readme * fix cpp readme	2024-12-05 12:32:42 +08:00
binbin Deng	f56a111aa2	[NPU] Fix load-low-bit benchmark script (#12502 )	2024-12-05 10:01:32 +08:00
Yuwen Hu	84f1c4ad57	Small fix for NPU Python cpp simple generate regarding eos tokens (#12501 )	2024-12-04 18:54:06 +08:00
Kai Huang	d8b14a6305	Update save/load comments (#12500 )	2024-12-04 18:51:38 +08:00
Kai Huang	b89ea1b0cf	Support save/load model for hf generate (#12499 ) * change dummy model * style * meet review	2024-12-04 18:26:39 +08:00
Kai Huang	7d27f134dd	Fix hf generate for llama3.2 (#12497 ) * fix kv condition] * meet review	2024-12-04 17:54:40 +08:00
Chu,Youcheng	ffa9a9e1b3	Update streaming in npu examples (#12495 ) * feat: add streaming * Update readme accordingly --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-04 17:51:10 +08:00
Yishuo Wang	a9e3f7f14c	optimize minicpm (#12496 )	2024-12-04 17:14:16 +08:00
Yishuo Wang	e0bf0054e1	small fix (#12493 )	2024-12-04 16:37:39 +08:00
Kai Huang	7ff4533b39	Support hf generate (#12477 ) * generate * style * update * remove timing * style * style * combine generate api * simple in kwargs	2024-12-04 16:31:09 +08:00
Yuwen Hu	ef4028ac2d	[NPU] Support split `lm_head` for Qwen2 with CPP (#12491 ) * Use split for Qwen2 lm_head instead of slice in optimize_pre * Support split lm_head in Qwen2 python cpp backend * Fit with Python acc lib pipeline * Removed default mixed_precision=True in all-in-one and related examples * Small fix * Style fix * Fix based on comments * Fix based on comments * Stype fix	2024-12-04 14:41:08 +08:00
Yishuo Wang	5629fdd518	optimize qwen2_vl multiple image input or video input (#12487 )	2024-12-04 09:24:38 +08:00
binbin Deng	c59284418c	Hotfix of BCE-Emdedding model (#12490 )	2024-12-03 18:16:04 +08:00
Yuwen Hu	4ac66db034	[NPU] Support streaming in Python (cpp backend) (#12488 ) * Support streaming in NPU Python (cpp backend) * Small fix	2024-12-03 17:17:26 +08:00
Jin, Qiao	7082844f3f	Fix NPU LLM example save/load tokenizer (#12485 )	2024-12-03 16:30:55 +08:00
Jin, Qiao	5fe766788e	Fix MiniCPM-V-2_6 running on NPU (#12486 )	2024-12-03 16:16:29 +08:00
Ruonan Wang	598603bea6	small fix of imatrix (#12480 )	2024-12-03 10:46:36 +08:00
binbin Deng	ab01753b1c	[NPU] update save-load API usage (#12473 )	2024-12-03 09:46:15 +08:00
Yuwen Hu	26adb82ee3	[NPU] Remove hard code (#12479 )	2024-12-02 18:26:07 +08:00
Yuwen Hu	b2e56a2e03	Add release support for option `xpu_arc` (#12422 ) * Add release support for xpu-arc * Dependency update	2024-12-02 17:16:04 +08:00
Yuwen Hu	aee9acb303	Add NPU QuickStart & update example links (#12470 ) * Add initial NPU quickstart (c++ part unfinished) * Small update * Update based on comments * Update main readme * Remove LLaMA description * Small fix * Small fix * Remove subsection link in main README * Small fix * Update based on comments * Small fix * TOC update and other small fixes * Update for Chinese main readme * Update based on comments and other small fixes * Change order	2024-12-02 17:03:10 +08:00
Jin, Qiao	31c69a8d31	Fix MiniCPM-V models running on NPU (#12478 )	2024-12-02 16:29:46 +08:00
binbin Deng	54d9a590d4	[NPU]Fix eos_token setting (#12475 )	2024-12-02 14:18:22 +08:00
Guancheng Fu	59bd4a214f	add vLLM glm4 fix (#12474 )	2024-12-02 14:05:16 +08:00
Ruonan Wang	4b6c3160be	Support imatrix-guided quantization for NPU CW (#12468 ) * init commit * remove print * add interface * fix * fix * fix style	2024-12-02 11:31:26 +08:00
binbin Deng	f99f188023	Hotfix of benchmark script (#12467 )	2024-11-29 14:00:59 +08:00
binbin Deng	c911026f03	[NPU C++] Update model support & examples & benchmark (#12466 )	2024-11-29 13:35:58 +08:00
binbin Deng	14d8d3d8af	Integrate NPU C++ imple into ipex-llm (#12461 )	2024-11-29 09:25:37 +08:00
Ruonan Wang	490bb0ca53	[NPU] update fused layers for GW (#12459 ) * update fused layers for GW * fix * fix llama condition for glm model * update	2024-11-28 17:14:30 +08:00
Yina Chen	1b533a105c	[NPU] Add env to enable scale search (#12462 ) * add env enable scale search * address comment * move logic	2024-11-28 17:06:00 +08:00

1 2 3 4 5 ...

2276 commits