ipex-llm

Author	SHA1	Message	Date
Ruonan Wang	78cca0a68c	[NPU] update llm-npu-cli example (#12729 ) * update cli example * add license * rename * update readme sample output	2025-01-22 09:59:27 +08:00
Jason Dai	7e29edcc4b	Update Readme (#12730 )	2025-01-22 08:43:32 +08:00
Yishuo Wang	6789e5d92f	small fix (#12727 )	2025-01-21 17:27:18 +08:00
Jason Dai	412bfd6644	Update readme (#12724 )	2025-01-21 10:59:14 +08:00
Wang, Jian4	716d4fe563	Add vllm 0.6.2 vision offline example (#12721 ) * add vision offline example * add to docker	2025-01-21 09:58:01 +08:00
Yishuo Wang	085974e307	fix nf4 to cpu (#12722 )	2025-01-21 09:23:22 +08:00
Yuwen Hu	9aa4be8ced	Update runtime configuration on MTL (#12720 )	2025-01-20 11:06:37 +08:00
Yishuo Wang	bda87c21eb	add support and optimization for minicpmo audio part (#12716 )	2025-01-16 16:39:00 +08:00
Shaojun Liu	53aae24616	Add note about enabling Resizable BAR in BIOS for GPU setup (#12715 )	2025-01-16 16:22:35 +08:00
Yuwen Hu	534e0e6774	Update dependency for PyTorch 2.6 RC support for woq int4 (#12714 )	2025-01-16 15:51:57 +08:00
Zhao Changmin	54d6328b3c	woq int4 fwd (#12711 )	2025-01-16 15:48:05 +08:00
Yishuo Wang	b62734748f	add support and optimization for minicpmo vision part (#12713 )	2025-01-16 14:51:00 +08:00
Yuwen Hu	c52bdff76b	Update Deepseek coder GPU example (#12712 ) * Update Deepseek coder GPU example * Fix based on comment	2025-01-16 14:05:31 +08:00
Yuwen Hu	9d65dcd7ef	Fix deepseek coder with linear rope type support on GPU (#12709 ) * Fix deepseek coder with linear rope type * Style fix * Move to optimize_pre * Small fix * Small fix * Small fix to not affect other cases * Style fixes * Update function name * Small fix * Small fix * Small fix * Fix for low transformers version first * Style fix * Small fix	2025-01-15 21:12:34 +08:00
binbin Deng	36bf3d8e29	[NPU doc] Update ARL product in QuickStart (#12708 )	2025-01-15 15:57:06 +08:00
Cengguang Zhang	9930351112	LLM: add new qtype woq_int4 to support gemm int4 temporary. (#12706 ) This PR add temporary qtype woq_int4 to avoid affecting other qtype and models. Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2025-01-15 14:41:33 +08:00
Yuwen Hu	6d03d06ebb	Change runtime configurations for perf test on Windows (#12705 ) * Change runtime configurations for perf test on Windows * Small fix	2025-01-14 17:54:57 +08:00
Xu, Shuo	350fae285d	Add Qwen2-VL HF GPU example with ModelScope Support (#12606 ) * Add qwen2-vl example * complete generate.py & readme * improve lint style * update 1-6 * update main readme * Format and other small fixes --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2025-01-13 15:42:04 +08:00
Yuwen Hu	a1da7908b9	Fix name device is not found bug (#12703 )	2025-01-13 10:11:02 +08:00
SONG Ge	e2d58f733e	Update ollama v0.5.1 document (#12699 ) * Update ollama document version and known issue	2025-01-10 18:04:49 +08:00
Yishuo Wang	db9db51e2c	fix lnl perf (#12700 )	2025-01-10 18:00:58 +08:00
Yuwen Hu	4bf93c66e8	Support install from source for PyTorch 2.6 RC in UT (#12697 ) * Support install from source for PyTorch 2.6 RC in UT * Remove expecttest	2025-01-10 16:44:18 +08:00
binbin Deng	da8bcb7db1	[NPU ] fix load logic of glm-edge models (#12698 )	2025-01-10 16:08:37 +08:00
joan726	584c1c5373	Update B580 CN doc (#12695 )	2025-01-10 11:20:47 +08:00
Jason Dai	cbb8e2a2d5	Update documents (#12693 )	2025-01-10 10:47:11 +08:00
Yishuo Wang	f8dc408888	fix user issue (#12692 )	2025-01-10 10:18:47 +08:00
Yishuo Wang	68857494a5	refactor to simplify following upgrade 2 (#12685 )	2025-01-10 09:29:03 +08:00
Shaojun Liu	2673792de6	Update Dockerfile (#12688 )	2025-01-10 09:01:29 +08:00
Jason Dai	f9b29a4f56	Update B580 doc (#12691 )	2025-01-10 08:59:35 +08:00
joan726	66d4385cc9	Update B580 CN Doc (#12686 )	2025-01-09 19:10:57 +08:00
Yuwen Hu	c24741584d	Support PyTorch 2.6 RC perf test on Windows (#12683 )	2025-01-09 18:17:23 +08:00
Yishuo Wang	7234c9b27b	update quantize kv cache condition (#12681 )	2025-01-09 15:23:04 +08:00
Yuwen Hu	5d8081afbc	Remove dummy model from performance tests (#12682 )	2025-01-09 14:50:17 +08:00
Yishuo Wang	1ec40cd09e	refactor to simplify following upgrade (#12680 )	2025-01-09 13:34:30 +08:00
Jason Dai	aa9e70a347	Update B580 Doc (#12678 )	2025-01-08 22:36:48 +08:00
Jason Dai	c6f57ad6ed	Update README.md (#12677 )	2025-01-08 21:55:52 +08:00
Jason Dai	2321e8d60c	Update README.md (#12676 )	2025-01-08 21:54:31 +08:00
Yishuo Wang	5c24276fc4	fix custom kernel registration (#12674 )	2025-01-08 17:39:17 +08:00
Yishuo Wang	a22a8c21bb	small fix and remove ununsed code about ipex (#12671 )	2025-01-08 17:39:04 +08:00
Yishuo Wang	c11f5f0fcd	also convert SdpaAttention in optimize_model (#12673 )	2025-01-08 16:48:03 +08:00
Shaojun Liu	2c23ce2553	Create a BattleMage QuickStart (#12663 ) * Create bmg_quickstart.md * Update bmg_quickstart.md * Clarify IPEX-LLM package installation based on use case * Update bmg_quickstart.md * Update bmg_quickstart.md	2025-01-08 14:58:37 +08:00
Yishuo Wang	7dd156d292	small fix and add comment (#12670 )	2025-01-08 10:56:50 +08:00
Yishuo Wang	ccf618ff4a	Remove all ipex usage (#12666 )	2025-01-08 10:31:18 +08:00
logicat	0534d7254f	Update docker_cpp_xpu_quickstart.md (#12667 )	2025-01-08 09:56:56 +08:00
Yuwen Hu	5db6f9dcde	Add option with PyTorch 2.6 RC version for testing purposes (#12668 ) * Add option with PyTorch 2.6 RC version for testing purposes * Small update	2025-01-07 18:28:55 +08:00
Yishuo Wang	f9ee7898c8	fix onednn dependency bug (#12665 )	2025-01-07 16:26:56 +08:00
Yishuo Wang	29ad5c449e	refactor codegeex to remove ipex kernel usage (#12664 )	2025-01-07 16:17:40 +08:00
Yuwen Hu	525b0ee991	[NPU] Tiny fixes on examples (#12661 )	2025-01-07 14:30:38 +08:00
Yuwen Hu	ebdf19fa7e	[NPU] Further fix saving of generation config (#12657 ) * Further fix saving of generation config * Fix based on comments * Small fix	2025-01-07 13:53:54 +08:00
Yuwen Hu	381d448ee2	[NPU] Example & Quickstart updates (#12650 ) * Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix	2025-01-07 13:52:41 +08:00

1 2 3 4 5 ...

3898 commits