ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	abe53eaa4f	optimize qwen1.5/2 memory usage when running long input with fp16 (#11403 )	2024-06-24 13:43:04 +08:00
Guoqiong Song	7507000ef2	Fix 1383 Llama model on transformers=4.41[WIP] (#11280 )	2024-06-21 11:24:10 -07:00
Shengsheng Huang	475b0213d2	README update (API doc and FAQ and minor fixes) (#11397 ) * add faq and API doc link in README.md * add missing quickstart link * update links in FAQ * update links in FAQ * update faq * update faq text	2024-06-21 19:46:32 +08:00
SONG Ge	0c67639539	Add more examples for pipeline parallel inference (#11372 ) * add more model exampels for pipelien parallel inference * add mixtral and vicuna models * add yi model and past_kv supprot for chatglm family * add docs * doc update * add license * update	2024-06-21 17:55:16 +08:00
Yuwen Hu	2004fe1a43	Small fix (#11395 )	2024-06-21 17:45:10 +08:00
Yuwen Hu	4cb9a4728e	Add index page for API doc & links update in mddocs (#11393 ) * Small fixes * Add initial api doc index * Change index.md -> README.md * Fix on API links	2024-06-21 17:34:34 +08:00
Xu, Shuo	b200e11e21	Add initial python api doc in mddoc (2/2) (#11388 ) * add PyTorch-API.md * small change * small change --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-06-21 17:15:05 +08:00
Yuwen Hu	aafd6d55cd	Add initial python api doc in mddoc (1/2) (#11389 ) * Add initial python api mddoc * Fix based on comments	2024-06-21 17:14:42 +08:00
Yuwen Hu	a027121530	Small mddoc fixed based on review (#11391 ) * Fix based on review * Further fix * Small fix * Small fix	2024-06-21 17:09:30 +08:00
Shengsheng Huang	072ce7e66d	update README links to mddocs (#11387 ) * update links to mddocs * update links * update links in texts * update table html links	2024-06-21 13:59:27 +08:00
Yuwen Hu	54f9d07d8f	Further mddocs fixes (#11386 ) * Update mddocs for ragflow quickstart * Fixes for docker guides mddocs * Further fixes	2024-06-21 13:27:43 +08:00
Xiangyu Tian	b30bf7648e	Fix vLLM CPU api_server params (#11384 )	2024-06-21 13:00:06 +08:00
ivy-lv11	21fc781fce	Add GLM-4V example (#11343 ) * add example * modify * modify * add line * add * add link and replace with phi-3-vision template * fix generate options * fix * fix --------- Co-authored-by: jinbridge <2635480475@qq.com>	2024-06-21 12:54:31 +08:00
Yuwen Hu	9b475c07db	Add missing ragflow quickstart in mddocs and update legecy contents (#11385 )	2024-06-21 12:28:26 +08:00
Xu, Shuo	fed79f106b	Update mddocs for DockerGuides (#11380 ) * transfer files in DockerGuides from rst to md * add some dividing lines * adjust the title hierarchy in docker_cpp_xpu_quickstart.md * restore * switch to the correct branch * small change --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-06-21 12:10:35 +08:00
SichengStevenLi	1a1a97c9e4	Update mddocs for part of Overview (2/2) and Inference (#11377 ) * updated link * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed * converted to md format, need to be reviewed, deleted some leftover texts * converted to md file type, need to be reviewed * converted to md file type, need to be reviewed * testing Github Tags * testing Github Tags * added Github Tags * added Github Tags * added Github Tags * Small fix * Small fix * Small fix * Small fix * Small fix * Further fix * Fix index * Small fix * Fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-06-21 12:07:50 +08:00
Zijie Li	33b9a9c4c9	Update part of Overview guide in mddocs (1/2) (#11378 ) * Create install.md * Update install_cpu.md * Delete original docs/mddocs/Overview/install_cpu.md * Update install_cpu.md * Update install_gpu.md * update llm.md and install.md * Update docs in KeyFeatures * Review and fix typos * Fix on folded NOTE * Small fix * Small fix * Remove empty known_issue.md * Small fix * Small fix * Further fix * Fixes * Fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-06-21 10:45:17 +08:00
binbin Deng	4ba82191f2	Support PP inference for chatglm3 (#11375 )	2024-06-21 09:59:01 +08:00
Jin Qiao	9a3a21e4fc	Update part of Quickstart guide in mddocs (2/2) (#11376 ) * axolotl_quickstart.md * benchmark_quickstart.md * bigdl_llm_migration.md * chatchat_quickstart.md * continue_quickstart.md * deepspeed_autotp_fastapi_quickstart.md * dify_quickstart.md * fastchat_quickstart.md * adjust tab style * fix link * fix link * add video preview * Small fixes * Small fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-06-20 19:03:06 +08:00
Yuwen Hu	8c9f877171	Update part of Quickstart guide in mddocs (1/2) * Quickstart index.rst -> index.md * Update for Linux Install Quickstart * Update md docs for Windows Install QuickStart * Small fix * Add blank lines * Update mddocs for llama cpp quickstart * Update mddocs for llama3 llama-cpp and ollama quickstart * Update mddocs for ollama quickstart * Update mddocs for openwebui quickstart * Update mddocs for privateGPT quickstart * Update mddocs for vllm quickstart * Small fix * Update mddocs for text-generation-webui quickstart * Update for video links	2024-06-20 18:43:23 +08:00
Yishuo Wang	f0fdfa081b	Optimize qwen 1.5 14B batch performance (#11370 )	2024-06-20 17:23:39 +08:00
Shaojun Liu	5aa3e427a9	Fix docker images (#11362 ) * Fix docker images * add-apt-repository requires gnupg, gpg-agent, software-properties-common * update * avoid importing ipex again	2024-06-20 15:44:55 +08:00
Yuwen Hu	d9dd1b70bd	Remove example page in mddocs (#11373 )	2024-06-20 14:23:43 +08:00
Wenjing Margaret Mao	c0e86c523a	Add qwen-moe batch1 to nightly perf (#11369 ) * add moe * reduce 437 models * rename * fix syntax * add moe check result * add 430 + 437 * all modes * 4-37-4 exclud * revert & comment --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-20 14:17:41 +08:00
Yuwen Hu	769728c1eb	Add initial md docs (#11371 )	2024-06-20 13:47:49 +08:00
Shengsheng Huang	9601fae5d5	fix system note (#11368 )	2024-06-20 11:09:53 +08:00
Yishuo Wang	a5e7d93242	Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359 )	2024-06-20 10:49:39 +08:00
Shengsheng Huang	ed4c439497	small fix (#11366 )	2024-06-20 10:38:20 +08:00
RyuKosei	05a8d051f6	Fix run.py run_ipex_fp16_gpu (#11361 ) * fix a bug on run.py * Update run.py fixed the format problem --------- Co-authored-by: sgwhat <ge.song@intel.com>	2024-06-20 10:29:32 +08:00
Wenjing Margaret Mao	b2f62a8561	Add batch 4 perf test (#11355 ) * copy files to this branch * add tasks * comment one model * change the model to test the 4.36 * only test batch-4 * typo * typo * typo * typo * typo * typo * add 4.37-batch4 * change the file name * revet yaml file * no print * add batch4 task * revert --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-20 09:48:52 +08:00
Shengsheng Huang	a721c1ae43	minor fix of ragflow_quickstart.md (#11364 )	2024-06-19 22:30:33 +08:00
Shengsheng Huang	13727635e8	revise ragflow quickstart (#11363 ) * revise ragflow quickstart * update titles and split the quickstart into sections * update	2024-06-19 22:24:31 +08:00
Zijie Li	5283df0078	LLM: Add RAGFlow with Ollama Example QuickStart (#11338 ) * Create ragflow.md * Update ragflow.md * Update ragflow_quickstart * Update ragflow_quickstart.md * Upload RAGFlow quickstart without images * Update ragflow_quickstart.md * Update ragflow_quickstart.md * Update ragflow_quickstart.md * Update ragflow_quickstart.md * fix typos in readme * Fix typos in quickstart readme	2024-06-19 20:00:50 +08:00
Zijie Li	ae452688c2	Add NPU HF example (#11358 )	2024-06-19 18:07:28 +08:00
Qiyuan Gong	1eb884a249	IPEX Duplicate importer V2 (#11310 ) * Add gguf support. * Avoid error when import ipex-llm for multiple times. * Add check to avoid duplicate replace and revert. * Add calling from check to avoid raising exceptions in the submodule. * Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.	2024-06-19 16:29:19 +08:00
Jason Dai	271d82a4fc	Update readme (#11357 )	2024-06-19 10:05:42 +08:00
Yishuo Wang	ae7b662ed2	add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support (#11352 )	2024-06-19 09:14:59 +08:00
Guoqiong Song	c44b1942ed	fix mistral for transformers>=4.39 (#11191 ) * fix mistral for transformers>=4.39	2024-06-18 13:39:35 -07:00
Heyang Sun	67a1e05876	Remove zero3 context manager from LoRA (#11346 )	2024-06-18 17:24:43 +08:00
Xiangyu Tian	f6cd628cd8	Fix script usage in vLLM CPU Quickstart (#11353 )	2024-06-18 16:50:48 +08:00
Xiangyu Tian	ef9f740801	Docs: Fix CPU Serving Docker README (#11351 ) Fix CPU Serving Docker README	2024-06-18 16:27:51 +08:00
Guancheng Fu	c9b4cadd81	fix vLLM/docker issues (#11348 ) * fix * fix * ffix	2024-06-18 16:23:53 +08:00
Yishuo Wang	83082e5cc7	add initial support for intel npu acceleration library (#11347 )	2024-06-18 16:07:16 +08:00
Shaojun Liu	694912698e	Upgrade scikit-learn to 1.5.0 to fix dependabot issue (#11349 )	2024-06-18 15:47:25 +08:00
hxsz1997	44f22cba70	add config and default value (#11344 ) * add config and default value * add config in taml * remove lookahead and max_matching_ngram_size in config * remove streaming and use_fp16_torch_dtype in test yaml * update task in readme * update commit of task	2024-06-18 15:28:57 +08:00
Shengsheng Huang	1f39bb84c7	update readthedocs perf data (#11345 )	2024-06-18 13:23:47 +08:00
Heyang Sun	00f322d8ee	Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314 ) * Fintune ChatGLM with Deepspeed Zero3 LoRA * add deepspeed zero3 config * rename config * remove offload_param * add save_checkpoint parameter * Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh * refine	2024-06-18 12:31:26 +08:00
Yina Chen	5dad33e5af	Support fp8_e4m3 scale search (#11339 ) * fp8e4m3 switch off * fix style	2024-06-18 11:47:43 +08:00
binbin Deng	e50c890e1f	Support finishing PP inference once `eos_token_id` is found (#11336 )	2024-06-18 09:55:40 +08:00
Qiyuan Gong	de4bb97b4f	Remove accelerate 0.23.0 install command in readme and docker (#11333 ) *ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。	2024-06-17 17:52:12 +08:00

1 2 3 4 5 ...

3138 commits