Yishuo Wang
abe53eaa4f
optimize qwen1.5/2 memory usage when running long input with fp16 ( #11403 )
2024-06-24 13:43:04 +08:00
Guoqiong Song
7507000ef2
Fix 1383 Llama model on transformers=4.41[WIP] ( #11280 )
2024-06-21 11:24:10 -07:00
Shengsheng Huang
475b0213d2
README update (API doc and FAQ and minor fixes) ( #11397 )
...
* add faq and API doc link in README.md
* add missing quickstart link
* update links in FAQ
* update links in FAQ
* update faq
* update faq text
2024-06-21 19:46:32 +08:00
SONG Ge
0c67639539
Add more examples for pipeline parallel inference ( #11372 )
...
* add more model exampels for pipelien parallel inference
* add mixtral and vicuna models
* add yi model and past_kv supprot for chatglm family
* add docs
* doc update
* add license
* update
2024-06-21 17:55:16 +08:00
Yuwen Hu
2004fe1a43
Small fix ( #11395 )
2024-06-21 17:45:10 +08:00
Yuwen Hu
4cb9a4728e
Add index page for API doc & links update in mddocs ( #11393 )
...
* Small fixes
* Add initial api doc index
* Change index.md -> README.md
* Fix on API links
2024-06-21 17:34:34 +08:00
Xu, Shuo
b200e11e21
Add initial python api doc in mddoc (2/2) ( #11388 )
...
* add PyTorch-API.md
* small change
* small change
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-06-21 17:15:05 +08:00
Yuwen Hu
aafd6d55cd
Add initial python api doc in mddoc (1/2) ( #11389 )
...
* Add initial python api mddoc
* Fix based on comments
2024-06-21 17:14:42 +08:00
Yuwen Hu
a027121530
Small mddoc fixed based on review ( #11391 )
...
* Fix based on review
* Further fix
* Small fix
* Small fix
2024-06-21 17:09:30 +08:00
Shengsheng Huang
072ce7e66d
update README links to mddocs ( #11387 )
...
* update links to mddocs
* update links
* update links in texts
* update table html links
2024-06-21 13:59:27 +08:00
Yuwen Hu
54f9d07d8f
Further mddocs fixes ( #11386 )
...
* Update mddocs for ragflow quickstart
* Fixes for docker guides mddocs
* Further fixes
2024-06-21 13:27:43 +08:00
Xiangyu Tian
b30bf7648e
Fix vLLM CPU api_server params ( #11384 )
2024-06-21 13:00:06 +08:00
ivy-lv11
21fc781fce
Add GLM-4V example ( #11343 )
...
* add example
* modify
* modify
* add line
* add
* add link and replace with phi-3-vision template
* fix generate options
* fix
* fix
---------
Co-authored-by: jinbridge <2635480475@qq.com>
2024-06-21 12:54:31 +08:00
Yuwen Hu
9b475c07db
Add missing ragflow quickstart in mddocs and update legecy contents ( #11385 )
2024-06-21 12:28:26 +08:00
Xu, Shuo
fed79f106b
Update mddocs for DockerGuides ( #11380 )
...
* transfer files in DockerGuides from rst to md
* add some dividing lines
* adjust the title hierarchy in docker_cpp_xpu_quickstart.md
* restore
* switch to the correct branch
* small change
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-06-21 12:10:35 +08:00
SichengStevenLi
1a1a97c9e4
Update mddocs for part of Overview (2/2) and Inference ( #11377 )
...
* updated link
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed
* converted to md format, need to be reviewed, deleted some leftover texts
* converted to md file type, need to be reviewed
* converted to md file type, need to be reviewed
* testing Github Tags
* testing Github Tags
* added Github Tags
* added Github Tags
* added Github Tags
* Small fix
* Small fix
* Small fix
* Small fix
* Small fix
* Further fix
* Fix index
* Small fix
* Fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-06-21 12:07:50 +08:00
Zijie Li
33b9a9c4c9
Update part of Overview guide in mddocs (1/2) ( #11378 )
...
* Create install.md
* Update install_cpu.md
* Delete original docs/mddocs/Overview/install_cpu.md
* Update install_cpu.md
* Update install_gpu.md
* update llm.md and install.md
* Update docs in KeyFeatures
* Review and fix typos
* Fix on folded NOTE
* Small fix
* Small fix
* Remove empty known_issue.md
* Small fix
* Small fix
* Further fix
* Fixes
* Fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-06-21 10:45:17 +08:00
binbin Deng
4ba82191f2
Support PP inference for chatglm3 ( #11375 )
2024-06-21 09:59:01 +08:00
Jin Qiao
9a3a21e4fc
Update part of Quickstart guide in mddocs (2/2) ( #11376 )
...
* axolotl_quickstart.md
* benchmark_quickstart.md
* bigdl_llm_migration.md
* chatchat_quickstart.md
* continue_quickstart.md
* deepspeed_autotp_fastapi_quickstart.md
* dify_quickstart.md
* fastchat_quickstart.md
* adjust tab style
* fix link
* fix link
* add video preview
* Small fixes
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-06-20 19:03:06 +08:00
Yuwen Hu
8c9f877171
Update part of Quickstart guide in mddocs (1/2)
...
* Quickstart index.rst -> index.md
* Update for Linux Install Quickstart
* Update md docs for Windows Install QuickStart
* Small fix
* Add blank lines
* Update mddocs for llama cpp quickstart
* Update mddocs for llama3 llama-cpp and ollama quickstart
* Update mddocs for ollama quickstart
* Update mddocs for openwebui quickstart
* Update mddocs for privateGPT quickstart
* Update mddocs for vllm quickstart
* Small fix
* Update mddocs for text-generation-webui quickstart
* Update for video links
2024-06-20 18:43:23 +08:00
Yishuo Wang
f0fdfa081b
Optimize qwen 1.5 14B batch performance ( #11370 )
2024-06-20 17:23:39 +08:00
Shaojun Liu
5aa3e427a9
Fix docker images ( #11362 )
...
* Fix docker images
* add-apt-repository requires gnupg, gpg-agent, software-properties-common
* update
* avoid importing ipex again
2024-06-20 15:44:55 +08:00
Yuwen Hu
d9dd1b70bd
Remove example page in mddocs ( #11373 )
2024-06-20 14:23:43 +08:00
Wenjing Margaret Mao
c0e86c523a
Add qwen-moe batch1 to nightly perf ( #11369 )
...
* add moe
* reduce 437 models
* rename
* fix syntax
* add moe check result
* add 430 + 437
* all modes
* 4-37-4 exclud
* revert & comment
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-20 14:17:41 +08:00
Yuwen Hu
769728c1eb
Add initial md docs ( #11371 )
2024-06-20 13:47:49 +08:00
Shengsheng Huang
9601fae5d5
fix system note ( #11368 )
2024-06-20 11:09:53 +08:00
Yishuo Wang
a5e7d93242
Add initial save/load low bit support for NPU(now only fp16 is supported) ( #11359 )
2024-06-20 10:49:39 +08:00
Shengsheng Huang
ed4c439497
small fix ( #11366 )
2024-06-20 10:38:20 +08:00
RyuKosei
05a8d051f6
Fix run.py run_ipex_fp16_gpu ( #11361 )
...
* fix a bug on run.py
* Update run.py
fixed the format problem
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2024-06-20 10:29:32 +08:00
Wenjing Margaret Mao
b2f62a8561
Add batch 4 perf test ( #11355 )
...
* copy files to this branch
* add tasks
* comment one model
* change the model to test the 4.36
* only test batch-4
* typo
* typo
* typo
* typo
* typo
* typo
* add 4.37-batch4
* change the file name
* revet yaml file
* no print
* add batch4 task
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-20 09:48:52 +08:00
Shengsheng Huang
a721c1ae43
minor fix of ragflow_quickstart.md ( #11364 )
2024-06-19 22:30:33 +08:00
Shengsheng Huang
13727635e8
revise ragflow quickstart ( #11363 )
...
* revise ragflow quickstart
* update titles and split the quickstart into sections
* update
2024-06-19 22:24:31 +08:00
Zijie Li
5283df0078
LLM: Add RAGFlow with Ollama Example QuickStart ( #11338 )
...
* Create ragflow.md
* Update ragflow.md
* Update ragflow_quickstart
* Update ragflow_quickstart.md
* Upload RAGFlow quickstart without images
* Update ragflow_quickstart.md
* Update ragflow_quickstart.md
* Update ragflow_quickstart.md
* Update ragflow_quickstart.md
* fix typos in readme
* Fix typos in quickstart readme
2024-06-19 20:00:50 +08:00
Zijie Li
ae452688c2
Add NPU HF example ( #11358 )
2024-06-19 18:07:28 +08:00
Qiyuan Gong
1eb884a249
IPEX Duplicate importer V2 ( #11310 )
...
* Add gguf support.
* Avoid error when import ipex-llm for multiple times.
* Add check to avoid duplicate replace and revert.
* Add calling from check to avoid raising exceptions in the submodule.
* Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.
2024-06-19 16:29:19 +08:00
Jason Dai
271d82a4fc
Update readme ( #11357 )
2024-06-19 10:05:42 +08:00
Yishuo Wang
ae7b662ed2
add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support ( #11352 )
2024-06-19 09:14:59 +08:00
Guoqiong Song
c44b1942ed
fix mistral for transformers>=4.39 ( #11191 )
...
* fix mistral for transformers>=4.39
2024-06-18 13:39:35 -07:00
Heyang Sun
67a1e05876
Remove zero3 context manager from LoRA ( #11346 )
2024-06-18 17:24:43 +08:00
Xiangyu Tian
f6cd628cd8
Fix script usage in vLLM CPU Quickstart ( #11353 )
2024-06-18 16:50:48 +08:00
Xiangyu Tian
ef9f740801
Docs: Fix CPU Serving Docker README ( #11351 )
...
Fix CPU Serving Docker README
2024-06-18 16:27:51 +08:00
Guancheng Fu
c9b4cadd81
fix vLLM/docker issues ( #11348 )
...
* fix
* fix
* ffix
2024-06-18 16:23:53 +08:00
Yishuo Wang
83082e5cc7
add initial support for intel npu acceleration library ( #11347 )
2024-06-18 16:07:16 +08:00
Shaojun Liu
694912698e
Upgrade scikit-learn to 1.5.0 to fix dependabot issue ( #11349 )
2024-06-18 15:47:25 +08:00
hxsz1997
44f22cba70
add config and default value ( #11344 )
...
* add config and default value
* add config in taml
* remove lookahead and max_matching_ngram_size in config
* remove streaming and use_fp16_torch_dtype in test yaml
* update task in readme
* update commit of task
2024-06-18 15:28:57 +08:00
Shengsheng Huang
1f39bb84c7
update readthedocs perf data ( #11345 )
2024-06-18 13:23:47 +08:00
Heyang Sun
00f322d8ee
Finetune ChatGLM with Deepspeed Zero3 LoRA ( #11314 )
...
* Fintune ChatGLM with Deepspeed Zero3 LoRA
* add deepspeed zero3 config
* rename config
* remove offload_param
* add save_checkpoint parameter
* Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh
* refine
2024-06-18 12:31:26 +08:00
Yina Chen
5dad33e5af
Support fp8_e4m3 scale search ( #11339 )
...
* fp8e4m3 switch off
* fix style
2024-06-18 11:47:43 +08:00
binbin Deng
e50c890e1f
Support finishing PP inference once eos_token_id is found ( #11336 )
2024-06-18 09:55:40 +08:00
Qiyuan Gong
de4bb97b4f
Remove accelerate 0.23.0 install command in readme and docker ( #11333 )
...
*ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。
2024-06-17 17:52:12 +08:00