ipex-llm

Author	SHA1	Message	Date
Chen, Zhentao	f36d7b2d59	Fix harness stuck (#9435 ) * remove env to avoid being stuck * use small model for test	2023-11-13 15:29:53 +08:00
Yuwen Hu	4faf5af8f1	[LLM] Add perf test for core on Windows (#9397 ) * temporary stop other perf test * Add framework for core performance test with one test model * Small fix and add platform control * Comment out lp for now * Add missing ymal file * Small fix * Fix sed contents * Small fix * Small path fixes * Small fix * Add update to ftp * Small upload fix * add chatglm3-6b * LLM: add model names * Keep repo id same as ftp and temporary make baichuan2 first priority * change order * Remove temp if false and separate pr and nightly results * Small fix --------- Co-authored-by: jinbridge <2635480475@qq.com>	2023-11-13 13:58:40 +08:00
Lilac09	5d4ec44488	Add all-in-one benchmark into inference-cpu docker image (#9433 ) * add all-in-one into inference-cpu image * manually_build * revise files	2023-11-13 13:07:56 +08:00
Zheng, Yi	9b5d0e9c75	Add examples for Yi-6B (#9421 )	2023-11-13 10:53:15 +08:00
SONG Ge	2888818b3a	[LLM] Support mixed_fp8 on Arc (#9415 ) * ut gpu allocation memory fix * support mix_8bit on arc * rename mixed_4bit to mixed_fp4 and mixed_8bit to mixed_fp8 * revert unexpected changes * revert unexpected changes * unify common logits * rename in llm xmx_checker * fix typo error and re-unify	2023-11-13 09:26:30 +08:00
Wang, Jian4	ac7fbe77e2	Update qlora readme (#9416 )	2023-11-12 19:29:29 +08:00
Yining Wang	d7334513e1	codeshell: fix wrong links (#9417 )	2023-11-12 19:22:33 +08:00
WeiguangHan	2cfef5ef1e	LLM: store the nightly test and pr results separately (#9404 ) * LLM: store the csv results separately * modify the trigger files of LLM Performance Test	2023-11-11 06:35:27 +08:00
Zheng, Yi	0674146cfb	Add cpu and gpu examples of distil-whisper (#9374 ) * Add distil-whisper examples * Fixes based on comments * Minor fixes --------- Co-authored-by: Ariadne330 <wyn2000330@126.com>	2023-11-10 16:09:55 +08:00
Ziteng Zhang	ad81b5d838	Update qlora README.md (#9422 )	2023-11-10 15:19:25 +08:00
Heyang Sun	b23b91407c	fix llm-init on deepspeed missing lib (#9419 )	2023-11-10 13:51:24 +08:00
SONG Ge	dfb00e37e9	[LLM] Add model correctness test on ARC for llama and falcon (#9347 ) * add correctness test on arc for llama model * modify layer name * add falcon ut * refactor and add ut for falcon model * modify lambda positions and update docs * replace loading pre input with last decodelayer output * switch lower bound to single model instead of using the common one * make the code implementation simple * fix gpu action allocation memory issue	2023-11-10 13:48:57 +08:00
Yuwen Hu	3d107f6d25	[LLM] Separate windows build UT and build runner (#9403 ) * Separate windows build UT and build runner * Small fix	2023-11-09 18:47:38 +08:00
dingbaorong	36fbe2144d	Add CPU examples of fuyu (#9393 ) * add fuyu cpu examples * add gpu example * add comments * add license * remove gpu example * fix inference time	2023-11-09 15:29:19 +08:00
Heyang Sun	df8e4d7889	[LLM] apply allreduce and bias to training in LowBitLinear (#9395 )	2023-11-09 14:35:54 +08:00
Wang, Jian4	40cead6b5b	LLM: Fix CPU qlora dtype convert issue (#9394 )	2023-11-09 14:34:01 +08:00
WeiguangHan	34449cb4bb	LLM: add remaining models to the arc perf test (#9384 ) * add remaining models * modify the filepath which stores the test result on ftp server * resolve some comments	2023-11-09 14:28:42 +08:00
Yuwen Hu	d4b248fcd4	Add windows binary build label AVX_VNNI (#9387 )	2023-11-08 18:13:35 +08:00
Ruonan Wang	bfca76dfa7	LLM: optimize QLoRA by updating lora convert logic (#9372 ) * update convert logic of qlora * update * refactor and further improve performance * fix style * meet code review	2023-11-08 17:46:49 +08:00
binbin Deng	54d95e4907	LLM: add alpaca qlora finetuning example (#9276 )	2023-11-08 16:25:17 +08:00
binbin Deng	97316bbb66	LLM: highlight transformers version requirement in mistral examples (#9380 )	2023-11-08 16:05:03 +08:00
Ruonan Wang	7e8fb29b7c	LLM: optimize QLoRA by reducing convert time (#9370 )	2023-11-08 13:14:34 +08:00
Chen, Zhentao	298b64217e	add auto triggered acc test (#9364 ) * add auto triggered acc test * use llama 7b instead * fix env * debug download * fix download prefix * add cut dirs * fix env of model path * fix dataset download * full job * source xpu env vars * use matrix to trigger model run * reset batch=1 * remove redirect * remove some trigger * add task matrix * add precision list * test llama-7b-chat * use /mnt/disk1 to store model and datasets * remove installation test * correct downloading path * fix HF vars * add bigdl-llm env vars * rename file * fix hf_home * fix script path * rename as harness evalution * rerun	2023-11-08 10:22:27 +08:00
Yishuo Wang	bfd9f88f0d	[LLM] Use fp32 as dtype when batch_size <=8 and qtype is q4_0/q8_0/fp8 (#9365 )	2023-11-08 09:54:53 +08:00
WeiguangHan	84ab614aab	LLM: add more models and skip runtime error (#9349 ) * add more models and skip runtime error * upgrade transformers * temporarily removed Mistral-7B-v0.1 * temporarily disable the upload of arc perf result	2023-11-08 09:45:53 +08:00
Heyang Sun	fae6db3ddc	[LLM] refactor cpu low-bit forward logic (#9366 ) * [LLM] refactor cpu low-bit forward logic * fix style * Update low_bit_linear.py * Update low_bit_linear.py * refine	2023-11-07 15:09:16 +08:00
Heyang Sun	af94058203	[LLM] Support CPU deepspeed distributed inference (#9259 ) * [LLM] Support CPU Deepspeed distributed inference * Update run_deepspeed.py * Rename * fix style * add new codes * refine * remove annotated codes * refine * Update README.md * refine doc and example code	2023-11-06 17:56:42 +08:00
Jin Qiao	f9bf5382ff	Fix: add aquila2 in README (#9362 )	2023-11-06 16:37:57 +08:00
Jin Qiao	e6b6afa316	LLM: add aquila2 model example (#9356 )	2023-11-06 15:47:39 +08:00
Xin Qiu	1420e45cc0	Chatglm2 rope optimization on xpu (#9350 )	2023-11-06 13:56:34 +08:00
Shaojun Liu	833e4dbc8d	fix llm-performance-test-on-arc bug (#9357 )	2023-11-06 10:00:25 +08:00
Yining Wang	9377b9c5d7	add CodeShell CPU example (#9345 ) * add CodeShell CPU example * fix some problems	2023-11-03 13:15:54 +08:00
Jason Dai	11a05641a4	Update readme (#9348 )	2023-11-03 11:27:07 +08:00
ZehuaCao	ef83c3302e	Use to test llm-performance on spr-perf (#9316 ) * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update action.yml * Create cpu-perf-test.yaml * Update action.yml * Update action.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml	2023-11-03 11:17:16 +08:00
Yuwen Hu	a0150bb205	[LLM] Move embedding layer to CPU for iGPU inference (#9343 ) * Move embedding layer to CPU for iGPU llm inference * Empty cache after to cpu * Remove empty cache as it seems to have some negative effect to first token	2023-11-03 11:13:45 +08:00
Cheen Hau, 俊豪	8f23fb04dc	Add inference test for Whisper model on Arc (#9330 ) * Add inference test for Whisper model * Remove unnecessary inference time measurement	2023-11-03 10:15:52 +08:00
Zheng, Yi	63411dff75	Add cpu examples of WizardCoder (#9344 ) * Add wizardcoder example * Minor fixes	2023-11-02 20:22:43 +08:00
Lilac09	74a8ad32dc	Add entry point to llm-serving-xpu (#9339 ) * add entry point to llm-serving-xpu * manually build * manually build * add entry point to llm-serving-xpu * manually build * add entry point to llm-serving-xpu * add entry point to llm-serving-xpu * add entry point to llm-serving-xpu	2023-11-02 16:31:07 +08:00
Ziteng Zhang	4df66f5cbc	Update llm-finetune-lora-cpu dockerfile and readme * Update README.md * Update Dockerfile	2023-11-02 16:26:24 +08:00
dingbaorong	2e3bfbfe1f	Add internlm_xcomposer cpu examples (#9337 ) * add internlm-xcomposer cpu examples * use chat * some fixes * add license * address shengsheng's comments * use demo.jpg	2023-11-02 15:50:02 +08:00
Jin Qiao	97a38958bd	LLM: add CodeLlama CPU and GPU examples (#9338 ) * LLM: add codellama CPU pytorch examples * LLM: add codellama CPU transformers examples * LLM: add codellama GPU transformers examples * LLM: add codellama GPU pytorch examples * LLM: add codellama in readme * LLM: add LLaVA link	2023-11-02 15:34:25 +08:00
Chen, Zhentao	d4dffbdb62	Merge harness (#9319 ) * add harness patch and llb script * add readme * add license * use patch instead * update readme * rename tests to evaluation * fix typo * remove nano dependency * add original harness link * rename title of usage * rename BigDLGPULM as BigDLLM * empty commit to rerun job	2023-11-02 15:14:19 +08:00
Zheng, Yi	63b2556ce2	Add cpu examples of skywork (#9340 )	2023-11-02 15:10:45 +08:00
dingbaorong	f855a864ef	add llava gpu example (#9324 ) * add llava gpu example * use 7b model * fix typo * add in README	2023-11-02 14:48:29 +08:00
Ziteng Zhang	dd3cf2f153	LLM: Add python 3.10 & 3.11 UT LLM: Add python 3.10 & 3.11 UT	2023-11-02 14:09:29 +08:00
Wang, Jian4	149146004f	LLM: Add qlora finetunning CPU example (#9275 ) * add qlora finetunning example * update readme * update example * remove merge.py and update readme	2023-11-02 09:45:42 +08:00
Jasonzzt	d1bdc0ef72	spr & arc ut with python 3.9 & 3.10 & 3.11	2023-11-01 22:57:48 +08:00
Jasonzzt	687da21467	test 3.11	2023-11-01 19:14:53 +08:00
WeiguangHan	9722e811be	LLM: add more models to the arc perf test (#9297 ) * LLM: add more models to the arc perf test * remove some old models * install some dependencies	2023-11-01 16:56:32 +08:00
Jasonzzt	3c3329010d	add conda update -n base conda	2023-11-01 16:36:35 +08:00

1 2 3 4 5 ...

1672 commits