ipex-llm

Author	SHA1	Message	Date
Ruonan Wang	fea6f16057	LLM: add mlp fusion for fp8e5 and update related check (#9860 ) * update mlp fusion * fix style * update	2024-01-09 09:56:32 +08:00
binbin Deng	294fd32787	LLM: update DeepSpeed AutoTP example with GPU memory optimization (#9823 )	2024-01-09 09:22:49 +08:00
Yuwen Hu	5ba1dc38d4	[LLM] Change default Linux GPU install option to PyTorch 2.1 (#9858 ) * Update default xpu to ipex 2.1 * Update related install ut support correspondingly * Add arc ut tests for both ipex 2.0 and 2.1 * Small fix * Diable ipex 2.1 test for now as oneapi 2024.0 has not beed installed on the test machine * Update document for default PyTorch 2.1 * Small fix * Small fix * Small doc fixes * Small fixes	2024-01-08 17:16:17 +08:00
Mingyu Wei	ed81baa35e	LLM: Use default typing-extension in LangChain examples (#9857 ) * remove typing extension downgrade in readme; minor fixes of code * fix typos in README * change default question of docqa.py	2024-01-08 16:50:55 +08:00
Jiao Wang	3b6372ab12	Fix Llama transformers 4.36 support (#9852 ) * supoort 4.36 * style * update * update * update * fix merge * update	2024-01-08 00:32:23 -08:00
Chen, Zhentao	1b585b0d40	set fp8 default as e5m2 (#9859 )	2024-01-08 15:53:57 +08:00
Ruonan Wang	dc995006cc	LLM: add flash attention for mistral / mixtral (#9846 ) * add flash attention for mistral * update * add flash attn for mixtral * fix style	2024-01-08 09:51:34 +08:00
Yishuo Wang	afaa871144	[LLM] support quantize kv cache to fp8 (#9812 )	2024-01-08 09:28:20 +08:00
Jiao Wang	248ae7fad2	LLama optimize_model to support transformers 4.36 (#9818 ) * supoort 4.36 * style * update * update * update	2024-01-05 11:30:18 -08:00
Ruonan Wang	a60bda3324	LLM: update check for deepspeed (#9838 )	2024-01-05 16:44:10 +08:00
Ruonan Wang	16433dd959	LLM: fix first token judgement of flash attention (#9841 ) * fix flash attention * meet code review * fix	2024-01-05 13:49:37 +08:00
Yina Chen	f919f5792a	fix kv cache out of bound (#9827 )	2024-01-05 12:38:57 +08:00
Ruonan Wang	5df31db773	LLM: fix accuracy issue of chatglm3 (#9830 ) * add attn mask for first token * fix * fix * change attn calculation * fix * fix * fix style * fix style	2024-01-05 10:52:05 +08:00
Jinyi Wan	3147ebe63d	Add cpu and gpu examples for SOLAR-10.7B (#9821 )	2024-01-05 09:50:28 +08:00
WeiguangHan	ad6b182916	LLM: change the color of peak diff (#9836 )	2024-01-04 19:30:32 +08:00
Xiangyu Tian	38c05be1c0	[LLM] Fix dtype mismatch in Baichuan2-13b (#9834 )	2024-01-04 15:34:42 +08:00
Ruonan Wang	8504a2bbca	LLM: update qlora alpaca example to change lora usage (#9835 ) * update example * fix style	2024-01-04 15:22:20 +08:00
Ziteng Zhang	05b681fa85	[LLM] IPEX auto importer set on by default (#9832 ) * Set BIGDL_IMPORT_IPEX default to True * Remove import intel_extension_for_pytorch as ipex from GPU example	2024-01-04 13:33:29 +08:00
Wang, Jian4	4ceefc9b18	LLM: Support bitsandbytes config on qlora finetune (#9715 ) * test support bitsandbytesconfig * update style * update cpu example * update example * update readme * update unit test * use bfloat16 * update logic * use int4 * set defalut bnb_4bit_use_double_quant * update * update example * update model.py * update * support lora example	2024-01-04 11:23:16 +08:00
WeiguangHan	9a14465560	LLM: add peak diff (#9789 ) * add peak diff * small fix * revert yml file	2024-01-03 18:18:19 +08:00
Mingyu Wei	f4eb5da42d	disable arc ut (#9825 )	2024-01-03 18:10:34 +08:00
Ruonan Wang	20e9742fa0	LLM: fix chatglm3 issue (#9820 ) * fix chatglm3 issue * small update	2024-01-03 16:15:55 +08:00
Wang, Jian4	a54cd767b1	LLM: Add gguf falcon (#9801 ) * init falcon * update convert.py * update style	2024-01-03 14:49:02 +08:00
Yuwen Hu	668c2095b1	Remove unnecessary warning when installing llm (#9815 )	2024-01-03 10:30:05 +08:00
dingbaorong	f5752ead36	Add whisper test (#9808 ) * add whisper benchmark code * add librispeech_asr.py * add bigdl license	2024-01-02 16:36:05 +08:00
binbin Deng	6584539c91	LLM: fix installation of codellama (#9813 )	2024-01-02 14:32:50 +08:00
Kai Huang	4d01069302	Temp remove baichuan2-13b 1k from arc perf test (#9810 )	2023-12-29 12:54:13 +08:00
dingbaorong	a2e668a61d	fix arc ut test (#9736 )	2023-12-28 16:55:34 +08:00
Qiyuan Gong	f0f9d45eac	[LLM] IPEX import support bigdl-core-xe-21 (#9769 ) Add support for bigdl-core-xe-21.	2023-12-28 15:23:58 +08:00
dingbaorong	a8baf68865	fix csv_to_html (#9802 )	2023-12-28 14:58:51 +08:00
Guancheng Fu	5857a38321	[vLLM] Add option to adjust KV_CACHE_ALLOC_BLOCK_LENGTH (#9782 ) * add option kv_cache_block * change var name	2023-12-28 14:41:47 +08:00
Ruonan Wang	99bddd3ab4	LLM: better FP16 support for Intel GPUs (#9791 ) * initial support * fix * fix style * fix * limi esimd usage condition * refactor code * fix style * small fix * meet code review * small fix	2023-12-28 13:30:13 +08:00
Yishuo Wang	7d9f6c6efc	fix cpuinfo error (#9793 )	2023-12-28 09:23:44 +08:00
Wang, Jian4	7ed9538b9f	LLM: support gguf mpt (#9773 ) * add gguf mpt * update	2023-12-28 09:22:39 +08:00
Cengguang Zhang	d299f108d0	update falcon attention forward. (#9796 )	2023-12-28 09:11:59 +08:00
Shaojun Liu	a5e5c3daec	set warm_up: 3 num_trials: 50 for cpu stress test (#9799 )	2023-12-28 08:55:43 +08:00
dingbaorong	f6bb4ab313	Arc stress test (#9795 ) * add arc stress test * triger ci * triger CI * triger ci * disable ci	2023-12-27 21:02:41 +08:00
Kai Huang	40eaf76ae3	Add baichuan2-13b to Arc perf (#9794 ) * add baichuan2-13b * fix indent * revert	2023-12-27 19:38:53 +08:00
Shaojun Liu	6c75c689ea	bigdl-llm stress test for stable version (#9781 ) * 1k-512 2k-512 baseline * add cpu stress test * update yaml name * update * update * clean up * test * update * update * update * test * update	2023-12-27 15:40:53 +08:00
dingbaorong	5cfb4c4f5b	Arc stable version performance regression test (#9785 ) * add arc stable version regression test * empty gpu mem between different models * triger ci * comment spr test * triger ci * address kai's comments and disable ci * merge fp8 and int4 * disable ci	2023-12-27 11:01:56 +08:00
binbin Deng	40edb7b5d7	LLM: fix get environment variables setting (#9787 )	2023-12-27 09:11:37 +08:00
Kai Huang	689889482c	Reduce max_cache_pos to reduce Baichuan2-13B memory (#9694 ) * optimize baichuan2 memory * fix * style * fp16 mask * disable fp16 * fix style * empty cache * revert empty cache	2023-12-26 19:51:25 +08:00
Jason Dai	361781bcd0	Update readme (#9788 )	2023-12-26 19:46:11 +08:00
Yuwen Hu	c38e18f2ff	[LLM] Migrate iGPU perf tests to new machine (#9784 ) * Move 1024 test just after 32-32 test; and enable all model for 1024-128 * Make sure python output encoding in utf-8 so that redirect to txt can always be success * Upload results to ftp * Small fix	2023-12-26 19:15:57 +08:00
WeiguangHan	c05d7e1532	LLM: add star_corder_15.5b model (#9772 ) * LLM: add star_corder_15.5b model * revert llm_performance_tests.yml	2023-12-26 18:55:56 +08:00
Ziteng Zhang	44b4a0c9c5	[LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py (#9786 ) * correct prompt format of Yi * correct prompt format of llama2 in cpu generate.py * correct prompt format of Qwen in GPU example	2023-12-26 16:57:55 +08:00
Xiangyu Tian	0ea842231e	[LLM] vLLM: Add api_server entrypoint (#9783 ) Add vllm.entrypoints.api_server for benchmark_serving.py in vllm.	2023-12-26 16:03:57 +08:00
dingbaorong	64d05e581c	add peak gpu mem stats in transformer_int4_gpu (#9766 ) * add peak gpu mem stats in transformer_int4_gpu * address weiguang's comments	2023-12-26 15:38:28 +08:00
Ziteng Zhang	87b4100054	[LLM] Support Yi model in chat.py (#9778 ) * Suppot Yi model * code style& add reference link	2023-12-26 10:03:39 +08:00
Ruonan Wang	11d883301b	LLM: fix wrong batch output caused by flash attention (#9780 ) * fix * meet code review * move batch size check to the beginning * move qlen check inside function * meet code review	2023-12-26 09:41:27 +08:00
Heyang Sun	66e286a73d	Support for Mixtral AWQ (#9775 ) * Support for Mixtral AWQ * Update README.md * Update README.md * Update awq_config.py * Update README.md * Update README.md	2023-12-25 16:08:09 +08:00
Ruonan Wang	1917bbe626	LLM: fix `BF16Linear` related training & inference issue (#9755 ) * fix bf16 related issue * fix * update based on comment & add arc lora script * update readme * update based on comment * update based on comment * update * force to bf16 * fix style * move check input dtype into function * update convert * meet code review * meet code review * update merged model to support new training_mode api * fix typo	2023-12-25 14:49:30 +08:00
Xiangyu Tian	30dab36f76	[LLM] vLLM: Fix kv cache init (#9771 ) Fix kv cache init	2023-12-25 14:17:06 +08:00
Yina Chen	449b387125	Support relora in bigdl-llm (#9687 ) * init * fix style * update * support resume & update readme * update * update * remove important * add training mode * meet comments	2023-12-25 14:04:28 +08:00
Shaojun Liu	b6222404b8	bigdl-llm stable version: let the perf test fail if the difference between perf and baseline is greater than 5% (#9750 ) * test * test * test * update * revert	2023-12-25 13:47:11 +08:00
Ziteng Zhang	986f65cea9	[LLM] Add trust_remote_code for local renamed model in bigdl_llm_model.py (#9762 )	2023-12-25 11:31:14 +08:00
Yishuo Wang	be13b162fe	add codeshell example (#9743 )	2023-12-25 10:54:01 +08:00
Guancheng Fu	daf536fb2d	vLLM: Apply attention optimizations for selective batching (#9758 ) * fuse_rope for prefil * apply kv_cache optimizations * apply fast_decoding_path * Re-enable kv_cache optimizations for prefill * reduce KV_CACHE_ALLOC_BLOCK for selective_batching	2023-12-25 10:29:31 +08:00
binbin Deng	ed8ed76d4f	LLM: update deepspeed autotp usage (#9733 )	2023-12-25 09:41:14 +08:00
Yuwen Hu	02436c6cce	[LLM] Enable more long context in-out pairs for iGPU perf tests (#9765 ) * Add test for 1024-128 and enable more tests for 512-64 * Fix date in results csv name to the time when the performance is triggered * Small fix * Small fix * further fixes	2023-12-22 18:18:23 +08:00
Chen, Zhentao	7fd7c37e1b	Enable fp8e5 harness (#9761 ) * fix precision format like fp8e5 * match fp8_e5m2	2023-12-22 16:59:48 +08:00
Qiyuan Gong	4c487313f2	Revert "[LLM] IPEX auto importer turn on by default for XPU (#9730 )" (#9759 ) This reverts commit `0284801fbd`.	2023-12-22 16:38:24 +08:00
Qiyuan Gong	0284801fbd	[LLM] IPEX auto importer turn on by default for XPU (#9730 ) * Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU. * Remove import intel_extension_for_pytorch as ipex from GPU example. * Add support for bigdl-core-xe-21.	2023-12-22 16:20:32 +08:00
Chen, Zhentao	86a69e289c	fix harness runner label of manual trigger (#9754 ) * fix runner * update golden	2023-12-22 15:09:22 +08:00
Guancheng Fu	fdf93c9267	Implement selective batching for vLLM (#9659 ) * add control to load hf model * finish initial version of selective_batching * temp * finish * Remove print statement * fix error * Apply yang's optimization * a version that works * We need to check kv_cache passed in, this could be an error. TODO: add fast decoding path * format * temp solution: not batching prefill requests * a version that works for prefill batching * format * a solid version: works normally * a temp version * Solid version: remove redundant functions * fix format * format * solid: add option to enable selective_batching * remove logic for using transformer models * format * format * solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING * format * finish * format	2023-12-22 13:45:46 +08:00
Ruonan Wang	2f36769208	LLM: bigdl-llm lora support & lora example (#9740 ) * lora support and single card example * support multi-card, refactor code * fix model id and style * remove torch patch, add two new class for bf16, update example * fix style * change to training_mode * small fix * add more info in help * fixstyle, update readme * fix ut * fix ut * Handling compatibility issues with default LoraConfig	2023-12-22 11:05:39 +08:00
SONG Ge	ba0b939579	[LLM] Support transformers-v4.36.0 on mistral model (#9744 ) * add support transformers-v4.36.0 on mistral model * python/llm/src/bigdl/llm/transformers/models/mistral.py * make the redundant implementation as utils * fix code style * fix * fix style * update with utils enough_kv_room	2023-12-22 09:59:27 +08:00
Xin Qiu	e36111e713	mixstral fused qkv and rope (#9724 ) * mixstral fused qkv and rope * fix and clean * fix style * update * update * fix * update * fix	2023-12-22 09:26:35 +08:00
Jiao Wang	e4f6e43675	safetenor to false (#9728 )	2023-12-21 14:41:51 -08:00
Shaojun Liu	bb52239e0a	bigdl-llm stable version release & test (#9732 ) * stable version test * trigger spr test * update * trigger * test * test * test * test * test * refine * release linux first	2023-12-21 22:55:33 +08:00
WeiguangHan	d4d2ccdd9d	LLM: remove startcorder-15.5b (#9748 )	2023-12-21 18:52:52 +08:00
WeiguangHan	474c099559	LLM: using separate threads to do inference (#9727 ) * using separate threads to do inference * resolve some comments * resolve some comments * revert llm_performance_tests.yml file	2023-12-21 17:56:43 +08:00
Yishuo Wang	426660b88e	simplify qwen attention (#9747 )	2023-12-21 17:53:29 +08:00
Wang, Jian4	984697afe2	LLM: Add bloom gguf support (#9734 ) * init * update bloom add merges * update * update readme * update for llama error * update	2023-12-21 14:06:25 +08:00
Heyang Sun	df775cf316	fix python style (#9742 ) * fix python style * fix * fix	2023-12-21 11:25:05 +08:00
Chen, Zhentao	b06a3146c8	Fix 70b oom (#9738 ) * add default value to bigdl llm * fix model oom	2023-12-21 10:40:52 +08:00
Xin Qiu	6c3e698bf1	mistral decoding_fast_path and fused mlp (#9714 ) * mistral decoding_fast_path and fused mlp * meet code review	2023-12-21 10:11:37 +08:00
Heyang Sun	d157f623b6	Load Mixtral gguf in a block-wise way (#9725 ) * Load Mixtral gguf in a block-wise way * refine	2023-12-21 10:03:23 +08:00
WeiguangHan	34bb804189	LLM: check csv and its corresponding yaml file (#9702 ) * LLM: check csv and its corresponding yaml file * run PR arc perf test * modify the name of some variables * execute the check results script in right place * use cp to replace mv command * resolve some comments * resolve more comments * revert the llm_performance_test.yaml file	2023-12-21 09:54:33 +08:00
Zhao Changmin	4bda975a3e	LLM: Align lowbit model config (#9735 ) * align lowbit model config	2023-12-21 09:48:58 +08:00
Wang, Jian4	e1e921f425	LLM: gguf other model using dtype (#9729 )	2023-12-21 09:33:40 +08:00
Yishuo Wang	13ea6330bd	optimize qwen rope (#9737 )	2023-12-20 17:34:34 +08:00
Ziteng Zhang	4c032a433e	[LLM] Add glibc checker (#9624 ) * Add glibc checker * Add env BIGDL_GLIBC_CHECK to control glibc checker. The default is false, i.e., don't check.	2023-12-20 16:52:43 +08:00
Yina Chen	cd652a1710	Support fp8 e5m2 on arc (#9711 ) * init * fix style * update * fix style * update	2023-12-20 16:26:17 +08:00
Yishuo Wang	e54c428d30	add bf16/fp16 fuse mlp support (#9726 )	2023-12-20 10:40:45 +08:00
Heyang Sun	612651cb5d	fix typo (#9723 )	2023-12-20 09:41:59 +08:00
WeiguangHan	3aa8b66bc3	LLM: remove starcoder-15.5b model temporarily (#9720 )	2023-12-19 20:14:46 +08:00
Yishuo Wang	522cf5ed82	[LLM] Improve chatglm2/3 rest token performance with long context (#9716 )	2023-12-19 17:29:38 +08:00
Yishuo Wang	f2e6abb563	fix mlp batch size check (#9718 )	2023-12-19 14:22:22 +08:00
Heyang Sun	1fa7793fc0	Load Mixtral GGUF Model (#9690 ) * Load Mixtral GGUF Model * refactor * fix empty tensor when to cpu * update gpu and cpu readmes * add dtype when set tensor into module	2023-12-19 13:54:38 +08:00
Qiyuan Gong	d0a3095b97	[LLM] IPEX auto importer (#9706 ) * IPEX auto importer and get_ipex_version. * Add BIGDL_IMPORT_IPEX to control auto import, default is false.	2023-12-19 13:39:38 +08:00
Yang Wang	f4fb58d99c	fusing qkv project and rope (#9612 ) * Try fusing qkv project and rope * add fused mlp * fuse append cache * fix style and clean up code * clean up	2023-12-18 16:45:00 -08:00
Kai Huang	4c112ee70c	Rename qwen in model name for arc perf test (#9712 )	2023-12-18 20:34:31 +08:00
Cengguang Zhang	4d22add4af	LLM: fix qwen efficiency issue in perf-test.	2023-12-18 18:32:54 +08:00
Ruonan Wang	8ed89557e5	LLM: add mlp optimization of mixtral (#9709 )	2023-12-18 16:59:52 +08:00
Chen, Zhentao	b3647507c0	Fix harness workflow (#9704 ) * error when larger than 0.001 * fix env setup * fix typo * fix typo	2023-12-18 15:42:10 +08:00
binbin Deng	12df70953e	LLM: add resume_from_checkpoint related section (#9705 )	2023-12-18 12:27:02 +08:00
Xin Qiu	320110d158	handle empty fused norm result (#9688 ) * handle empty fused norm result * remove fast_rms_norm * fix style	2023-12-18 09:56:11 +08:00
Ziteng Zhang	67cc155771	[LLM] Correct chat format of llama and add llama_stream_chat in chat.py * correct chat format of llama * add llama_stream_chat	2023-12-15 16:36:46 +08:00
Ziteng Zhang	0d41b7ba7b	[LLM] Correct chat format & add stop words for chatglm3 in chat.py * correct chat format of chatglm3 * correct stop words of chatglm3	2023-12-15 16:35:17 +08:00
Ziteng Zhang	d57efd8eb9	[LM] Add stop_word for Qwen model and correct qwen chat format in chat.py (#9642 ) * add stop words list for qwen * change qwen chat format	2023-12-15 14:53:58 +08:00
SONG Ge	d5b81af7bd	Support mixtral attention optimization on transformers-v4.36.0 (#9674 ) * add example code to support mistral/mixtral attention on transformers v4.36.0 * update * style fix * add update for seen-tokens * support mixtral * rm mistral change * small fix * add more comments and remove use_cache part --------- Co-authored-by: plusbang <binbin1.deng@intel.com>	2023-12-15 14:30:23 +08:00
Cengguang Zhang	adbef56001	LLM: update qwen attention forward. (#9695 ) * feat: update qwen attention forward. * fix: style.	2023-12-15 14:06:15 +08:00
Wang, Jian4	b8437a1c1e	LLM: Add gguf mistral model support (#9691 ) * add mistral support * need to upgrade transformers version * update	2023-12-15 13:37:39 +08:00
Wang, Jian4	496bb2e845	LLM: Support load BaiChuan model family gguf model (#9685 ) * support baichuan model family gguf model * update gguf generate.py * add verify models * add support model_family * update * update style * update type * update readme * update * remove support model_family	2023-12-15 13:34:33 +08:00
Lilac09	3afed99216	fix path issue (#9696 )	2023-12-15 11:21:49 +08:00
Jason Dai	37f509bb95	Update readme (#9692 )	2023-12-14 19:50:21 +08:00
WeiguangHan	1f0245039d	LLM: check the final csv results for arc perf test (#9684 ) * LLM: check the final csv results for arc perf test * delete useless python script * change threshold * revert the llm_performance_tests.yml	2023-12-14 19:46:08 +08:00
Yishuo Wang	9a330bfc2b	fix fuse mlp when using q5_0 or fp8 (#9689 )	2023-12-14 16:16:05 +08:00
Yuwen Hu	82ac2dbf55	[LLM] Small fixes for win igpu test for ipex 2.1 (#9686 ) * Fixes to install for igpu performance tests * Small update for core performance tests model lists	2023-12-14 15:39:51 +08:00
WeiguangHan	3e8d198b57	LLM: add eval func (#9662 ) * Add eval func * add left eval	2023-12-14 14:59:02 +08:00
Ziteng Zhang	21c7503a42	[LLM] Correct prompt format of Qwen in generate.py (#9678 ) * Change qwen prompt format to chatml	2023-12-14 14:01:30 +08:00
Qiyuan Gong	223c9622f7	[LLM] Mixtral CPU examples (#9673 ) * Mixtral CPU PyTorch and hugging face examples, based on #9661 and #9671	2023-12-14 10:35:11 +08:00
Xin Qiu	5e46e0e5af	fix baichuan2-7b 1st token performance regression on xpu (#9683 ) * fix baichuan2-7b 1st token performance regression * add comments * fix style	2023-12-14 09:58:32 +08:00
ZehuaCao	877229f3be	[LLM]Add Yi-34B-AWQ to verified AWQ model. (#9676 ) * verfiy Yi-34B-AWQ * update	2023-12-14 09:55:47 +08:00
binbin Deng	68a4be762f	remove disco mixtral, update oneapi version (#9671 )	2023-12-13 23:24:59 +08:00
Ruonan Wang	1456d30765	LLM: add dot to option name in setup (#9682 )	2023-12-13 20:57:27 +08:00
Yuwen Hu	cbdd49f229	[LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 (#9679 ) * Change igpu win tests for ipex 2.1 and oneapi 2024.0 * Qwen model repo id updates; updates model list for 512-64 * Add .eval for win igpu all-in-one benchmark for best performance	2023-12-13 18:52:29 +08:00
Mingyu Wei	16febc949c	[LLM] Add exclude option in all-in-one performance test (#9632 ) * add exclude option in all-in-one perf test * update arc-perf-test.yaml * Exclude in_out_pairs in main function * fix some bugs * address Kai's comments * define excludes at the beginning * add bloomz:2048 to exclude	2023-12-13 18:13:06 +08:00
Ruonan Wang	9b9cd51de1	LLM: update setup to provide new install option to support ipex 2.1 & oneapi 2024 (#9647 ) * update setup * default to 2.0 now * meet code review	2023-12-13 17:31:56 +08:00
Yishuo Wang	09ca540f9b	use fuse mlp in qwen (#9672 )	2023-12-13 17:20:08 +08:00
Ruonan Wang	c7741c4e84	LLM: update moe block convert to optimize rest token latency of Mixtral (#9669 ) * update moe block convert * further accelerate final_hidden_states * fix style * fix style	2023-12-13 16:17:06 +08:00
ZehuaCao	503880809c	verfiy codeLlama (#9668 )	2023-12-13 15:39:31 +08:00
Xiangyu Tian	1c6499e880	[LLM] vLLM: Support Mixtral Model (#9670 ) Add Mixtral support for BigDL vLLM.	2023-12-13 14:44:47 +08:00
Ruonan Wang	dc5b1d7e9d	LLM: integrate sdp kernel for FP16 rest token inference on GPU [DG2/ATSM] (#9633 ) * integrate sdp * update api * fix style * meet code review * fix * distinguish mtl from arc * small fix	2023-12-13 11:29:57 +08:00
Qiyuan Gong	5b0e7e308c	[LLM] Add support for empty activation (#9664 ) * Add support for empty activation, e.g., [0, 4096]. Empty activation is allowed by PyTorch. * Add comments.	2023-12-13 11:07:45 +08:00
SONG Ge	284e7697b1	[LLM] Optimize ChatGLM2 kv_cache to support beam_search on ARC (#9579 ) * optimize kv_cache to support beam_search on Arc * correctness test update * fix query_length issue * simplify implementation * only enable the optimization on gpu device * limit the beam_search support only enabled with gpu device and batch_size > 1 * add comments for beam_search case and revert ut change * meet comments * add more comments to describe the differece between multi-cases	2023-12-13 11:02:14 +08:00
Heyang Sun	c64e2248ef	fix str returned by get_int_from_str rather than expected int (#9667 )	2023-12-13 11:01:21 +08:00
binbin Deng	bf1bcf4a14	add official Mixtral model support (#9663 )	2023-12-12 22:27:07 +08:00
Ziteng Zhang	8931f2eb62	[LLM] Fix transformer qwen size mismatch and rename causal_mask (#9655 ) * Fix size mismatching caused by context_layer * Change registered_causal_mask to causal_mask	2023-12-12 20:57:40 +08:00
binbin Deng	2fe38b4b9b	LLM: add mixtral GPU examples (#9661 )	2023-12-12 20:26:36 +08:00
Yuwen Hu	968d99e6f5	Remove empty cache between each iteration of generation (#9660 )	2023-12-12 17:24:06 +08:00
Xin Qiu	0e639b920f	disable test_optimized_model.py temporarily due to out of memory on A730M(pr validation machine) (#9658 ) * disable test_optimized_model.py * disable seq2seq	2023-12-12 17:13:52 +08:00
binbin Deng	59ce86d292	LLM: support `optimize_model=True` for Mixtral model (#9657 )	2023-12-12 16:41:26 +08:00
Yuwen Hu	d272b6dc47	[LLM] Enable generation of html again for win igpu tests (#9652 ) * Enable generation of html again and comment out rwkv for 32-512 as it is not very stable * Small fix	2023-12-11 19:15:17 +08:00
WeiguangHan	afa895877c	LLM: fix the issue that may generate blank html (#9650 ) * LLM: fix the issue that may generate blank html * reslove some comments	2023-12-11 19:14:57 +08:00
ZehuaCao	45721f3473	verfiy llava (#9649 )	2023-12-11 14:26:05 +08:00
Heyang Sun	9f02f96160	[LLM] support for Yi AWQ model (#9648 )	2023-12-11 14:07:34 +08:00
Xin Qiu	82255f9726	Enable fused layernorm (#9614 ) * bloom layernorm * fix * layernorm * fix * fix * fix * style fix * fix * replace nn.LayerNorm	2023-12-11 09:26:13 +08:00
Yuwen Hu	894d0aaf5e	[LLM] iGPU win perf test reorg based on in-out pairs (#9645 ) * trigger pr temparorily * Saparate benchmark run for win igpu based in in-out pairs * Rename fix * Test workflow * Small fix * Skip generation of html for now * Change back to nightly triggered	2023-12-08 20:46:40 +08:00
Chen, Zhentao	972cdb9992	gsm8k OOM workaround (#9597 ) * update bigdl_llm.py * update the installation of harness * fix partial function * import ipex * force seq len in decrease order * put func outside class * move comments * default 'trust_remote_code' as True * Update llm-harness-evaluation.yml	2023-12-08 18:47:25 +08:00
WeiguangHan	1ff4bc43a6	degrade pandas version (#9643 )	2023-12-08 17:44:51 +08:00
Yina Chen	70f5e7bf0d	Support peft LoraConfig (#9636 ) * support peft loraconfig * use testcase to test * fix style * meet comments	2023-12-08 16:13:03 +08:00
Xin Qiu	0b6f29a7fc	add fused rms norm for Yi and Qwen (#9640 )	2023-12-08 16:04:38 +08:00
Xin Qiu	5636b0ba80	set new linear status (#9639 )	2023-12-08 11:02:49 +08:00
binbin Deng	499100daf1	LLM: Add solution to fix `oneccl` related error (#9630 )	2023-12-08 10:51:55 +08:00
ZehuaCao	6eca8a8bb5	update transformer version (#9631 )	2023-12-08 09:36:00 +08:00
WeiguangHan	e9299adb3b	LLM: Highlight some values in the html (#9635 ) * highlight some values in the html * revert the llm_performance_tests.yml	2023-12-07 19:02:41 +08:00
Yuwen Hu	6f34978b94	[LLM] Add more performance tests for win iGPU (more in-out pairs, RWKV model) (#9626 ) * Add supports for loading rwkv models using from_pretrained api * Temporarily enable pr tests * Add RWKV in tests and more in-out pairs * Add rwkv for 512 tests * Make iterations smaller * Change back to nightly trigger	2023-12-07 18:55:16 +08:00
Ruonan Wang	d9b0c01de3	LLM: fix unlora module in qlora finetune (#9621 ) * fix unlora module * split train and inference	2023-12-07 16:32:02 +08:00
Heyang Sun	3811cf43c9	[LLM] update AWQ documents (#9623 ) * [LLM] update AWQ and verified models' documents * refine * refine links * refine	2023-12-07 16:02:20 +08:00
Yishuo Wang	7319f2c227	use fused mlp in baichuan2 (#9620 )	2023-12-07 15:50:57 +08:00
Xiangyu Tian	deee65785c	[LLM] vLLM: Delete last_kv_cache before prefilling (#9619 ) Remove last_kv_cache before prefilling to reduce peak memory usage.	2023-12-07 11:32:33 +08:00
Yuwen Hu	48b85593b3	Update all-in-one benchmark readme (#9618 )	2023-12-07 10:32:09 +08:00
Xiangyu Tian	0327169b50	[LLM] vLLM: fix memory leak in prepare_kv_cache (#9616 ) Revert modification in prepare_kv_cache to fix memory leak.	2023-12-07 10:08:18 +08:00
Xin Qiu	13d47955a8	use fused rms norm in chatglm2 and baichuan (#9613 ) * use fused rms norm in chatglm2 and baichuan * style fix	2023-12-07 09:21:41 +08:00
Jason Dai	51b668f229	Update GGUF readme (#9611 )	2023-12-06 18:21:54 +08:00
dingbaorong	a7bc89b3a1	remove q4_1 in gguf example (#9610 ) * remove q4_1 * fixes	2023-12-06 16:00:05 +08:00
Yina Chen	404e101ded	QALora example (#9551 ) * Support qa-lora * init * update * update * update * update * update * update merge * update * fix style & update scripts * update * address comments * fix typo * fix typo --------- Co-authored-by: Yang Wang <yang3.wang@intel.com>	2023-12-06 15:36:21 +08:00
Guancheng Fu	6978b2c316	[VLLM] Change padding patterns for vLLM & clean code (#9609 ) * optimize * fix minor error * optimizations * fix style	2023-12-06 15:27:26 +08:00
dingbaorong	89069d6173	Add gpu gguf example (#9603 ) * add gpu gguf example * some fixes * address kai's comments * address json's comments	2023-12-06 15:17:54 +08:00
Yuwen Hu	0e8f4020e5	Add traceback error output for win igpu test api in benchmark (#9607 )	2023-12-06 14:35:16 +08:00
Ziteng Zhang	aeb77b2ab1	Add minimum Qwen model version (#9606 )	2023-12-06 11:49:14 +08:00
Yuwen Hu	c998f5f2ba	[LLM] iGPU long context tests (#9598 ) * Temp enable PR * Enable tests for 256-64 * Try again 128-64 * Empty cache after each iteration for igpu benchmark scripts * Try tests for 512 * change order for 512 * Skip chatglm3 and llama2 for now * Separate tests for 512-64 * Small fix * Further fixes * Change back to nightly again	2023-12-06 10:19:20 +08:00
Heyang Sun	4e70e33934	[LLM] code and document for distributed qlora (#9585 ) * [LLM] code and document for distributed qlora * doc * refine for gradient checkpoint * refine * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * add link in doc	2023-12-06 09:23:17 +08:00
Zheng, Yi	d154b38bf9	Add llama2 gpu low memory example (#9514 ) * Add low memory example * Minor fixes * Update readme.md	2023-12-05 17:29:48 +08:00
Jason Dai	06febb5fa7	Update readme for FP8/FP4 inference examples (#9601 )	2023-12-05 15:59:03 +08:00
dingbaorong	a66fbedd7e	add gpu more data types example (#9592 ) * add gpu more data types example * add int8	2023-12-05 15:45:38 +08:00
Ziteng Zhang	65934c9f4f	[LLM] Fix Qwen causal_mask and attention_mask size mismatching (#9600 ) * Fix #9582 , caused by Qwen modified modeling_qwen.py `7f62181c94 (d2h-049182)`	2023-12-05 15:15:54 +08:00
Jinyi Wan	b721138132	Add cpu and gpu examples for BlueLM (#9589 ) * Add cpu int4 example for BlueLM * addexample optimize_model cpu for bluelm * add example gpu int4 blueLM * add example optimiza_model GPU for bluelm * Fixing naming issues and BigDL package version. * Fixing naming issues... * Add BlueLM in README.md "Verified Models"	2023-12-05 13:59:02 +08:00
Guancheng Fu	8b00653039	fix doc (#9599 )	2023-12-05 13:49:31 +08:00
Qiyuan Gong	f211f136b6	Configurable TORCH_LINEAR_THRESHOLD from env (#9588 ) * Add TORCH_LINEAR_THRESHOLD from env (BIGDL_LLM_LINEAR_THRESHOLD) * Change default to 512	2023-12-05 13:19:47 +08:00
Yuwen Hu	1012507a40	[LLM] Fix performance tests (#9596 ) * Fix missing key for cpu_embedding * Remove 512 as it stuck for now * Small fix	2023-12-05 10:59:28 +08:00
Chen, Zhentao	8c8a27ded7	Add harness summary job (#9457 ) * format yml * add make_table_results * add summary job * add a job to print single result * upload full directory	2023-12-05 10:04:10 +08:00
Yuwen Hu	3f4ad97929	[LLM] Add performance tests for windows iGPU (#9584 ) * Add support for win gpu benchmark with peak gpu memory monitoring * Add win igpu tests * Small fix * Forward outputs * Small fix * Test and small fixes * Small fix * Small fix and test * Small fixes * Add tests for 512-64 and change back to nightly tests * Small fix	2023-12-04 20:50:02 +08:00
Chen, Zhentao	9557aa9c21	Fix harness nightly (#9586 ) * update golden * loose the restriction of diff * only compare results when scheduled	2023-12-04 11:45:00 +08:00
Xiangyu Tian	5c03651309	[LLM] vLLM: Add Preempt for scheduler (#9568 ) Implement Preempt_by_recompute method for vllm.	2023-12-03 20:16:25 +08:00
Chen, Zhentao	cb228c70ea	Add harness nightly (#9552 ) * modify output_path as a directory * schedule nightly at 21 on Friday * add tasks and models for nightly * add accuracy regression * comment out if to test * mixed fp4 * for test * add missing delimiter * remove comma * fixed golden results * add mixed 4 golden result * add more options * add mistral results * get golden result of stable lm * move nightly scripts and results to test folder * add license * add fp8 stable lm golden * run on all available devices * trigger only when ready for review * fix new line * update golden * add mistral	2023-12-01 14:16:35 +08:00
Chen, Zhentao	4d7d5d4c59	Add 3 leaderboard tasks (#9566 ) * update leaderboard map * download model and dataset without overwritten * fix task drop * run on all available devices	2023-12-01 14:01:14 +08:00
Wang, Jian4	ed0dc57c6e	LLM: Add cpu qlora support other models guide (#9567 ) * use bf16 flag * add using baichuan model * update merge * remove * update	2023-12-01 11:18:04 +08:00
Jason Dai	bda404fc8f	Update readme (#9575 )	2023-11-30 22:45:52 +08:00
Xin Qiu	69c49d21f5	use fused rms norm (#9572 ) * use fused rms norm * meet code review	2023-11-30 21:47:41 +08:00
Yishuo Wang	66f5b45f57	[LLM] add a llama2 gguf example (#9553 )	2023-11-30 16:37:17 +08:00
Yishuo Wang	7f6465518a	support loading llama tokenizer from gguf model (#9565 )	2023-11-30 14:56:12 +08:00
Wang, Jian4	a0a80d232e	LLM: Add qlora cpu distributed readme (#9561 ) * init readme * add distributed guide * update	2023-11-30 13:42:30 +08:00
Chen, Zhentao	c8e0c2ed48	Fixed dumped logs in harness (#9549 ) * install transformers==4.34.0 * modify output_path as a directory * add device and task to output dir parents	2023-11-30 12:47:56 +08:00
Qiyuan Gong	d85a430a8c	Uing bigdl-llm-init instead of bigdl-nano-init (#9558 ) * Replace `bigdl-nano-init` with `bigdl-llm-init`. * Install `bigdl-llm` instead of `bigdl-nano`. * Remove nano in README.	2023-11-30 10:10:29 +08:00
Yuwen Hu	34503efa6a	Fix cpu pinned embedding (#9556 )	2023-11-29 18:27:56 +08:00
binbin Deng	4ff2ca9d0d	LLM: fix loss error on Arc (#9550 )	2023-11-29 15:16:18 +08:00
Yishuo Wang	65121c7997	support loading q4_1/q5_0/q5_1/q8_0 gguf model (#9546 )	2023-11-29 14:40:37 +08:00
Wang, Jian4	b824754256	LLM: Update for cpu qlora mpirun (#9548 )	2023-11-29 10:56:17 +08:00
Yuwen Hu	5f5ca38b74	[LLM Doc] Fix api doc rendering error (#9542 ) * Fix api rendering error * Fix python style	2023-11-29 09:17:09 +08:00
Yishuo Wang	a86c6e0b56	[LLM] support loading gguf model (#9544 )	2023-11-28 15:51:15 +08:00
Xiangyu Tian	916c338772	fix bugs in vllm length check (#9543 )	2023-11-28 11:09:54 +08:00
WeiguangHan	5098bc3544	LLM: enable previous models (#9505 ) * enable previous models * test mistral model * for test * run models separately * test all models * for test * revert the llm_performance_test.yaml	2023-11-28 10:21:07 +08:00
Zhao Changmin	e7e0cd3b5e	CPU Pinned embedding Layer (#9538 ) * CPU Pinned embedding	2023-11-28 09:46:31 +08:00
Guancheng Fu	963a5c8d79	Add vLLM-XPU version's README/examples (#9536 ) * test * test * fix last kv cache * add xpu readme * remove numactl for xpu example * fix link error * update max_num_batched_tokens logic * add explaination * add xpu environement version requirement * refine gpu memory * fix * fix style	2023-11-28 09:44:03 +08:00
Guancheng Fu	b6c3520748	Remove xformers from vLLM-CPU (#9535 )	2023-11-27 11:21:25 +08:00
binbin Deng	2b9c7d2a59	LLM: quick fix alpaca qlora finetuning script (#9534 )	2023-11-27 11:04:27 +08:00
Yuwen Hu	11fa3de290	Add sutup support of win gpu for bigdl-llm (#9512 )	2023-11-24 17:49:21 +08:00
Chen, Zhentao	45820cf3b9	add optimize model option (#9530 )	2023-11-24 17:10:49 +08:00
binbin Deng	6bec0faea5	LLM: support Mistral AWQ models (#9520 )	2023-11-24 16:20:22 +08:00
Ruonan Wang	914a5a5a27	LLM: fix abnormal Mistral GPU accuracy by updating rms_norm (#9529 )	2023-11-24 15:37:50 +08:00
SONG Ge	3d24823cda	hot-fix mistral kv_cache (#9528 )	2023-11-24 14:33:04 +08:00
Zhao Changmin	42b7a16bc5	Replace torch.bmm with safe_bmm (#9519 ) * replace bmm with safe one * rename args and deprecated warning	2023-11-24 12:16:48 +08:00
Jason Dai	b3178d449f	Update README.md (#9525 )	2023-11-23 21:45:20 +08:00
Jason Dai	82898a4203	Update GPU example README (#9524 )	2023-11-23 21:20:26 +08:00
Jason Dai	064848028f	Update README.md (#9523 )	2023-11-23 21:16:21 +08:00
Ruonan Wang	b63aae8a8e	LLM: add flash attention support for llama (#9518 ) * add initial flash attention for llama * accelerate fp32 first token by changing to fp16 in advance * support fp32	2023-11-23 18:40:18 +08:00
Guancheng Fu	bf579507c2	Integrate vllm (#9310 ) * done * Rename structure * add models * Add structure/sampling_params,sequence * add input_metadata * add outputs * Add policy,logger * add and update * add parallelconfig back * core/scheduler.py * Add llm_engine.py * Add async_llm_engine.py * Add tested entrypoint * fix minor error * Fix everything * fix kv cache view * fix * fix * fix * format&refine * remove logger from repo * try to add token latency * remove logger * Refine config.py * finish worker.py * delete utils.py * add license * refine * refine sequence.py * remove sampling_params.py * finish * add license * format * add license * refine * refine * Refine line too long * remove exception * so dumb style-check * refine * refine * refine * refine * refine * refine * add README * refine README * add warning instead error * fix padding * add license * format * format * format fix * Refine vllm dependency (#1) vllm dependency clear * fix licence * fix format * fix format * fix * adapt LLM engine * fix * add license * fix format * fix * Moving README.md to the correct position * Fix readme.md * done * guide for adding models * fix * Fix README.md * Add new model readme * remove ray-logic * refactor arg_utils.py * remove distributed_init_method logic * refactor entrypoints * refactor input_metadata * refactor model_loader * refactor utils.py * refactor models * fix api server * remove vllm.stucture * revert by txy 1120 * remove utils * format * fix license * add bigdl model * Refer to a specfic commit * Change code base * add comments * add async_llm_engine comment * refine * formatted * add worker comments * add comments * add comments * fix style * add changes --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com> Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2023-11-23 16:46:45 +08:00
Heyang Sun	48fbb1eb94	support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507 )	2023-11-23 10:58:09 +08:00
Qiyuan Gong	0f0c6bb631	[LLM] Fix Qwen registered_causal_mask is None (#9513 ) * Add registered_causal_mask init based on `2abd8e5777`.	2023-11-23 09:28:04 +08:00
Heyang Sun	11fa5a8a0e	Fix QLoRA CPU dispatch_model issue about accelerate (#9506 )	2023-11-23 08:41:25 +08:00
Heyang Sun	1453046938	install bigdl-llm in deepspeed cpu inference example (#9508 )	2023-11-23 08:39:21 +08:00
binbin Deng	86743fb57b	LLM: fix transformers version in CPU finetuning example (#9511 )	2023-11-22 15:53:07 +08:00
binbin Deng	1a2129221d	LLM: support resume from checkpoint in Alpaca QLoRA (#9502 )	2023-11-22 13:49:14 +08:00
Ruonan Wang	139e98aa18	LLM: quick fix benchmark (#9509 )	2023-11-22 10:19:57 +08:00
WeiguangHan	c2aeb4d1e8	del model after test (#9504 )	2023-11-21 18:41:50 +08:00
Ruonan Wang	076d106ef5	LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499 ) * update to bf16 to accelerate gradient checkpoint * add utils and fix ut	2023-11-21 17:08:36 +08:00
Cheen Hau, 俊豪	3e39828420	Update all in one benchmark readme (#9496 ) * Add gperftools install to all in one benchmark readme * Update readme	2023-11-21 14:57:16 +08:00
binbin Deng	b7ae572ac3	LLM: update Alpaca QLoRA finetuning example on GPU (#9492 )	2023-11-21 14:22:19 +08:00
Wang, Jian4	c5cb3ab82e	LLM : Add CPU alpaca qlora example (#9469 ) * init * update xpu to cpu * update * update readme * update example * update * add refer * add guide to train different datasets * update readme * update	2023-11-21 09:19:58 +08:00
binbin Deng	96fd26759c	LLM: fix QLoRA finetuning example on CPU (#9489 )	2023-11-20 14:31:24 +08:00
Xin Qiu	50b01058f1	enable new q4_1 (#9479 )	2023-11-17 14:58:57 +08:00
binbin Deng	3dac21ac7b	LLM: add more example usages about alpaca qlora on different hardware (#9458 )	2023-11-17 09:56:43 +08:00
Heyang Sun	921b263d6a	update deepspeed install and run guide in README (#9441 )	2023-11-17 09:11:39 +08:00
Zhao Changmin	30abd304a7	LLM: Fix baichuan pre-normalize model tensor assigning issue when loading (#9481 ) * No need to normalized when loading	2023-11-16 21:57:28 +08:00
WeiguangHan	bc06bec90e	LLM: modify the script to generate html results more accurately (#9445 ) * modify the script to generate html results more accurately * resolve some comments * revert some codes	2023-11-16 19:50:23 +08:00
Ruonan Wang	c0ef70df02	llm: quick fix of fast_rms_norm (#9480 )	2023-11-16 14:42:16 +08:00
Yina Chen	d5263e6681	Add awq load support (#9453 ) * Support directly loading GPTQ models from huggingface * fix style * fix tests * change example structure * address comments * fix style * init * address comments * add examples * fix style * fix style * fix style * fix style * update * remove * meet comments * fix style --------- Co-authored-by: Yang Wang <yang3.wang@intel.com>	2023-11-16 14:06:25 +08:00
Ruonan Wang	d2c064124a	LLM: update rms related usage to suport ipex 2.1 new api (#9466 ) * update rms related usage * fix style	2023-11-16 11:21:50 +08:00
Yuwen Hu	731b0aaade	Empty cache after embedding to cpu (#9477 )	2023-11-16 10:52:30 +08:00
WeiguangHan	c487b53f21	LLM: only run arc perf test nightly (#9448 ) * LLM: only run arc perf test nightly * deleted unused python scripts * rebase main	2023-11-15 19:38:14 +08:00
WeiguangHan	0d55bbd9f1	LLM: ajust the order of some models (#9470 )	2023-11-15 17:04:59 +08:00
Xin Qiu	170e0072af	chatglm2 correctness test (#9450 ) * chatglm2 ut * some update * chatglm2 path * fix * add print	2023-11-15 15:44:56 +08:00
Ruonan Wang	0f82b8c3a0	LLM: update qlora example (#9454 ) * update qlora example * fix loss=0	2023-11-15 09:24:15 +08:00
Chen, Zhentao	dbbdb53a18	fix multiple gpu usage (#9459 )	2023-11-14 17:06:27 +08:00
Chen, Zhentao	d19ca21957	patch bigdl-llm model to harness by binding instead of patch file (#9420 ) * add run_llb.py * fix args interpret * modify outputs * update workflow * add license * test mixed 4 bit * update readme * use autotokenizer * add timeout * refactor workflow file * fix working directory * fix env * throw exception if some jobs failed * improve terminal outputs * Disable var which cause the run stuck * fix unknown precision * fix key error * directly output config instead * rm harness submodule	2023-11-14 12:51:39 +08:00
Yang Wang	51d07a9fd8	Support directly loading gptq models from huggingface (#9391 ) * Support directly loading GPTQ models from huggingface * fix style * fix tests * change example structure * address comments * fix style * address comments	2023-11-13 20:48:12 -08:00
WeiguangHan	d109275333	temporarily disable the test of some models (#9434 )	2023-11-13 18:50:53 +08:00
Chen, Zhentao	0ecb9efb05	use AutoTokenizer to enable more models (#9446 )	2023-11-13 17:47:43 +08:00
Cengguang Zhang	ece5805572	LLM: add chatglm3-6b to latency benchmark test. (#9442 )	2023-11-13 17:24:37 +08:00
Chen, Zhentao	5747e2fe69	fix multiple gpu usage of harness (#9444 )	2023-11-13 16:53:23 +08:00
Heyang Sun	da6bbc8c11	fix deepspeed dependencies to install (#9400 ) * remove reductant parameter from deepspeed install * Update install.sh * Update install.sh	2023-11-13 16:42:50 +08:00
Yuwen Hu	4faf5af8f1	[LLM] Add perf test for core on Windows (#9397 ) * temporary stop other perf test * Add framework for core performance test with one test model * Small fix and add platform control * Comment out lp for now * Add missing ymal file * Small fix * Fix sed contents * Small fix * Small path fixes * Small fix * Add update to ftp * Small upload fix * add chatglm3-6b * LLM: add model names * Keep repo id same as ftp and temporary make baichuan2 first priority * change order * Remove temp if false and separate pr and nightly results * Small fix --------- Co-authored-by: jinbridge <2635480475@qq.com>	2023-11-13 13:58:40 +08:00
Zheng, Yi	9b5d0e9c75	Add examples for Yi-6B (#9421 )	2023-11-13 10:53:15 +08:00
SONG Ge	2888818b3a	[LLM] Support mixed_fp8 on Arc (#9415 ) * ut gpu allocation memory fix * support mix_8bit on arc * rename mixed_4bit to mixed_fp4 and mixed_8bit to mixed_fp8 * revert unexpected changes * revert unexpected changes * unify common logits * rename in llm xmx_checker * fix typo error and re-unify	2023-11-13 09:26:30 +08:00
Wang, Jian4	ac7fbe77e2	Update qlora readme (#9416 )	2023-11-12 19:29:29 +08:00
Yining Wang	d7334513e1	codeshell: fix wrong links (#9417 )	2023-11-12 19:22:33 +08:00
Zheng, Yi	0674146cfb	Add cpu and gpu examples of distil-whisper (#9374 ) * Add distil-whisper examples * Fixes based on comments * Minor fixes --------- Co-authored-by: Ariadne330 <wyn2000330@126.com>	2023-11-10 16:09:55 +08:00
Ziteng Zhang	ad81b5d838	Update qlora README.md (#9422 )	2023-11-10 15:19:25 +08:00
Heyang Sun	b23b91407c	fix llm-init on deepspeed missing lib (#9419 )	2023-11-10 13:51:24 +08:00
SONG Ge	dfb00e37e9	[LLM] Add model correctness test on ARC for llama and falcon (#9347 ) * add correctness test on arc for llama model * modify layer name * add falcon ut * refactor and add ut for falcon model * modify lambda positions and update docs * replace loading pre input with last decodelayer output * switch lower bound to single model instead of using the common one * make the code implementation simple * fix gpu action allocation memory issue	2023-11-10 13:48:57 +08:00
dingbaorong	36fbe2144d	Add CPU examples of fuyu (#9393 ) * add fuyu cpu examples * add gpu example * add comments * add license * remove gpu example * fix inference time	2023-11-09 15:29:19 +08:00
Heyang Sun	df8e4d7889	[LLM] apply allreduce and bias to training in LowBitLinear (#9395 )	2023-11-09 14:35:54 +08:00
Wang, Jian4	40cead6b5b	LLM: Fix CPU qlora dtype convert issue (#9394 )	2023-11-09 14:34:01 +08:00
WeiguangHan	34449cb4bb	LLM: add remaining models to the arc perf test (#9384 ) * add remaining models * modify the filepath which stores the test result on ftp server * resolve some comments	2023-11-09 14:28:42 +08:00
Ruonan Wang	bfca76dfa7	LLM: optimize QLoRA by updating lora convert logic (#9372 ) * update convert logic of qlora * update * refactor and further improve performance * fix style * meet code review	2023-11-08 17:46:49 +08:00
binbin Deng	54d95e4907	LLM: add alpaca qlora finetuning example (#9276 )	2023-11-08 16:25:17 +08:00
binbin Deng	97316bbb66	LLM: highlight transformers version requirement in mistral examples (#9380 )	2023-11-08 16:05:03 +08:00
Ruonan Wang	7e8fb29b7c	LLM: optimize QLoRA by reducing convert time (#9370 )	2023-11-08 13:14:34 +08:00
Chen, Zhentao	298b64217e	add auto triggered acc test (#9364 ) * add auto triggered acc test * use llama 7b instead * fix env * debug download * fix download prefix * add cut dirs * fix env of model path * fix dataset download * full job * source xpu env vars * use matrix to trigger model run * reset batch=1 * remove redirect * remove some trigger * add task matrix * add precision list * test llama-7b-chat * use /mnt/disk1 to store model and datasets * remove installation test * correct downloading path * fix HF vars * add bigdl-llm env vars * rename file * fix hf_home * fix script path * rename as harness evalution * rerun	2023-11-08 10:22:27 +08:00
Yishuo Wang	bfd9f88f0d	[LLM] Use fp32 as dtype when batch_size <=8 and qtype is q4_0/q8_0/fp8 (#9365 )	2023-11-08 09:54:53 +08:00
WeiguangHan	84ab614aab	LLM: add more models and skip runtime error (#9349 ) * add more models and skip runtime error * upgrade transformers * temporarily removed Mistral-7B-v0.1 * temporarily disable the upload of arc perf result	2023-11-08 09:45:53 +08:00
Heyang Sun	fae6db3ddc	[LLM] refactor cpu low-bit forward logic (#9366 ) * [LLM] refactor cpu low-bit forward logic * fix style * Update low_bit_linear.py * Update low_bit_linear.py * refine	2023-11-07 15:09:16 +08:00
Heyang Sun	af94058203	[LLM] Support CPU deepspeed distributed inference (#9259 ) * [LLM] Support CPU Deepspeed distributed inference * Update run_deepspeed.py * Rename * fix style * add new codes * refine * remove annotated codes * refine * Update README.md * refine doc and example code	2023-11-06 17:56:42 +08:00
Jin Qiao	f9bf5382ff	Fix: add aquila2 in README (#9362 )	2023-11-06 16:37:57 +08:00
Jin Qiao	e6b6afa316	LLM: add aquila2 model example (#9356 )	2023-11-06 15:47:39 +08:00
Xin Qiu	1420e45cc0	Chatglm2 rope optimization on xpu (#9350 )	2023-11-06 13:56:34 +08:00
Yining Wang	9377b9c5d7	add CodeShell CPU example (#9345 ) * add CodeShell CPU example * fix some problems	2023-11-03 13:15:54 +08:00
ZehuaCao	ef83c3302e	Use to test llm-performance on spr-perf (#9316 ) * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update action.yml * Create cpu-perf-test.yaml * Update action.yml * Update action.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml * Update llm_performance_tests.yml	2023-11-03 11:17:16 +08:00
Yuwen Hu	a0150bb205	[LLM] Move embedding layer to CPU for iGPU inference (#9343 ) * Move embedding layer to CPU for iGPU llm inference * Empty cache after to cpu * Remove empty cache as it seems to have some negative effect to first token	2023-11-03 11:13:45 +08:00
Cheen Hau, 俊豪	8f23fb04dc	Add inference test for Whisper model on Arc (#9330 ) * Add inference test for Whisper model * Remove unnecessary inference time measurement	2023-11-03 10:15:52 +08:00
Zheng, Yi	63411dff75	Add cpu examples of WizardCoder (#9344 ) * Add wizardcoder example * Minor fixes	2023-11-02 20:22:43 +08:00
dingbaorong	2e3bfbfe1f	Add internlm_xcomposer cpu examples (#9337 ) * add internlm-xcomposer cpu examples * use chat * some fixes * add license * address shengsheng's comments * use demo.jpg	2023-11-02 15:50:02 +08:00
Jin Qiao	97a38958bd	LLM: add CodeLlama CPU and GPU examples (#9338 ) * LLM: add codellama CPU pytorch examples * LLM: add codellama CPU transformers examples * LLM: add codellama GPU transformers examples * LLM: add codellama GPU pytorch examples * LLM: add codellama in readme * LLM: add LLaVA link	2023-11-02 15:34:25 +08:00
Chen, Zhentao	d4dffbdb62	Merge harness (#9319 ) * add harness patch and llb script * add readme * add license * use patch instead * update readme * rename tests to evaluation * fix typo * remove nano dependency * add original harness link * rename title of usage * rename BigDLGPULM as BigDLLM * empty commit to rerun job	2023-11-02 15:14:19 +08:00
Zheng, Yi	63b2556ce2	Add cpu examples of skywork (#9340 )	2023-11-02 15:10:45 +08:00
dingbaorong	f855a864ef	add llava gpu example (#9324 ) * add llava gpu example * use 7b model * fix typo * add in README	2023-11-02 14:48:29 +08:00
Ziteng Zhang	dd3cf2f153	LLM: Add python 3.10 & 3.11 UT LLM: Add python 3.10 & 3.11 UT	2023-11-02 14:09:29 +08:00
Wang, Jian4	149146004f	LLM: Add qlora finetunning CPU example (#9275 ) * add qlora finetunning example * update readme * update example * remove merge.py and update readme	2023-11-02 09:45:42 +08:00
WeiguangHan	9722e811be	LLM: add more models to the arc perf test (#9297 ) * LLM: add more models to the arc perf test * remove some old models * install some dependencies	2023-11-01 16:56:32 +08:00
Jin Qiao	6a128aee32	LLM: add ui for portable-zip (#9262 )	2023-11-01 15:36:59 +08:00
Jasonzzt	cb7ef38e86	rerun	2023-11-01 15:30:34 +08:00
Jasonzzt	ba148ff3ff	test py311	2023-11-01 14:08:49 +08:00
Yishuo Wang	726203d778	[LLM] Replace Embedding layer to fix it on CPU (#9254 )	2023-11-01 13:58:10 +08:00
Jasonzzt	7c7a7f2ec1	spr & arc ut with python3,9&3.10&3.11	2023-11-01 13:17:13 +08:00
Yang Wang	e1bc18f8eb	fix import ipex problem (#9323 ) * fix import ipex problem * fix style	2023-10-31 20:31:34 -07:00
Cengguang Zhang	9f3d4676c6	LLM: Add qwen-vl gpu example (#9290 ) * create qwen-vl gpu example. * add readme. * fix. * change input figure and update outputs. * add qwen-vl pytorch model gpu example. * fix. * add readme.	2023-11-01 11:01:39 +08:00
Ruonan Wang	7e73c354a6	LLM: decoupling bigdl-llm and bigdl-nano (#9306 )	2023-11-01 11:00:54 +08:00
Yina Chen	2262ae4d13	Support MoFQ4 on arc (#9301 ) * init * update * fix style * fix style * fix style * meet comments	2023-11-01 10:59:46 +08:00
binbin Deng	8ef8e25178	LLM: improve response speed in multi-turn chat (#9299 ) * update * fix stop word and add chatglm2 support * remove system prompt	2023-11-01 10:30:44 +08:00
Cengguang Zhang	d4ab5904ef	LLM: Add python 3.10 llm UT (#9302 ) * add py310 test for llm-unit-test. * add py310 llm-unit-tests * add llm-cpp-build-py310 * test * test * test. * test * test * fix deactivate. * fix * fix. * fix * test * test * test * add build chatglm for win. * test. * fix	2023-11-01 10:15:32 +08:00
WeiguangHan	03aa368776	LLM: add the comparison between latest arc perf test and last one (#9296 ) * add the comparison between latest test and last one to html * resolve some comments * modify some code logics	2023-11-01 09:53:02 +08:00
Jin Qiao	96f8158fe2	LLM: adjust dolly v2 GPU example README (#9318 )	2023-11-01 09:50:22 +08:00
Jin Qiao	c44c6dc43a	LLM: add chatglm3 examples (#9305 )	2023-11-01 09:50:05 +08:00
Xin Qiu	06447a3ef6	add malloc and intel openmp to llm deps (#9322 )	2023-11-01 09:47:45 +08:00
Cheen Hau, 俊豪	d638b93dfe	Add test script and workflow for qlora fine-tuning (#9295 ) * Add test script and workflow for qlora fine-tuning * Test fix export model * Download dataset * Fix export model issue * Reduce number of training steps * Rename script * Correction	2023-11-01 09:39:53 +08:00
Ruonan Wang	d383ee8efb	LLM: update QLoRA example about accelerate version(#9314 )	2023-10-31 13:54:38 +08:00
Cheen Hau, 俊豪	cee9eaf542	[LLM] Fix llm arc ut oom (#9300 ) * Move model to cpu after testing so that gpu memory is deallocated * Add code comment --------- Co-authored-by: sgwhat <ge.song@intel.com>	2023-10-30 14:38:34 +08:00

... 4 5 6 7 8 ...

962 commits