Commit graph

1189 commits

Author SHA1 Message Date
Cengguang Zhang
3e2662c87e
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
Jin Qiao
73a67804a4
GPU configuration update for examples (windows pip installer, etc.) (#10762)
* renew chatglm3-6b gpu example readme

fix

fix

fix

* fix for comments

* fix

* fix

* fix

* fix

* fix

* apply on HF-Transformers-AutoModels

* apply on PyTorch-Models

* fix

* fix
2024-04-15 17:42:52 +08:00
yb-peng
b5209d3ec1
Update example/GPU/PyTorch-Models/Model/llava/README.md (#10757)
* Update example/GPU/PyTorch-Models/Model/llava/README.md

* Update README.md

fix path in windows installation
2024-04-15 13:01:37 +08:00
binbin Deng
3d561b60ac
LLM: add enable_xetla parameter for optimize_model API (#10753) 2024-04-15 12:18:25 +08:00
Jiao Wang
a9a6b6b7af
Fix baichuan-13b issue on portable zip under transformers 4.36 (#10746)
* fix baichuan-13b issue

* update

* update
2024-04-12 16:27:01 -07:00
Jiao Wang
9e668a5bf0
fix_internlm-chat-7b-8k repo name in examples (#10747) 2024-04-12 10:15:48 -07:00
binbin Deng
c3fc8f4b90
LLM: add bs limitation for llama softmax upcast to fp32 (#10752) 2024-04-12 15:40:25 +08:00
hxsz1997
0d518aab8d
Merge pull request #10697 from MargarettMao/ceval
combine english and chinese, remove nan
2024-04-12 14:37:47 +08:00
jenniew
dd0d2df5af Change fp16.csv mistral-7b-v0.1 into Mistral-7B-v0.1 2024-04-12 14:28:46 +08:00
jenniew
7309f1ddf9 Mofidy Typos 2024-04-12 14:23:13 +08:00
jenniew
cb594e1fc5 Mofidy Typos 2024-04-12 14:22:09 +08:00
jenniew
382c18e600 Mofidy Typos 2024-04-12 14:15:48 +08:00
jenniew
1a360823ce Mofidy Typos 2024-04-12 14:13:21 +08:00
jenniew
cdbb1de972 Mark Color Modification 2024-04-12 14:00:50 +08:00
jenniew
9bbfcaf736 Mark Color Modification 2024-04-12 13:30:16 +08:00
jenniew
bb34c6e325 Mark Color Modification 2024-04-12 13:26:36 +08:00
Yishuo Wang
8086554d33
use new fp16 sdp in llama and mistral (#10734) 2024-04-12 10:49:02 +08:00
Yang Wang
019293e1b9
Fuse MOE indexes computation (#10716)
* try moe

* use c++ cpu to compute indexes

* fix style
2024-04-11 10:12:55 -07:00
jenniew
b151a9b672 edit csv_to_html to combine en & zh 2024-04-11 17:35:36 +08:00
binbin Deng
70ed9397f9
LLM: fix AttributeError of FP16Linear (#10740) 2024-04-11 17:03:56 +08:00
Keyan (Kyrie) Zhang
1256a2cc4e
Add chatglm3 long input example (#10739)
* Add long context input example for chatglm3

* Small fix

* Small fix

* Small fix
2024-04-11 16:33:43 +08:00
hxsz1997
fd473ddb1b
Merge pull request #10730 from MargarettMao/MargarettMao-parent_folder
Edit ppl update_HTML_parent_folder
2024-04-11 15:45:24 +08:00
Qiyuan Gong
2d64630757
Remove transformers version in axolotl example (#10736)
* Remove transformers version in axolotl requirements.txt
2024-04-11 14:02:31 +08:00
yb-peng
2685c41318
Modify all-in-one benchmark (#10726)
* Update 8192 prompt in all-in-one

* Add cpu_embedding param for linux api

* Update run.py

* Update README.md
2024-04-11 13:38:50 +08:00
Xiangyu Tian
301504aa8d
Fix transformers version warning (#10732) 2024-04-11 13:12:49 +08:00
Wenjing Margaret Mao
9bec233e4d
Delete python/llm/test/benchmark/perplexity/update_html_in_parent_folder.py
Delete due to repetition
2024-04-11 07:21:12 +08:00
Cengguang Zhang
4b024b7aac
LLM: optimize chatglm2 8k input. (#10723)
* LLM: optimize chatglm2 8k input.

* rename.
2024-04-10 16:59:06 +08:00
Yuxuan Xia
cd22cb8257
Update Env check Script (#10709)
* Update env check bash file

* Update env-check
2024-04-10 15:06:00 +08:00
Shaojun Liu
29bf28bd6f
Upgrade python to 3.11 in Docker Image (#10718)
* install python 3.11 for cpu-inference docker image

* update xpu-inference dockerfile

* update cpu-serving image

* update qlora image

* update lora image

* update document
2024-04-10 14:41:27 +08:00
Qiyuan Gong
b727767f00
Add axolotl v0.3.0 with ipex-llm on Intel GPU (#10717)
* Add axolotl v0.3.0 support on Intel GPU.
* Add finetune example on llama-2-7B with Alpaca dataset.
2024-04-10 14:38:29 +08:00
Wang, Jian4
c9e6d42ad1
LLM: Fix chatglm3-6b-32k error (#10719)
* fix chatglm3-6b-32k

* update style
2024-04-10 11:24:06 +08:00
Keyan (Kyrie) Zhang
585c174e92
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.

* Fix style
2024-04-10 10:48:46 +08:00
Jiao Wang
d1eaea509f
update chatglm readme (#10659) 2024-04-09 14:24:46 -07:00
Jiao Wang
878a97077b
Fix llava example to support transformerds 4.36 (#10614)
* fix llava example

* update
2024-04-09 13:47:07 -07:00
Jiao Wang
1e817926ba
Fix low memory generation example issue in transformers 4.36 (#10702)
* update cache in low memory generate

* update
2024-04-09 09:56:52 -07:00
Yuwen Hu
97db2492c8
Update setup.py for bigdl-core-xe-esimd-21 on Windows (#10705)
* Support bigdl-core-xe-esimd-21 for windows in setup.py

* Update setup-llm-env accordingly
2024-04-09 18:21:21 +08:00
Zhicun
b4147a97bb
Fix dtype mismatch error (#10609)
* fix llama

* fix

* fix code style

* add torch type in model.py

---------

Co-authored-by: arda <arda@arda-arc19.sh.intel.com>
2024-04-09 17:50:33 +08:00
Shaojun Liu
f37a1f2a81
Upgrade to python 3.11 (#10711)
* create conda env with python 3.11

* recommend to use Python 3.11

* update
2024-04-09 17:41:17 +08:00
Yishuo Wang
8f45e22072
fix llama2 (#10710) 2024-04-09 17:28:37 +08:00
Yishuo Wang
e438f941f2
disable rwkv5 fp16 (#10699) 2024-04-09 16:42:11 +08:00
Cengguang Zhang
6a32216269
LLM: add llama2 8k input example. (#10696)
* LLM: add llama2-32K example.

* refactor name.

* fix comments.

* add IPEX_LLM_LOW_MEM notes and update sample output.
2024-04-09 16:02:37 +08:00
Wenjing Margaret Mao
289cc99cd6
Update README.md (#10700)
Edit "summarize the results"
2024-04-09 16:01:12 +08:00
Wenjing Margaret Mao
d3116de0db
Update README.md (#10701)
edit "summarize the results"
2024-04-09 15:50:25 +08:00
Chen, Zhentao
d59e0cce5c
Migrate harness to ipexllm (#10703)
* migrate to ipexlm

* fix workflow

* fix run_multi

* fix precision map

* rename ipexlm to ipexllm

* rename bigdl to ipex  in comments
2024-04-09 15:48:53 +08:00
Keyan (Kyrie) Zhang
1e27e08322
Modify example from fp32 to fp16 (#10528)
* Modify example from fp32 to fp16

* Remove Falcon from fp16 example for now

* Remove MPT from fp16 example
2024-04-09 15:45:49 +08:00
binbin Deng
44922bb5c2
LLM: support baichuan2-13b using AutoTP (#10691) 2024-04-09 14:06:01 +08:00
Yina Chen
c7422712fc
mistral 4.36 use fp16 sdp (#10704) 2024-04-09 13:50:33 +08:00
Ovo233
dcb2038aad
Enable optimization for sentence_transformers (#10679)
* enable optimization for sentence_transformers

* fix python style check failure
2024-04-09 12:33:46 +08:00
Yang Wang
5a1f446d3c
support fp8 in xetla (#10555)
* support fp8 in xetla

* change name

* adjust model file

* support convert back to cpu

* factor

* fix bug

* fix style
2024-04-08 13:22:09 -07:00
jenniew
591bae092c combine english and chinese, remove nan 2024-04-08 19:37:51 +08:00