ipex-llm/python/llm
Cengguang Zhang c0cd238e40
LLM: support llama2 8k input with w4a16. (#10677)
* LLM: support llama2 8k input with w4a16.

* fix comment and style.

* fix style.

* fix comments and split tensor to quantized attention forward.

* fix style.

* refactor name.

* fix style.

* fix style.

* fix style.

* refactor checker name.

* refactor native sdp split qkv tensor name.

* fix style.

* fix comment rename variables.

* fix co-exist of intermedia results.
2024-04-08 11:43:15 +08:00
..
dev add test api transformer_int4_fp16_gpu (#10627) 2024-04-07 15:47:17 +08:00
example LLM: upgrade deepspeed in AutoTP on GPU (#10647) 2024-04-07 14:05:19 +08:00
portable-zip Migrate portable zip to ipex-llm (#10617) 2024-04-07 13:58:58 +08:00
scripts LLM: check user env (#10580) 2024-03-29 17:19:34 +08:00
src/ipex_llm LLM: support llama2 8k input with w4a16. (#10677) 2024-04-08 11:43:15 +08:00
test Fix llamaindex ut (#10673) 2024-04-08 09:47:51 +08:00
.gitignore [LLM] add chatglm pybinding binary file release (#8677) 2023-08-04 11:45:27 +08:00
setup.py Update pip install to use --extra-index-url for ipex package (#10557) 2024-03-28 09:56:23 +08:00
version.txt Update setup.py and add new actions and add compatible mode (#25) 2024-03-22 15:44:59 +08:00