ipex-llm/python/llm
Yina Chen e37f951cce
[NPU] Groupwise (#12241)
* dq divide

* fix

* support attn divide

* update qwen2 7b

* divide down_proj & other linear

* use concat & reduce sum

* support scale after

* support qwen2

* w/ mm

* update reshape

* spda

* split

* split 2+

* update

* lm head-> 28

* no scale

* update

* update

* update

* fix style

* fix style

* to split linear

* update

* update code

* address comments

* fix style & remove redundant code & revert benchmark scripts

* fix style & remove code

* update save & load

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2024-10-23 14:10:58 +08:00
..
dev [NPU] Groupwise (#12241) 2024-10-23 14:10:58 +08:00
example Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245) 2024-10-22 17:07:51 +08:00
portable-zip Fix null pointer dereferences error. (#11125) 2024-05-30 16:16:10 +08:00
scripts fix typo in python/llm/scripts/README.md (#11536) 2024-07-09 09:53:14 +08:00
src/ipex_llm [NPU] Groupwise (#12241) 2024-10-23 14:10:58 +08:00
test Further update windows gpu perf test regarding results integrity check (#12232) 2024-10-18 18:15:13 +08:00
tpp OSPDT: add tpp licenses (#11165) 2024-06-06 10:59:06 +08:00
.gitignore [LLM] add chatglm pybinding binary file release (#8677) 2023-08-04 11:45:27 +08:00
setup.py Support cpp release for ARL on Windows (#12189) 2024-10-14 17:20:31 +08:00
version.txt Update pypi tag to 2.2.0.dev0 (#11895) 2024-08-22 16:48:09 +08:00