Commit graph

6 commits

Author SHA1 Message Date
Yang Wang
feb3af0567 Optimize transformer int4 memory footprint (#8579) 2023-07-20 20:22:13 -07:00
Zhao Changmin
23f6a4c21f LLM: Optimize transformer int4 loading (#8499)
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Xin Qiu
cd7a980ec4 Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481)
* quant in Q4 5 8

* meet code review

* update readme

* style

* update

* fix error

* fix error

* update

* fix style

* update

* Update README.md

* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Zhao Changmin
81d655cda9 LLM: transformer int4 save and load (#8462)
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
Yang Wang
449aea7ffc Optimize transformer int4 loading memory (#8400)
* Optimize transformer int4 loading memory

* move cast to convert

* default settting low_cpu_mem_usage
2023-06-30 20:12:12 -07:00
Yang Wang
ce6d06eb0a Support directly quantizing huggingface transformers into 4bit format (#8371)
* Support directly quantizing huggingface transformers into 4bit format

* refine example

* license

* fix bias

* address comments

* move to ggml transformers

* fix example

* fix style

* fix style

* address comments

* rename

* change API

* fix style

* add lm head to conversion

* address comments
2023-06-25 16:35:06 +08:00