Yang Wang
|
feb3af0567
|
Optimize transformer int4 memory footprint (#8579)
|
2023-07-20 20:22:13 -07:00 |
|
Zhao Changmin
|
23f6a4c21f
|
LLM: Optimize transformer int4 loading (#8499)
* LLM: Optimize transformer int4 loading
|
2023-07-12 15:25:42 +08:00 |
|
Xin Qiu
|
cd7a980ec4
|
Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481)
* quant in Q4 5 8
* meet code review
* update readme
* style
* update
* fix error
* fix error
* update
* fix style
* update
* Update README.md
* Add load_in_low_bit
|
2023-07-12 08:23:08 +08:00 |
|
Zhao Changmin
|
81d655cda9
|
LLM: transformer int4 save and load (#8462)
* LLM: transformer int4 save and load
|
2023-07-10 16:34:41 +08:00 |
|
Yang Wang
|
449aea7ffc
|
Optimize transformer int4 loading memory (#8400)
* Optimize transformer int4 loading memory
* move cast to convert
* default settting low_cpu_mem_usage
|
2023-06-30 20:12:12 -07:00 |
|
Yang Wang
|
ce6d06eb0a
|
Support directly quantizing huggingface transformers into 4bit format (#8371)
* Support directly quantizing huggingface transformers into 4bit format
* refine example
* license
* fix bias
* address comments
* move to ggml transformers
* fix example
* fix style
* fix style
* address comments
* rename
* change API
* fix style
* add lm head to conversion
* address comments
|
2023-06-25 16:35:06 +08:00 |
|