Chu,Youcheng
ffa9a9e1b3
Update streaming in npu examples ( #12495 )
...
* feat: add streaming
* Update readme accordingly
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-04 17:51:10 +08:00
Yuwen Hu
ef4028ac2d
[NPU] Support split lm_head for Qwen2 with CPP ( #12491 )
...
* Use split for Qwen2 lm_head instead of slice in optimize_pre
* Support split lm_head in Qwen2 python cpp backend
* Fit with Python acc lib pipeline
* Removed default mixed_precision=True in all-in-one and related examples
* Small fix
* Style fix
* Fix based on comments
* Fix based on comments
* Stype fix
2024-12-04 14:41:08 +08:00
Jin, Qiao
7082844f3f
Fix NPU LLM example save/load tokenizer ( #12485 )
2024-12-03 16:30:55 +08:00
binbin Deng
ab01753b1c
[NPU] update save-load API usage ( #12473 )
2024-12-03 09:46:15 +08:00
binbin Deng
c911026f03
[NPU C++] Update model support & examples & benchmark ( #12466 )
2024-11-29 13:35:58 +08:00
binbin Deng
14d8d3d8af
Integrate NPU C++ imple into ipex-llm ( #12461 )
2024-11-29 09:25:37 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples ( #12438 )
...
* update qwen2.5-3B
* update convert
* small fix
* replace load_in_low_bit with low_bit
* small fix
2024-11-25 16:38:31 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline ( #12279 )
...
* first commit
* update example
* fix style
* update example
* embedding as const
* fix generate
* code refactor
* meet code review
* fix style
* change max_output_len to max_context_len
* fix all-in-one
* fix example
* add check for new tokens
2024-10-28 16:05:49 +08:00
Jin, Qiao
8fa98e2742
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" ( #12245 )
...
* Remove qwen2-7b from npu example readme
* fix
2024-10-22 17:07:51 +08:00
Jin, Qiao
2bedb17be7
Add Qwen2.5 NPU Example ( #12110 )
...
* Add Qwen2.5 NPU Example
* fix
* Merge qwen2.py and qwen2.5.py into qwen.py
* Fix description
2024-09-25 15:20:03 +08:00