* add more model exampels for pipelien parallel inference
* add mixtral and vicuna models
* add yi model and past_kv supprot for chatglm family
* add docs
* doc update
* add license
* update
* copy files to this branch
* add tasks
* comment one model
* change the model to test the 4.36
* only test batch-4
* typo
* typo
* typo
* typo
* typo
* typo
* add 4.37-batch4
* change the file name
* revet yaml file
* no print
* add batch4 task
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
* Add gguf support.
* Avoid error when import ipex-llm for multiple times.
* Add check to avoid duplicate replace and revert.
* Add calling from check to avoid raising exceptions in the submodule.
* Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.
* add config and default value
* add config in taml
* remove lookahead and max_matching_ngram_size in config
* remove streaming and use_fp16_torch_dtype in test yaml
* update task in readme
* update commit of task
* add lookahead in test_api:transformer_int4_fp16_gpu
* change the short prompt of summarize
* change short prompt to cnn_64
* change short prompt of summarize
* change api
* move to pr mode and remove the build
* add batch4 yaml and remove the bigcode
* remove batch4
* revert the starcode
* remove the exclude
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
* add update_parent_folder and change the workflow file
* add update_parent_folder and change the workflow file
* move to pr mode and comment the test
* use one model per comfig
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
* Add GPU example for Qwen2
* Update comments in README
* Update README for Qwen2 GPU example
* Add CPU example for Qwen2
Sample Output under README pending
* Update generate.py and README for CPU Qwen2
* Update GPU example for Qwen2
* Small update
* Small fix
* Add Qwen2 table
* Update README for Qwen2 CPU and GPU
Update sample output under README
---------
Co-authored-by: Zijie Li <michael20001122@gmail.com>
* Change installation address
Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example
* Change Prompt
Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
* Create and update model minicpm
* Update model minicpm
Update model minicpm under GPU/PyTorch-Models
* Update readme and generate.py
change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0
"
* Update comments for minicpm GPU
Update comments for generate.py at minicpm GPU
* Add CPU example for MiniCPM
* Update minicpm README for CPU
* Update README for MiniCPM and Llama3
* Update Readme for Llama3 CPU Pytorch
* Update and fix comments for MiniCPM
* Update installation guide for pipeline parallel inference
* Small fix
* further fix
* Small fix
* Small fix
* Update based on comments
* Small fix
* Small fix
* Small fix
* add batch 2&4 and exclude to perf_test
* modify the perf-test&437 yaml
* modify llm_performance_test.yml
* remove batch 4
* modify check_results.py to support batch 2&4
* change the batch_size format
* remove genxir
* add str(batch_size)
* change actual_test_casese in check_results file to support batch_size
* change html highlight
* less models to test html and html_path
* delete the moe model
* split batch html
* split
* use installing from pypi
* use installing from pypi - batch2
* revert cpp
* revert cpp
* merge two jobs into one, test batch_size in one job
* merge two jobs into one, test batch_size in one job
* change file directory in workflow
* try catch deal with odd file without batch_size
* modify pandas version
* change the dir
* organize the code
* organize the code
* remove Qwen-MOE
* modify based on feedback
* modify based on feedback
* modify based on second round of feedback
* modify based on second round of feedback + change run-arc.sh mode
* modify based on second round of feedback + revert config
* modify based on second round of feedback + revert config
* modify based on second round of feedback + remove comments
* modify based on second round of feedback + remove comments
* modify based on second round of feedback + revert arc-perf-test
* modify based on third round of feedback
* change error type
* change error type
* modify check_results.html
* split batch into two folders
* add all models
* move csv_name
* revert pr test
* revert pr test
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
* Fix ipex auto importer with Python builtins.
* Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm.
* Remove import ipex in examples.