* [LLM] Multi-process and distributed QLoRA on CPU platform * Update README.md * Update README.md * Update README.md * Update README.md * enable llm-init and bind to socket * refine * Update Dockerfile * add all files of qlora cpu example to /bigdl * fix * fix k8s * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuning-job.yaml * fix train sync and performance issues * add node affinity * disable user to tune cpu per pod * Update bigdl-qlora-finetuning-job.yaml
9 lines
510 B
YAML
9 lines
510 B
YAML
imageName: intelanalytics/bigdl-llm-finetune-qlora-cpu:2.5.0-SNAPSHOT
|
|
trainerNum: 2
|
|
microBatchSize: 8
|
|
nfsServerIp: your_nfs_server_ip
|
|
nfsPath: a_nfs_shared_folder_path_on_the_server
|
|
dataSubPath: alpaca_data_cleaned_archive.json # a subpath of the data file under nfs directory
|
|
modelSubPath: Llama-2-7b-chat-hf # a subpath of the model file (dir) under nfs directory
|
|
httpProxy: "your_http_proxy_like_http://xxx:xxxx_if_needed_else_empty"
|
|
httpsProxy: "your_https_proxy_like_http://xxx:xxxx_if_needed_else_empty"
|