add pv for llm-serving k8s deployment (#9906)

2024-01-16 11:32:54 +08:00 · 2024-01-16 11:32:54 +08:00 · 05ea0ecd70
commit 05ea0ecd70
parent 8d7326ae03
3 changed files with 61 additions and 4 deletions
--- a/docker/llm/serving/cpu/kubernetes/README.md
+++ b/docker/llm/serving/cpu/kubernetes/README.md
@ -35,6 +35,27 @@ The entrypoint of the image will try to set `OMP_NUM_THREADS` to the correct num
 If you want to use the vllm AsyncLLMEngine for serving, you should set the args -w vllm_worker in worker part of deployment.yaml.
 ### PersistentVolume
 We use the following yaml file for PersistentVolume deployment:
 ```yaml
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: models-pv
  labels:
    app: models
 spec:
  capacity:
    storage: 10Gi #Modify according to model size
  accessModes:
    - ReadWriteMany
  storageClassName: models
  nfs:
    path: YOUR_NFS_PATH
    server: YOUR_NFS_SERVER
 ```
 Then you should upload model to `YOUR_NFS_PATH`
 ### Controller
@ -154,8 +175,8 @@ spec:
      restartPolicy: "Always"
      volumes:
      - name: llm-models
-        hostPath:
+        persistentVolumeClaim:
-          path: /home/llm/models # change this in other envs
+          claimName: models-pvc
 ```
 You may want to change the `MODEL_PATH` variable in the yaml.  Also, please remember to change the volume path accordingly.
@ -200,6 +221,12 @@ print(completion.choices[0].message.content)
 #### cURL
 cURL is another good tool for observing the output of the api.
 Before using cURL, you should set your `http_proxy` and `https_proxy` to empty
 ```bash
 export http_proxy=
 export https_proxy=
 ```
 For the following examples, you may also change the service deployment address.
 List Models:
--- a/docker/llm/serving/cpu/kubernetes/deployment.yaml
+++ b/docker/llm/serving/cpu/kubernetes/deployment.yaml
@ -1,4 +1,19 @@
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: models-pvc
 spec:
  accessModes:
    - ReadWriteMany
  storageClassName: models
  resources:
    requests:
      storage: 10Gi
  # selector: # Optional
  #   matchLabels:
  #     app: kubecon-models
 ---
 apiVersion: v1
 kind: Pod
 metadata:
  name: bigdl-fschat-a1234bd-controller
@ -105,5 +120,5 @@ spec:
      restartPolicy: "Always"
      volumes:
      - name: llm-models
-        hostPath:
+        persistentVolumeClaim:
-          path: /home/llm/models # change this in other envs
+          claimName: models-pvc
--- a/docker/llm/serving/cpu/kubernetes/models-pv.yaml
+++ b/docker/llm/serving/cpu/kubernetes/models-pv.yaml
@ -0,0 +1,15 @@
 apiVersion: v1
 kind: PersistentVolume
 metadata:
  name: models-pv
  labels:
    app: models
 spec:
  capacity:
    storage: 10Gi #Modify according to model size
  accessModes:
    - ReadWriteMany
  storageClassName: models
  nfs:
    path: YOUR_NFS_PATH
    server: YOUR_NFS_SERVER