* Change order of LLM in header * Some updates to footer * Add BigDL-LLM index page and basic file structure * Update index page for key features * Add initial content for BigDL-LLM in 5 mins * Improvement to footnote * Add initial contents based on current contents we have * Add initial quick links * Small fix * Rename file * Hide cli section for now and change model supports to examples * Hugging Face format -> Hugging Face transformers format * Add placeholder for GPU supports * Add GPU related content structure * Add cpu/gpu installation initial contents * Add initial contents for GPU supports * Add image link to LLM index page * Hide tips and known issues for now * Small fix * Update based on comments * Small fix * Add notes for Python 3.9 * Add placehoder optimize model & reveal CLI; small revision * examples add gpu part * Hide CLI part again for first version of merging * add keyfeatures-optimize_model part (#1) * change gif link to the ones hosted on github * Small fix --------- Co-authored-by: plusbang <binbin1.deng@intel.com> Co-authored-by: binbin Deng <108676127+plusbang@users.noreply.github.com>
		
			
				
	
	
	
	
		
			1.3 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			1.3 KiB
		
	
	
	
	
	
	
	
CLI (Command Line Interface) Tool
.. note:: 
   Currently ``bigdl-llm`` CLI supports *LLaMA* (e.g., vicuna), *GPT-NeoX* (e.g., redpajama), *BLOOM* (e.g., pheonix) and *GPT2* (e.g., starcoder) model architecture; for other models, you may use the ``transformers``-style or LangChain APIs.
Convert Model
You may convert the downloaded model into native INT4 format using llm-convert.
# convert PyTorch (fp16 or fp32) model; 
# llama/bloom/gptneox/starcoder model family is currently supported
llm-convert "/path/to/model/" --model-format pth --model-family "bloom" --outfile "/path/to/output/"
# convert GPTQ-4bit model
# only llama model family is currently supported
llm-convert "/path/to/model/" --model-format gptq --model-family "llama" --outfile "/path/to/output/"
Run Model
You may run the converted model using llm-cli or llm-chat (built on top of main.cpp in llama.cpp)
# help
# llama/bloom/gptneox/starcoder model family is currently supported
llm-cli -x gptneox -h
# text completion
# llama/bloom/gptneox/starcoder model family is currently supported
llm-cli -t 16 -x gptneox -m "/path/to/output/model.bin" -p 'Once upon a time,'
# chat mode
# llama/gptneox model family is currently supported
llm-chat -m "/path/to/output/model.bin" -x llama