5.2 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			5.2 KiB
		
	
	
	
	
	
	
	
BigDL-Nano Features
| Feature | Meaning | 
|---|---|
| Intel-openmp | Use Intel-openmp library to improve performance of multithread programs | 
| Jemalloc | Use jemalloc as allocator | 
| Tcmalloc | Use tcmalloc as allocator | 
| Neural-Compressor | Neural-Compressor int8 quantization | 
| OpenVINO | OpenVINO fp32/bf16/fp16/int8 acceleration on CPU/GPU/VPU | 
| ONNXRuntime | ONNXRuntime fp32/int8 acceleration | 
| CUDA patch | Run CUDA code even without GPU | 
| JIT | PyTorch JIT optimization | 
| Channel last | Channel last memory format | 
| BF16 | BFloat16 mixed precision training and inference | 
| IPEX | Intel-extension-for-pytorch optimization | 
| Multi-instance | Multi-process training and inference | 
| ray | Use ray as multi-process backend | 
Common Feature Support (Can be used in both PyTorch and TensorFlow)
| Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows | 
|---|---|---|---|---|---|
| Intel-openmp | ✅ | ✅ | ✅ | ② | ✅ | 
| Jemalloc | ✅ | ✅ | ✅ | ❌ | ❌ | 
| Tcmalloc | ✅ | ❌ | ❌ | ❌ | ❌ | 
| Neural-Compressor | ✅ | ✅ | ❌ | ❌ | ? | 
| OpenVINO | ✅ | ① | ❌ | ❌ | ④ | 
| ONNXRuntime | ✅ | ① | ✅ | ❌ | ✅ | 
| ray | ✅ | ? | ? | ? | ④ | 
PyTorch Feature Support
| Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows | 
|---|---|---|---|---|---|
| CUDA patch | ✅ | ✅ | ✅ | ? | ✅ | 
| JIT | ✅ | ✅ | ✅ | ? | ✅ | 
| Channel last | ✅ | ✅ | ✅ | ? | ✅ | 
| BF16 | ✅ | ✅ | ⭕ | ⭕ | ✅ | 
| IPEX | ✅ | ✅ | ❌ | ❌ | ❌ | 
| Multi-instance | ✅ | ✅ | ② | ② | ② | 
TensorFlow Feature Support
| Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows | 
|---|---|---|---|---|---|
| BF16 | ✅ | ✅ | ⭕ | ⭕ | ✅ | 
| Multi-instance | ③ | ③ | ②③ | ②③ | ❌ | 
Symbol Meaning
| Symbol | Meaning | 
|---|---|
| ✅ | Supported | 
| ❌ | Not supported | 
| ⭕ | All Mac machines (Intel/M-series chip) do not support bf16 instruction set, so this feature is pointless | 
| ① | This feature is only supported when used together with jemalloc | 
| ② | This feature is supported but without any performance guarantee | 
| ③ | Only Multi-instance training is supported for now | 
| ④ | This feature is only supported when using PyTorch | 
| ? | Not tested |