Are there clear data on the differences in resource requirements (such as peak GPU memory usage, number of CPU cores needed, memory IO frequency) during deployment for different AI models (such as deep learning models / traditional machine learning models, image-based / NLP models)? What are the specific ratios of resource "surplus" to "deficit": for example, is there detailed data showing that the average GPU utilisation for a certain type of model is only 30% (surplus), while another type of model experiences deployment delays exceeding expectations by 200% due to insufficient GPUs (deficit)? Are there fluctuations over time: is there a situation where "peak periods (such as simultaneous deployment of multiple models) experience more severe resource competition, while off-peak periods have a higher rate of resource idleness"?

More Evan Tsang's questions See All
Similar questions and discussions