Confusion matrices, Classification report provide insights in terms of accuracy. how to evaluate classification models on tasks while tackling domain specfic problems e.g mechanical systems engineering problems, where we are mostly dealing with time-series data or its frequency domain features? and want to use classification models for tasks such as Anomaly detection, fault classification? which types of visulization will explain my research more accurately?