Most of the papers have compared different hyperparameters such as layers, initializers, activation function, and optimizer algorithm in the CNN algorithm for detecting network intrusion. This is practical, while in the scientific view, I have not found any proofs for it.
Network data itself is not a kind of image data and is converted to an image through the process.
How can we prove in scientific view that using CNN algorithm not only properly analyzes images (with multiple dimension data like pixel, domain, phase, and color), but also well-performed in a network data (with just one dimension: features) converted to an image (features depict as pixels)?