I am having WorldView-2 and WorldView-3 imagery (includes SWIR bands) of dense urban areas. I want to extract building footprints (2D and 3D) of very complex buildings. For 3D, I have DSM.
A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of at least three layers of nodes. In input instead of pixel's value I suggest to use a aggregate pixel's value that is a patch. For example the mean of patch of 100x100 pixel so you taking into accunt adjacent pixel, shape and so on. Patch dimension is function of image sizew and what do you want to classify