Some pre-trained CNN models presume the input matrix to be n*m*3 in their inner architecture; therefore, in order to comply with the input size, a binary image has to be duplicated into the 3 tuples (R,G,B) even though it is just black and white.
The image-binarization will help in sharpening the image by identifying the light and dark areas. As you might know that all the colours can be represented as some combination of three primary RGB colours. Each of these R, G and B values usually vary from 0 to 255 for each pixel.