I attended a deep learning course yesterday (I'm new to ML and am relatively new to programming in general) and the instructor mentioned that we need to normalise data for input into the deep learning network.

Unfortunately, he didn't go into any real detail about how to do this.

I have Googled and figured out how to normalise numeric data (e.g. sensor readings), but I can't find any solutions for dealing with categorical data. The instructor gave an example of a NN trying to predict house prices.

For example, one value in the dataset might be outer wall colour, which obviously isn't a number (to begin with), i.e. it might be brown/grey/cream/red brick/etc.

How do I normalise this sort of data?

Is it simply a case of assigning each category a number, and then normalising that number dataset?

e.g. brown = 1, grey = 2, cream = 3, red brick = 4, and so on?

...and then normalising that 1-4 dataset?

Is it as simple as that or is there more to it?

Thanks in advance for thoughts.

(I don't specifically need this knowledge for a project right now, it's more a matter of interest)

More David Hunter's questions See All
Similar questions and discussions