02 February 2021 5 2K Report

I am conducting exploratory research about users on the Ethereum blockchain (I obtain the data from big query), and I would like to cluster the users, mostly by transactional features, for persona/archetype development.

However, the data is not normally distributed, many of the variables have a power-law distribution and some have no clear distribution pattern. It is very likely that I would like to include more than five variables.

Besides the question of what algorithm fits best, is it reasonable to normalize all variables (to a more normal distribution) and to perform a z-transformation?

More Dinh Truong Vu's questions See All
Similar questions and discussions