I have to find correaltion (if it exists) between the age of the buildings and their subsidence. I thought first of clustering them based on the plot that shows their age on the x-axis and subsidence on the y-axis.

After that, I thought of making 9 clusters of age groups, containing their individual subsidence of each buildings, of every group.

The situation is that building's age does not follow normal distribution and also my samples are of different sizes (e.g 16, 2090, 81, 537 etc)

1st) can you recommend me any specific clustering or other method to group the buildings correctly?

2nd) if I cannot generate similar groups, how can I counduct anlyses that will depict their average subsidence as a group, in order to compare it with the other age groups.

Age groups in the form of: "0-10 year old buildings", having avg subsidence -0.8 mm/year..and number of buildings 81....................with..............

"10-75 year old buildings", having avg subsidence -1.2 mm/year and number of buildings 2090.....etc.

other groups have 200, 500 buildings, affecting their average subsidence.

Please help

Similar questions and discussions