Hi,
I’m studying the effect of joint (cracks) sets spacing and persistence on blasted rock size so I have two independent categorical variables (labelled SP for spacing and PER for persistence) that have 5 levels of measurement ranges each. The dependent variable is the blasted rock size (Xc ) i.e I want to know how the spacing and persistence of the existing joints on a rock face would affect the size of blasted rocks. Measurement levels for both spacing and persistence are listed below
spacing levels:
SP1: less than 60mm SP2: 60-200mm SP3: 200-600mm SP4:0.6-2m SP5: more than 2m
persistence levels:
PER1: more than 20m PER2: 10-20m PER3: 3-10m PER4: 1-3m PER5: less than 1m
Spacing and Persistence were recoded as ranges since they were estimated and not measured individually as it'd take too much time to measure each one (1 set of joint may have at least 10 joints, some can reach 50 or more and the measurement are not exactly the same between joints belonging to the same set. Measurement was done manually on site)
Initially, I ran the regression with these two variables as categorical variables but the problem is the levels are not mutually exclusive. 1 rock slope could consist of 2 or more crack sets hence the situation where more than 1 levels of spacing and persistence can be observed. As an example, rock face A consist of 3 crack sets:
Set 1 (quantity: 25) SP3 PER5 Set 2 (quantity: 30) SP4 PER6 set 3 (quantity: 56) SP2 PER3
As can be seen, 1 rock face contains 3 different levels of SP and PER.
Technically, these are ordinal variables and as explained above if I choose to treat them as categorical I face the problem of non-mutually exclusive levels. Recently, I found out that that ordinal variable can be treated as continuous which seems to solve my problems with non-mutually exclusive levels of the variables if I enter the variables as categorical. My main concern is to look at the variable as a whole not by its levels so it might be what I need.
My question is, is it correct if I assign the numerical value to the levels like this in order to treat the variables as continuous? 1 to 5, from lowest to highest.
Spacing: 1: less than 60mm 2: 60-200mm 3: 200-600mm 4:0.6-2m 5: more than 2m
persistence: 1: less than 1m 2: 1-3m 3: 3-10m 4: 10-20m 4: more than 20m
and then run regression as I would with the usual continuous variable? Plus, for prediction, once I get the equation, do I insert the value 1-5 as the X in the equation? I am still confused with the prediction step since even if I treat it as continuous I'd still have the problem with the presence of different levels of SP and PER. Or is there another way around this problem?
2nd question is: as provided in data example for rock face A, is it correct to repeat the data input according to the quantity? as in I’d entered set 1 data 25 times, set 2 data 30 times and set 3 data 56 times.
I am very new to statistic and learning it on my own so I might be wrong with something in this field. Any answers, suggestion and advise are very much appreciated. Thank you in advance!