Hello,
I have an R regression problem that has been confusing me for the past two years, and I was hoping to reach out to this community for some guidance. It seems like it should be a simple regression model, but the considerations of the study design have pushed me in many different directions, and it appears to be much more complex than I was initially anticipating.
Here is the study to summarize it:
I want to find the effect of ceiling insulation on indoor room temperature. For my experiment, I looked at four different rooms in the same building. Two rooms had ceiling insulation, and two rooms did not have ceiling insulation. I placed a single temperature sensor in each room that took a temperature reading in Fahrenheit once a day. The temperature sensors were not synchronized at the same minute, but they were all initiated on the same day. My hypothesis is that during the summer, having ceiling insulation in a room will lead to cooler overall indoor room temperature than if the room had no ceiling insulation. My research question is thus: Did the two rooms with ceiling insulation have an overall cooler indoor temperature than the two rooms without ceiling insulation? Or more broadly: from this experiment, can we conclude that ceiling insulation causes indoor temperatures to be cooler during the summer?
In my attached example data csv file: Date is the date of each temperature reading. Room is one of the four rooms I am looking at, where IA: Insulation A (room with insulation A), IB: Insulation B (room with insulation B), NIA: No Insulation A (room with no insulation A), NIB: No Insulation B (room with no insulation[' B). Insulation: a binary where 1 = yes insulation and 2 = no insulation. Temperature.F: temperature reading in Fahrenheit at the corresponding date. (Please note that this is just an example file to show the structure of the data I am working with. The temperature values here are random, so you would of course not find any trends or actual meaningful results.)
I have really been struggling with how to construct a linear model of this in R, primarily with how to include the appropriate random effect. I feel that I need a random effect in the model to acknowledge that the four rooms in the studied building are different, each supporting a unique purpose, so each room would have different wall construction, air flow qualities, windows, appliances, and occupancy densities, thus contributing to indoor temperature. Of course, my data does not have all this detail. As displayed in my attached example data, all I have available to work with is a daily temperature reading for each room. For this reason, I plan on adding a random effect to the “room” parameter to account for the unique room features and characteristics I do not have data for.
This is what I have so far:
lmer(Temperature.F ~ Room*Insulation + Room*Date + (1 + Insulation | Room), correlation = CorAR1(form = ~date | room), data = my_data)
However, when I run this model, I get the following error message:
"Error: Dropping columns failed to produce full column rank design matrix"
I don’t really understand what this means, but one problem I suspect is with the data I have, where the “Room” parameter and “Insulation” parameter are in a way really the same thing, since in my data, the only thing that defines each room is the presence of insulation or not. With that being the case, there would thus be no difference between IA and IB and between NIA and NIB. So it is almost as if IA and IB are one room and NIA and NIB are one room. It was suggested to me to me, that for this reason, I should just get rid of the “Insulation” binary column, and change the room column so that I would only have two rooms: I (insulation) and NI (no insulation). This would be saying that all of the temperature readings I have occurred in the same room, which is not true! I feel I need to acknowledge that the four rooms I used for were study were in fact unique, and that is why I felt I should use a random effect for the “Room” parameter in my model, to account for their uniqueness. I am not sure if the R code I tried out is on the right track or not.
I feel that I need an interaction between Room and Insulation, because in the building, the Room is connected to the ceiling of course! And then I want to add a random effect for room with unique slopes for the insulation. Does this make sense? Also, how would I best incorporate the Data parameter? Since I think I should account for autocorrelation, since to my understanding, I am working with time series data here. From what I found, I should be using the CorAr1 correlation structure for this, but I am not sure.
I would really appreciate any input on this challenge and I was hoping I might be able to receive some R code ideas that might better suit the research objective I am going for here. I know this seems like a simple data set, but in a way, it is this simplicity that has led to all the confusion for me.
Thank you so much!