I'm trying to create a multiple regression that explains the final price of electric energy in Brasil. Any suggests about which could be the independent variables?
First, I think you need to distinguish economic end-use sectors, at least by residential and nonresidential end-users; perhaps residential, industrial and 'other.'
I worked with official statistics at the US Energy Information Administration (EIA), but not really from the economic point of view in my work, only from the standpoint of estimating the current revenues, sales, and thus prices, for the current finite population, based on sampled data, and related data such as a previous census. If you want to specifically account for the supply and demand influences which determine a given price, that sounds more complex, and data may be harder to obtain.
Price is a ratio of revenue to sales volume. Might you need to estimate each, with covariances? If you are looking at finite population sampled data for a period, and estimating growth rate from a previous period, then both revenue and sales can have their own separate simple ratio estimates, though covariance is then a consideration when you look at variance for price. Because the impact of rainfall for hydroelectric plants, and fuel costs for others, etc., are 'built in' to the sample data collected, that should also be 'built in' to the predicted data (from "prediction," not a forecast, for simple regressions on the current finite population), and with the sampled and predicted data of the current finite population, you can estimate total revenues and total sales volume, and divide.
If you are looking at a time series, then seasonality is important. For that, rainfall (impacting hydroelectric generation), and fuel cost fluctuations seem largely 'built in' again, but a forecast can be preturbed by unusual changes in currentrain fuel prices, and rainfall, etc.
Since prices are impacted by both supply and demand, it seems you have to consider each for your question. It seems to me that if you have a time series, the supply variables would probably be more volatile, causing breaks in the series, than demand variables, as I would imagine demand to persist, but not change abruptly.
Though I worked with electric power data at the EIA, I think your question may be rather far outside of my main expertise. The above is just my 'guesswork' here. However, the EIA website may give you some help, and they do take questions, or did when I worked there, so you may want to see what you can find out there, or they may be able to refer you to one of the US Department of Energy laboratories, such as Oak Ridge. Unfortunately the current administration - if I can call it that - may already be making it hard for the dedicated US civil servants, who have gladly provided information to everyone worldwide in the past, with regarding to providing information now. Best if you ask soon. The situation may get worse ... before it hopefully gets better, post this administration.
To create a multiple regression that explains the final price of electric energy in Brasil, you could use the 'input factor costs' or the 'demand supply effects' to model the price. But, input factors will be different for different types of energy production processes & will again differ even for same type of energy production as input factors are source specific. To put it simply, input factor cost for thermal, hydel, wind, atomic, etc. power will differ & even for the same type of energy e.g., thermal, cost of fuel may differ from plant to plant depending on availability of coal & distance from coal mining sites. All this discussion suggests that any regression model for energy pricing will be for a specific type of energy produced & at a specific plant. To generalise the price mdel for the country, you may have to opt for demand-supply relationship for energy in Brasil. Anyway a nice topic chosen. Best wishes for your research.
Before make a regression you should know that there is a theory of the equilibrium of the demand and supply. So, you should make the 3 steps of the regression: the price vs the demand, the price vs the supply and finally the equailibrium of the prediction of the future demand va future supply
I would find the consumption of full oil for to generate electricity. After I should know the prices of that full oil. And the end, I should to find out the kW-h of generator. All that in the same period of time; dayly, weekly, mounthly or a year. That's all.
Using a multivariate analysis, I would try to classified the variables into several groups. One of them could be structure of the energy matrix (e.g. proportion of renewable and conventional sources, technology alternatives or substitutes), policy design (sevaral instruments such as subsidies, tax incentives, tariffs, environmental goals and so on) and market policy drivers (net energy imports, available resources, etc.)
Currently, I am dealing with a similar research question regarding offshore wind (the dependent variable is different according your question), and I trying to capture these different dimensions.