I'm currently writing my Masters Thesis, and after an extensive literature study I've finally found my topic. The wind power company I'm doing my thesis at have accumulated a large amount of data over the years the've had their turbines up and running. Almost 600 parameters are logged every 5 minutes and stored in a database that I have access to.

Their initial wish was for me to find a way to optimise their turbines using this data, but I realised pretty quickly that real-time data and extensive on-site measurements are needed to make a conventional optimisation. In addition to that, it didn't seem like a very academic study, but more a way to avoid hiring a consultant.

At first I thought of a lot of different ways to make my study a new take on the matter, but the more articles I read, the more I realised that there has been alot of studies in this field, and my ideas were already researched. When reading all those articles, I noticed that all groups (as far as I could see) used only the mean values of the parameters in their models.

Most of the groups were only using the ANN to find a better fit of the wind turbine's power curve, which is defined as the power output as a function of wind speed. But since the power output depends on many more variables (albeit none as much as wind power), the power curve isn't as smooth as turbine makers shows in their specifications, but rather highly variable. Another contributing factor to this is that the average numbers are used (usually 5/10 min). A lot of the groups put much focus into trim the data of outliers, finding a more narrow fit to the curve.

My take is that if I use min, max and standard deviation values found in the data, in addition to the average, I will find a better fit and can better predict power output. What I've found so far is that my model can predict the samples that other groups discarded as outliers, which means i only have to discard points where production is zero. Also, I will use more parameters that are coupled to the production and my hope is to find a very sensitive model that can find deviations from normal production.

Now you've got a picture of the project I have in mind, and hopefully I can get some help on the architecture of the neural network, since I'm new to the concept.

I've got pretty good matlab skills, so that's were I'll be doing my work. So far I've only used the standard nftool GUI to construct my ANN's, tried to figure out how to best find the number of hidden neurons. To begin with I have a rather small dataset of about 800 samples, trying to keep down the computation time as long as I'm bound to a "slow" computer.

My thought is to use the first year of data from a turbine to train my model, this is to ensure i have data points from the whole spectra and also get the seasonal change in there. I've tried to pick inputs that I know affect the power output, and are not directly coupled to other inputs. I've used average values for parameters that do not fluctuate greatly over 5 min, and min/max/avg/std for parameters with large fluctuation.

My inputs are:

Grid phase angle (average) - generator

Grid frequency (average) - electrodynamical torque

Pitch angle blade 1 (average) - Coefficient of power 

Pitch angle blade 2 (average)

Pitch angle blade 3 (average) 

Rotor RPM (average) - Coefficient of power, inertia

Generator torque (average) - mechanical torque

Generator torque (min) 

Generator torque (max) 

Generator torque (standard deviation)

Nacelle Direction (average) - performance when compared to wind dir.

Wind direction (average) - different wind shear and turbulence patterns

Outdoor temp (average) - density of air -> power output

Wind speed (average) - power output

Wind speed (min)

Wind speed (max)

Wind speed (Standard deviation)

Output:

Power output (average)

So, by using this ANN model, I hope to detect under-production from the turbine. In extension, I'd like to use ANN pattern recognition to classificate certain common faults, but that's a whole other story.

Questions and thoughts:

Do you think that this model would benefit from more than one hidden layer? 

Is nftool the best tool to use here? (in Matlab)

Any thoughts of the amount of hidden neurons?

I've tried both L-M and Bayesian Regularisation as training algorithms, is there a better one for this kind of problem?

I did a test trying 1 to 20 hidden neurons with both L-M and BR and picked out the ones I thought was best (computational time not considered), I'm not quite sure what to look for in the performance plot though. Is this behaviour as the plots shows what I should look for, where training and test are close to each other? (file attatched)

Sorry for the long post, but I'm short on feedback and would greatly appreciate any kind of help!

Best regards,

Daniel

More Daniel Karlsson's questions See All
Similar questions and discussions