I have a number of parameters in my proposed method in the field of cascade classification. With the means of different parameter settings, I obtained different accuracy. I want to find the best parameter setting with genetic algorithms. Does anyone know how can I do that?
Any other suggestions other than using genetic algorithms?
Looking forward to your generous help.
Forgot to mention one important thing: you should consider carefully before using (meta)heuristics for tuning your algorithm.
Consider the number of runs you will need to test your algorithm (or the overall runtime) and then decide if it's better for you to invest on Design of Experiments or Automatic Tuning. Usually if what you want is to understand parameters you should invest on the first, and use the latter only if you're interested in finding a good parameter setting.
Also, remember that GAs and any heuristic search method have their own parameter settings, the difference being that you don't care so much about having the best tuned tuner as much as you want a properly tuned algorithm.
Different tuners will likely produce different parameter settings, and hence you should choose the one that gives you a high-performing setting at a reasonable computation cost. I particular use I/F-Race (Iterated F-Race) and have been getting very good results so far with its default settings.
You may take help of statistical methods like "Orthogonal array method" for selecting the more appropriate combination.
Please follow the link.
http://www.sciencedirect.com/science/article/pii/S1568494612001846
Thanks
Your problem falls into a category of optimisation problems with a single objective function. In your case the objective function is the quality of the cascade classification process, and your parameters represent a multidimensional search space. There is a number of heuristics that you could employ. Have a look here: http://en.wikipedia.org/wiki/Mathematical_optimization for some pointers.
Genetic algorithms are one of the methods you could use. You need to encode your parameters as genome. You would use each of the individuals as input (parameter space) to your classifier, and calculate the fitness.
Hope that helps,
Mariusz
Mariusz was right. You need to make your parameters as chromosomes and classification result as fitness function (the higher the accuracy, the fitter the performance of GA, better set of combined parameters).
But before you use GA to find optimised set of combined parameters, you need to think about the GA parameter settings to be used to find optimal solution to your problem...
You are not necessary to use GA for your problem. Try Weka, this may give you some ideas.
Thanks a ton for your instant and valuable answers, guys.
@Dong Tong: How can I use Weka for this purpose?
Dear Mahnaz you should use Evolutionary Strategy in your problem,
in the chromosome u should double lenght of chromosome, one bit for your parameters, and one bit for weight of the parameter
My understanding of genetic algorithms is basic, so here is a basic answer.
- Code your parameters into a vector (genome).
- Initialize a number of vectors with random or adequate parameters (genes), these can and should in most cases be allowed to have varying lengths.
- Create rules for mutations (where, how and how frequent) and for combination. These depend greatly of of how you encode your parameters into "genes". Since you are using cascade classifiers a gene can be a global parameter, a local parameter or a full parameter set for a classifier.
- Set your objective function as the fitness function.
- Run the optimizer allowing the replication of "vector" with high fitness and killing those with low fitness (similar to a particle filter sampling), then apply mutations to your vector (the population) and perform permutations of parts of the genetic codes between genomes (this need some rules to make those valid...).
This is my dumb down understanding of genetic algorithms. I am certainly wrong about some of those steps, but I hope it helps.
Alternative to GAs are more recent population based algorithms like PSO, DE, ACO;
see e.g. the review in "Comparison of Evolutionary and Swarm Intelligence Methods for History Matching and Uncertainty Quantification in Petroleum Reservoir Models"
Yasin Hajizadeh, Vasily Demyanov, Linah Mohamed, Mike Christie
07/2011; DOI:10.1007/978-3-642-21705-0_8
https://www.researchgate.net/publication/225936905_Comparison_of_Evolutionary_and_Swarm_Intelligence_Methods_for_History_Matching_and_Uncertainty_Quantification_in_Petroleum_Reservoir_Models?ev=prf_pub
Various codes are available for any of these.
Chapter Comparison of Evolutionary and Swarm Intelligence Methods fo...
@Those who agree on using GA. There remains many ambiguities in my mind. Let me ask my question with the help of some examples.
Suppose that one of my parameter settings and the obtained accuracy is as follows:
par1 = 3
par2 = 0.9
par3 = 0.005
par4 = 0.8
par5 = 0.5
Accuracy = 95%
Assume that I have 15 additional results as mentioned above, with some variations.
Up to now my question is: Does it make sense to run the GA with a population of the size 15?
In the next step, if I decide to put the parameters as genomes and build chromosomes should I first multiply them to 1000 and then convert them to binary form? Then for the fitness of the chromosome how can I make a relation between parameters and the accuracy? This is the main problem I face. Please inform me if I didn't got the point you mentioned or if there is any misunderstandings.
Thanks.
Hi Mahnaz,
maybe I have misinterpret your question.
I thought you want to know which combination of your paramerer setting is the best. From the example you are given above, you have already tested 15 combination set of parameter settings, and you have got the accuracy result for each set, which to me, you already had the answer to your problem. The set with highest accuracy is the best combined set of settings!
Unless you want to test more possible combined set, then you can use PCA in WEKA to plot you correlation matrix and find which combined set is the best. The easiest way I can think of is
first, you need to put your data value in binary form (each field represent different values in your parameter settings, 1 means you use that setting, 0 means off)
sample para1-0.25 para1-0.5 para2-0.1 para2-0.2 para3-1.0 para3-2.0
case1 1 0 1 0 0 1
case2 1 0 1 0 0 1
case3 0 1 1 0 0 1
case4 0 1 1 0 1 0
case5 0 1 1 0 1 0
case6 0 1 1 0 1 0
case7 0 1 1 0 1 0
case8 0 1 0 1 0 0
case9 0 1 0 1 1 0
case10 0 1 0 1 0 0
then, run WEKA --> select attribute --> Principal Components
finally, get correlation matrix.
hope that helps.
@Dong Tong: Absolutely you are right. I have the best accuracy from 15 experimental results. But I think that there exist better parameter settings which I didn't test, am I right? I'm looking for those settings. Now suppose that each parameter setting is used for 16 cases, I mean that we apply each parameter setting on 16 data sets simultaneously. So we have 16 outputs (accuracy) obtained from each experiment. In this case again we have 15 distinct parameter settings each gives 16 outputs. Will this condition make any changes on our strategy to find the best setting? It's worth noting that, there exist an acceptable balance between accuracy variations in each setting's outputs.
1) You can combile genetic programming with GA.
2) GA can do simultenious feature selection and classification/clustering.
Mahnaz,
Methods such as GA, PSO, DE, etc. will not be useful in your case. These are optimization methods that require the prior knowledge of an objective function (OF). Then, for any given combination of the parameters, they would use the OF to CALCULATE how good or bad the combination is.
My understanding is that, in your case, you are NOT able to provide an OF. You have some experimental results and you want to know what would be a good combination of parameters that would be worth testing experimentally and that would lead to an improvement.
If my assumption is correct then your problem rather belongs to the category of DOE problems (or Design of Experiments).
I suggest you do a google search (with keywords DOE methods, Design of Experiment Methods) to get basic information on this subject. .
Hope this helps.
If you already have found sets of parameters with accuracy, and assume that they are already good solutions and the optimal solution somehow lies among them. You can cast your problem like this: Each of your set of parameters is a point in a 5D space, with an accuracy value. Together they describe a subspace (manifold) where your optimal solution lies. Then searching the optimal solution would be equivalent to searching within this manifold for a point that maximizes the accuracy. You can then use any of the available algorithms to perform search on the manifold. This is basically a constrained optimization problem, with solution constrained to be on the manifold. You can use GA, PSO, or any similar methods, or formulate the problem using graph-based methods and solve for the optimal point. The assumption in my suggestion is the manifold is smooth and interpolation allows one to find the optimal solution.
Alan, your assumption of smooth manifold will indeed open the way to using "surface response" optimization techniques. However, this assumption is not always valid.
Of course, Mahnaz could still (1) assume smoothness just to make a prediction, then (2) test whether these predictions actually yield an improvement. Nonetheless, if the assumption of smoothness is not applicable, then these predictions will not be very useful.
The DoE methods I suggested in my previous post will also make some assumptions but they progressively correct their predictions with new input of data. Hence, Mahnaz could use the technique you suggest but, perhaps, in an iterative way, by injecting progressively in the computations newly acquired experimental results.
Hi Mahnaz,
I'm not sure if I interpret your explanation correctly. I assumed that you have 15 distinct parameter settings (para1, para2, para3, ..., para15).
I assumed that you wan to know which combined set of settings is the best for your problem, for example combined set (para1, para3, para15, para16) is the best solution compared to any other combination set of parameters.
You DON'T want to know which combined para values and para, for example combined set (para1-value3, para3-value0.001, para15-value0.7, para16-value1), is the best solution. (Am I right?)
If you only interested with combination of parameters (not the para values), the easiest way I could think of is just try all combined para (is only 15 parameters) in cascade classifier (just like what you have done for the previous 16 accuracy outputs). If you want to make your life easy, you can fit (constant value) the number of combined para. Or else, you can refer back to my previous comment for plotting correlation analysis.
If you want to know to which para values that the para set that give the best performance, then refer to my previous comment, plot PCA or any correlation analysis to find you best combined set (at this point, you only identify potential solution) and then you have to test if this set can give you good classification results (now, you validate your finding and confirm your finding is, better or at least feasible, for solving the problem).
I doubt that you can find the solution to your problems using GA nor supervised methods, because those methods require target class to work on samples. You need to look for unsupervised methods for your problem.
All the best.
H.E. Lehtihet, I agree with what you said, and I would suggest to iteratively check/refine the solution. It shouldn't be too hard to come up with some scheme to do that effectively.
Assuming that your work lies on the field of feature selection, i.e. that you wanna know the best of parameter (not the best values for the 15 parameters). You can apply techiques such as separability measures; neural networks for features selection (it doesnt need an objective function) or take advantage of bayesian information criterion.
You can simply use GA. Just make a chromosome of 5 cells, each dedicated for one parameter. The values of cells should be restricted to those values allowed for their corresponding parameter. The fitness function will be the accuracy of your classification method. Just initialize a population of for example 100 chromosomes, each indicating a parameter setting and, evaluate your population by executing the 100 classifiers and computing their accuracies. Then continue to the next generations through the selection and genetic operators to reach the optimal classification parameter setting.
Dear Sahand,
How can I make a relationship between accuracy as fitness and the corresponding chromosome? For example when a crossover happens, how can I introduce the fitness of the new offspring? And similarly in the case of mutation.
Dear Dong Tong,
To give a better picture of the problem, I attached a table. As we can see in the table, right now I have 15 parameter settings(named as PS in the table). Each of these parameter settings was tested on 16 test cases (named as TC). I want to find other better parameter settings which I don't have in hand at this time.
I have a question with using PCA via Weka. I'm not a professional in Weka, but as far as I know, when we want to prepare an arff file for Weka, we have to define a particular column as class label for it, am I right? If yes, in my case what will be the class label?
Thanks for the question. However, till date I have not used genetic algorithm. For this, please ask Indrajit Saha, (type dblp indrajit Saha for details)
Sensitivity analysis has to be done to get optimum sets of parameters. Mostly genetic algorithm involves using 4 important parameters namely the population size, the number of generations (no. of iterations) which may be used as the termination condition, the crossover rate which is the probability of using the crossover operator and the mutation rate which is the probability of using the mutation operator. These dependent on various other factors like the genotype representation. Sensitivity analysis involves keeping all parameters constant while changing the value of one parameter and repeating this for all the parameters. By far this has been one method used in EA literature. Please do try.
---------------------------------------------------------------------------------------------
-Dear Sahand,
-How can I make a relationship between accuracy as fitness and the corresponding chromosome? For example when a crossover happens, how can I introduce the fitness of the new offspring? And similarly in the case of mutation.
---------------------------------------------------------------------------------------------
You have to re-run your algorithm. GAs/EAs or any optimization method that you use will have to be "linked" to your algorithm being tuned. For every chromossome you generate, you run your algorithm using such settings and its output accuracy is its fitness value.
Serach also other methods for automatic tuning, such as ParamILS and I/F-Race.
Forgot to mention one important thing: you should consider carefully before using (meta)heuristics for tuning your algorithm.
Consider the number of runs you will need to test your algorithm (or the overall runtime) and then decide if it's better for you to invest on Design of Experiments or Automatic Tuning. Usually if what you want is to understand parameters you should invest on the first, and use the latter only if you're interested in finding a good parameter setting.
Also, remember that GAs and any heuristic search method have their own parameter settings, the difference being that you don't care so much about having the best tuned tuner as much as you want a properly tuned algorithm.
Different tuners will likely produce different parameter settings, and hence you should choose the one that gives you a high-performing setting at a reasonable computation cost. I particular use I/F-Race (Iterated F-Race) and have been getting very good results so far with its default settings.
Sujatha, sensitivity analysis is indeed fundamental in this problem. That is why, in my first post, I have suggested the use of DOE methods, which are based precisely on sensitivity analysis and which are extremely useful especially in the case where the cost of evaluating the fitness of a given solution is too high (time consuming, resource consuming, etc.). In some problems, evaluating the fitness of a single solution requires a whole experimental setup and many hours of work (or many hours of computations if the evaluation is accessible numerically). Clearly, the higher the cost of evaluation of a single solution, the more powerful DOE methods will become because their main purpose is to reduce as much as possible the number of trials.
Leonardo Bezerra, thank you for your complete answer. So you think that there is no offline solution for this problem, don't you? You are absolutely right, the question I asked from Sahand was in the case of having the accuracy and parameter settings in offline format. I seek for that because of some computational overheads since I run my algorithm for each parameter settings 100 times and then calculate the mean accuracy.
Well, you probably don't need to run 100 times for each parameter setting as long as you have enough test cases. In the end what you need is a representative number of samples (test-cases x runs). In fact using more test-cases than runs is good for increasing variance and making your tuning more robust.
The solutions I proposed are for offline tuning. You select a tuning set of test cases and tune your algorithm for it. Remember to make it representative and not to overtone it. Then you'd have a proper parameter setting for similar test situations.
The problem of overtuning is actually worse for runtimes. If you consider one specific runtime it may happen that our parameter setting won't be as efficient for a longer/shorter runtime (happened to me once). Since that depends a lot on your application, I don't have a definite solution, but tuning algorithms for anytime optimization is a good way out (see "http://iridia.ulb.ac.be/IridiaTrSeries/rev/IridiaTr2012-012r002.pdf).
The idea of using online tuning is also very interesting, but it's very algorithm-specific. I'd have to know better about your approach to suggest anything.
Sorry for the mispelling: blame Apple for changing my words and destroying the meaning of my sentences :P
Did you test the Cohen's Kappa index for measuring the classification performance in the cost function of the GA?
I would say that genetic algorithms are a little slow and unstable for this kind of task. The new trend is to use surrogate learning. For example, you can model the uncertainty you have about your parameter space using a guassian process. Then, you can ask this model which point is the most likely to improve your current performance. This will recommand you the next hyperparameter configuration to explore and will bring back information to your gaussian process which will again suggest you another paramter configuration.
This project on github offer a complete solution to perform such optimization.
https://github.com/JasperSnoek/spearmint
Have a good luck.
Article Practical Bayesian Optimization of Machine Learning Algorithms
On the issue of run-time tuning, there are two methods particularly developed and broadly tested for this: ParamILS (see http://www.cs.ubc.ca/labs/beta/Projects/ParamILS/) and SMAC (http://www.cs.ubc.ca/labs/beta/Projects/SMAC/). They both have mechanisms to take care of the problem that poor configurations tend to require lots of running time. You can find some generic advice on how many training instances (inputs) and what cutoff times per algorithm run these methods require at http://www.prog-by-opt.net in the FAQs (Which design optimiser (algorithm configurator) should I use?). There is also some more information about algorithm configuration in this IJCAI 2013 tutorial: http://www.cs.ubc.ca/labs/beta/Projects/PbO_Tutorial/.
I'd be happy to provide further information if you need it.
Best,
Holger
BTW, in case anyone is curious: In our 2009 paper on ParamILS, we have used an algorithm configurator to configure itself (ParamILS) - it's expensive, but it works.
And just one final comment: I agree with Alexandre's comment that surrogate (or sequential model-based optimisation) methods are promising in the context of algorithm configuration, parameter tuning and hyperparameter optimisation (in machine learning) - SMAC is in fact such a method, as is spearmint. I/F Race actually uses a somewhat similar idea, in that it iteratively learns and exploits a simple probabilistic model of what good settings look like. Having said all this, personally, I would probably use ParamILS for the setting Mahnaz describes, given all the data I've seen for the various methods. (I have to admit, though, I am not entirely free from bias here, since ParamILS has been developed and extensively used by our group.)
To my understanding, two slightly different topics can be investigate here:
1- How to study the influence of a set of parameters to the performance of Mahnaz’s method?
2- How to find the best parameter settings to reach the best possible performance from Mahnaz’s method?
The simplest tool that can be employed to study the effect of a set of parameters is the linear sensitivity analysis. To apply this, you should simply fix the values of all parameters, but one to study the contribution of that one parameter to the quality of the final result. Then, you should choose another parameter to study (by changing its value once at a time) while fixing the values of the other parameters. This simple procedure is useful when the parameters are independent from each other or at least not significantly conflicting. For example, the size of neighbourhood (i.e. K) and distance metric (e.g. Euclidian distance) in K-NN can be studied independently (even the best value for K may be different for different distance metrics). However, in case of strong dependency or interaction (e.g. population size and maximum number of iterations of EAs when the computational budget is fixed), more sophisticated algorithms are of interest. In these cases, grid search, orthogonal experiment design and uniform experiment design methods can be adopted.
To answer the second question (i.e., finding optimum parameter values) one can employ EA or any other optimization methods. Since EAs are stochastic and parameter sensitive, you have to run them several times and with different parameter settings. Using powerful EA tools, such as DE, SPSO and CMA-ES can help you to find the best parameter settings for your method. To do that, each chromosome can be seen as a vector which its length is equal to the number of your parameters: x_i = < p1_i, p2_i, …, pn_i > and the fitness value associated to this individual can be computed by applying the parameter setting (x_i) to your method and calculating the accuracy/recall/precision of your classifier on a set of training/validation/testing dataset. For example, assume we need to find the best values for soft margin parameter (i.e. C) and Gaussian kernel parameter (i.e. gamma) of an SVM model. In this case, x_i is defined as < C_i , gamma_i > and the fitness value of x_i can be evaluated by applying the C_i and gamma_i to SVM and computing accuracy of the test dataset, for example.
As mentioned, adopting EAs to solve the second question is very simple; however, some problems may emerge:
1- Defining fitness value as a function of classifier output is computationally expensive.
2- Finding the best value according to a limited set of datasets can hardly be generalised to all kind of datasets (because the optimum value of classifier parameters are mostly problem specific).
I hope you find this comment helpful.
We did something like that to evaluate parameters for our ant colony algorithm for Word Sense Disambiguation. It was not a genetic algorithm but a simulated annealing. You can read the article : Parameter estimation under uncertainty with Simulated Annealing applied to an ant colony based probabilistic WSD algorithm by Andon Tchechmedjiev, Didier Schwab, Jérôme Goulian and Gilles Sérasset,
Conference Paper Parameter estimation under uncertainty with Simulated Anneal...
@Borhan Kazimipour: That was really helpful, thanks a lot. Yes as you said, finding the best parameter setting is very problem specific and also it varies from one data set to another one. I have to include a part in my code (prior to testing phase) in order to find the best parameter setting, don't I?
I think this is best done with an external tool, such as the ones mentioned by several folks earlier. This essentially calls your code (on a set of inputs you provide) repeatedly, trying to optimise its performance.
In terms of sensitivity: it is my experience that in most algorithm optimisation settings, parameters do strongly interact. Grid search, orthogonal experiment design and uniform experiment design methods, as mentioned by Borhan, are all valid approaches, but don't work well when the number of parameters is high - that's when you want to use more sophisticated algorithm configuration techniques, such as iRace, ParamILS, SMAC, spearmint, ...
For sensitivity/importance analysis, there are also some pretty cool new methods (some of them are discussed in this talk: http://www.cs.ubc.ca/~hoos/Talks/vienna-13-slides.pdf - again, a biased sample, but these are ones I know particularly well).
As Holger mentioned, the optimizer (let say GA, for example) calls your code many times to find the best values for your parameters which are able to enhance the performance in terms of the criteria you have chosen (e.g., recall, accuracy, AUC,...). These methods are generally called wrapper.
If you want to report the best parameter settings for your method, you can measure the performance on train or test or both datasets. For better generalization, you can categorize your datasets into disjoint categories (e.g., small/large datasets, noisy/clean datasets, linearly separable/non-separable datasets …) and report different optimal parameter settings for each category.
If you want to compare your method with others’, however, you should only optimize the parameters according to the train (or validation) dataset (do not include test dataset). Otherwise, it is unfair to incorporating knowledge from test dataset (which is supposed to be unseen) to optimize the parameters while other methods have no such opportunity. Note that, using validation set (a portion of train dataset which you keep unseen from the classifier but not the optimizer) is highly recommended because minimizing train error may result to overfitting situation. To use validation set, you can train your classifier by train dataset (while validation subset is kept away) and calculate the performance of your classifier on the validation subset. This way, you can find the optimum value for your parameters which increase the performance of the classifier on test phase (since classifier cannot see the validation set, it works like a test dataset for the classifier).
Actually, you have another option to compare your method with others’ methods: You can find the best parameter setting (using both train and test data) on some datasets. Then, use those values to compare your method with others’ while classifying different datasets. For example, you can find the optimum parameter setting using GA and iris dataset and then use that parameter values for classification of glass dataset (other methods must also be applied on glass dataset). Obviously, if both datasets are chosen from the same family, your method might achieve better results.
I think the method you have to chose will certainly depend on the sensitivity of the accuracy of your problem on the choice of the parameters. That is, if the accuracy will be highly sensitive on the parameters then your problem will be a multimodal optimization problem in a dimensional space, which depends on the number of the parameters. If that is the case, then you have to be careful on the choice of the method, GA algorithm could be a good choice but you have to optimize it carefully to solve multimodal optimization problems (see MATLAB tools for GA optimization); other methods could also be simulation annealing, or Swarm Particle Optimization methods.
In each case, the ``accuracy" will be the function that you want to optimize (i.e., the score), and the set of parameters will be the solution of the problem, which can be easy set up.
In my experience, Evolutionary Strategies (ES) usually outperform GA, simulated annealing, PSO, etc in the case of parameter optimization. Because of the required coding, GA has rather many "tuning screws" that have a strong influence on its performance. Some of my old optimizer implementations are available on http://alphard.ethz.ch/Hafner/Opt/opt.htm.
The idea is to design a objevtive function that réflex the way in wich the problema must be solved. Then the GA method evaluates the silutions until find a suitable solution.