If you are coding this problem yourself then here is one way of encoding.
Say you have 5 variables v1, v2, v3, v4, v5 and you want to maximise some function f(v1,v2,v3,v4,v5) For each of the variables v1 to v5 you need to set an upper limit and a lower limit.
You could set the problem in terms of chromosomes with 5 bases which take on real values between your limits which you have set. Then you implement arithmetic crossover. see http://www.ijarcsse.com/docs/papers/Volume_4/3_March2014/V4I3-0330.pdf.
For mutation you add a small random number to randomly selected bases in each chromosome. see http://www.obitko.com/tutorials/genetic-algorithms/crossover-mutation.php. Then you do a correction to ensure that you have not breeched the limits you have set.
So, what you need is real value encoding with arithmetic crossover and real value mutation, with correction applied to ensure that the limits which you set are not breeched. see http://www.geatbx.com/docu/algindex-04.html
If you use real coded genetic algorithm, then you do not have to encode or decode it into binary. For example you have just two variables, say v1 ranges from 0.128 to 0.354 and v2 ranges from 0.325 to 0.539. Then you can encode this as a single string with six digits. For example, a solution (0.136,0.428) will be encoded as 136428. Of course, you have to take care that the numbers do not go outside your range and other thing is that the accuracy of the solution will be 3 digits after the decimal point.
Now suppose you want to apply binary coded GA. First fix the accuracy. Let us say that the accuracy required be of 3 decimal places. Then the total possibilities for v1 are 0.128, 0.129, 0.130, ....,0.354. Total possibilities are 227. Now 2^8=256. Thus eight bits are required to encode this variable. Similarly for the other variable. Then the two binary strings are concatenated. It will become your chromosome.
Thanks at Rajesh Sanghvi . Please, going by the second approach, how do I decode the binary solutions to the range of the real-values for fitness evaluation after applying my GA operators.
Suppose the binary string is 1101101. Then the last 1 will be multiplied by 2^0 (=1), then the second last 0 will be multiplied by 2^1, the next 1 will be multiplied by 2^2, etc. So, the binary digits of the string will be multiplied by increasing powers of 2 from right to left. The final decoding will be as follows:
Thanks at Rajesh Sanghvi... I got that big idea but please, with this conversion to decimal I can never get a fractional result. Assuming my result is to be in the continuous interval of [-5,1]. what should I do? I
Let us say that you want the result with three decimal places. So, we have to encode the numbers from -5.000 to 1.000. There are total 6001 numbers in this range, -5.000, -4.999, -4.998, ...,1.000. Now 2^12=4096 and 2^13=8192. So, we have to take 13 digits to represent the numbers. The assignment scheme is as follows:
-5.000 = 0000000000000 -4.999 = 0000000000001
-4.998 = 0000000000010 -4.997 = 0000000000011 etc.
Now suppose your decoded number is 2157. Then the actual number in the range [-5,1] is
-5 + 0.001*2157 = -2.843.
If your decoded number is 5998. Then the actual number is
-5 + 0.001*5998 = 0.998.
We multiply it by 0.001 because the accuracy is three decimal. If the accuracy is only two decimal, then multiply by 0.01 only. Of course, in that case the total numbers are 601. So, you have to take 10 bit string as 2^9=512 and 2^10=1024. There are some numbers which are outside range. For that in your program you have to write some code so that the number comes inside the boundary. When such a situation arises, you may treat that number as the maximum number 1.000.
For example suppose your decoded value is 7156 with three decimal accuracy, then the actual number will be
-5 + 0.001*7156 = 2.156 which is outside the range. This happens because with the help of 13 bits you can encode 8192 numbers but you have only 6001 numbers to encode. I hope now you completely understand the scheme.
I think you can also do this by using a binary code starting with two digits and then using a learning function to update your priors using experimental data of sample observations and updating on the digits. You can take a look at my paper on Genetic Algorithms using String Theory on my RG page if you wish to. It is based on String theory, Kinematic theory of Systems and Fractal Geometry and economizes on energy. Earl Prof. (University Institute Chair) Dr. (D.Phil.) SKM QC EPS Fellow (Indirect) MES MRES MAICTE
I agree with Prof. Rajesh Sanghavi. I would like to add to what Prof. Rajesh Sanghavi suggested about "some numbers which are outside range": (i) you should carefully initialize the values of chromosomes (individuals) so that decoding of these chromosomes do not results into the value outside the range. For this you can design your own initialization algorithm (ii) you should also pay attention to design of crossover and mutation operators so that these operators do not produce offspring whose decoding values fall outside the range.
Maybe using some other techniques, of real-valued origin, would be worth trying? I really like Differential Evolution (), it is very easy to understand and apply (no need to encode into binary etc.) and well developed at the same time.
You can always skip the conversion from real-coded or floating-point to binary values. That GAs still work (GAs convergence) for floating-point values has been shown by Z. Michalewicz (1991):
Best explanation by Sir Rajesh Sanghvi , Sir I am new in this field I want to know the fitness evaluation in GA. if I have 20 feature(fraction values) of 200 sample and want to reduce the feature using GA, how do I develop fitness function. Please help me out. Thanks
Sorry that I am late in answering your question. I was busy in some examination work. Tomorrow I will send you some material as well as some code for your problem.
Here you can use Binary Valued GA. Your chromosome will be a binary string with 12 bits. A 1 at a particular position indicates the selection of that particular feature. For example, the string 100100001011 indicates that the features 1, 4, 9, 11 and 12 are selected. Now apply any classification algorithm and calculate classification accuracy. Your fitness function will be the classification accuracy and we would like to maximize it. Of course, you may include a constraint on the number of features selected. Naturally we would like to minimize the number of features if the classification accuracy is not highly affected. The constraint can be incorporated in the algorithm using penalty function approach. In that case, the fitness function may be written as
f(chromosome)=classification accuracy - penalty parameter*number of features.
The value of penalty parameter is to be set experimentally.
The Matlab codes for Feature selection using GA are available on net.
Thank you Mr. Rajesh Sanghvi for your great explanations.
I'm also new in this field and I have a question concerning mixed problems.
How to code a string when I must have binary variables for representing an assignement problem (a machine to do an operation), but I also need to use continuous variables to represent the flow line position of machines/operations ?
Should I do a single string with different types of variables or I must have many strings simultaneously ?
A single string is preferable. For example, suppose there are three variables of which the second is discrete. Further suppose that the first variable varies from 0.0 to 6.3, the second takes integer values from 0 to 7, and range of the third is from 0.0 to 1.27. Then the number of bits required are 6, 3 and 7 respectively. A typical chromosome may be 1010010110110101. In this chromosome, the first six bits 101001 (=32+8+1) represents 4.0 as the value of the first variable; 011 represents the value 3 for the second one and 0110101 (32+16+4+1=53) represents the value 0.52 for the third variable. If the ranges of the variables do not start from 0, then appropriate adjustments are to be done.
I could not understand your question. The bit string depends on the range of the variable to encode and how much spacing is applied. As an example, let the range of a variable is from 0 to 1 (inclusive) and we need the answer which is accurate up to one decimal. So, I have to encode 0, 0.1, 0.2, ...,0.9, 1. There are total 11 numbers. With the help of 2 bits, I can encode only 4 numbers. Similarly, if 3 bits are used, I have bit strings 000, 001, 010, 011,....,111. So, I can encode only 8 numbers. So, to encode 11 numbers, I have to use a bit string of 4 digits. The string will be of the form 0000, 0001, 0010, .... In this case, we have total 16 bit strings. Since we have only 11 numbers to encode, some of the bit strings will remain unused.
Now suppose we need the answer which is accurate up to 2 decimals. In that case, I want to encode 0.00, 0.01, 0.02, ....,0.99, 1.00. There are total 101 numbers to encode. Since 2^6=64 and 2^7=128, we will have to use 7 bits. For example, 0000000 will correspond to 0, 0000001 will correspond to 0.01, 0000010 will correspond to 0.02, etc. If the string is 1100111, then because 1000111=71, this string represents 0.71. The explanation is slightly lengthy, but I hope, now you will have understood the procedure.
if i have 3 variables say x,y,z. x ranges from[0,15] , y ranges from [0.001,0.005] , z ranges from [10^3 , 10^4] .What will be the minimum string length coded in binary string to achieve two significant digits accuracy in the solution?
Thanks Rajesh Sanghvi . I preferer the real coded genetic algorithm, but I don't know how to make sure the mutation operator will not lead to violating the boundaries. Should I just simply substitute the violated numbers with the upper or lower number of boundaries. I afraid it removes the diversity after a couple of generation! Could you please help me out.
Maedeh Ta if you always replace the values violating boundaries by the boundary values, you risk considerably reducing your diversity, and your algorithm may be stuck in bad individuals. I would suggest using a mutation operator that replaces a certain gene value by a random value between boundaries (0, 1).
Thanks Rachel Campos Sabioni and Rajesh Sanghvi , could you please make an example what do you mean by replacing a certain gene value. (e.g., the gene value is 151 and the maximum of the boundary is 99.)