Friday, February 26, 2010

Speciation and the evolution simulation...

I've been recoding the evolution simulation which Bahar and Nate developed in Matlab.  I have never programmed something so complicated.  I have also never really TRULY programmed anything in C++, ie used classes and the standard library and other pre-made classes effectively.  It has taken me several weeks to get to the point of piecing my various classes together and writing out a speciation algorithm.  I have also had a tough time dealing with the standard library class, vectors.  I think I have gotten to the point where I can use them efficiently.  Here's the break down of what I need this program to do:
I need to initialize some parameters in order to create the simulation.  Those parameters first help build a fitness landscape which represents locations where an environment benefits some phenotypes more than others.  The landscape axes represent a continuous scale of phenotypes; the area then represents a phenospace which the landscape then corresponds to.  Think of a flat map as being the phenospace; your latitude and longitude determine your location in that phenospace.  Mountainous regions represent areas which the environment would be most beneficial and the ocean trenches would be regions of poor fitness.  
The next part of the simulation is to build an initial population.  The population is a set of indivs which are initialized with some default data characteristics.  Now the main part of the program is ready.  
The primary section of the program is a for loop which iterates the generations.   Within each generation, several events occur.  The first is to find and record the identities of the two nearest indivs in phenospace for each indiv in the population.  The next step is to determine, based on each of indiv's neighbors, which indiv's are in the same species.  The speciation algorithm is based partly on a reproductive isolation model which states (in real life) that if no offspring can be conceived by two possible varieties of an organism, then those two possible varieties are actually different species.  However, our model extends this to include the second nearest neighbor as well - an assumption "dreamt" up by Nate to avoid numerous two indiv sized species.  This also indicates that different organisms don't necessarily be "close" to each other phenotypically.  Other speciation algorithms are certainly possible to implement; it would just require my time to write out such algorithms.
The next section of the program then records the information about the species and the indiv's.  One important aspect of the indiv's is their mutation rate.  The mutation rate determines how much variability their children may have about some range determined by the distance between the two mates.  This parameter is the primary value which we tweak or distribute differently among the indiv's.  As each indiv mates with their nearest neighbor, the "choosing" indiv then passes on it's mutation rate to the babies it makes.  The number of babies depends on the fitness of the "choosing" indiv parent.
The mating portion takes place after recording the population and species information.  The baby indiv's make up a new population for the next generation iteration.
Once the baby population is made, the parents are no longer needed, so their population is deleted.  Now the baby indiv population is weeded out by several death functions.  There are three ways in which the new indiv's are killed off.  One method is by an overpopulation limit.  This models organisms niche in their environment.  Particular organisms occupy important niches and tend not to share a niche with other organisms.  Therefore no indiv's may exist within a certain distance of each other in the phenospace.  We refer to this distance as the overpopulation limit.  The next killing method is random death.  This simply kills off indivs at random by determining what percentage (up to a small percentage) of the population will "win" a lottery.  The final killing that takes place is of those indivs which have sprung up outside of the phenospace limits.  This simply models a limit to our phenospace.
After the destruction of the selected indivs, the resulting population then cycles back to the beginning of the generation loop.  And the circle of life continues until either the population dies out or the program reaches the maximum number of generations that we have defined at the beginning in the parameters.
Adam D Scott