Tuesday, January 19, 2010

Luck

Where are these spikes coming from?

A quick review: I am simulating the evolution of strategies for playing Prisoner's Dilemma. I have a parameterized version of the classic strategy TIT FOR TAT. At each generation, I have 100 strategies, i.e. 100 different combinations of values of the four parameters that define a strategy. Each strategy is played against every other strategy for 100 or 300 iterations. Each strategy accumulates a total score across all these iterations and opponents. After all the play of one generation, the lowest scoring half the the strategies are eliminated. The remaining strategies, that scored in the top half, are then combined in a randomized way to generate enough new strategies to bring the total back to 100, and a new round of play is begun.

The graph above is a plot of the numbers of defect-defect, defect-cooperate, and cooperate-cooperate moves for each generation. For this experiment I let each pair of strategies play each other for a run of 300 iterations, instead of 100. The frequency of spikes is thereby reduced.

The most interesting parameter seems to be - given that the opponent has been defecting recently, with what probability should the current strategy cooperate? This probability appears to settle slowly down to around 0.05, then jump rapidly at irregular intervals to around 0.2.

I ran a strategy with this "unwarranted cooperation" parameter set to 0.1, against the classical TIT FOR TAT, for 100 runs, each of which was 100 iterations long. TIT FOR TAT never won a run of 100 iterations - at best it tied. The 20th through 80th percentile runs had the 0.1 strategy winning by 5 points. The best the 0.1 strategy did against TIT FOR TAT was to win by 15 points.

I then played a 0.2 strategy against the 0.1 strategy. the 0.1 strategy generally won - the best it did was to win by 110 points. The median margin was 45 points. 80 percent of the time it won by at least 25 point. But the 0.2 strategy did win at least once, by 25 points.

I think this explains the spikes, or at least starts to. The results of the playing at each generation are not well determined - the distribution of results has some significant range. There is a general drift toward defection, but now and again there is enough lucky cooperation that the cooperating strategies can survive and reproduce.

Another source of randomness is the creation of the new strategies to be added to play in the next generation. Each new strategy is constructed from two randomly chosen surviving strategies from the current generation. If by chance two relatively cooperative strategies are chosen, their child is sure also to be cooperative. The more strategies there are that are cooperative, the more they have the opportunity to increase their scores through mutual cooperation.



Monday, January 18, 2010

backwards

Oops! Reviewing my code... I see that I had the logic backwards. The probability that keeps spiking is actually that of an unwarranted cooperation! Here is a plot with the total numbers of defect-defect, defect-cooperate, and cooperate-cooperate rounds of the game. There are 100 strategies, so that is 4950 pairs, and each pair is run for 100 iterations. Thus the total across all three categories should be 495, 000. Whew! Looks like a match!

Sunday, January 17, 2010

Elimination Rate


Each generation, I am saving the best strategies to run again in the next generation. There is no limit to how long a strategy can be kept - as long as it doesn't get eliminated, it will be kept.

However many strategies are eliminated, I will create the same number of new strategies, so that in each round of play there are the same number of strategies, 100, competing. Each new strategy is created by combining two existing strategies. These two strategies are picked at random from the strategies that scored well enough to be kept for the next generation.

Now I am sorting the scores on each round of play, so I can control the fraction of strategies I eliminate. Here is the graph of evolution of the probability of unprovoked defection, when I eliminate the bottom scoring half of the strategies in each generation:
I have added a couple new lines to this graph - now I am plotting the smallest and the largest values across the 100 strategies, along with the 20th and 80th percentiles, and the median.

If instead I only eliminate the bottom scoring 20% of the strategies, here is how things evolve:

Details, Details

Here's the evolution of the probability of unprovoked defection, when I reduce the level of mutation allowed to 3% instead of 5%. The spikes are certainly much less prominent and their onset is less abrupt.

There is another factor that I am not controlling very carefully - the fraction of strategies that is eliminated in each generation. What I am currently doing is just taking the midpoint between the highest score and the lowest score. The strategies that score below the midpoint are eliminated. Sometimes that might eliminated 40% of the strategies, sometimes 60%. I need to control that fraction more carefully, and to investigate how the pattern of evolution varies and the fraction changes.

Saturday, January 16, 2010

The Literature

OK, I do need to see what others have discovered - this game is quite classical by now, so there must be a considerable literature.

Robert Hoffmann's paper "The Ecology of Cooperation" looks really good for a starting point:

http://www.nottingham.ac.uk/~lizrh2/papers.html

First Plot

I ran the simulation for 4000 generations. This is a plot of the probability that evolves, the probability of defecting when the opponent has been cooperating recently. I am actually using a moving average of recent opponent behavior, rather than just the most recent move. My population size is 100. I have plotted here the 20th percentile, median, and 80th percentile values for the probability of an unprovoked defection.

This sure looks more like some kind of chaos, rather than anything periodic! The relaxation back to polite behavior looks nicely exponential. But that indulgence in nastiness seems to come out of nowhere.

Right now I am allowing a child's parameter to run beyond the range of its parents' values by 5 percent. It must be the case that these nasty suprises come out of these mutations. For a next experiment, I will reduce the allowable range of mutation. This should reduce the frequency of nasty surprises - so I hypothesize!

Prisoner's Dilemma

I am having great fun reading William Poundstone's book on John Von Neumann, Game Theory, the RAND Corporation, etc. So much so, I have to try some of this myself! So I bring up a fresh project space in Linux KDE, and away I go.

How about a parameterized TIT FOR TAT. For example, there can be a couple of probabilities - depending on what the opponent did on the last move, cooperate or defect, this new strategy can use one probability or the other to decide whether to cooperate or defect this turn.

Next, suppose we have a whole population of TIT FOR TAT players, each with its own parameter values. We can play every one against the other and accumulate scores. Then we can pick the parameter values with the best scores, maybe the top half or whatever fraction we like.

The idea is to use this as a model for evolution. I will have to read Axelrod's The Evolution of Cooperation, but sometimes it's just more fun to jump in and try it before getting too bogged down reading about all the approaches everybody else has tried!

Now that I have selected the fittest strategies, I will reproduce them with a bit of variation to create a new generation of strategies. My current idea is to mimic biology: I will create a new strategy from a pair of existing strategies. For each parameter, the pair of existing strategies defines a range of values. I will pick the corresponding parameter value for the new strategy by picking randomly from that range, plus allowing a little extra mutation, a little running outside that range.

So, select the fittest in each generation, then reproduce with variation to create each next generation, repeat for many generations... what will happen?

My first experiment gives a curious result: there seems to be a kind of cycle, where gradually the strategies evolve into a polite TIT FOR TAT with little cheating, then there is a sudden burst of cheating, which again settles slowly back into politeness.

Is this a real property of this kind of system or just a bug? More investigation is needed! What fun!