Hi, all!

This post is about an experiment I performed during Fall 2015 semester under the supervision of Prof. Stephanie Forrest as part of her course on complex adaptive systems here at UNM. The aim was to evolve a population of *seemingly* random binary sequences into sequences that are provably random with respect to a given set of tests of randomness and independence. My motivation for this experiment was to explore the possibility of using a genetic algorithm as a wrapper on the RNG (random number generator) code. This may be useful in situations when we require the user to not be able to regenerate the sequence(s) given initial RNG parameters.

For the experiment, the initial population of sequences was generated using the Mersenne twister inside Python 2.7.10’s random.randint(0,1) function and then evolved using a genetic algorithm (GA). The idea was to weigh these sequences with respect to the p-values of five statistical tests of randomness and independence performed on each of them and apply some selection, crossover, and mutation to evolve this population into one that has a majority of high fitness individuals.

The sequences in the initial population were generated using different seeds. This defined the original aim of the experiment: *to evolve to a set of sequences that are *random enough* and show negligible dependence on the seed used for the random number generator (RNG) that created them.* Thus, the GA was halted as soon as the population evolved to contain at least some given fraction of high fitness individuals.

An interesting observation was made when the results were obtained. The algorithm showed a strong aversion to mutation (when one would expect that mutation would actually help). Even when setting the mutation probability to something as low as , the population did not *seem to* converge to contain high fitness individuals up to the fraction we desired. This suggested that the space of all *random* sequences (as generated by this RNG) contained very small (maybe even singleton sets of) neutral regions with respect to the fitness function used and that there was perhaps a high correlation between the bits in the sequence and their position in it. The plot below shows the results obtained, where the X-axis represents number of generations and the Y-axis represents fitness.

The plot below shows the results obtained, where the X-axis represents number of generations and the Y-axis represents fitness. Mutation probability was set to per bit and the crossover probability was kept very close to . As can be seen, the maximum fitness in the population decreases with successive generations.

Even more interesting is the fact that a high probability single point crossover operation supported evolution in our favor, and produced a population of distinct sequences having high fitness values. So, if it was indeed the case that the neutral regions are small, one would expect the crossover to not do so well either. So, to verify this, I ran some simulations with low crossover and mutation rates and observed that the population hardly evolved. This behavior has made me sleepless since.

Some questions and extensions I am considering for this project are as follows:

- Can -values be replaced by some other measure of goodness on the tests of randomness/independence?
- Does the order of applying the tests matter? In other words, given a sequence of tests , when does there exist (or not) a sequence such that is
**not**random with respect to the set for all , but is random with respect to the whole set ? - How about another design of the fitness function, other than just the product of the values?
- Does the nature of crossover matter in this setting?
- Is there an analytical explanation to the small sized neutral regions?
- Define a measure for goodness of evolution and prepare plots for this goodness against crossover rates and mutation rates.

I plan to update this post with the actual algorithm I used and plots for the results, along with a more analytical explanation of the situation and possibly the questions above, so that some of you can suggest edits and help me solve the mystery. Stay tuned 🙂

**References :**

Could you provide more detail about the fitness function? I would like to know exactly how the fitness value is obtained from an individual in the population.

LikeLike