This article explores a simple genetic algorithm I wrote in Objective-C. The purpose of this article is to introduce the basics of genetic algorithms to someone new to the topic, as well as show a fully functional example of such an algorithm.
I am by no means an expert in the field of artificial intelligence. If you are looking for an authoritative voice on cutting-edge research in genetic algorithms, and this article’s title didn’t tip you off, please look elsewhere. If you’re looking for the “Hello, World!” of genetic algorithms written by a professional software developer, not an academic researcher, read on.
The demo program reviewed in this article is available on github.
The demo program
In this article we will review a genetic algorithm whose purpose is to construct a piece of text (i.e. a string) that you specify at run-time. Sounds simple, right? The tricky part is that the algorithm must start with nothing but randomly generated strings and somehow “evolve” them until one of the strings exactly matches the string you specified. The process of evolving the strings is where things get interesting. Here is a screenshot of the Cocoa desktop demo program I wrote to test out the algorithm.
This was my first time writing a Cocoa desktop program, so I’m sure I didn’t follow any Cocoa best practices. My bad!
Below is an abridged version of the output logged by the application while it evolved strings to match the input value of “Once upon a time there was a genetic algorithm”. Each line represents a generation, whose number is listed first, followed by the string in that generation closest to matching the input value.
#00: Rdne$cnfg"2L:ThDBA`awN!d`Z$f"xfsdgVo$Zx^V&oslq #10: Rpdf!xppl!a tjle!sibs`"vbv#b!hgcbvje aimlumthl #20: Onee!unoo"a"vhjd!tfese!var `"ffodtid"dlgrrishn #30: Nnce!upon `"time#theqg wds a gfnduid#alfnritjm #40: Once!upon a time!theqe!xas b henetid"algnqithm #50: Once!upon a time!theqe was a genetid"algnrithm #60: Once upoo a time shese was a genetic amgorjshm #70: Once upoo a time theqe was a genetic!algorithm #80: Once upoo a time these was a genetic algorithm #86: Once upon a time there was a genetic algorithm
I designed and wrote this genetic algorithm, but only after reading many blog posts showing how other people attacked the same problem. My approach involves a couple of novel twists, such as the “population shuffling” that we’ll examine later on, but I’m definitely standing on the shoulders of giants here.
On the origin of genetic algorithms
The heady world of artificial intelligence and machine learning is a fascinating place. Many of the concepts are imported from other fields of research. For example, neural networks are programs that mimic aspects of the brain in order to get some tricky job done, such as recognizing letters and numbers in your handwriting.
Genetic algorithms are based on concepts from the field of biology. They take conceptual root in Darwinian evolution by natural selection. The name genetic algorithm makes sense because these algorithms emulate the mechanisms by which heritable traits are transmitted from parent to offspring, namely the genes found in DNA.
Don’t take the “evolution by natural selection” analogy too seriously. Your code can do things that don’t happen in nature, without causing your algorithm to somehow be wrong or invalid. Genetic algorithms were inspired by what is observed in the natural world, but they do not need to precisely imitate it. If one took the analogy too seriously, and learned how these algorithms actually work, one would have to conclude that the biological world is a huge, non-stop, rampant orgy of incestual mutants. That’s not an accurate description of the world we live in (I hope!).
Components of a genetic algorithm
For our purposes in this article the following quick summary of how genetic algorithms work will suffice. In the next section I provide some other sites to check out if you want a more comprehensive conceptual background.
- Gene – A unit of information that encodes part of a potential solution to the problem being solved. In the demo program a gene is stored as a character in a string.
- Chromosome – A sequence of genes that might be a good solution to the problem being solved. Some people refer to this as an “individual” instead of “chromosome.” In the demo program a chromosome is represented as a JASChromosome object, which uses an NSMutableString to store its gene sequence.
- Population – All of the chromosomes processed by the genetic algorithm. The algorithm mates chromosomes from the population to produce a new generation of chromosomes, which are added to the population.
- Fitness function – Some logic that evaluates a chromosome and returns a numeric value indicating the “fitness” of the chromosome’s genes for solving whatever problem is being solved.
- Selection method – Some logic that figures out which chromosomes should mate in order to create the next generation of chromosomes. In the demo program the chromosomes are rather lazy and will mate with whoever happens to live next door.
- Crossover method – Some logic that determines how to combine the genes from each chromosome while they are mating. In the demo program the fitness of the genes at the same gene sequence index from both parents are compared, and the fitter gene is given to the child. This is an example of my algorithm performing an optimization that doesn’t happen in nature, but who cares?
- Mutation method – Some procedure that randomly changes a chromosome’s gene sequence once in a while. In the demo program, mutations occur for about 20% of the chromosomes immediately after they’re born.
The hard part is figuring out a good fitness function, selection method, crossover method, and mutation method for the problem your algorithm is supposed to solve. The rest is just scaffolding and housekeeping.
Learn (a lot) more
If you want to learn more about the theory of genetic algorithms, you’re in luck. There’s a ton of great material on the Web. It seems like standard operating procedure these days to check what Wikipedia has to say on the topic, but I found this blog post much more digestible and useful.
If you prefer to see a C# implementation be sure to check out this blog post, which is part of an excellent series about genetic algorithms.
Another approachable, and rather hilarious, introduction to the topic with an example in Python can be found here.
How my algorithm works
I split the algorithm into two classes: JASGeneticAlgo and JASChromosome. The JASGeneticAlgo class creates, breeds, and analyzes a collection of JASChromosome objects. Each chromosome is capable of mating with other chromosomes, and determining how its fitness compares to another chromosome’s fitness. Here is the interface from JASChromosome.h:
Now let’s turn our attention back to the JASGeneticAlgo class. After you instantiate that class, and provide it with the target sequence (the string typed in by the user), you send it the execute message to begin running the algorithm. The execute method from JASGeneticAlgo.m, and its two direct dependencies, are shown below.
The populate method adds a bunch of chromosomes to the population array. Each chromosome has the same number of genes as the target sequence. In other words, each JASChromosome object has a string of random characters the same length as the user-defined input value. Here is the code from JASChromosome.m that shows how a chromosome is initialized.
As seen above, the run method of JASGeneticAlgo has been decomposed into three methods: breedNextGeneration, shufflePopulation, and analyzePopulation. Those three methods from JASGeneticAlgo.m are shown below.
Note: In the full demo program source code, the analyzePopulation method logs information about the fittest chromosome from the current generation, which is why that method doesn’t just search for a chromosome whose gene sequence exactly matches the target sequence. I removed that logging code before taking a screenshot of the method, so that the code would visually fit on my blog.
My algorithm’s “selection method” can be found in the loop of the breedNextGeneration method and also in the shufflePopulation method. The selection method used here is to mate each chromosome with its neighbor, and then replace the less fit parent with the child chromosome. To avoid some of the horrific side effects of rampant incest, the shufflePopulation method moves the last chromosome in the array to the beginning of the array. This allows each chromosome to mate with both of its neighbors on an alternating basis. What a life…
The algorithm’s fitness function is exposed by JASChromosome as the isFitterThanChromosome:forTargetSequence: method. That logic is shown below. The basic idea is that each character in the string is compared to the corresponding character in the target string. The numeric difference between the two characters is normalized to a negative number, where zero represents an exact match (perfect fitness). The fitness of each gene is aggregated into an overall fitness for the chromosome. Those overall fitness values can then be compared to each other, where a higher number is considered to be a chromosome with a better fitness. In this context, a fitness value of zero indicates that a chromosome’s gene sequence is an exact match of the target sequence.
Next up is the mating ritual. This method in JASChromosome is responsible for creating a new chromosome based on two parent chromosomes. It contains the “crossover method” component of the genetic algorithm, which determines what genes to give to an offspring.
The crossover method makes an optimization that is specific to the particular problem being solved. It compares the parents’ genes and gives their child the better gene of the two. In more complicated problem spaces, where the genes themselves do not directly correspond to a potential solution but are only used to encode a more complicated solution, this optimization would usually not make any sense.
Lastly, we need to see how random mutations occur. As seen toward the end of the mateWithChromosome: method above, once in a while a new chromosome’s gene sequence is mutated. This adds more variety into the gene pool, which is usually necessary to arrive at a correct or optimal solution. Here is how mutations occur in a JASChromosome object.
If you are interested in diving into this code deeper, feel free to grab a copy off github here.
Genetic algorithms are an interesting and powerful way to find the solution to certain types of problems. They are great for finding something that you know how to look for, which is what the fitness function is all about. If a fitness function cannot be created to evaluate potential solutions to a problem, genetic algorithms won’t be much help in solving that problem.
The algorithm reviewed in this article has a very simple problem to solve. The genes in a chromosome directly map to a solution to the problem (i.e. characters of a string). When applied to more complicated problems, the genes in a chromosome become decoupled from the problem space and are used purely as a means of encoding information. To learn more about this fascinating step up in complexity, visit my next blog post about solving the 8 Queens puzzle.
Thanks for stopping by!