Do you have a problem that could be solved by iterative optimization? With this excerpt from Programming Collective Intelligence, Toby Segaran explores the use of genetic algorithms to solve such problems using the Python.
Another set of techniques for optimization, also inspired by nature, is called genetic algorithms. These work by initially creating a set of random solutions known as the population. At each step of the optimization, the cost function for the entire population is calculated to get a ranked list of solutions. An example is shown in Table 5.1.
Table 5.1. Ranked list of solutions and costs
[7, 5, 2, 3, 1, 6, 1, 6, 7, 1, 0, 3]
[7, 2, 2, 2, 3, 3, 2, 3, 5, 2, 0, 8]
[0, 4, 0, 3, 8, 8, 4, 4, 8, 5, 6, 1]
[5, 8, 0, 2, 8, 8, 8, 2, 1, 6, 6, 8]
After the solutions are ranked, a new population—known as the next generation—is created. First, the top solutions in the current population are added to the new population as they are. This process is called elitism. The rest of the new population consists of completely new solutions that are created by modifying the best solutions.
There are two ways that solutions can be modified. The simpler of these is called mutation, which is usually a small, simple, random change to an existing solution. In this case, a mutation can be done simply by picking one of the numbers in the solution and increasing or decreasing it. A couple of examples are shown in Figure 5.3.
The other way to modify solutions is called crossover or breeding. This method involves taking two of the best solutions and combining them in some way. In this case, a simple way to do crossover is to take a random number of elements from one solution and the rest of the elements from another solution, as illustrated in Figure 5.4.
A new population, usually the same size as the old one, is created by randomly mutating and breeding the best solutions. Then the process repeats—the new population is ranked and another population is created. This continues either for a fixed number of iterations or until there has been no improvement over several generations.
def geneticoptimize(domain,costf,popsize=50,step=1, mutprob=0.2,elite=0.2,maxiter=100): # Mutation Operation def mutate(vec): i=random.randint(0,len(domain)-1) if random.random( )<0.5 and vec[i]>domain[i]: return vec[0:i]+[vec[i]-step]+vec[i+1:] elif vec[i]
This function takes several optional parameters:
Try optimizing the travel plans using the genetic algorithm in your Python session:>>>
s=optimization.geneticoptimize(domain,optimization.schedulecost)3532 3503 ... 2591 2591 2591 >>>
optimization.printschedule(s)Seymour BOS 12:34-15:02 $109 10:33-12:03 $ 74 Franny DAL 10:30-14:57 $290 10:51-14:16 $256 Zooey CAK 10:53-13:36 $189 10:32-13:16 $139 Walt MIA 11:28-14:40 $248 12:37-15:05 $170 Buddy ORD 12:44-14:17 $134 10:33-13:11 $132 Les OMA 11:08-13:07 $175 11:07-13:24 $171
The computer scientist John Holland is widely considered to be the father of genetic algorithms because of his 1975 book, Adaptation in Natural and Artificial Systems (University of Michigan Press). Yet the work goes back to biologists in the 1950s who were attempting to model evolution on computers. Since then, genetic algorithms and other optimization methods have been used for a huge variety of problems, including:
Finding which concert hall shape gives the best acoustics
Designing an optimal wing for a supersonic aircraft
Suggesting the best library of chemicals to research as potential drugs
Automatically designing a chip for voice recognition
Potential solutions to these problems can be turned into lists of numbers. This makes it easy to apply genetic algorithms or simulated annealing.
Whether a particular optimization method will work depends very much on the problem. Simulated annealing, genetic optimization, and most other optimization methods rely on the fact that, in most problems, the best solution is close to other good solutions. To see a case where optimization might not work, look at Figure 5.5.
The cost is actually lowest at a very steep point on the far right of the figure. Any solution that is close by would probably be dismissed from consideration because of its high cost, and you would never find your way to the global minimum. Most algorithms would settle in one of the local minima on the left side of the figure.
The flight scheduling example works because moving a person from the second to the third flight of the day would probably change the overall cost by a smaller amount than moving that person to the eighth flight of the day would. If the flights were in random order, the optimization methods would work no better than a random search—in fact, there's no optimization method that will consistently work better than a random search in that case.
Learn more about this topic from Programming Collective Intelligence.
This fascinating book demonstrates how you can build web applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.