Genetic Algorithms Digest Monday, August 5 1991 Volume 5 : Issue 20 - Send submissions to GA-List@AIC.NRL.NAVY.MIL - Send administrative requests to GA-List-Request@AIC.NRL.NAVY.MIL Today's Topics: - Re: Building Block Hypothesis Considered Harmful (5 messages) ********************************************************************** CALENDAR OF GA-RELATED ACTIVITIES: (with GA-List issue reference) IJCAI 91, International Joint Conference on AI, Sydney, AU Aug 25-30, 1991 First European Conference on Artificial Life (v5n10) Dec 11-13, 1991 ECAI 92, 10th European Conference on AI (v5n13) Aug 3-7, 1992 Parallel Problem Solving from Nature, Brussels, (CFP 4/15) Sep 28-30, 1992 10th National Conference on AI, San Jose, (CFP 1/15) Jul 12-17, 1992 Canadian AI Conference, Vancouver, (CFP 1/7) May 11-15, 1992 (Send announcements of other activities to GA-List@aic.nrl.navy.mil) ********************************************************************** ---------------------------------------------------------------------- From: John.Henry@ub.cc.umich.edu Date: Thu, 1 Aug 91 12:17:12 EDT Subject: Re: Building Block Hypothesis Considered Harmful Just a note on one (fairly crucial) error in John G.'s discussion. He says "... it [the GA] can no longer estimate the TRUE AVERAGE [my emphasis] payoff of the schemas." Note that an expectation ("average") is always based on a distribution -- there is no such thing as a TRUE average. I assume John means the average associated with a uniform distribution (but, of course, not even the initial population need be assigned according to a uniform distribution). The schema theorem is EXPLICITLY distribution dependent (as is clearly evidenced by the form written in terms of probabilities instead of population numbers; see, for example, Theorem 6.2.3 of Chapter 6 in ADAPTATION IN NATURAL AND ARTIFICIAL SYSTEMS). In fact, it shows how the DISTRIBUTION changes from one generation to the next (which, in turn, changes both the theoretical expectations associated with the sche- mata, and the observed averages). Others have made this mistake, and it has also lead to a misinterpretation of the relation between the n-armed bandit solutions and the changing role of related schemata (assuming that the bandits somehow remain constant over time in some TRUE configuration). John (Holland) ------------------------------ From: gref@AIC.NRL.Navy.Mil Date: Thu, 1 Aug 91 16:02:09 EDT Subject: Re: Building Block Hypothesis Considered Harmful To clarify my terms: wherever I used the term "true average" I was referring to the actual mean value of all points in the schema according the payoff function. This is independent of whatever points happen to be in the population at any time. I don't see any disagreement between John Holland's observations and my own. In particular, I know that John does not subscribe the the Strong Building Block Hypothesis himself, and as I said, one arrives at the SBBH by misreading the careful statement of the Schema Theorem and his discussion of the n-armed bandit problem in ANAS. - John G. ------------------------------ From: Stewart Wilson Date: Thu, 01 Aug 91 16:27:57 EDT Subject: Re: Building Block Hypothesis Considered Harmful A terminological aside: I used "GA-easy" to describe the condition Gref refers to, but only because it was a more or less available label. I did not like the term's built-in implication that such problems were necessarily easy for a genetic algorithm to solve, and in fact never stated that they were, necessarily. The closest I came to this was in the sentence "The tentative implication of the results of this and the previous section is that within the class of functions that are GA-easy, and therefore presumably *actually* rather easy for a GA, there exists a quite large subclass of problems that are not easily solvable by "steepest-ascent", a basic hill-climbing technique. Because "GA-easy" has this value implication, I recently suggested that it not be used for the technical condition (of the fittest schema in each partition containing the optimum, etc.) and that another word be adopted for the condition. That word is "veridical" (latin "true-saying", and used in such phrases as "veridical perception" meaning perception that does not deceive (!) you). Thus, a veridical schema is one that is the fittest in its partition and contains the global optimum, a veridical partition is one containing a veridical schema, a veridical problem (including encoding) is one for which every partition over a uniform random (infinite?) population is veridical. If you reserve "GA-easy" for problems that are in fact easy for a GA, then Gref's question becomes whether veridical problems are GA-easy (and deceptive, GA-hard). I want to think further about his examples, etc., but will note that I also provided everybody with another neat category, "obscure", to describe problems in which, while the problems might be technically *either* veridical or deceptive, things like statistical fluctuations or extreme needliness would result in the GA's having very little reliable information to go on, and so the result could go either way. It seems to me Gref's second and third factors, limited population size and large variance within schemas, are varieties of obscurity. His first factor, biased sampling due to previous convergence, seems to me the essential focus of the debate. -Stewart ------------------------------ From: wnm@horus.cs.Virginia.EDU Date: Fri, 2 Aug 91 09:00:56 EDT Subject: RE: Building Block Hypothesis ... John, You will be glad to hear that in the Theory Workshop Ken DeJong attempted to make the same point by trying to remind people that when discussing search processes it is not possible to discuss "hard" or "easy" or "deceptive" solution spaces (or criteria functions for optimization) without reference to the particulars of the assumed search process. The SBBH states the crucial particular of the assumed GA search process. Ken's point is fundamental to all algorithmic analysis, even discussion of "undecidable problems" is actually conditioned on a particular conceptualization of computation, albeit an extremely general one. W.N. Martin ------------------------------ From: David Chalmers Date: Fri, 2 Aug 91 01:11:50 EST Subject: Re: Building Block Hypothesis Considered Harmful Hear hear hear. I've had precisely the same problem with GA theory for some time. After being particularly frustrated at the FOGA workshop last year I was driven to write up the following notes to myself to try and express it -- some very similar points to the ones [John] makes, particularly in sections 2, 3, 4 below. You might be interested to take a look, though it's very rough. (NB I don't work in GA theory, I'm just an ex-mathematician turned AI/ALife researcher who uses GAs occasionally.) --Dave Chalmers. Putting Schemas in Context Schema Thm says: number of schemas increases in proportion to average fitness of those in current population. Why this isn't all there is to say: (1) Schemas present in optimal (or near-optimal) solutions may not necessarily have above average fitness when considered in *any* context. i.e. population-wide schema fitnesses may not reflect properties of the optimum. (Standard "deceptiveness".) This doesn't necessarily spell doom for GA's, because (2). (2) Current schema fitness, over a given population, doesn't reflect overall average schema fitness either. Instead, it reflects fitness of schema *in context of* current population (i.e. in context of other present schemas, etc). In particular, fitness is measured in a context of other pretty-fit schemas, not in a random context. (e.g. fitness of genes for opposable thumb is relative to fitness of genes for an arm... "Overall fitness" of those genes played no role in selection.) Morals: (a) "overall" schema fitnesses may be complete irrelevant, except for the first one or two generations of a run. Very soon, populations become patterned. What determines reproductive success of a schema is its av fitness in context of such a patterned population. (b) perhaps makes (1) less of a worry. even if optimal strings don't reflect schemas with high av fitness, they *might* still reflect schemas that have high fitness in such patterned (i.e. quite fit) contexts. *So* -- there is almost no use for "overall" fitness of schemas at all. The notion should be discarded. And definitions of deceptiveness, epistasis etc that use such population averages should be given the boot. OK, so now what we've established is that *context-relative* fitnesses of schemas is what is important. Question now is whether fit schemas *in context of partly-evolved population* reflect schemas present in the optimum. We hope so. (3) Another problem with blind use of schema theorem. Reproduction is *not* proportional to mean value in current pop. Or rather, it is, but only for one generation, and that may not be interesting. How about looking at effect over 4-5 generations? Trouble is that after a generation has passed, the distribution of that schema is changed, and reproduction is relative to av fitness of schema in *new* population, which will be higher because the instances of the schema that reproduced were the *good* ones. Say (oversimplifying) the best 50%. So reproduction in the next generation is proportional to the mean of these, or to about the 75th percentile of the original population of this schema. So if we want to look at numbers over 4-5 generations, schema reproduction will be proportional not to the mean, but maybe to (say) the 90th percentile of the schemas in the current pop. Reproduction favours the good ones, and they reproduce further. This corresponds to Grefenstette/Baker's point that the GA 2-armed bandit doesn't just go to highest mean, but to highest variance as well. Roughly, to best "90th percentile", say. After 4-5 generations the character of the population will have changed sufficiently that this kind of argument won't be enlightening. Too many new contexts will have arisen. (4) So the major question isn't how fitness of small schemas goes together to determining fitness of big ones, but rather whether fitness of ever-changing schemas can create a sequence of dynamically evolving *contexts* C1, C2, ...Cn (say, 5-10 generations apart) -- such that fit schemas in context C1 produce context C2, fit schemas in context C2 produce context C3,..., fit schemas in context C(n-1) produce context Cn which contains an optimal string. i.e. a much more dynamic approach to schema selection. (Happens in nature, after all.) So what we need to do is to move away from the single recurrence equation mode-of-thought suggested by a simplistic approach to the schema theorem, and instead analyze GA's as a *coupled* system -- schema frequencies (S) and contexts (C) evolving in tandem. S(n+1) is a function of both S(n) and C(n), and furthermore the new context C(n+1) is also a function of S(n) and C(n). The dynamics of this coupled system -- how schema frequencies and contexts coevolve -- should yield more complex but more realistic dynamics. Things are simplified by noting that S and C aren't two separate entities, but rather two sides of the same coin. You can see the context just as the table of schema frequencies over all schemas. So maybe we can get back to a single recurrence, but a much more sophisticated kind. In the old system, the value of S(n+1)(K) [where K is a given scheme] is treated as a function solely of S(n)(K); in the new system, S(n+1)(K) is also dependent on S(n)(L) for every other schema L in the population (because the context C is made up of all such values S(n)(L)). So now the "vector" S(n) of schema frequencies evolves holistically, rather than each schema frequency evolving separately on its own thread. Now just have to set up the math properly... This of course isn't incompatible with anything that the Schema thm says. Its just that in practice it seems that people using the Schema Thm seem to assume uniform non-biased contexts, which just don't exist after a couple of generations, and this assumption leads to the simplistic picture of GA dynamics given by the simple recurrence. (5) Now the results of GA runs seem to be very non-deterministic, final products vary greatly from run to run. If GA's were a direct instantiation of "neat schema build-up" we wouldn't really expect this. So what "non-neat" property is causing this? My guess: "low-order" schemata aren't really selected at all. On many intersting problems low-order things have pretty uniform fitness (intuition, check it out). Instead, what is selected early are one or two useful *high-order* schemata, which hapopen to have decent fitness. Selected because they happen to be present in the population, and fitness is robust under crossover with different schemata. This is sort of contrary to accepted wisdom, which has high-order being constructed out of low-order. But this was really only "wisdom" because the idea was "how else would you get good high-order stuff? Surely not by chance." Answer: yes, by chance. There are quite a lot of good high-order schemata around, in a reasonable population with a few generations you're sure to come across something decent. Now, you may not come across the *best*, but who said you had to? Of course, chance dictates that from run to run it will be *different* high-order schemata that get selected early. The somewhat chancy character of the early high-order schemata is what determines the direction of the entire run, very likely, and this is why different runs produce very different results on interesting problems. You get context-dependent optima. ------------------------------ End of Genetic Algorithms Digest ******************************