Genetic Algorithms Digest   Monday, August 5 1991   Volume 5 : Issue 20

 - Send submissions to GA-List@AIC.NRL.NAVY.MIL
 - Send administrative requests to GA-List-Request@AIC.NRL.NAVY.MIL

Today's Topics:
	- Re: Building Block Hypothesis Considered Harmful (5 messages)

**********************************************************************

CALENDAR OF GA-RELATED ACTIVITIES: (with GA-List issue reference)

 IJCAI 91, International Joint Conference on AI, Sydney, AU   Aug 25-30, 1991
 First European Conference on Artificial Life (v5n10)         Dec 11-13, 1991
 ECAI 92, 10th European Conference on AI (v5n13)              Aug  3-7,  1992
 Parallel Problem Solving from Nature, Brussels, (CFP 4/15)   Sep 28-30, 1992
 10th National Conference on AI, San Jose, (CFP 1/15)         Jul 12-17, 1992
 Canadian AI Conference, Vancouver, (CFP 1/7)                 May 11-15, 1992

 (Send announcements of other activities to GA-List@aic.nrl.navy.mil)

**********************************************************************
----------------------------------------------------------------------

From: John.Henry@ub.cc.umich.edu
Date: Thu, 1 Aug 91 12:17:12 EDT
Subject: Re: Building Block Hypothesis Considered Harmful

   Just a note on one (fairly crucial) error in John G.'s discussion.  He
   says "... it [the GA] can no longer estimate the TRUE AVERAGE [my
   emphasis] payoff of the schemas."  Note that an expectation ("average") is
   always based on a distribution -- there is no such thing as a TRUE
   average.  I assume John means the average associated with a uniform
   distribution (but, of course, not even the initial population need be
   assigned according to a uniform distribution).  The schema theorem is
   EXPLICITLY distribution dependent (as is clearly evidenced by the form
   written in terms of probabilities instead of population numbers; see, for
   example, Theorem 6.2.3 of Chapter 6 in ADAPTATION IN NATURAL AND
   ARTIFICIAL SYSTEMS).  In fact, it shows how the DISTRIBUTION changes from
   one generation to the next (which, in turn, changes both the theoretical
   expectations associated with the sche- mata, and the observed averages).
   Others have made this mistake, and it has also lead to a misinterpretation
   of the relation between the n-armed bandit solutions and the changing role
   of related schemata (assuming that the bandits somehow remain constant
   over time in some TRUE configuration).

	John (Holland)

------------------------------

From: gref@AIC.NRL.Navy.Mil
Date: Thu, 1 Aug 91 16:02:09 EDT
Subject: Re: Building Block Hypothesis Considered Harmful

   To clarify my terms: wherever I used the term "true average" I was
   referring to the actual mean value of all points in the schema according
   the payoff function.  This is independent of whatever points happen to be
   in the population at any time.

   I don't see any disagreement between John Holland's observations and my
   own.  In particular, I know that John does not subscribe the the Strong
   Building Block Hypothesis himself, and as I said, one arrives at the SBBH
   by misreading the careful statement of the Schema Theorem and his
   discussion of the n-armed bandit problem in ANAS.

   - John G.

------------------------------

From: Stewart Wilson <wilson@cmns-sun.think.com>
Date: Thu, 01 Aug 91 16:27:57 EDT
Subject: Re: Building Block Hypothesis Considered Harmful 

   A terminological aside:

   I used "GA-easy" to describe the condition Gref refers to, but
   only because it was a more or less available label.  I did not like
   the term's built-in implication that such problems were necessarily
   easy for a genetic algorithm to solve, and in fact never stated
   that they were, necessarily.  The closest I came to this was in
   the sentence "The tentative implication of the results of this and
   the previous section is that within the class of functions that
   are GA-easy, and therefore presumably *actually* rather easy for
   a GA, there exists a quite large subclass of problems that are
   not easily solvable by "steepest-ascent", a basic hill-climbing
   technique.

   Because "GA-easy" has this value implication, I recently suggested
   that it not be used for the technical condition (of the fittest
   schema in each partition containing the optimum, etc.) and that
   another word be adopted for the condition.  That word is "veridical"
   (latin "true-saying", and used in such phrases as "veridical
   perception" meaning perception that does not deceive (!) you).
   Thus, 

   a veridical schema is one that is the fittest in its partition and
			   contains the global optimum,

   a veridical partition is one containing a veridical schema,

   a veridical problem (including encoding) is one for which every
			   partition over a uniform random (infinite?)
			   population is veridical.


   If you reserve "GA-easy" for problems that are in fact easy for
   a GA, then Gref's question becomes whether veridical problems
   are GA-easy (and deceptive, GA-hard).  

   I want to think further about his examples, etc., but will note 
   that I also provided everybody with another neat category, "obscure", 
   to describe problems in which, while the problems might be technically 
   *either* veridical or deceptive, things like statistical fluctuations or 
   extreme needliness would result in the GA's having very little reliable 
   information to go on, and so the result could go either way.  It seems
   to me Gref's second and third factors, limited population size and
   large variance within schemas, are varieties of obscurity.  His
   first factor, biased sampling due to previous convergence, seems to
   me the essential focus of the debate.

   -Stewart

------------------------------

From: wnm@horus.cs.Virginia.EDU
Date: Fri, 2 Aug 91 09:00:56 EDT
Subject: RE: Building Block Hypothesis ...

   John,

     You will be glad to hear that in the Theory Workshop
   Ken DeJong attempted to make the same point by
   trying to remind people that when discussing search
   processes it is not possible to discuss "hard" or
   "easy" or "deceptive" solution spaces (or criteria
   functions for optimization) without reference to
   the particulars of the assumed search process.
   The SBBH states the crucial particular of the
   assumed GA search process.

     Ken's point is fundamental to all algorithmic
   analysis, even discussion of "undecidable problems"
   is actually conditioned on a particular conceptualization
   of computation, albeit an extremely general one.

     W.N. Martin

------------------------------

From: David Chalmers <dave@cogsci.indiana.edu>
Date: Fri, 2 Aug 91 01:11:50 EST
Subject: Re: Building Block Hypothesis Considered Harmful

   Hear hear hear.  I've had precisely the same problem with GA theory for
   some time.  After being particularly frustrated at the FOGA workshop last
   year I was driven to write up the following notes to myself to try and
   express it -- some very similar points to the ones [John] makes,
   particularly in sections 2, 3, 4 below.  You might be interested to take a
   look, though it's very rough.
   (NB I don't work in GA theory, I'm just an ex-mathematician turned
   AI/ALife researcher who uses GAs occasionally.)
   --Dave Chalmers.

   
   Putting Schemas in Context

   Schema Thm says: number of schemas increases in proportion to average
   fitness of those in current population.

   Why this isn't all there is to say:

   (1)  Schemas present in optimal (or near-optimal) solutions
   may not necessarily have above average fitness when considered in *any* 
   context.  i.e. population-wide schema fitnesses may not reflect properties
   of the optimum.  (Standard "deceptiveness".)

   This doesn't necessarily spell doom for GA's, because (2).

   (2)  Current schema fitness, over a given population, doesn't reflect
   overall average schema fitness either.  Instead, it reflects fitness
   of schema *in context of* current population (i.e. in context of other
   present schemas, etc).  In particular, fitness is measured in a
   context of other pretty-fit schemas, not in a random context.  (e.g.
   fitness of genes for opposable thumb is relative to fitness of genes for an
   arm...  "Overall fitness" of those genes played no role in selection.)
   Morals:

    (a) "overall" schema fitnesses may be complete irrelevant, except for
	the first one or two generations of a run.  Very soon, populations
	become patterned.  What determines reproductive success of a schema
	is its av fitness in context of such a patterned population.

    (b) perhaps makes (1) less of a worry.  even if optimal strings don't
	reflect schemas with high av fitness, they *might* still reflect
	schemas that have high fitness in such patterned (i.e. quite fit)
	contexts.

   *So* -- there is almost no use for "overall" fitness of schemas at all.
   The notion should be discarded.  And definitions of deceptiveness,
   epistasis etc that use such population averages should be given the boot.

   OK, so now what we've established is that *context-relative* fitnesses
   of schemas is what is important.  Question now is whether fit schemas
   *in context of partly-evolved population* reflect schemas present in the
   optimum.  We hope so.

   (3)  Another problem with blind use of schema theorem.  Reproduction is
   *not* proportional to mean value in current pop.  Or rather, it is,
   but only for one generation, and that may not be interesting.  How about
   looking at effect over 4-5 generations?  Trouble is that after a 
   generation has passed, the distribution of that schema is changed, and
   reproduction is relative to av fitness of schema in *new* population,
   which will be higher because the instances of the schema that reproduced
   were the *good* ones.  Say (oversimplifying) the best 50%.  So reproduction
   in the next generation is proportional to the mean of these, or to
   about the 75th percentile of the original population of this schema.

   So if we want to look at numbers over 4-5 generations, schema reproduction
   will be proportional not to the mean, but maybe to (say) the 90th percentile
   of the schemas in the current pop.  Reproduction favours the good ones, and
   they reproduce further.  This corresponds to Grefenstette/Baker's point that
   the GA 2-armed bandit doesn't just go to highest mean, but to highest
   variance as well.  Roughly, to best "90th percentile", say.

   After 4-5 generations the character of the population will have changed
   sufficiently that this kind of argument won't be enlightening.  Too many
   new contexts will have arisen.

   (4)  So the major question isn't how fitness of small schemas goes together
   to determining fitness of big ones, but rather whether fitness of
   ever-changing schemas can create a sequence of dynamically evolving
   *contexts* C1, C2, ...Cn (say, 5-10 generations apart) -- such that
   fit schemas in context C1 produce context C2, fit schemas in context C2
   produce context C3,..., fit schemas in context C(n-1) produce context Cn
   which contains an optimal string.  i.e. a much more dynamic approach to
   schema selection.  (Happens in nature, after all.)

   So what we need to do is to move away from the single recurrence equation
   mode-of-thought suggested by a simplistic approach to the schema theorem,
   and instead analyze GA's as a *coupled* system -- schema frequencies (S) 
   and contexts (C) evolving in tandem.  S(n+1) is a function of both S(n)
   and C(n), and furthermore the new context C(n+1) is also a function
   of S(n) and C(n).  The dynamics of this coupled system -- how schema
   frequencies and contexts coevolve -- should yield more complex but more
   realistic dynamics.

   Things are simplified by noting that S and C aren't two separate entities,
   but rather two sides of the same coin.  You can see the context just as
   the table of schema frequencies over all schemas.  So maybe we can get
   back to a single recurrence, but a much more sophisticated kind.  In the
   old system, the value of S(n+1)(K) [where K is a given scheme] is treated
   as a function solely of S(n)(K); in the new system, S(n+1)(K) is also
   dependent on S(n)(L) for every other schema L in the population (because
   the context C is made up of all such values S(n)(L)).  So now the
   "vector" S(n) of schema frequencies evolves holistically, rather than
   each schema frequency evolving separately on its own thread.  Now just
   have to set up the math properly...

   This of course isn't incompatible with anything that the Schema thm says.
   Its just that in practice it seems that people using the Schema Thm seem to
   assume uniform non-biased contexts, which just don't exist after a couple
   of generations, and this assumption leads to the simplistic picture of GA
   dynamics given by the simple recurrence.

   (5)  Now the results of GA runs seem to be very non-deterministic, final
   products vary greatly from run to run.  If GA's were a direct instantiation
   of "neat schema build-up" we wouldn't  really expect this.  So what
   "non-neat" property is causing this?

   My guess: "low-order" schemata aren't really selected at all.  On many
   intersting problems low-order things have pretty uniform fitness (intuition,
   check it out).  Instead, what is selected early are one or two useful
   *high-order* schemata, which hapopen to have decent fitness.  Selected
   because they happen to be present in the population, and fitness is
   robust under crossover with different schemata.

   This is sort of contrary to accepted wisdom, which has high-order being
   constructed out of low-order.  But this was really only "wisdom" because
   the idea was "how else would you get good high-order stuff?  Surely
   not by chance."  Answer: yes, by chance.  There are quite a lot of
   good high-order schemata around, in a reasonable population with a few
   generations you're sure to come across something decent.  Now, you may not
   come across the *best*, but who said you had to?  Of course, chance
   dictates that from run to run it will be *different* high-order schemata
   that get selected early.  The somewhat chancy character of the early
   high-order schemata is what determines the direction of the entire run,
   very likely, and this is why different runs produce very different results
   on interesting problems.  You get context-dependent optima.

------------------------------
End of Genetic Algorithms Digest
******************************