Genetic Algorithms Digest  Tuesday, April 4, 1995 Volume 9 : Issue 19

 - Do NOT send email or reply to gadistr@aic.nrl.navy.mil (GA List Moderator)
 - Send submissions (articles) to GA-List@AIC.NRL.NAVY.MIL
 - Send administrative (subscribe, unsubscribe, change of address, etc.,)
   requests to GA-List-Request@AIC.NRL.NAVY.MIL
 ******************************************************************************
 - You can access back issues, GA code, conference announcements, etc.,
   either through the WWW at URL http://www.aic.nrl.navy.mil/galist/ or
   through anonymous ftp at ftp.aic.nrl.navy.mil [192.26.18.68] in /pub/galist.
 ******************************************************************************

Today's Topics:

	- NFL Reply
	- Reply to question on face generation (Re: v9n18)
	- Poor reviews of GA papers (Re: v9n17)

----------------------------------------------------------------------
*******************************************************************************

CALENDAR OF GA-RELATED ACTIVITIES: (with GA-List issue reference)

ICANNGA95 Intl Conf on Artificial NNs and GAs, France (v8n10)   Apr 18-21, 95
IAS-95 Intelligent Adaptive Systems, Melbourne Beach FL (v8n32)    Apr 26, 95
CAD95 Computer Aided Design / "GA Day", Yalta  City, CIS (v8n46) May 4-13, 95
CSCS10 10th Intl Conf on CONTROL SYSTEMS & CS, Romania (v8n37)  May 24-26, 95
ECAL95 3rd European Conf on Artificial Life, Granada, Spain(v8n5) Jun 4-6, 95
GWIC95 Golden West Intl Conf on Intell Systems, San Fran(v8n49) Jun 12-14, 95
ML95 GP Workshop: Theory to real-world applications, CA (v9n8)      Jul 9, 95
AAICG95 Symposium on Appl of AI in Geophysics, Boulder, CO (v8n39) Jul 12, 95
ICGA-95 Sixth Intl Conference on GAs, Pittsburgh PA (v8n32)     Jul 15-20, 95
PASE95 Wrkshp Parallel Appl in Stat and Econ, Trier (v8n39)  Aug 29-Sep 2, 95
EA95 Evolution Artificielle, Telecom Bretagne, Brest, Fr (v9n1)   Sep 4-8, 95
GALESIA'95 GAs in Eng Systems, Univ of Sheffield, UK (v8n45)    Sep 12-14, 95
AIPetro95 AI in the Petroleum Ind, Lillehammer, Norway (v8n48)  Sep 13-15, 95
ASI-AA-95 Practice and Future of Autonomous Agents (v8n19)   Sep 23-Oct 1, 95
SOCO95 Soft Computing Fuzzy Logic, NNs and GAs, Rochester(v9n5) Oct 24-27, 95
Genetic Methods for Routing & Scheduling, New Orleans(v8n50) Oct 29-Nov 1, 95
AAAI GP Symposium (Fall Series), Cambridge, MA (v8n43)          Nov 10-12, 95
IEEE Wrkshp on Natural Algorithms in Signal Proc, Essex (v9n7)  Nov 12-14, 95
WWW'95 on Fuzzy Logic and NNs/Evol Comp, Nagoya, Japan (v9n6)   Nov 14-15, 95
ICEC'95 IEEE Intl Conf on Evol. Computing, Perth (v8n36)     Nov 29-Dec 1, 95
EUROGEN95 GAs and ESs in Computational Sci & Eng, Spain (v9n15)   Dec 4-8, 95
EP96 5th Conf on Evol Programming, San Diego, CA (v9n18)     Feb 29-Mar 3, 96
ICEC'96 IEEE Intl Conf on Evol Comp, Nagoya, Japan (v9n18)      May 20-22, 96
GP96 Genetic Programming Conference, Stanford, CA (v9n9)        Jul 28-31, 96
PPSN 96 Parallel Problem Solving from Nature, Berlin (v9n17)    Sep 22-27, 96

(Send announcements of other activities to GA-List@aic.nrl.navy.mil)

*******************************************************************************
------------------------------

From: wgm@santafe.edu (Bill Macready)
Date: Fri, 24 Mar 95 14:19:15 MST
Subject: NFL Reply

Just when you hoped the NFL discussion was dying down...

We fail to understand what animates Mr. Park in his attack on our
using uniform P(f) (the distribution over fitness functions) for some
of the results in our paper as somehow invalidating the whole body of
work.

First, as we have stated several times now, many of our results are
completely independent of P(f). This is the case, for example, for our
benchmarks of a search algorithm's performance, for the information
theoretic and geometric calculations we perform, and for our proof
that for *any* fixed f, you can not use previous performance of an
algorithm on that f to infer anything concerning future performance.

As for the NFL theorems themselves, we are again confused by
Mr. Park. Did he think that we were somehow claiming that P(f) is
uniform in the real world? As we have stated time and again, we use
uniform P(f) in the NFL theorems merely as a tool, to make certain
points. One of which is that you must make an assumption, implicitly
or otherwise, if you want to justify your search algorithm. Whether or
not uniform P(f) is dominated by "random" f's is irrelevent as far
this conclusion is concerned.

In fact, the burden is not on us to "justify" our choice of
P(f). Rather the burden is on the user of a search algorithm to
justify the (almost always implicit) choice of P(f) (s)he is
making. In fact, THAT IS THE WHOLE POINT. (And of course assumptions
for such priors are often wrong - we again direct interested readers
to the exciting new results in supervised learning where the cautions
of NFL-type theorems are being borne out in real-world scenarios.)

For the record though, it turns out that one can make P(f)
significantly non-uniform, in any of a number of reasonable ways, and
the NFL results still hold. We thank Tal Grossman and Raja Das for
pointing this out to us.

It is unfortunate that this particular discussion has focused on
issues of Kolmogorov complexity, which are at best of peripheral
concern. We address Mr. Park's concerns below, but this digression
merely obscures the important issues concerning our paper. The
NFL-overloaded reader has been warned! (Might we also suggest Mr. Park
that if you would like to continue this discussion that we do it in
private and not bother the ga-list readers)


====================== MR PARKS CONCERNS  ============================

We are well aware of Kolmogorov (aka Solomonoff, aka Chaitin, aka
algorithmic information) complexity. It has reared its fascinating -
but ultimately pointless - head in two other areas where we have done
research (supervised learning and the thermodynamics of computation).

Our first point though is that even if Kolmogorov complexity *were*
relevant in the sense that Mr. Park thinks, his conclusions are still
quite questionable.

To see this, let's simplify things out of the Kolmogorov context. Say
that the fitness function is being implemented in < 5000 lines of C++
code. (I.e., C++ is our "compression" scheme.)

Often the fitness function is actually provided by the real world, so
this assumption is wrong - see below. In addition, the use of C++ is
clearly extremely arbitrary, perhaps crucially so (again, see
below). But w/o this assumption or something similar, we can't
understand Mr. Park's concerns. I.e., his concerns seem to demand this
kind of an assumption. So let's make it, and see where we go.

Now for a big search space, the percentage of all fitness functions
that can be implemented in < 5000 lines of C++ is infinitesimal. One
might infer from this that results based on averaging over all fitness
functions can not possibly have relevence to a scenario in which you
must use < 5000 line of C++ code.

As we understand this is Mr. Parks true concern. Please Mr. Park
correct us if we are misinterpreting you.

However note that the very reason that people use C++ (or any
language) is because it is so flexible and powerful. In a very real
sense, for any actual fitness function there exists a <5000 line C++
fitness function program that "approximates" it quite well, especially
for searches that are not extremely long.

In essence, we can think of no reason to believe that the statistics
over the space of all functions won't be an exceedingly good
approximation to the statistics over the space of <5000 lines of code
functions.

To give one simple example brought up before: a pseudo-random number
generator can be implemented in a tiny number of lines of code. But
obviously how well the standard search algorithms will do on a fitness
function generated by such a random number generator will barely
differ at all from how they would do on fitness functions generated by
true random number generators. (In fact, that's the whole basis for
using pseudo-random number generators, that they well-approximate
"true" randomness.) We repeat ourselves: does Mr. Park believe that a
GA will perform significantly differently on a pseudo-random function
than on a truly random one?

Unless you somehow build an algorithm that explicitly tries to search
for small (in C++) programs that could have generated the fitness
function, it seems absurd to us to believe that restricting things to
<5000 line fitness functions will result in different average
performance than allowing all fitness functions.

On to Kolmogorov Complexity, and its (current) irrelevance...

In the current context, its pointlessness is a direct consequence of
the fact that it is fundamentally predicated upon an infinite system
(namely a Turing machine) whereas the systems we are considering are
finite. (As an aside, the same problem afflicts arguments that try to
address the NFL theorems that are based on Lebesgue measures and the
rationals.)

To be more precise: What Mr. Park fails to mention in his precis of
Kolmogorov complexity is that the innocuous overall "additive
constant" in Kolmogorov complexity is in fact the whole ballgame if
you're encoding finite sets.

Think about it, it's really quite trivial: I have a finite set, and
can use any encoding scheme I desire. *Obviously* there is no general
(!!) sense in which I can say that one element of the set is any
harder or easier to encode than some other element. The encoding is
simply a mapping from the elements of the set to a bit pattern. That
mapping can be anything; no one element of the set is more or less
intrinsically compressible than any other. Mr. Park is simply wrong to
think that there is some such intrinsic sense in which certain cost
functions are more or less compressible than others.

(Note that we are here operating in Mr. Park's context where nothing
is specified about the encoding scheme - obviously if there are
restrictions on the encoding scheme things change. Just as obviously,
it would then be up to Mr. Park to justify those restrictions.)

For finite sets, Kolmogorov complexity can not be used to distinguish
"random" from "non-random" elements of the set. Change the universal
Turing machine reading the tape, and you can get any element you want
to be "random" or "incompressible". When Mr. Park makes his conclusion
at the end of the following, he is simply wrong. Full stop. (Or to be
more precise: all x have the same min_{reference frame} K(x).)

Mr. Park writes

>>>
>It turns out that up to an additive constant which is independent
>of the particular object at hand, a nonrandom (compressible) object
>relative to one universal TM will remain compressible relative to any
>other. Analogously for random objects. That is, K(x) is an inherent
>property of x, independent of the reference frame.
>>>

>>>
>Some functions (e.g., constant functions) obviously
>have short descriptions ( << |A| log |B|).
>>>

ALL FUNCTIONS HAVE SHORT DESCRIPTIONS. More precisely, given any
pre-fixed function, if I am free to choose the description - as I am
in Mr. Park's scenario - I can always choose the description scheme to
have that function consist of the shortest possible number of bits.
(If Mr. Park restricts himself to Turing machines of fixed number of
internal states, or some such, then, again, things change. But i) he
would have to justify that, and ii) that ain't Kolmogorov complexity.)

In general, for any fixed description scheme, most functions will be
"random". But so what? Which functions are random changes completely
based on the description scheme.

====================================================================


back to more general issues...

Mr. Park writes

>>>
>This NFL statement depends crucially on the uniform distribution assumption.
> Although W & M claim in their commentary (GA-List v9n12) that they have
> generalized the result to work for any distribution,
>
> >	It should also be noted that some of these "subsequent
> > investigations" hold for any and all distributions P(f), in which case
> > the whole issue of whether one should take to heart implications of
> > the uniform P(f) case are rendered mute. For example, we show that for
> > any f, there is no formal justification for choosing between two
> > search algorithms based solely (!) on which has performed best so
> > far. (Formally, if one averages over all pairs of algorithms, any
> > procedure for choosing between two algorithms based on performance so
> > far is just as good as any other, regardless of f.)
>
> it is trivial to construct counter-examples of probability distributions for
> which the statement is false, hence the issue is not "rendered mute." In the
> qualifying statement "For example, ..." they go on to describe a result,
> which to me, seems to be a different statement from the original NFL statement.
> In the absence of details, I will refrain from speculating what is exactly
> being proven.
>>>

The details are not absent, THEY ARE IN THE PAPER. Moreover, there is
no counter-example to our claim - we suggest Mr. Park show us one such
example. More generally, is he somehow confusing our "For example"
with a restatement of the NFL theorems? Obviously we are not claiming
NFL holds for any P(f).


>>>
> 1. Claims of expected behavior being equal across all algorithms (i.e., search
>    algorithms as W & M formulate them) crucially depends on the uniformity
>    assumption over F
>>>

Nope. To give just one counter-example to Mr. Park's statement: The
NFL theorem holds for any P(f) of the form P(f) = prod_{x \in X}
g(f(x)), for any function g(.) that is non-zero and
normalized. Uniform g(.)  gives the uniform P(f) case. Non-uniform
g(.) does not. It also probably holds (we haven't formally checked
yet) for P(f)'s where the y values at different x values are tightly
correlated. As an example, take the case where all of the y's at
different x's are distinct, then sum over all possible permutations of
the y values.


>>>
>    Two, typical fitness functions (i.e., fitness functions which cannot be
> compressed below |A| log |B|) will never be encountered in practice, nor
> when artificially concocted. Consider a 300-variable (for simplicity, make
> each variable binary) fitness function. A "typical" 300-variables fitness
> function relative to the uniform distribution needs at least 2^300 bits to be
> represented, a super-astronomical number. Provably there is no way to
> compress it. This is where we use Kolmogorov complexity. It has nothing to do
> with search itself.
>>>

This is specious. How in the world can Mr. Park presume to speak for
Nature, as to what can be encountered in practice? Why in the world
should Nature care about Turing machines? In fact, his presumption is
wrong.  To give one simple example, consider the time series of the
stock market. It is very likely that it is governed by a process
consisting of far more than 10^23 (Avogadro's number) of degrees of
freedom. "Algormithmically simple" it is not. The same presumably
applies to the search process of the "original GA", namely natural
selection in the full Terran biosphere.

More generally (and to repeat ourselves), give me your "typical"
300-variable function. Then I can create an encoding in which I
encode that function in 1 bit.  Alternatively, give me an "atypical"
function that you think can be encoded in 1 bit. I can find a
different encoding (in fact, a huge number of encodings) in which it
takes 2^300 bits.

And to which the appropriate reply is "so what"? Since when do we know
that nature does or doesn't create "typical" problems?

To return to a previous theme, Mr. Park might phrase his comments with
a particular encoding in mind (say the C++ language) with which we can
certainly say that some cost functions require long C++ routines while
others are short. It's true that most of the functions we sum over
will require large C routines. However we can easily imagine other
encodings where the functions that were lengthy in C are now short
(and vice versa). Consider the exciting new application of
optimization to the design of molecules with specific
functionality. In this case the encoding is protein chemistry;
certainly difficult to code up in C++!

>>>
>It is very possible that someone
>may prove that for some probability distribution that does
>concentrate most of its measure on compressible fitness functions
>simulated annealing outperforms GAs (or vise versa) on all but such
>and such functions with high probability.
>>>

No. Again, "compressibility" without restrictions on the encoding
scheme is a meaningless notion in the current context. You might very
well come up with a particular encoding scheme, and then prove your
putative result. But there is no way you can prove it w/o
specification of the encoding scheme. (You can't even formally state
it in that sense.)

In fact, in our section on the geometry of search, we analyze the very
point that for a pre-fixed encoding scheme - i.e., for a particular
P(f) - not all algorithms do the same.


***

As a final comment, if Mr. Park wished to use some other measure of
randomness - e.g., degrees of correlation - besides Kolmogorov
complexity, most of our commentary above wouldn't apply. But of
course, we also have replies to arguments based on those kinds of
randomness. :-)

David and Bill

------------------------------

From: djang@wsmr-emh91.army.mil (Philipp A. Djang)
Date: Fri, 31 Mar 1995 08:41:40 -0700 (MST)
Subject: reply to question on face generation (Re: v9n18)

>From: Peter Hancock <p.j.b.hancock@psych.stir.ac.uk>
>Subject: Face generation by GA
>
>In the 4th ICGA, Caldwell and Johnston had a paper on using a GA to
>drive selection of identikit-like features to make up faces, which
>were rated by witnesses for similarity to a suspect.  This allowed the
>witnesses to derive good likenesses to the suspect.  The idea is based
>on Dawkin's breeder program.  Does anyone know of further work in this
>area, or the current whereabouts of C or J (they aren't known to
>email at the NMSU)?  I'm aware of some as yet unpublished work in the
>UK Home Office, but nothing else other than rumours of a
>near-commercial development of C&J's work.

Greetings from New Mexico State.  Victor Johnston can be reached at
vjohnston@nmsu.edu.  I believe that Craig Caldwell has graduated and
moved on greater things.  We had a GA-day at NMSU last summer and
Victor gave a wonderful talk on his research.

Another reference to this subject can be found in the 5th GA: look in
the one-page summaries or poster sessions. I would give you the exact
reference, except somebody has borrowed my copy of the 5th GA.

Philipp Djang

------------------------------

From: Peter Ross <peter@aisb.ed.ac.uk>
Date: Fri, 31 Mar 95 12:19:02 BST
Subject: Poor reviews of GA papers (Re: v9n17)

With regard to Garry Greenwood's item in v9n17 ..

Over the years I have, for a variety of reasons, seen a lot of reviews
of papers in a lot of different areas of AI. My impression is that there
is very wide variation in reviewing standards. Sometimes a referee's
report is hopeless: wrong, or rude, or both. Finding good, appropriate
referees is hard; for some reason the klutzes seem to have more time for
such things ... Moreover, with the best will in the world, the general
pressure upon people's time sometimes means that the machinery fails
properly to filter what goes back to authors.

Having seen what happens in a variety of areas I don't really think that
GA people should feel unduly cursed with refereeing horror stories!

[BTW, this is a general comment: I had no connection with the case
mentioned]

Peter Ross
Dept of AI
University of Edinburgh

------------------------------
End of Genetic Algorithms Digest
******************************