Genetic Algorithms Digest Tuesday, April 4, 1995 Volume 9 : Issue 19 - Do NOT send email or reply to gadistr@aic.nrl.navy.mil (GA List Moderator) - Send submissions (articles) to GA-List@AIC.NRL.NAVY.MIL - Send administrative (subscribe, unsubscribe, change of address, etc.,) requests to GA-List-Request@AIC.NRL.NAVY.MIL ****************************************************************************** - You can access back issues, GA code, conference announcements, etc., either through the WWW at URL http://www.aic.nrl.navy.mil/galist/ or through anonymous ftp at ftp.aic.nrl.navy.mil [192.26.18.68] in /pub/galist. ****************************************************************************** Today's Topics: - NFL Reply - Reply to question on face generation (Re: v9n18) - Poor reviews of GA papers (Re: v9n17) ---------------------------------------------------------------------- ******************************************************************************* CALENDAR OF GA-RELATED ACTIVITIES: (with GA-List issue reference) ICANNGA95 Intl Conf on Artificial NNs and GAs, France (v8n10) Apr 18-21, 95 IAS-95 Intelligent Adaptive Systems, Melbourne Beach FL (v8n32) Apr 26, 95 CAD95 Computer Aided Design / "GA Day", Yalta City, CIS (v8n46) May 4-13, 95 CSCS10 10th Intl Conf on CONTROL SYSTEMS & CS, Romania (v8n37) May 24-26, 95 ECAL95 3rd European Conf on Artificial Life, Granada, Spain(v8n5) Jun 4-6, 95 GWIC95 Golden West Intl Conf on Intell Systems, San Fran(v8n49) Jun 12-14, 95 ML95 GP Workshop: Theory to real-world applications, CA (v9n8) Jul 9, 95 AAICG95 Symposium on Appl of AI in Geophysics, Boulder, CO (v8n39) Jul 12, 95 ICGA-95 Sixth Intl Conference on GAs, Pittsburgh PA (v8n32) Jul 15-20, 95 PASE95 Wrkshp Parallel Appl in Stat and Econ, Trier (v8n39) Aug 29-Sep 2, 95 EA95 Evolution Artificielle, Telecom Bretagne, Brest, Fr (v9n1) Sep 4-8, 95 GALESIA'95 GAs in Eng Systems, Univ of Sheffield, UK (v8n45) Sep 12-14, 95 AIPetro95 AI in the Petroleum Ind, Lillehammer, Norway (v8n48) Sep 13-15, 95 ASI-AA-95 Practice and Future of Autonomous Agents (v8n19) Sep 23-Oct 1, 95 SOCO95 Soft Computing Fuzzy Logic, NNs and GAs, Rochester(v9n5) Oct 24-27, 95 Genetic Methods for Routing & Scheduling, New Orleans(v8n50) Oct 29-Nov 1, 95 AAAI GP Symposium (Fall Series), Cambridge, MA (v8n43) Nov 10-12, 95 IEEE Wrkshp on Natural Algorithms in Signal Proc, Essex (v9n7) Nov 12-14, 95 WWW'95 on Fuzzy Logic and NNs/Evol Comp, Nagoya, Japan (v9n6) Nov 14-15, 95 ICEC'95 IEEE Intl Conf on Evol. Computing, Perth (v8n36) Nov 29-Dec 1, 95 EUROGEN95 GAs and ESs in Computational Sci & Eng, Spain (v9n15) Dec 4-8, 95 EP96 5th Conf on Evol Programming, San Diego, CA (v9n18) Feb 29-Mar 3, 96 ICEC'96 IEEE Intl Conf on Evol Comp, Nagoya, Japan (v9n18) May 20-22, 96 GP96 Genetic Programming Conference, Stanford, CA (v9n9) Jul 28-31, 96 PPSN 96 Parallel Problem Solving from Nature, Berlin (v9n17) Sep 22-27, 96 (Send announcements of other activities to GA-List@aic.nrl.navy.mil) ******************************************************************************* ------------------------------ From: wgm@santafe.edu (Bill Macready) Date: Fri, 24 Mar 95 14:19:15 MST Subject: NFL Reply Just when you hoped the NFL discussion was dying down... We fail to understand what animates Mr. Park in his attack on our using uniform P(f) (the distribution over fitness functions) for some of the results in our paper as somehow invalidating the whole body of work. First, as we have stated several times now, many of our results are completely independent of P(f). This is the case, for example, for our benchmarks of a search algorithm's performance, for the information theoretic and geometric calculations we perform, and for our proof that for *any* fixed f, you can not use previous performance of an algorithm on that f to infer anything concerning future performance. As for the NFL theorems themselves, we are again confused by Mr. Park. Did he think that we were somehow claiming that P(f) is uniform in the real world? As we have stated time and again, we use uniform P(f) in the NFL theorems merely as a tool, to make certain points. One of which is that you must make an assumption, implicitly or otherwise, if you want to justify your search algorithm. Whether or not uniform P(f) is dominated by "random" f's is irrelevent as far this conclusion is concerned. In fact, the burden is not on us to "justify" our choice of P(f). Rather the burden is on the user of a search algorithm to justify the (almost always implicit) choice of P(f) (s)he is making. In fact, THAT IS THE WHOLE POINT. (And of course assumptions for such priors are often wrong - we again direct interested readers to the exciting new results in supervised learning where the cautions of NFL-type theorems are being borne out in real-world scenarios.) For the record though, it turns out that one can make P(f) significantly non-uniform, in any of a number of reasonable ways, and the NFL results still hold. We thank Tal Grossman and Raja Das for pointing this out to us. It is unfortunate that this particular discussion has focused on issues of Kolmogorov complexity, which are at best of peripheral concern. We address Mr. Park's concerns below, but this digression merely obscures the important issues concerning our paper. The NFL-overloaded reader has been warned! (Might we also suggest Mr. Park that if you would like to continue this discussion that we do it in private and not bother the ga-list readers) ====================== MR PARKS CONCERNS ============================ We are well aware of Kolmogorov (aka Solomonoff, aka Chaitin, aka algorithmic information) complexity. It has reared its fascinating - but ultimately pointless - head in two other areas where we have done research (supervised learning and the thermodynamics of computation). Our first point though is that even if Kolmogorov complexity *were* relevant in the sense that Mr. Park thinks, his conclusions are still quite questionable. To see this, let's simplify things out of the Kolmogorov context. Say that the fitness function is being implemented in < 5000 lines of C++ code. (I.e., C++ is our "compression" scheme.) Often the fitness function is actually provided by the real world, so this assumption is wrong - see below. In addition, the use of C++ is clearly extremely arbitrary, perhaps crucially so (again, see below). But w/o this assumption or something similar, we can't understand Mr. Park's concerns. I.e., his concerns seem to demand this kind of an assumption. So let's make it, and see where we go. Now for a big search space, the percentage of all fitness functions that can be implemented in < 5000 lines of C++ is infinitesimal. One might infer from this that results based on averaging over all fitness functions can not possibly have relevence to a scenario in which you must use < 5000 line of C++ code. As we understand this is Mr. Parks true concern. Please Mr. Park correct us if we are misinterpreting you. However note that the very reason that people use C++ (or any language) is because it is so flexible and powerful. In a very real sense, for any actual fitness function there exists a <5000 line C++ fitness function program that "approximates" it quite well, especially for searches that are not extremely long. In essence, we can think of no reason to believe that the statistics over the space of all functions won't be an exceedingly good approximation to the statistics over the space of <5000 lines of code functions. To give one simple example brought up before: a pseudo-random number generator can be implemented in a tiny number of lines of code. But obviously how well the standard search algorithms will do on a fitness function generated by such a random number generator will barely differ at all from how they would do on fitness functions generated by true random number generators. (In fact, that's the whole basis for using pseudo-random number generators, that they well-approximate "true" randomness.) We repeat ourselves: does Mr. Park believe that a GA will perform significantly differently on a pseudo-random function than on a truly random one? Unless you somehow build an algorithm that explicitly tries to search for small (in C++) programs that could have generated the fitness function, it seems absurd to us to believe that restricting things to <5000 line fitness functions will result in different average performance than allowing all fitness functions. On to Kolmogorov Complexity, and its (current) irrelevance... In the current context, its pointlessness is a direct consequence of the fact that it is fundamentally predicated upon an infinite system (namely a Turing machine) whereas the systems we are considering are finite. (As an aside, the same problem afflicts arguments that try to address the NFL theorems that are based on Lebesgue measures and the rationals.) To be more precise: What Mr. Park fails to mention in his precis of Kolmogorov complexity is that the innocuous overall "additive constant" in Kolmogorov complexity is in fact the whole ballgame if you're encoding finite sets. Think about it, it's really quite trivial: I have a finite set, and can use any encoding scheme I desire. *Obviously* there is no general (!!) sense in which I can say that one element of the set is any harder or easier to encode than some other element. The encoding is simply a mapping from the elements of the set to a bit pattern. That mapping can be anything; no one element of the set is more or less intrinsically compressible than any other. Mr. Park is simply wrong to think that there is some such intrinsic sense in which certain cost functions are more or less compressible than others. (Note that we are here operating in Mr. Park's context where nothing is specified about the encoding scheme - obviously if there are restrictions on the encoding scheme things change. Just as obviously, it would then be up to Mr. Park to justify those restrictions.) For finite sets, Kolmogorov complexity can not be used to distinguish "random" from "non-random" elements of the set. Change the universal Turing machine reading the tape, and you can get any element you want to be "random" or "incompressible". When Mr. Park makes his conclusion at the end of the following, he is simply wrong. Full stop. (Or to be more precise: all x have the same min_{reference frame} K(x).) Mr. Park writes >>> >It turns out that up to an additive constant which is independent >of the particular object at hand, a nonrandom (compressible) object >relative to one universal TM will remain compressible relative to any >other. Analogously for random objects. That is, K(x) is an inherent >property of x, independent of the reference frame. >>> >>> >Some functions (e.g., constant functions) obviously >have short descriptions ( << |A| log |B|). >>> ALL FUNCTIONS HAVE SHORT DESCRIPTIONS. More precisely, given any pre-fixed function, if I am free to choose the description - as I am in Mr. Park's scenario - I can always choose the description scheme to have that function consist of the shortest possible number of bits. (If Mr. Park restricts himself to Turing machines of fixed number of internal states, or some such, then, again, things change. But i) he would have to justify that, and ii) that ain't Kolmogorov complexity.) In general, for any fixed description scheme, most functions will be "random". But so what? Which functions are random changes completely based on the description scheme. ==================================================================== back to more general issues... Mr. Park writes >>> >This NFL statement depends crucially on the uniform distribution assumption. > Although W & M claim in their commentary (GA-List v9n12) that they have > generalized the result to work for any distribution, > > > It should also be noted that some of these "subsequent > > investigations" hold for any and all distributions P(f), in which case > > the whole issue of whether one should take to heart implications of > > the uniform P(f) case are rendered mute. For example, we show that for > > any f, there is no formal justification for choosing between two > > search algorithms based solely (!) on which has performed best so > > far. (Formally, if one averages over all pairs of algorithms, any > > procedure for choosing between two algorithms based on performance so > > far is just as good as any other, regardless of f.) > > it is trivial to construct counter-examples of probability distributions for > which the statement is false, hence the issue is not "rendered mute." In the > qualifying statement "For example, ..." they go on to describe a result, > which to me, seems to be a different statement from the original NFL statement. > In the absence of details, I will refrain from speculating what is exactly > being proven. >>> The details are not absent, THEY ARE IN THE PAPER. Moreover, there is no counter-example to our claim - we suggest Mr. Park show us one such example. More generally, is he somehow confusing our "For example" with a restatement of the NFL theorems? Obviously we are not claiming NFL holds for any P(f). >>> > 1. Claims of expected behavior being equal across all algorithms (i.e., search > algorithms as W & M formulate them) crucially depends on the uniformity > assumption over F >>> Nope. To give just one counter-example to Mr. Park's statement: The NFL theorem holds for any P(f) of the form P(f) = prod_{x \in X} g(f(x)), for any function g(.) that is non-zero and normalized. Uniform g(.) gives the uniform P(f) case. Non-uniform g(.) does not. It also probably holds (we haven't formally checked yet) for P(f)'s where the y values at different x values are tightly correlated. As an example, take the case where all of the y's at different x's are distinct, then sum over all possible permutations of the y values. >>> > Two, typical fitness functions (i.e., fitness functions which cannot be > compressed below |A| log |B|) will never be encountered in practice, nor > when artificially concocted. Consider a 300-variable (for simplicity, make > each variable binary) fitness function. A "typical" 300-variables fitness > function relative to the uniform distribution needs at least 2^300 bits to be > represented, a super-astronomical number. Provably there is no way to > compress it. This is where we use Kolmogorov complexity. It has nothing to do > with search itself. >>> This is specious. How in the world can Mr. Park presume to speak for Nature, as to what can be encountered in practice? Why in the world should Nature care about Turing machines? In fact, his presumption is wrong. To give one simple example, consider the time series of the stock market. It is very likely that it is governed by a process consisting of far more than 10^23 (Avogadro's number) of degrees of freedom. "Algormithmically simple" it is not. The same presumably applies to the search process of the "original GA", namely natural selection in the full Terran biosphere. More generally (and to repeat ourselves), give me your "typical" 300-variable function. Then I can create an encoding in which I encode that function in 1 bit. Alternatively, give me an "atypical" function that you think can be encoded in 1 bit. I can find a different encoding (in fact, a huge number of encodings) in which it takes 2^300 bits. And to which the appropriate reply is "so what"? Since when do we know that nature does or doesn't create "typical" problems? To return to a previous theme, Mr. Park might phrase his comments with a particular encoding in mind (say the C++ language) with which we can certainly say that some cost functions require long C++ routines while others are short. It's true that most of the functions we sum over will require large C routines. However we can easily imagine other encodings where the functions that were lengthy in C are now short (and vice versa). Consider the exciting new application of optimization to the design of molecules with specific functionality. In this case the encoding is protein chemistry; certainly difficult to code up in C++! >>> >It is very possible that someone >may prove that for some probability distribution that does >concentrate most of its measure on compressible fitness functions >simulated annealing outperforms GAs (or vise versa) on all but such >and such functions with high probability. >>> No. Again, "compressibility" without restrictions on the encoding scheme is a meaningless notion in the current context. You might very well come up with a particular encoding scheme, and then prove your putative result. But there is no way you can prove it w/o specification of the encoding scheme. (You can't even formally state it in that sense.) In fact, in our section on the geometry of search, we analyze the very point that for a pre-fixed encoding scheme - i.e., for a particular P(f) - not all algorithms do the same. *** As a final comment, if Mr. Park wished to use some other measure of randomness - e.g., degrees of correlation - besides Kolmogorov complexity, most of our commentary above wouldn't apply. But of course, we also have replies to arguments based on those kinds of randomness. :-) David and Bill ------------------------------ From: djang@wsmr-emh91.army.mil (Philipp A. Djang) Date: Fri, 31 Mar 1995 08:41:40 -0700 (MST) Subject: reply to question on face generation (Re: v9n18) >From: Peter Hancock