Genetic Algorithms Digest   Wednesday, June 22, 1994   Volume 8 : Issue 21

 - Send submissions to GA-List@AIC.NRL.NAVY.MIL
 - Send administrative requests to GA-List-Request@AIC.NRL.NAVY.MIL
 - anonymous ftp archive: FTP.AIC.NRL.NAVY.MIL (Info in /pub/galist/FTP)

Today's Topics:

	- One offspring or two. (Re: v8n18/19/20)
	- Convergence of Canonical GA (Re: v8n20)
	- Quadruplets better than twins? (Re: v8n18/19/20)
	- Feature Selection using Genetic Algorithms
	- GA in MATLAB
	- GA accelerator hardware
	- CFP: GA's in Telecommunication Systems
	- Papers available (2 messages)

----------------------------------------------------------------------
****************************************************************************

CALENDAR OF GA-RELATED ACTIVITIES: (with GA-List issue reference)

The IEEE Conference on Evolutionary Computation, Orlando(v7n26) Jun 26-30, 94
FOGA94 Foundations of GAs Wkshop, Estes Park, Colorado(v7n26)Jul 30-Aug 3, 94
SAB94 3rd Intl Conf on Sim of Adaptive Behavior, Brighton(v7n11) Aug 8-12, 94
ECAI-94, 11th European Conference on AI, Amsterdam (v7n23)       Aug 8-12, 94
ECAI-94 Wkshp on Applied Genetic & Other Evol Algs, Amsterdam(v8n5) Aug 9, 94
IEEE/Nagoya Univ WW Wkshp on Fuzzy Logic & NNs/GAs, Japan(v7n33) Aug 9-10, 94
ISRAM94 Special Session on Robotics & GAs, Maui, Hawaii (v7n22) Aug 14-17, 94
TECOM AI Conference (v8n19)                                     Sep 13-16, 94
Evolution Artificielle 94, Toulouse, France (v8n10)             Sep 19-23, 94
COMPLEX94 2nd Australian National Conference, Australia (v7n34) Sep 26-28, 94
PPSN-94 Parallel Problem Solving from Nature, Israel (v7n32)     Oct 9-14, 94
GAs in Image Processing and Vision Colloquium, Savoy Place (v8n16) Oct 20, 94
AI'94 Workshop on Evol Comp, Armidale, NSW, Australia (v8n15)      Nov 22, 94
ACM SAC '95 Symposium on Applied Computing, Tennessee (v8n20)   Feb 26-28, 95
EP95 4th Ann Conf on Evolutionary Programming, San Diego,CA(v8n6) Mar 1-4, 95
ICTS95 3rd Int'l Conf. on Telecommunications, Tennessee (v8n21) Mar 16-19, 95
ICANNGA95 Intl Conf on Artificial NNs and GAs, France (v8n10)   Apr 18-21, 95
ECAL95 3rd European Conf on Artificial Life, Granada, Spain(v8n5) Jun 4-6, 95
ASI-AA-95 Practice and Future of Autonomous Agents (v8n19) Sep 23 - Oct 1, 95

(Send announcements of other activities to GA-List@aic.nrl.navy.mil)

****************************************************************************
------------------------------

From: terry@santafe.edu
Date: Wed, 15 Jun 94 03:33:33 MDT
Subject: Re: One offspring or two.

Although Nick Radcliffe's question wasn't specifically about diversity,
that was the difference that people seemed to think would be affected.

It might seem like a trivial point to some, but you can't answer that
question until you say what diversity is. There are many ways to do
that. Some will tell you one form of crossover produces more diversity
than the other, while other measures will tell you they produce the
same amount.

If we have a population of size N (where N is even), binary strings of
length L, and two methods of producing a new population:

   M1   1-point crossover, 2 parents (chosen randomly, with replacement)
        produce 2 children and one of these is chosen (with equal
        probability) to enter the next population. This is done N times to
        fill the next population.

   M2   one-point crossover, 2 parents (chosen randomly, with replacement)
        produce 2 (complementary) children which both go into the next
        population. This is done N/2 times to fill the next population.

Suppose we have a diversity measure d() : P -> Reals. The function d
gives us a measure of diversity. If we take a population P and produce
P1 from P using M1 and P2 from P using M2, we are interested in the
relationship between d(P1) and d(P2).

There are diversity measures that would consider these methods equal
in terms of their diversity production, e.g. d(X) = 0 for all
X. Another measure, d(X) = the number of strings in X whose complement
is not also in X, would usually result in d(P1) > d(P2).

Both of these are trivial, but they at least illustrate that how you
define diversity can change the answer to the question.

There are many other more realistic diversity measures. For instance,

    Mean, minimum or maximum Hamming distance.
    Range or distribution of fitnesses in the population.
    Some measure of allele distribution at each locus.
    Number or distribution of schema in the population.
    How much you can infer about the parent population's individuals.

This is just the beginning; there are at least 30 measures of
complexity that could arguably be used to measure diversity.

As a concrete example, the following is an additive measure based on
the variation found at each locus:

          L
          __
d(P)    = \   min(Fi, 1 - Fi)
          /
          --
          i=1

where
           N
           __
      Fi = \  Pi(j)
           /
           --
           j=1
and
       Pi(j) = 1 if the jth member of P has a 1 at locus i.
               0 otherwise

[This is all merely a long way of saying that we compute a value for
each locus, and add them to get the overall diversity. The value
computed for each locus will be 0.5 if the locus is one-half 0's and
one-half 1's, and falls to 0.0 when the locus is fully converged. Thus

                            0 <= d(P) <= L/2

and a higher d(P) value is taken as indicating greater diversity.

For this (perfectly reasonable) measure of diversity, I claim without
proof that, E(d(P1)) = E(d(P2)). That is, the expected value of d()
for the resultant populations is the same. I have most of a proof of
this (Mathematica has the rest) if anyone is interested.

If we divide the interesting (whatever that is) diversity measures
into 3 classes:

    0 = the measure indicates the methods produce equal diversity.
    1 = the measure indicates M1 produces more diversity.
    2 = the measure indicates M2 produces more diversity.

and use #n to denote the number of measures in class n, then
my feelings are that: #0 > #2 and #1 > #2.

This seems to run counter to the responses to Nick's posting. I'm not
sure why people think that M2 would generate more diversity (but I'm
also not sure what they mean by "diversity").

Terry Jones.

[ WMS: To be provocative, let me ask whether an expected value analysis
  is sufficient. Isn't the full distribution of importance? ]

------------------------------

From: Guenter Rudolph <rudolph@lucky.informatik.uni-dortmund.de>
Date: Sat, 18 Jun 94 19:16:52 +0200
Subject: Re: Convergence of Canonical GA

Aaron Konstam cited my paper
"Convergence Analysis of Canonical Genetic Algorithms",
IEEE Trans on Neural Networks 5(1), Jan 1994 pp. 96-101
and asked for explanations. Here they come:

The 'Canonical Genetic Algorithm' (CGA) is the standard
GA using bit strings, mutation probabilities > 0,
arbitrary crossover operators and proportionate reproduction
(scaling is not considered, but it does not affect the results).

At first, one has to define what is meant by convergence:
Let Z_t denote the best fitness value in the population
at generation t. Then the CGA is said to converge to
the global optimum if Prob(Z_t = f*) -> 1 as t->infinity,
where f* is the value of the global optimum.

Clearly, Z_t is a random variable. There are many different
modes of stochastic convergence of random variables (E. Lukacs,
Stochastic Convergence, 2nd ed., Academic Press: New York 1975,
offers about a dozen of different concepts).
The mode of convergence used above is termed 'convergence
in probability'. This kind of convergence is weak enough
to be a minimum requirement for stochastic algorithms.

What I have shown via Markov chain analysis is, that
Prob(Z_t = f*) < 1 as t-> inf. for the CGA unless the
fitness function is constant.

Therefore, the CGA is not globally convergent.

Now consider another sequence of random variables:
Let R_t = max{ Z_i : i = 0, 1, ..., t } be the
'record sequence' of sequence Z_0, Z_1, ...
In other words, R_t is the fitness value of the best
solution found until generation t.

Via Markov chains I have shown that Prob(R_t = f*) -> 1
as t-> inf. for the CGA. That means, that the CGA
will generate the optimal solution at some time.
Why does this happen ? Since the mutation probability
is larger than zero, any point x of the search space
can be generated via mutation in one step for an
arbitrary individual with a probability bounded from
zero. This probability is very small, but larger than
zero.

Moreover, the expected time for the occurence of this
event is finite (the event occurs with probability one),
but the time to wait until this event occurs may be
transcomputational long!

The result for the record sequence R_t is also contained
in the paper of Joe Suzuki in the 5th ICGA proceedings (1993).
He used a Markov model (obviously independently) introduced by
M. Vose and T.E. Davis, but the result is the same.

But why Z_t does not converge ? When using proport. reproduction
an optimal solution at time t is lost with probability one
at a later moment > t, even if there are n optimal individuals
(n = popsize). But mutation guarantees that the optimal solution will
be found again. Then it will be lost, found, lost, ...
In short: The sequence Z_t oscillates and cannot converge.

Conclusion:
 o One can use a CGA for static optimization if one keeps track
   of the best solution.
 o We must be careful with the usage of terms like 'convergence':
   A population does NOT converge in a mathematical sense!
   Also, 'premature convergence' of GAs does not express 'convergence'
   in a mathematical sense: here, the GA is attracted by a locally
   optimal solution with a wide basin and it may take a long, long
   time until this basin is left (but it will happen!).

Hope this helps!

Guenter Rudolph
Informatik Centrum Dortmund e.V.
Joseph-von-Fraunhofer-Str. 20
D-44227 Dortmund (Germany)

------------------------------

From: Gary Roberts  <gar@tdat.ElSegundoCA.NCR.COM>
Date: Wed, 15 Jun 94 16:00:10 PDT
Subject: Quadruplets better than twins?

What do we mean by "children"?  I think a look back at mammalian biology
might be of interest.  Please realize that I am not a biologist, and, no
doubt, have an oversimplified view.  When I use "recombination" here,
I mean this in the genetic algorithm sense.

"Recombination" is not analogous to the union of egg and sperm.  The
production of germ cells (egg or sperm) is where the analogue for
"recombination" might be derived.  In the biological process, one cell
with diploid chromosomes produces four haploid germ cells.  No genes
are lost in this process.  Of course, it is unlikely that any of the
germ cells will actually reproduce, let alone all four.  Still, this
seems to follow the multiple children approach, at least so far as germ
cell fitness is concerned.  I guess what we have here is partial genetic
information from two pairs of grandparents, recombined into four germ
cells.  The "fitness" function is nontrivial.
						*

------------------------------

From: oyman@ds5500.cc.boun.edu.tr (A.Irfan Oyman-BogaziciUni-CmpEngDept)
Date: Wed, 15 Jun 94 10:09:26 -0400
Subject: Feature Selection using Genetic Algorithms

Dear researchers,
I am a research assistant in Bogazici (Bosphorus) University, Turkey.
My research area is feature selection using Genetic Algorithms.
This is also the topic of my Master of Science Thesis.

Can you advise me any papers in the area of
  'feature selection' ?
(thank you in advance, irfan)

------------------------------

From: Sourav Kundu <sourav@control.prec.metro-u.ac.jp>
Date: Thu, 16 Jun 1994 13:17:58 +0900 (JST)
Subject: GA in MATLAB

Hello GA-list Subscribers,

I would like to know if anybody has used or written some GA engine in
MATLAB ? Is there any public domain MATLAB code for GA available ? I
would like to link a SIMULINK model with a GA engine. I know I could
do it via a link to C with MEX files but I would prefer to do straight
with MATLAB.

If any code is available for use, I could save the time to write one.
Any pointers would be appreciated.

Rgds to all,

-kundu

*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
SOURAV KUNDU                    ~                               Internet:
Control Engg. Laboratory        ~       sourav@control.prec.metro-u.ac.jp
Department of Precision Engg.   ~      sourav@yamabuki.comp.metro-u.ac.jp
Faculty of Technology,          ~           sourav@archsci.arch.su.EDU.AU
Tokyo Metropolitan University   ~     Voice : +81(426)-77-1111  EXTN.4251
1-1 Minami-Ohsawa, Hachioji-shi ~                Fax   : +81(426)-77-2717
Tokyo 192-03, JAPAN             ~                    Time  : GMT +9 Hours
          "In nature, there are neither rewards nor punishments --
                           there are consequences."

------------------------------

From: Arie Zychlinski <ARIEZ@OPTIBASE.MHS.CompuServe.COM>
Date: 16 Jun 94 08:43:38 EDT
Subject: GA accelerator hardware

I would like to know  your opinion on the need for special hardware
accelerator to speed optimization calculations using GA's. Since I am new
in GA, it might be that there are already such available. I would be much
obliged to know on them.

			thank you !!
			ARIEZ@optibase.mhs.compuserve.com
			OPTIBASE,     ISRAEL

------------------------------

From: "Charles C. Palmer          Tie 863-7411   <cpalmer@watson.ibm.com>
Date: Fri, 17 Jun 94 14:28:38 -0500
Subject: CFP: GA's in Telecommunication Systems

I'm putting together a session in this conference on
GA's in Telecommunications.  The conference is kept small
so as to encourage interaction between the participants
and to maintain the quality of the presentations. The
previous conferences have been superb and well worth the
attendee's while.  The conference itself is VERY well run
and has been quite interesting in the past.

The talks will be 20 minutes each, and given to a audience of
design and optimization folks. If you are interested, please
send your paper (preferable), or titles and abstracts for
potential presentations to be considered for the conference
to via email (for speed) either to me at cpalmer@watson.ibm.com
of directly to the conference chair, Bezalel Gavish (address below),
by August 1, 1994.

Sending more than one abstract is encouraged!

Charles C. Palmer ( cpalmer@swamp.watson.ibm.com )
Network Design Tools               Tie 863-7411      914/784-7411
IBM T. J. Watson Research Center   P.O.Box 704       Yorktown Heights, NY  10598

Bezalel Gavish
Owen Graduate School of Management
Vanderbilt University
Nashville, TN, 37203
Bitnet: GAVISHB@VUCTRVAX
Tel: (615) 322-3659
FAX: (615) 343-7177

[ WMS: The complete Call for Papers is on our FTP site under
  pub/galist/info/conferences/ICTS95. ]

------------------------------

From: Kazuhiro M. Saito <kazu@MIT.EDU>
Date: Mon, 20 Jun 94 18:29:15 EDT
Subject: A paper available

Hi.

The following paper is available via anonymous ftp:

	Meshing of Engineering Domains by Meitotic Cell Division

		Kazuhiro Saitou		Mark J. Jakiela
		    Computer-Aided Design Laboratory
	          Massachusetts Institute of Technology

   To appear in Proceedings of ALife IV, July 6-8 1994, MIT, Cambridge, MA
				
				Abstract
	A new technique for triangular meshing of a given two-dimensional
	domain is developed, where meshing is done by a process of simulated
	meitotic cell division.  Dead cells become the elements of the
	discretized domain, and the domain is filled by a process of cell
	growth.  A classifier system evolves the rules for developmental cell
	growth in the domain.  A given domain is meshed repeatedly using the
	growth rules, and the quality of each mesh is checked with a fitness
	function.  The preliminary results shows that an evolved set of
	classifiers for a particular test domain could also be effectively
	used for geometrically and topologically similar domains.

To obtain an electronic copy of this paper (6 pages):

	ftp ftp.cognet.ucla.edu
        Name: anonymous
        Password: <your email address>
        cd /pub/alife/public
        binary
        get kazu.ps.gz
        quit

Then at your system:
	gunzip kazu.ps.gz
        lpr -P<printername> kazu.ps

If you cannot obtain an electronic copy, please send a request for a
hard copy to kazu@mit.edu

Kazuhiro Saitou
Graduate Student Research Assistant
Computer-Aided Design Laboratory
Massachusetts Institute of Technology
kazu@mit.edu

------------------------------

From: "Laurence D. Merkle" <lmerkle@afit.af.mil>
Date: Fri, 17 Jun 1994 08:22:21 -0400
Subject: Availability of Paper

Application of the Parallel Fast Messy Genetic Algorithm
             to the Protein Folding Problem

 Laurence D. Merkle, George H. Gates, Jr., Gary B. Lamont
    Department of Electrical and Computer Engineering
            Air Force Institute of Technology
             Graduate School of Engineering
             Wright-Patterson AFB, OH  45433
          {lmerkle, ggates, lamont}@afit.af.mil

                      Ruth Pachter
                    Wright Laboratory
             Wright-Patterson AFB, OH  45433
                 pachterr@ml.wpafb.af.mil

The capability to predict a polypeptide's tertiary structure given its
amino acid sequence is important to numerous scientific, medical, and
engineering applications, including understanding the mechanism by which
specific enzymes function, and/or designing new polypeptides with specific
biochemical, mechanical, and/or optical properties.  A common theme in
many approaches to the protein folding problem is the minimization of an
energy function in conformational space.  We discuss a parallel genetic
algorithm development effort which focuses on the global minimization of
a semi-empirical energy model.

In particular, a parallel fast messy GA (PFMGA) is designed using the
island model and implemented on the Intel Paragon.  Each processor
independently performs probabilistically complete initialization and the
primordial phase.  The subpopulations are then recombined in a single
population, and the juxtapositional phase is carried out serially on a
single processor.  We discuss the relationship between expected execution
time and processor count for this design.

Each individual in the PFMGA population, once overlaid on the competitive
template, is a binary string representation of a subset of the internal
coordinates required to fully specify a conformation.  The values of
additional coordinates sufficient to fully specify the molecule's
conformation, as well as empirical physical constants used by the energy
model, are specified during initialization.  Thus, the fitness of an
individual is determined by overlaying it on the competitive template,
`decoding' each of the independently variable coordinates, calculating
the interatomic distances, and then calculating the energy associated with
the conformation.

We describe experiments designed to assess the scalability of the PFMGA
design, in particular for the application to the energy minimization of
[Met]-Enkephalin.  The PFMGA obtains sub-linear speedup through 32
processors, at which point execution time begins to increase.  The
sublinear nature of the speedup is due primarily to the sequential nature
of the juxtapositional phase, and secondarily to the increased
communication required for recombination of the population between the
primordial and juxtapositional phases.  The energies obtained by the
PFMGA are of the same order of magnitude as the lowest known for the
model.  Also, there is no statistically significant difference observed
between the energies obtained for any processor counts.

Capt Laurence D. Merkle (Larry)
PhD Student
Air Force Institute of Technology
Graduate School of Engineering

------------------------------

End of Genetic Algorithms Digest
******************************