OakGP

Introduction to Genetic Programming

This page provides a general high-level introduction to genetic programming.
For an introduction on how to specifically use OakGP to perform genetic programming please read the getting started with OakGP guide.
Aim
Structure
Representation
Initialisation
Iteration
Genetic Operators
Termination
Example
   Aim
   Fitness Function
   Termination Condition
   Primitive Set
   Step 1: Create Initial Population
   Step 2: Rank Initial Population
   Step 3: Evolve New Generation
   Step 4: Rank New Generation
   Step 5: Terminate
Challenges
Alternative Approaches
Successes

Aim

The aim of genetic programming is to automatically generate computer programs.

Structure

A popular approach to genetic programming is to represent programs using tree structures which are evaluated in a recursive manner. The tree structures consist of:

The combined set of allowed functions and terminals is called the primitive set.

The image below contains a tree-structure representing the arithmetic expression: (3 * x) + (y - 1). The structure has seven nodes which consists of three functions (+, * and -), two constants (3 and 1) and two variables (x and y) When x is assigned to 7 and y is assigned to 5 then the result of evaluating this expression will be 25. When x is assigned to 2 and y is assigned to 9 then the result of evaluating this expression will be 14.

Example of representing a program as a tree structure
Example of representing a program as a tree structure.

Representation

It is common for tree structures to be represented textual using s-expressions and prefix notation. Using this approach a function call is represented as a sequence of space separated elements enclosed in brackets. The first element represents the operator (i.e. function) and the subsequent elements represent the operands (i.e. arguments). For example, the expression: (3 * x) + (y - 1) can be represented as: (+ (* 3 x) (- y - 1)).

The use of s-expressions and prefix notation has the benefits of being:

Initialisation

The genetic programming process starts with an initial population. The initial population is a collection of of programs, with each individual program representing a possible solution to the problem being researched. It is common for the initial population to be generated randomly.

Iteration

Genetic programming uses an iterative process with the input of each iteration being the previous collection of programs (which for the first iteration will be the initial population) and the output of each iteration being a new collection of programs. The collection of programs output for a particular iteration is called a generation. Each iteration uses two components in order to output a new generation:

The fitness function is used by genetic programming to mimic the evolutionary mechanism of natural selection. The fitness function is applied to each candidate of the existing generation. The better the fitness (i.e. suitability) of a candidate the more likely it is to be selected for use by the genetic operators in order to create the new generation. This means the best candidates of the existing generation have the best chance of contributing to the contents of the new generation.

Genetic Operators

The techniques used to evolve new programs include two main sub-categories:

Example of how subtree crossover can be used to combine two tree structures to create a new tree structure

Example of how subtree crossover can be used to combine two tree structures to create a new tree structure.

Termination

The iterative process of producing new generations continues until a termination condition is met. Possible termination conditions include:

Example

Aim

Below is a table containing mappings from a single input to a single output. The aim of this simple example is to demonstrate how genetic programming can be used to produce an arithmetic function that performs this mapping.

InputOutput
1029
42125
38

Fitness Function

The fitness value for a candidate will be the sum of the differences between the expected outputs (listed in the table above) and the actual outputs when the candidate is evaluated which each of the three input values. The lower the fitness value the better the candidate.

Termination Condition

The process will continue until a candidate is found that has a fitness value of zero.

Primitive Set

The primitive set used by our example genetic programming process will consist of:

  • A function set containing the addition (+), subtraction (-) and multiplication (*) arithmetic operators.
  • A variable set containing a single variable named x.
  • A constant set containing the integer values 0, 1, 2, 3, 4 and 5.

Step 1: Create Initial Population

The initial population consists of the following 2 candidates that were randomly generated:

  • (+ (+ 3 2) (* 1 1))
  • (+ (- x x) (+ x x))

Step 2: Rank Initial Population

Candidate 1: (+ (+ 3 2) (* 1 1))
InputExpectedActualDifference
1029623
421256119
3862
Fitness = 23 + 119 + 2 = 144
Candidate 2: (+ (- x x) (+ x x))
InputExpectedActualDifference
1029209
421258441
3862
Fitness = 9 + 41 + 2 = 52
RankCandidateFitness
1st(+ (- x x) (+ x x))52
2nd(+ (+ 3 2) (* 1 1))144

Step 3: Evolve New Generation

We can now apply genetic operators to candidates from the initial generation in order to generate a new generation:

  • (+ (- x x) (+ x x)) mutates to (+ (* x x) (+ x x))
  • (+ (- x x) (+ x x)) and (+ (+ 3 2) (* 1 1)) crossover to (+ (- x (* 1 1)) (+ x x))

Step 4: Rank New Generation

Candidate 1: (+ (* x x) (+ x x))
InputExpectedActualDifference
102912091
4212518481723
38157
Fitness = 91 + 1723 + 7 = 1821
Candidate 2: (+ (- x (* 1 1)) (+ x x))
InputExpectedActualDifference
1029290
421251250
3880
Fitness = 0 + 0 + 0 = 0
RankCandidateFitness
1st(+ (- x (* 1 1)) (+ x x))0
2nd(+ (* x x) (+ x x))1821

Step 5: Terminate

A candidate with a fitness value of zero has now been found. This means that the process had come to a successful conclusion. If a candidate had not yet been found that satisfied the termination condition then steps 3 and 4 would of been repeated.

Challenges

Alternative Approaches

Alternatives to a tree-based approach to genetic programming include:

Successes

Genetic programming has been applied to a wide range of problem domains including financial trading, industrial process control and biomedical data mining. See The Genetic Programming Bibliography for a bibliography of papers published on Genetic Programming.

Home | Getting Started | Documentation | FAQs