OakGP

Symbolic Regression

This page provides an example of using the OakGP genetic programming Java framework to perform symbolic regression.
For an overview of OakGP please read the getting started with OakGP guide.
Problem Description
Approach
Configuration
   Return Type
   Variable Set
   Constant Set
   Function Set
Java Source Code
Output

Problem Description

The aim of this example is to demonstrate how genetic programming can be used to evolve a program that best fits a given dataset. The process of generating a computer program to fit numerical data is called symbolic regression. In this example the dataset contains inputs/outputs for the expression x2 + x + 1.

This is the same problem as described in "A Field Guide to Genetic Programming" (R. Poli, W. B. Langdon, and N. F. McPhee, with contributions by J. R. Koza, 2008). View Chapter

Approach

There is no need to implement any specialised functions, types or fitness functions for this problem. The function set consists of functions provided by the org.oakgp.function.math.IntegerUtils class. The org.oakgp.rank.fitness.TestDataFitnessFunction class provides a suitable fitness function.

The genetic programming run is configured in SymbolicRegressionExample using a org.oakgp.util.RunBuilder.

Configuration

Return Type

TypeDescription
integerThe output of applying the input value to the arithmetic expression represented by the generated candidate.

Variable Set

IDTypeDescription
v0integerThe input to the generated candidate.

Constant Set

TypeValues
integer0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Function Set

Class:org.oakgp.function.math.Add
Symbol:+
Return Type:integer
Arguments:integer, integer

Class:org.oakgp.function.math.Multiply
Symbol:*
Return Type:integer
Arguments:integer, integer

Class:org.oakgp.function.math.Subtract
Symbol:-
Return Type:integer
Arguments:integer, integer

Java Source Code

org/oakgp/examples/simple/SymbolicRegressionExample.java
package org.oakgp.examples.simple;

import java.util.HashMap;
import java.util.Map;

import org.oakgp.Assignments;
import org.oakgp.Type;
import org.oakgp.function.Function;
import org.oakgp.function.math.IntegerUtils;
import org.oakgp.node.ConstantNode;
import org.oakgp.node.Node;
import org.oakgp.rank.RankedCandidates;
import org.oakgp.rank.fitness.FitnessFunction;
import org.oakgp.rank.fitness.TestDataFitnessFunction;
import org.oakgp.util.RunBuilder;
import org.oakgp.util.Utils;

/** An example of using symbolic regression to evolve a program that best fits a given data set for the function {@code x2 + x + 1}. */
public class SymbolicRegressionExample {
   private static final int TARGET_FITNESS = 0;
   private static final int INITIAL_POPULATION_SIZE = 50;
   private static final int INITIAL_POPULATION_MAX_DEPTH = 4;

   public static void main(String[] args) {
      // the function set will be the addition, subtraction and multiplication arithmetic operators
      Function[] functions = { IntegerUtils.INTEGER_UTILS.getAdd(), IntegerUtils.INTEGER_UTILS.getSubtract(), IntegerUtils.INTEGER_UTILS.getMultiply() };
      // the constant set will contain the integers in the range 0-10 inclusive
      ConstantNode[] constants = Utils.createIntegerConstants(0, 10);
      // the variable set will contain a single variable - representing the integer value input to the function
      Type[] variableTypes = { Type.integerType() };
      // the fitness function will compare candidates against a data set which maps inputs to their expected outputs
      FitnessFunction fitnessFunction = TestDataFitnessFunction.createIntegerTestDataFitnessFunction(createDataSet());

      RankedCandidates ouput = new RunBuilder().setReturnType(Type.integerType()).setConstants(constants).setVariables(variableTypes).setFunctions(functions)
            .setFitnessFunction(fitnessFunction).setInitialPopulationSize(INITIAL_POPULATION_SIZE).setTreeDepth(INITIAL_POPULATION_MAX_DEPTH)
            .setTargetFitness(TARGET_FITNESS).process();
      Node best = ouput.best().getNode();
      System.out.println(best);
   }

   /**
    * Returns the data set used to assess the fitness of candidates.
    * <p>
    * Creates a map of input values in the range [-10,+10] to the corresponding expected output value.
    */

   private static Map<Assignments, Integer> createDataSet() {
      Map<Assignments, Integer> tests = new HashMap<>();
      for (int i = -10; i < 11; i++) {
         Assignments assignments = Assignments.createAssignments(i);
         tests.put(assignments, getExpectedOutput(i));
      }
      return tests;
   }

   private static int getExpectedOutput(int x) {
      return (x * x) + x + 1;
   }
}

Output

Here is the solution generated by the SymbolicRegressionExample:

(+ (+ 1 v0) (* v0 v0))

Success!
Home | Getting Started | Documentation | FAQs