In this document we will walk through the construction of the regression example, available in examples/reg_exp.clj. Some familiarity with Clojure (or any other Lisp) is assumed. You should also read the Overview article if you have not done so yet.
Create a .clj file with an appropriate name, like reg_exp.clj. Alternatively, open examples/reg_exp.clj as included in the ClojureGP package and follow along there.
It’s good practice to keep an experiment in its own namespace, but this is not a technical requirement.
(ns reg-exp
"Example 02: evolving a solution to a simple regression problem."
(:require [cljgp.core :as core]
[cljgp.config :as config :refer [prim]]
[cljgp.random :as random]
[cljgp.util :as util]
[cljgp.tools.logging :as log]))
We will be using a few functions provided by ClojureGP from the listed
namespaces. Besides the core function that will generate our GP run
(core/generate-run), we use config/make-func-template and prim to help
construct our experiment configuration. As our evaluation will use random
values, we need random/gp-rand. Then there’s util/tree-depth, which we will
use in a tree validator function that will limit the size of the evolved
expression trees. Lastly, two logging functions that will generate some useful
output: log/print-stats and log/reduce-to-summary.
We will need to define a evaluation function that compares the results of an evolved function to the target results. In context of the regression problem, this involves computing the expected result of the target equation given certain values of the variables involved. This target result then needs to be compared to the result of applying the evolved function with the same values.
For this example we will use a simple target equation:
z = x^2 * y^2 + x^4 + x * y
The following function evaluates a given evolved function on a single random x,
y instance.
(defn test-reg-once
"Evaluate the given function on a single result, returning the abs error."
[func]
(let [x (gp-rand)
y (gp-rand)
target (+
(+ (* (* x x) (* y y)) (* (* x x) (* x x)))
(* x y)) ; z = x^2 * y^2 + x^4 + xy
result (func x y)]
(Math/abs (float (- target result)))))
First we use random/gp-rand to generate our random x and y. It is important to
use random/gp-rand, as it will be bound at runtime to a function that uses the
correct, thread-local, seeded RNG. This allows us to reproduce results and avoid
bottlenecking multi-threaded performance on a single synchronised RNG.
Then we compute the target value given those x and y, and the result the
evolved function produces for the same values. Because the given evolved func
is a proper Clojure function, we can simply apply it with x and y as its
arguments. We will set up our experiment configuration for this later on in this
tutorial.
Lastly, the absolute error is returned, as a convenient measure of the quality of the result.
Evaluating on a single random instance is a bit unreliable, so we will instead sum the result of a number of such tests, for our final evaluation function:
(defn evaluate-reg
"Evaluate the given function on multiple results and return a fitness."
[func]
(reduce + (take 10 (repeatedly #(test-reg-once func)))))
We simply repeatedly apply test-reg-once to the evolved function and sum the
errors. If all results are identical to the target, the total error will be 0,
which equates to a perfect fitness. Larger errors mean higher and therefore
worse fitness.
At this point we would normally consider what types of values are playing a role
in the experiment, and use Clojure’s ad hoc hierarchies with derive to create
an appropriate type hierarchy.
For this experiment however, we are evolving an equation, which only involves a bunch of numbers. All function and terminal nodes will be of the same type, which makes the experiment as a whole essentially untyped.
With some creativity we can still take a look at deriving type hierarchies:
(derive ::number ::any)
(derive ::string ::any)
(derive Number ::number)
We pick ::any as our root type here. This is arbitrary, as Clojure hierarchies
are very free-form, and let you use any keyword you want. We derive the
::number keyword from ::any. This means we can define a function in our
experiment configuration’s function set that requires an argument of type
::any, meaning the GP process can use any other node that has either the type
::any, or the type ::number, because (isa? ::number ::any) is true.
Then we also derive ::string from ::any, meaning (isa? ::string ::any)
will be true, and for the hypothetical function mentioned just now any node of
type ::any, ::number or ::string could be used.
Lastly we derive Number, that is the class java.lang.Number, from
::any. Java classes can (only) form the leaves in a Clojure hierarchy, and can
be used as types in ClojureGP experiments. Any Java subclasses will behave
properly, i.e. (isa? Float Number) is true, which is something you can also
employ in defining your types. In practice however, it’s often nice to just
define a custom keyword hierarchy, as it is much easier to reorganise.
In this experiment, we will just define every type as being Number.
The experiment configuration is a hashmap containing all the information the GP process needs to run. From the function and terminal sets it should use, to the evaluation function to use, to how many threads it should run in.
Starting off with the function and terminal sets:
(def config-reg
{
;; Some mathematical operators
:function-set [(prim `- {:gp-type Number
:gp-arg-types [Number Number]})
(prim `+ {:gp-type Number
:gp-arg-types [Number Number]})
(prim `* {:gp-type Number
:gp-arg-types [Number Number]})]
;; The two variables in the equation
:terminal-set [(prim 'x {:gp-type Number})
(prim 'y {:gp-type Number})]
;; More to come...
...
})
We define two keys in the map: :function-set and :terminal-set. As the Configuration Reference will tell you, both take a sequence of symbols. These symbols will be available for use by the GP process in generating and breeding expression trees. Besides the symbols themselves, we need to specify their type information. For both functions and terminals, this includes what type of value they return. For functions we also need to define a sequence of types that defines what type each of their arguments should be.
This type information is stored in the symbol’s metadata. ClojureGP provides the
helper primitive, which takes a symbol and a type info map, performs some
error checking, and stores the map as the symbol’s metadata. In the above
example the abbreviation prim is used, because the type definitions can get
crowded.
Let’s take a closer look at the + function symbol definition. The types are
straightforward: applying + will result in a number, so :gp-type should be
Number. It takes two arguments, the first of which should be a Number, and
the second of which should be a Number. Hence :gp-arg-types is the vector
[Number Number]. This data is stored in a map, which is attached to this +
symbol as metadata.
Besides the type information, the symbol itself has an important property. We
will be using Clojure’s built-in +, that is, clojure.core/+. Because we
don’t want to make any assumptions about the context in which the tree will be
evaluated (where '+ might evaluate to my.crazy/+, or more realistically
where a custom function will simply not be defined), we prefix it with a
syntax-quote, or backquote. This tells Clojure’s reader to resolve the symbol to
its namespace right here. For `+, this results in clojure.core/+, which is
exactly what we want.
You should resolve all symbols that will not be passed in as arguments by
backquoting them. This simple guideline will prevent all issues with symbols
resolving to the wrong thing. Or more likely, not resolving to anything: if we
define a function foo in our experiment namespace, and add it to our function
set as 'foo, eval will not be able to resolve the symbol. As far as it knows
foo does not exist. Using `foo, will resolve it to `my.ns/foo, which
tells eval where to look and will prevent the issue.
The other two elements in the function set are similar, so we will move on to the terminal set.
Much is the same there. We use prim to get a symbol with the right type data
attached. Both terminals, x and y, are of type Number. They are terminal
nodes of the tree, also known as “leaves” or a number of other terms. They will
not have arguments (child nodes in the tree), and we do not need to define types
for those.
We will pass x and y to our evolved functions as arguments when we evaluate
them, as we can see when we look back to our evaluation function. As per the
guideline given earlier, we should not resolve the symbols here, but leave them
as “clean” symbols, with a normal quote: 'x and 'y. In a moment we will
define our evolved functions’ argument list to include these two symbols.
With the function and terminal sets defined, most of our work is done. We only need to define a few more things, and can let the rest fall back to defaults.
To start off with something easy: the :root-type
key should be set to the type that the root of the tree should satisfy. In this
case, that is simply Number.
As mentioned earlier, we need to specify that we want the argument list of
evolved functions to be [x y]. We can do this using the function template
functionality, by setting the :func-template-fn
key with some help from cljgp.config/make-func-template.
Next up, we specify the evaluate-reg function we defined earlier as the value
of :evaluation-fn, so that it will be
used to generate fitness values for all individuals during the evaluation phase.
(def config-reg
{
... ; snip
;; Evolved functions must return a number
:root-type Number
;; Basic template for a fn with our arguments
:func-template-fn (make-func-template '[x y])
:evaluation-fn evaluate-reg
... ; snip
})
Fairly straightforward. Take a look at the documentation for these keys linked above if you’re interested in the details.
The last few bits we will discuss here concern tree validation, the size of the population, the number of threads and the RNG seeds.
First up is :validate-tree-fn. The
function specified for that key will be used to check every tree that is
generated. If it returns false, the tree is scrapped and a new one will be
generated. This can be useful to prevent excessively large trees, and that is
what we will use it for here. We simply define an anonymous function on the
spot: #(< (util/tree-depth %) 10). In other words: the depth of the given tree
must be smaller than 10 levels. The util/tree-depth function comes from cljgp.util,
which also has a util/tree-size function for example.
Next is the :population-size. This value determines the number of individuals in the population, as you would expect.
The value for :threads determines the number of threads used by ClojureGP to split up the work over multiple cores. Check the linked documentation for more detail.
Lastly, :rand-seeds should be a seq
containing the seeds that will be used to initialise the thread-local RNGs.
Hence, there should be at least as many seeds as there are threads. In
cljgp.config a function is provided that creates a lazy seq of seeds generated
from the time, but here we will simply set two arbitrary integers.
(def config-reg
{
... ; snip
;; Keep tree size sane
:validate-tree-fn #(< (tree-depth %) 10)
:population-size 128
:threads 2
:rand-seeds [9234 5327]
})
This hard work results in the following configuration:
(def config-reg
{
;; Some mathematical operators
:function-set [(prim `- {:gp-type Number
:gp-arg-types [Number Number]})
(prim `+ {:gp-type Number
:gp-arg-types [Number Number]})
(prim `* {:gp-type Number
:gp-arg-types [Number Number]})]
;; The two variables in the equation
:terminal-set [(prim 'x {:gp-type Number})
(prim 'y {:gp-type Number})]
;; Evolved functions must return a number
:root-type Number
;; Basic template for a fn with our arguments
:func-template-fn (make-func-template '[x y])
:evaluation-fn evaluate-reg
;; Keep tree size sane
:validate-tree-fn #(< (tree-depth %) 10)
:population-size 128
:threads 2
:rand-seeds [9234 5327]
})
Many settings were left on their defaults, such as the functions that will be
used to breed new individuals. When you generate a run from this configuration
using core/generate-run, it will tell you about the keys that fell back to
defaults, as we will see in a moment.
To finish off this tutorial we will define a run function so we can
conveniently run the experiment from the REPL when we want to, and we’ll take a
look at the output it generates.
If you have read the Overview, you may recall that performing a
GP run in ClojureGP entails constructing a lazy seq of successive generations
using core/generate-run, and then consuming it until the last generation is
reached, which is the one for which the end condition has been
reached.1
For this tutorial we are not really interested in advanced usage, so we will use some of the functions provided by ClojureGP to handle this for us and give some interesting information on the way:
(defn run
"Run experiment and print summary when done."
([]
(run :basic))
([print-type]
(log/reduce-to-summary
(map #(log/print-stats print-type %)
(core/generate-run config-reg)))))
Here log/reduce-to-summary consumes the sequence, tracking some statistics
about the run as a whole, and printing the best individual at the end along
with the stats. We also map log/print-stats over the sequence of generations,
so that we get fitness statistics of each generation immediately after it has
been generated. We leave out stuff like logging to a file for this example.
At this point we are done defining things, and we can try running the experiment.
Personally, I find it convenient to simply run experiments in a REPL. It is
usually possible to use clojure.core/load to quickly and easily reload the
experiment after fixing some mistakes (or you could do it directly in the REPL,
just don’t forget to apply the changes in the file as well if they work out). As
an example of performing a run, we will look at running the reg-exp example
experiment, which is virtually identical to the experiment we defined in this
tutorial.
From the repository root, we run:
clojure -M -i examples/reg_exp.clj --repl
This should give us a REPL prompt with the reg_exp.clj file already loaded. We can then simply run the experiment as follows:
user=> (reg-exp/run)
Run configuration preprocessing report:
NOTE: The following keys were missing, using default values:
(:rand-fn-maker :pop-generation-fn :end-condition-fn :breeding-retries :selection-fn :breeders)
Gen 000: Best: 1.39 -- Worst: 58.17 -- Avg: 8.69
Gen 001: Best: 0.92 -- Worst: 15.10 -- Avg: 3.52
Gen 002: Best: 0.86 -- Worst: 14.50 -- Avg: 3.13
Gen 003: Best: 0.96 -- Worst: 6.99 -- Avg: 2.80
Gen 004: Best: 0.65 -- Worst: 5.92 -- Avg: 2.57
Gen 005: Best: 0.77 -- Worst: 13.61 -- Avg: 2.89
Gen 006: Best: 0.78 -- Worst: 10.03 -- Avg: 2.92
Gen 007: Best: 0.57 -- Worst: 9.57 -- Avg: 2.64
Gen 008: Best: 0.49 -- Worst: 8.34 -- Avg: 2.68
Gen 009: Best: 0.27 -- Worst: 16.35 -- Avg: 2.55
Gen 010: Best: 0.45 -- Worst: 8.90 -- Avg: 2.43
Gen 011: Best: 0.19 -- Worst: 16.71 -- Avg: 2.55
Gen 012: Best: 0.22 -- Worst: 15.57 -- Avg: 2.38
Gen 013: Best: 0.27 -- Worst: 8.46 -- Avg: 2.19
Gen 014: Best: 0.28 -- Worst: 7.33 -- Avg: 1.69
Gen 015: Best: 0.18 -- Worst: 13.83 -- Avg: 1.99
Gen 016: Best: 0.00 -- Worst: 9.96 -- Avg: 1.59
#################
Run complete
#################
Best individual of entire run:
Data of individual:
Generation: 16
Fitness: 3.1918912E-16
Function:
(fn [x y] (* x (+ (+ (* x (* x x)) (* y (* y x))) y)))
Run statistics:
Total time: 1954.50 msecs
Inds. evaluated: 2176
#################
nil
user=>
With our basic reporting setup, we only get some fitness data for each generation. At the end of the run, the summary tells us a few things about the best individual, and reports the time it took.
In this tutorial we have taken all the basic steps that one needs to take in order to run a ClojureGP experiment, discussing a lot of different points of interest along the way.
A next step in learning about ClojureGP would be to look at the other included examples, looking up any configuration keys that are unfamiliar in the Reference. Of course, the rest of the documentation should also be of interest, if you have not seen it yet. Lastly, you may be tempted to take a look at the ClojureGP source to gain a deeper understanding. All functions are documented in a (hopefully) useful manner.
We left the :end-condition-fn at the default, which at the time of writing is max. 100 generations or an individual with a fitness extremely close to 0. ↩