Grammatical evolution (GE) is an evolutionary search algorithm, similar to genetic programming (GP). It is typically used to generate programs with syntax defined through a grammar. http://www.grammatical-evolution.org/ by GE’s inventer, Michael O’Neill, is a good resource for a formal introduction to this technique
This document serves as a quick and informal tutorial on GE, with examples implemented using gramEvol package in R.
The goal of using GE is to automatically generate a program that minimises a cost function:
Notice that by a program, we refer to any sequence of instructions that perform a specific task. This ranges from a single expression (e.g., sin(x)
), to several statements with function declarations, assignments, and control flow.
The rest of this section will describe each component in more details.
A grammar is a set of rules that describe the syntax of sentences and expressions in a language. While grammars were originally invented for studying natural languages, they are extensively used in computer science for describing programming languages.
GE uses a context-free grammar to describe the syntax of programs.
A grammar in which the rules are not sensitive to the sentence’s context is called a context-free grammar (CFG), and is defined using a collection of terminal symbols, non-terminal symbols, production rules, and a start symbol:
<subject>
, a <verb>,
or an <object>
.<sentence> ::= <subject> <verb> <object>. | <subject> <verb>. (1.a), (1.b)
<subject> ::= I | You | They (2.a), (2.b), (2.c)
<verb> ::= read | write | check (3.a), (3.b), (3.c)
<object> ::= books | stories | academic papers (4.a), (4.b), (4.c)
In each rule, the “|” symbol separates different replacement possibilities; such as <subject>
, that can be replaced with “I”, “You” or “They”. One must note that a non-terminal symbol can be replaced with other non-terminals as well as terminal symbols, such as in the example’s <sentence>
.
This style of notation, including the use of angle brackets (< and >) is known as the Backus-Naur Form (BNF).
In BNF, a start symbol determines a non-terminal where the generation of the expression starts. For example:
<sentence>
Informally, only the start symbol and the production rules are required to define a grammar.
In formal language theory, a context-free grammar is a formal grammar where every production rule, formalized by the pair \((n, V)\), is in form of \(n \rightarrow V\). The CFG is defined by the 4-tuple \((\mathcal{T}, \mathcal{N}, \mathcal{R}, \mathcal{S})\), where \(\mathcal{T}\) is the finite set of terminal symbols, \(\mathcal{N}\) is the finite set of non-terminal symbols, \(\mathcal{R}\) is the production rule set, \(\mathcal{S} \in \mathcal{N}\) is the start symbol.
A production rule \(n \rightarrow V\) is realized by replacing the non-terminal symbol \(n \in \mathcal{N}\) with the symbol \(v \in V\), where \(V \in (\mathcal{T} \cup \mathcal{N})^*\) is a sequence of terminal and/or non-terminal symbols.
For more details on CFGs, their relation to context-free languages, parsing, compilers and other related topics refer to (Aho, Sethi, and Ullman 1986) or Wikipedia.
Notice that each rule in the example grammar were numbered. Using these numbers, one can precisely refer to a certain expression. This is performed by replacing the first non-terminal symbol with the \(n\)th rule of that non-terminal, starting with the start symbol.
For example, the sequence [2, 3, 1]
selects rules (1.b), (2.c) and (3.a) in the following four-step sequence:
Step | Sequence | Rule | Current state |
---|---|---|---|
0 | Start | <sentence> . |
|
1 | 2 | (1.b) | <subject> <verb> . |
2 | 3 | (2.c) | They <verb >. |
3 | 1 | (3.a) | They read. |
Evolutionary optimisation algorithms are a class of optimisation techniques inspired by natural evolution. They are used in cases where:
It most be noted that the stochastic nature of evolutionary algorithms does not guarantee the optimal solution, since most practical problems involve very large search spaces, and it is often not computationally feasible to search the whole space.
The oldest and simplest of these algorithms is the genetic algorithm (GA), which optimises a vector of binary variables. In this vignette, when referring to GA, we refer to an extended GA which handles integers numbers.
For an in depth introduction, readers are referred to Wikipedia
GA only optimises numeric arrays. By mapping an integer array to a program using a grammar, GA can be readily applied to evolve programs:
Any application which needs a program definable by grammar, is creatable in GE. Using a grammar allows integration of domain knowledge and a custom program syntax, which adds flexibility and precision to GE compared to other techniques such as GP.
Applications of GE include computational finance, music, and robotic control, among others. See http://www.grammatical-evolution.org/pubs.html for a collection of publications in this area.
The package gramEvol simplifies defining a grammar and offers a GA implementation. gramEvol hides many details, including the grammar mapping and GA parameters, and the only things the user has to do is to:
CreateGrammar
.expression
(s) and return a numeric value.GrammaticalEvolution
.In this section, examples are used to demonstrate its usage.
Symbolic regression is the process of discovering a function, in symbolic form, which fits a given set of data. Evolutionary algorithms such as GP and GE are commonly used to solve Symbolic Regression problems. For more information, visit http://www.symbolicregression.com/.
Rediscovery of Kepler’s law has been used as a benchmark for symbolic regression (Koza 1992; Ferreira 2006; Langley, Simon, and Bradshaw 1987). Here, the goal is to find a relationship between orbital periods and distances of solar system planets from the sun. The distance and period data, normalised to Earth, are as follows:
Planet | Distance | Period |
---|---|---|
Venus | 0.72 | 0.61 |
Earth | 1.00 | 1.00 |
Mars | 1.52 | 1.84 |
Jupiter | 5.20 | 11.90 |
Saturn | 9.53 | 29.40 |
Uranus | 19.10 | 83.50 |
Kepler’s third law states: \(period^2 = constant \times distance^3\)
To use grammatical evolution to find this relationship from the data, we define the following context-free grammar. Here \(\mathcal{S}\) denotes the starting symbol and \(\mathcal{R}\) is the collection of production rules:
\(\mathcal{S}\) = <expr>
Production rules : \(\mathcal{R} =\)
<expr> ::= <expr><op><expr> | <sub-expr> (1.a), (1.b)
<sub-expr> ::= <func>(<var>) | <var> | <var>^<n> (2.a), (2.b), (2.c)
<func> ::= log | sqrt | sin | cos (3.a), (3.b), (3.c), (3.d)
<op> ::= + | - | x (4.a), (4.b), (4.c)
<var> ::= distance | distance^<n> | <n> (5.a), (5.b), (5.c)
<n> ::= 1 | 2 | 3 | 4 (6.a), (6.b), (6.c), (6.d)
This is a general purpose grammar, and it can create different expressions corresponding to different formulas which can explain and model the data.
The first step for using gramEvol is loading the grammar:
library("gramEvol")
ruleDef <- list(expr = grule(op(expr, expr), func(expr), var),
func = grule(sin, cos, log, sqrt),
op = grule(`+`, `-`, `*`),
var = grule(distance, distance^n, n),
n = grule(1, 2, 3, 4))
grammarDef <- CreateGrammar(ruleDef)
Here, the BNF notation is implemented in R:
list
.non.terminal.name = grule(replacement1, replacement2, ...)
format.CreateGrammar
is used to load the list and create the grammar objectThe print function reproduces the grammar in BNF format:
print(grammarDef)
## <expr> ::= <op>(<expr>, <expr>) | <func>(<expr>) | <var>
## <func> ::= `sin` | `cos` | `log` | `sqrt`
## <op> ::= `+` | `-` | `*`
## <var> ::= distance | distance^<n> | <n>
## <n> ::= 1 | 2 | 3 | 4
Note that `+`
and op(expr, expr)
are used in the code above because grule
expects R expressions, and expr op expr
is not valid in R. As it is tedious to convert between the functional form and the operator form, the package also provides gsrule
(or grammar string rule), which accepts strings with <>
:
ruleDef <- list(expr = gsrule("<expr><op><expr>", "<func>(<expr>)", "<var>"),
func = gsrule("sin", "cos", "log", "sqrt"),
op = gsrule("+", "-", "*"),
var = grule(distance, distance^n, n),
n = grule(1, 2, 3, 4))
CreateGrammar(ruleDef)
## <expr> ::= <expr><op><expr> | <func>(<expr>) | <var>
## <func> ::= sin | cos | log | sqrt
## <op> ::= + | - | *
## <var> ::= distance | distance^<n> | <n>
## <n> ::= 1 | 2 | 3 | 4
Note that gsrule
and grule
can be mixed, as in the example above.
We use the following equation to normalise the error, adjusting its impact on small values (e.g., Venus) versus large values (e.g., Uranus):
\(e = \frac{1}{N} \sum{ log(1 + |p-\hat{p}|)}\)
where \(e\) is the normalised error, \(N\) is the number of samples, \(p\) is the orbital period and \(\hat{p}\) is the result of symbolical regression.
We implement this as the fitness function SymRegFitFunc
:
planets <- c("Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus")
distance <- c(0.72, 1.00, 1.52, 5.20, 9.53, 19.10)
period <- c(0.61, 1.00, 1.84, 11.90, 29.40, 83.50)
SymRegFitFunc <- function(expr) {
result <- eval(expr)
if (any(is.nan(result)))
return(Inf)
return (mean(log(1 + abs(period - result))))
}
Here, the SymRegFitFunc
receives an R expression
and evaluates it. It is assumed that the expression
uses distance
to estimate the period
. Invalid expressions are handled by returning a very high cost (infinite error). Valid results are compared with the actual period according to the error function to compute the expression’s fitness.
GrammaticalEvolution
can now be run. All of the parameters are determined automatically. To avoid wasting time, and as the best possible outcome and its error are known (because we know the answer), a terminationCost
is computed and set to terminate GE when the Kepler’s equation is found.
ge <- GrammaticalEvolution(grammarDef, SymRegFitFunc,
terminationCost = 0.021)
ge
## Grammatical Evolution Search Results:
## No. Generations: 11
## Best Expression: sqrt(distance^3)
## Best Cost: 0.0201895728693592
Now that the result is found, it can be used in production. Here we only use it in a simple comparison:
best.expression <- ge$best$expression
data.frame(distance, period, Kepler = sqrt(distance^3),
GE = eval(best.expression))
## distance period Kepler GE
## 1 0.72 0.61 0.6109403 0.6109403
## 2 1.00 1.00 1.0000000 1.0000000
## 3 1.52 1.84 1.8739819 1.8739819
## 4 5.20 11.90 11.8578244 11.8578244
## 5 9.53 29.40 29.4197753 29.4197753
## 6 19.10 83.50 83.4737743 83.4737743
As a real-world optimisation may take a long time, a feedback of the state of optimisation is desirable. GrammaticalEvolution
allows monitoring this status using a callback function. This function, if provided to the parameter monitorFunc
, receives an object similar to the return value of GrammaticalEvolution
. For example, the following function prints the current generation, the best individual’s expression and its error:
customMonitorFunc <- function(results){
cat("-------------------\n")
print(results)
}
ge <- GrammaticalEvolution(grammarDef, SymRegFitFunc,
terminationCost = 0.021,
monitorFunc = customMonitorFunc)
or even using the print
function directly:
ge <- GrammaticalEvolution(grammarDef, SymRegFitFunc,
terminationCost = 0.021,
monitorFunc = print)
which prints:
## Grammatical Evolution Search Results:
## No. Generations: 1
## Best Expression: distance
## Best Cost: 1.60700784338907
## Grammatical Evolution Search Results:
## No. Generations: 2
## Best Expression: distance
## Best Cost: 1.60700784338907
… until:
## Grammatical Evolution Search Results:
## No. Generations: 9
## Best Expression: distance + distance
## Best Cost: 1.54428158317392
## Grammatical Evolution Search Results:
## No. Generations: 10
## Best Expression: 1 - distance + (cos(distance) - 1) * sin(distance^2) + distance + (log(distance) + distance + (cos(distance) - 1) * sin(distance^2) + distance)
## Best Cost: 1.4186428597461
## Grammatical Evolution Search Results:
## No. Generations: 11
## Best Expression: sqrt(distance^3)
## Best Cost: 0.0201895728693592
A regular expressions (RE) is a string that determines a character pattern. REs are more expressive and precise in determining sub-string matches compared to wildcards, and are widely used in many string pattern matching tasks, such as searching through log files or parsing a program’s output.
Creating a regular expressions requires careful assembly of symbols and operators to match the desired pattern. While this is usually performed by an expert programmer, it is possible to use evolutionary optimisation techniques to infer a RE from examples (Bartoli et al. 2012).
In this example, we demonstrate how gramEvol can be used to learn REs.
In formal language theory, a regular expression is a sequence of symbols and operators that describes a character pattern. REs are translated by RE processors into a non-deterministic finite automaton (NFA) and subsequently into a deterministic finite automaton (DFA). The DFA can then be executed on any character string to recognize sub-strings that match the regular expression. For a theoretical introduction to REs, including their relationship with context-free grammars, readers are referred to (Aho, Sethi, and Ullman 1986).
R supports standard regular expression with both the POSIX and the Perl syntax. In addition, the rex Package offers a functional interface for creating REs in R.
Consider matching a decimal real number in the form of \([\pm]nnn[.nnn]\), where [ ]
means optional and \(nnn\) denotes one or more digits. The following table compares this notation with the syntax of Perl, POSIX, and rex:
Rule | Notation | Perl | POSIX | rex |
---|---|---|---|---|
One digit | \(n\) | \d |
[[:digit:]] |
number |
One or more digits | \(nnn\) | \d+ |
[[:digit:]]+ |
numbers |
Optional presence of X | [X] | X? |
X? |
maybe(X) |
alternate presence of X or Y | X|Y | X|Y |
X|Y |
or(X, Y) |
Plus sign | + | \+ |
\+ |
"+" |
Minus sign | - | - |
- |
"-" |
Dot (decimal point) | . | \. |
\. |
"." |
Using the above table, \([\pm]nnn[.nnn]\) is translated to:
(\+|-)?\d+(\.\d+)?
(\+|-)?[[:digit:]]+(\.[[:digit:]]+)?
maybe(or("+", "-")), numbers, maybe(".", numbers)
To use a RE, the expression has to be wrapped in a start and stop symbol (\^...\$
in POSIX and Perl, and rex(start, ..., end)
for rex):
re <- "^(\\+|-)?[[:digit:]]+(\\.[[:digit:]]+)?$"
grepl
can be used to check if a string matches the RE pattern or not:
grepl(re, "+1.1")
## [1] TRUE
grepl(re, "1+1")
## [1] FALSE
Some matching and non-matching examples are listed below:
matching <- c("1", "11.1", "1.11", "+11", "-11", "-11.1")
non.matching <- c("a", "1.", "1..1", "-.1", "-", "1-", "1.-1",
".-1", "1.-", "1.1.1", "", ".", "1.1-", "11-11")
In this section, we use gramEvol to learn a RE that matches a decimal real number, as explained in the previous section.
The objective is to infer a RE that matches the decimal numbers in the vector matching
, but not in the non.matching
. Consequently, the score of any RE is determined by counting the number of matches and non-matches:
re.score <- function(re) {
score <- sum(sapply(matching, function(x) grepl(re, x))) +
sum(sapply(non.matching, function(x) !grepl(re, x)))
return (length(matching) + length(non.matching) - score)
}
The fitness function in gramEvol receives an R expression
, which has to be evaluated before being passed to re.score
:
fitfunc <- function(expr) re.score(eval(expr))
We use rex RE functions to create a grammar. The grammar only includes the functions explored in Section , and is designed such that the search space is reduced:
library("rex")
library("gramEvol")
grammarDef <- CreateGrammar(list(
re = grule(rex(start, rules, end)),
rules = grule(rule, .(rule, rules)),
rule = grule(numbers, ".", or("+", "-"), maybe(rules))))
grammarDef
## <re> ::= rex(start, <rules>, end)
## <rules> ::= <rule> | <rule>, <rules>
## <rule> ::= numbers | "." | or("+", "-") | maybe(<rules>)
<re>
, creates a valid rex command that uses <rules>
for pattern matching.<rules>
, is recursive and can create a collection of rules by repeating itself, e.g., <rule>
, <rule>
, <rule>
. The .()
allows using a comma inside a grule
definition, where otherwise it would have been interpreted as another replacement rule in the list.<rule>
, expands to a RE function or character pattern. These include numbers
and maybe
from rex, a decimal point, and + or –.The last step is to perform a search for a regular expression that minimises the score function. Here the minimum terminationCost
is known (i.e., zero error), and max.depth
is increased to allow for more expansion of the recursive <rules>
. We use GrammaticalExhaustiveSearch
to exhaustively search for the answer among all possible combinations of the grammar: % this takes some time to run, use “monitorFunc = print” for more convenience
GrammaticalExhaustiveSearch(grammarDef, fitfunc, max.depth = 7, terminationCost = 0)
## GE Search Results:
## Expressions Tested: 6577
## Best Chromosome: 0 1 3 0 2 1 3 1 0 0 1 1 0 0 3 0 0
## Best Expression: rex(start, maybe(or("+", "-")), maybe(numbers, "."), numbers, maybe(numbers), end)
## Best Cost: 0
The result, while correct, is different from what we expected: \([\pm][nnn.]nnn[nnn]\), which is true for any real number. Furthermore, the search takes a considerable amount of time:
system.time(GrammaticalExhaustiveSearch(grammarDef, fitfunc,
max.depth = 7, terminationCost = 0))
## user system elapsed
## 380.469 17.022 392.637
which was measured a 3.40 GHz Intel Core i7-2600 CPU.
In conclusion, one might find it easier to design REs by hand in real-world scenarios, rather than using evolutionary optimisation techniques.
In this section, some of the other functionalities of the gramEvol are introduced. Here, all of the examples are demonstrated using the following grammar:
grammarDef <- CreateGrammar(list(
expr = gsrule("(<expr>)<op>(<expr>)", "<coef>*<var>"),
op = gsrule("+", "-", "*", "/"),
coef = gsrule("c1", "c2"),
var = gsrule("v1", "v2")))
grammarDef
## <expr> ::= (<expr>)<op>(<expr>) | <coef>*<var>
## <op> ::= + | - | * | /
## <coef> ::= c1 | c2
## <var> ::= v1 | v2
To map a numeric sequence to an expression manually, use GrammarMap
:
GrammarMap(c(0, 1, 0, 0, 1, 1, 0, 0), grammarDef)
## (c1 * v1) - (c1 * v1)
The sequence is zero-indexed (the first rule is zero). To see the step by step mapping, use the verbose
parameter option:
GrammarMap(c(0, 1, 0, 0, 1, 1, 0, 0), grammarDef, verbose = TRUE)
## Step Codon Symbol Rule Result
## 0 starting: <expr>
## 1 0 <expr> (<expr>)<op>(<expr>) (<expr>)<op>(<expr>)
## 2 1 <expr> <coef>*<var> (<coef>*<var>)<op>(<expr>)
## 3 0 <coef> c1 (c1*<var>)<op>(<expr>)
## 4 0 <var> v1 (c1*v1)<op>(<expr>)
## 5 1 <op> - (c1*v1)-(<expr>)
## 6 1 <expr> <coef>*<var> (c1*v1)-(<coef>*<var>)
## 7 0 <coef> c1 (c1*v1)-(c1*<var>)
## 8 0 <var> v1 (c1*v1)-(c1*v1)
## Valid Expression Found
## (c1 * v1) - (c1 * v1)
If the length of a sequence is insufficient for the mapping process, such that a few non-terminal elements still remain in the resulting expression, a wrapping of up to wrappings
is performed. For example:
GrammarMap(c(0, 1, 0, 0, 1, 1), grammarDef, verbose = TRUE)
## Step Codon Symbol Rule Result
## 0 starting: <expr>
## 1 0 <expr> (<expr>)<op>(<expr>) (<expr>)<op>(<expr>)
## 2 1 <expr> <coef>*<var> (<coef>*<var>)<op>(<expr>)
## 3 0 <coef> c1 (c1*<var>)<op>(<expr>)
## 4 0 <var> v1 (c1*v1)<op>(<expr>)
## 5 1 <op> - (c1*v1)-(<expr>)
## 6 1 <expr> <coef>*<var> (c1*v1)-(<coef>*<var>)
## Non-terminal expression
## Wrapping string to position 0
## Step Codon Symbol Rule Result
## 7 0 <coef> c1 (c1*v1)-(c1*<var>)
## 8 1 <var> v2 (c1*v1)-(c1*v2)
## 9 0 <var> v2 (c1*v1)-(c1*v2)
## Valid Expression Found
## (c1 * v1) - (c1 * v2)
gramEvol offers several functions to examine grammar definitions.
summary
reports a summary of what grammar presents:
summary(grammarDef)
## Start Symbol: <expr>
## Is Recursive: TRUE
## Tree Depth: Limited to 4
## Maximum Rule Choices: 4
## Maximum Sequence Length: 18
## Maximum Sequence Variation: 2 2 2 2 4 4 2 2 2 4 2 2 2 2 4 2 2 2
## No. of Unique Expressions: 18500
Many of these properties are available through individual functions:
GetGrammarDepth
computes the depth of grammar tree. The parameter max.depth
is used to limit recursion in cyclic grammars. For example, this grammar is cyclic because of rule <expr> ::= <expr><op><expr>
, i.e., replacing a <expr>
with other <expr>
s. By default GetGrammarDepth
limits recursion to the number of symbols defined in the grammar:
GetGrammarDepth(grammarDef)
## [1] 4
GetGrammarDepth(grammarDef, max.depth = 10)
## [1] 10
For grammars without recursion, the value returned by GetGrammarDepth
is the actual depth of the tree:
grammarDef2 <- CreateGrammar(list(
expr = gsrule("(<subexpr>)<op>(<subexpr>)"),
subexpr = gsrule("<coef>*<var>"),
op = gsrule("+", "-", "*", "/"),
coef = gsrule("c1", "c2"),
var = gsrule("v1", "v2")))
GetGrammarDepth(grammarDef2)
## [1] 3
GetGrammarDepth
also supports computing the depth from any symbol:
GetGrammarDepth(grammarDef2, startSymb = "<subexpr>")
## [1] 2
GetGrammarDepth(grammarDef2, startSymb = "<coef>")
## [1] 1
GetGrammarMaxRuleSize
returns the maximum number of production rules per symbol. Here, <op>
has the highest number of production rules:
GetGrammarMaxRuleSize(grammarDef)
## [1] 4
GetGrammarNumOfExpressions
returns the number of possible expressions existing in the grammar space. This function also uses the optional argument max.depth
to limit the number of recursions and startSymb
to set the starting symbol:
GetGrammarNumOfExpressions(grammarDef)
## [1] 18500
GetGrammarNumOfExpressions(grammarDef, max.depth = 2)
## [1] 4
GetGrammarNumOfExpressions(grammarDef, startSymb = "<coef>")
## [1] 2
Here, the only expressions with depth of 2 or less are constructed if rule (\(\times\)) is applied first, creating 4 expressions (i.e., \(c_1 \times v_1\), \(c_1 \times v_2\), \(c_2 \times v_1\) and \(c_2 \times v_2\)). Also if is chosen as the starting symbol, the expressions are limited to \(c_1\) and \(c_2\).
GetGrammarMaxSequenceLen
computes the length of integer sequence required for iterating through the grammar space without wrapping. As with the previous functions, max.depth
is set to the number of symbols defined in the grammar.
GetGrammarMaxSequenceLen(grammarDef)
## [1] 18
GetGrammarMaxSequenceLen(grammarDef, max.depth = 3)
## [1] 8
GetGrammarMaxSequenceLen(grammarDef2, startSymb = "<subexpr>")
## [1] 3
GrammaticalEvolution
is defined as follows:
GrammaticalEvolution(grammarDef, evalFunc,
numExpr = 1,
max.depth = GrammarGetDepth(grammarDef),
startSymb = GrammarStartSymbol(grammarDef),
seqLen = GrammarMaxSequenceLen(grammarDef, max.depth, startSymb),
wrappings = 3,
suggestions = NULL,
optimizer = c("auto", "es", "ga"),
popSize = 8, newPerGen = "auto", elitism = 2,
mutationChance = NA,
iterations = 1000, terminationCost = NA,
monitorFunc = NULL,
plapply = lapply, ...)
max.depth
and startSymb
determine recursive grammar limitations, similar to what was explained in the previous section.
The rest of the parameters are the evolutionary optimisation options:
GrammaticalEvolution
evolves a population of popSize
chromosomes for a number of iterations
.optimizer
is set to “auto”, using the information obtained about the grammar (e.g., number of possibles expressions and maximum sequence length), GrammaticalEvolution
uses a heuristic algorithm based on the work by Deb and Agrawal (1999) to automatically determine a suitable value for popSize
(i.e., the population size) iterations
(i.e., the number of iterations) parameters.GrammaticalEvolution
automatically changes cross-over parameters depending on the grammar to improve optimisation results. A user can turn this off by manually setting the optimizer
.suggestions
in form of integer chromosomes, and randomly generated individuals.evalFunc
.elitism
are directly added to the next generation’s population. The rest of the population is created using cross-over of chromosomes selected with roulette selection operator.mutationChance
.iterations
or the desired terminationCost
, the algorithm stops and returns the best expression found so far.GrammaticalEvolution
supports multi-gene operations, generating more than one expression per chromosome using the numExpr
parameter.seqLen
times numExpr
(i.e., the sequence length per expression, times the number of expressions).monitorFunc
is then called with information and statistics about the current status of the population.plapply
is used for parallel processing.GrammaticalEvolution
automatically filters non-terminal expressions (i.e., expressions that don’t yield a terminal expression even after times of wrappings
). Therefore the end-user does not need to worry about them while using gramEvol.Processing expressions and computing their fitness is often computationally expensive. The gramEvol package can utilise parallel processing facilities in R to improve its performance. This is done through the plapply
argument of GrammaticalEvolution
function. By default, lapply
function is used to evaluate all individuals in the population.
Multi-core systems simply benefit from using mclapply
from package parallel
, which is a drop-in replacement for lapply
on POSIX compatible systems. The following code optimises evalFunc
on 4 cores:
library("parallel")
options(mc.cores = 4)
ge <- GrammaticalEvolution(grammarDef, evalFunc,
plapply = mclapply)
To run gramEvol on a cluster, clusterapply
functions can be used instead. The gramEvol package must be first installed on all machines and the fitness function and its data dependencies exported before GE is called. The following example demonstrates a four-process cluster running on the local machine:
library("parallel")
cl <- makeCluster(type = "PSOCK", c("127.0.0.1",
"127.0.0.1",
"127.0.0.1",
"127.0.0.1"))
clusterEvalQ(cl, library("gramEvol"))
clusterExport(cl, c("evalFunc"))
ge <- GrammaticalEvolution(grammarDef, evalFunc,
plapply = function(...) parLapply(cl, ...))
stopCluster(cl)
It must be noticed that in any problem, the speed-up achieved depends on the overhead of communication compared with the fitness functions’ computational complexity.
gramEvol supports generation and evaluation of multiple expressions:
numExpr
in GrammaticalEvolution
is used to pass a list of more than one R expression
EvalExpressions
offers a simpler interface for evaluating multiple expressions.The following example show cases EvalExpressions
: It uses a dataset for variables defined in the grammar, and evaluates a GE expression object along with a string:
df <- data.frame(c1 = c(1, 2),
c2 = c(2, 3),
v1 = c(3, 4),
v2 = c(4, 5))
quad.expr <- expression(c1 * v1, c1 * v2, c2 * v1, c2 * v2)
EvalExpressions(quad.expr, envir = df)
## expr1 expr2 expr3 expr4
## 1 3 4 6 8
## 2 8 10 12 15
This is useful in applications when more than one expression is required, or the collective power of several simple expressions outperform a single complex program. For example de Silva et al. (2013) (PDF) used GE for electricity load forecasting; instead of using a complex machine learning algorithm, pools of string expressions were generated in a guided manner and were used as features in a simpler machine learning algorithm to obtain better results.
The idea of generating using GE is further explored in gramEvol’s paper in the Journal of Statistical Software (2016).
gramEvol also provides a random search and an exhaustive search. Their syntax is similar to the GrammaticalEvolution
:
result1 <- GrammaticalExhaustiveSearch(grammarDef, evalFunc)
result2 <- GrammaticalRandomSearch(grammarDef, evalFunc)
gvrule
allows members of a vector to be used as individual rules. For example,
gvrule(1:5)
## Rule 0: 1L
## Rule 1: 2L
## Rule 2: 3L
## Rule 3: 4L
## Rule 4: 5L
which is equal to
grule(1,2,3,4,5)
## Rule 0: 1
## Rule 1: 2
## Rule 2: 3
## Rule 3: 4
## Rule 4: 5
Without gvrule
, 1:5
would have been interpreted as a single rule:
grule(1:5)
## Rule 0: 1:5
There are two ways to use commas and assignments in gramEvol rules:
gsrule
..()
and defined using grule
.For example, consider the following rules:
<assignment> ::= A = B | A = C
<comma> ::= A, B | B, C
Their definition using gramEvol is as follows:
CreateGrammar(list(assignment = gsrule("A = B", "A = C"),
comma = gsrule("A, B", "B, C")))
## <assignment> ::= A = B | A = C
## <comma> ::= A, B | B, C
or
CreateGrammar(list(assignment = grule(.(A = B), .(A = C)),
comma = grule(.(A, B), .(B, C))))
## <assignment> ::= A = B | A = C
## <comma> ::= A, B | B, C
GE offers a flexible yet powerful framework for automatic program generation. The syntax and the structure of the programs are described using a context-free grammar, and their objective is determined by a cost function. An evolutionary search is performed on the grammar to find the program that minimises the cost function.
gramEvol implements GE in R. It allows a grammar to be defined using R expressions, as well as in custom string formats. A GE program generator using gramEvol only requires a grammar definition and a cost function, and other steps including the evolutionary search and selecting its optimal parameters are handled automatically by the package. It also supports parallel computing, and includes facilities for exhaustive and random search.
In this vignette, some of the functionalities of gramEvol were explored. Furthermore, two examples were used to demonstrate the flexibility of GE and gramEvol.
Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
Bartoli, Alberto, Giorgio Davanzo, Andrea De Lorenzo, Marco Mauri, Eric Medvet, and Enrico Sorio. 2012. “Automatic Generation of Regular Expressions from Examples with Genetic Programming.” In Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, 1477–8. ACM. https://www.lri.fr/~hansen/proceedings/2012/GECCO/companion/p1477.pdf.
de Silva, Anthony Mihirana, Farzad Noorian, Richard I. A. Davis, and Philip H. W. Leong. 2013. “A Hybrid Feature Selection and Generation Algorithm for Electricity Load Prediction using Grammatical Evolution.” http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/fs\_icmla13.pdf.
Deb, Kalyanmoy, and Samir Agrawal. 1999. “Understanding Interactions among Genetic Algorithm Parameters.” Foundations of Genetic Algorithms. Morgan Kaufmann Publishers, 265–86.
Ferreira, Candida. 2006. “Kepler’s Third Law.” In Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence, 2nd ed., 253–57. Springer.
Koza, John R. 1992. “Symbolic Regression - Error-Driven Evolution.” In Genetic Programming: On the Programming of Computers by Means of Natural Selection, 1:237–88. MIT press.
Langley, Pat, Herbert A Simon, and Gary L Bradshaw. 1987. “Heuristics for Empirical Discovery.” In Computational Models of Learning, 21–54. Springer-Verlag Berlin.
Noorian, Farzad, Anthony M. de Silva, and Philip H. W. Leong. 2016. “gramEvol: Grammatical Evolution in R.” Journal of Statistical Software 71 (1): 1–26. doi:10.18637/jss.v071.i01.
O’Neill, Michael, Conor Ryan, Maarten Keijzer, and Mike Cattolico. 2003. “Crossover in Grammatical Evolution.” Genetic Programming and Evolvable Machines 4 (1). Springer-Verlag: 67–93.