EvoGFuzz stands for evolutionary grammar-based fuzzing. This approach leverages evolutionary optimization techniques to systematically explore the space of a program’s potential inputs, with a particular emphasis on identifying inputs that could lead to exceptional behavior. With a user-defined objective, EvoGFuzz can adapt and refine the input generation strategy over time, making it a powerful tool for uncovering software defects and vulnerabilities.
Efficient detection of defects and vulnerabilities hinges on the ability to automatically generate program inputs that are both valid and diverse. One common strategy is to use grammars, which provide structured and syntactically correct inputs. This approach leads to the concept of grammar-based fuzzing, where fuzzing strategies are guided by the rules defined within the grammar.
A further enhancement to this concept is probabilistic grammar-based fuzzing, where competing grammar rules are associated with probabilities that guide their application. By carefully assigning and optimizing these probabilities, we gain considerable control over the nature of the generated inputs. This enables us to direct the fuzzing process towards specific areas of interest—for example, those functions that are deemed critical, have a higher propensity for failures, or have undergone recent modifications.
In essence, EvoGFuzz represents a potent blend of evolutionary optimization and probabilistic grammar-based fuzzing, poised to reveal hidden defects and vulnerabilities in a targeted and efficient manner.
Our program under investigation is The Calculator
. This program acts as a typical calculator, capable of evaluating not just arithmetic expressions but also trigonometric functions, such as sine, cosine, and tangent. Furthermore, it also supports the calculation of the square root of a given number.
import math
def calculator(inp: str) -> float:
"""
A simple calculator function that can evaluate arithmetic expressions
and perform basic trigonometric functions and square root calculations.
"""
return eval(
str(inp), {"sqrt": math.sqrt, "sin": math.sin, "cos": math.cos, "tan": math.tan}
)
Side Note: In the calculator
, we use Python’s eval
function, which takes a string and evaluates it as a Python expression. We provide a dictionary as the second argument to eval, mapping names to corresponding mathematical functions. This enables us to use the function names directly within the input string.
# Evaluating the cosine of 2π
print(calculator('cos(6*3.141)'))
Output:
0.999993677717667
# Calculating the square root of 36
print(calculator('sqrt(6*6)'))
Output:
6.0
Each of these calls to the calculator will evaluate the provided string as a mathematical expression, and print the result.
Now, to find new defects, we need to introduce an oracle that tells us if the error that is triggered is something we expect or a new/unkonwn defect. The OracleResult
is an enum with two possible values, NO_BUG
and BUG
. NO_BUG
donates a passing test case and BUG
a failing one.
We import the OracleResult
enumerated type from the evogfuzz
library. This is used in the oracle function to indicate the outcome of executing the ‘calculator’ function with a given input.
from evogfuzz.oracle import OracleResult
This is a function called oracle, which acts as an intermediary to handle and classify exceptions produced by the calculator function when given a certain input.
# Make sure you use the OracleResult from the evogfuzz library
from evogfuzz.oracle import OracleResult
def oracle(inp: str):
"""
This function serves as an oracle or intermediary that catches and handles exceptions
generated by the 'calculator' function. The oracle function is used in the context of fuzz testing.
It aims to determine whether an input triggers a bug in the 'calculator' function.
Args:
inp (str): The input string to be passed to the 'calculator' function.
Returns:
OracleResult: An enumerated type 'OracleResult' indicating the outcome of the function execution.
- OracleResult.NO_BUG: Returned if the calculator function executes without any exception or only with CalculatorSyntaxError
- OracleResult.BUG: Returned if the calculator function raises a ValueError exception, indicating a potential bug.
"""
try:
calculator(inp)
except ValueError as e:
return OracleResult.BUG
return OracleResult.NO_BUG
This oracle function is used in the context of fuzzing to determine the impact of various inputs on the program under test (in our case the calculator). When the calculator function behaves as expected (i.e., no exceptions occur), the oracle function returns OracleResult.NO_BUG
. However, when the calculator
function raises an unexpected exception, the oracle interprets this as a potential bug in the calculator
and returns OracleResult.BUG
.
We can see this in action by testing a few initial inputs:
initial_inputs = ['sqrt(1)', 'cos(912)', 'tan(4)']
for inp in initial_inputs:
print(inp.ljust(20), oracle(inp))
Output:
sqrt(1) NO_BUG
cos(912) NO_BUG
tan(4) NO_BUG
The following code represents a simple context-free grammar for our calculator function. This grammar encompasses all the potential valid inputs to the calculator, which include mathematical expressions involving square roots, trigonometric functions, and integer and decimal numbers:
from fuzzingbook.Grammars import Grammar, is_valid_grammar
CALCGRAMMAR: Grammar = {
"<start>":
["<function>(<term>)"],
"<function>":
["sqrt", "tan", "cos", "sin"],
"<term>": ["-<value>", "<value>"],
"<value>":
["<integer>.<integer>",
"<integer>"],
"<integer>":
["<digit><integer>", "<digit>"],
"<digit>":
["1", "2", "3", "4", "5", "6", "7", "8", "9"]
}
assert is_valid_grammar(CALCGRAMMAR)
The defined grammar CALCGRAMMAR provides a structured blueprint for creating various inputs for our fuzz testing. Each rule in this grammar reflects a possible valid input that our calculator function can handle. By fuzzing based on this grammar, we can systematically explore the space of valid inputs to the calculator function.
We apply our EvoGFuzz
class to carry out fuzz testing using evolutionary grammar-based fuzzing. This is aimed at uncovering potential defects in our ‘calculator’ function.
To initialize our EvoGFuzz instance, we require a grammar (in our case, CALCGRAMMAR
), an oracle function, an initial set of inputs, a fitness function, and the number of iterations to be performed in the fuzzing process.
Upon creating the EvoGFuzz
instance, we can execute the fuzzing process. The fuzz()
method runs the fuzzing iterations, evolving the inputs based on our fitness function, and returns a collection of inputs that lead to exceptions in the ‘calculator’ function.
from evogfuzz.evogfuzz_class import EvoGFuzz
epp = EvoGFuzz(
grammar=CALCGRAMMAR,
oracle=oracle,
inputs=initial_inputs,
iterations=20
)
Upon creating the EvoGFuzz
instance, we can execute the fuzzing process. The .fuzz()
method runs the fuzzing iterations, evolving the inputs based on our fitness function, and returns a collection of inputs that lead to exceptions in the ‘calculator’ function.
found_exception_inputs = epp.fuzz()
print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")
EvoGFuzz found 646 bug-triggering inputs!
Lastly, we can examine the inputs that resulted in exceptions. This output can provide valuable insight into potential weaknesses in the ‘calculator’ function that need to be addressed.
# print only the first 20 bug-triggering inputs
for inp in list(found_exception_inputs)[:20]:
print(str(inp))
Output:
sqrt(-444744.5717)
sqrt(-41.4)
sqrt(-29.43)
sqrt(-1187573157.4)
sqrt(-9399.563215131992)
sqrt(-52353.2227175)
sqrt(-836.5)
sqrt(-6.41)
sqrt(-1.3)
sqrt(-6535.55956592274)
sqrt(-583.14)
sqrt(-6.72)
sqrt(-355571.1)
sqrt(-34.94)
sqrt(-337.634295292245)
sqrt(-28452153347992435335.8432233)
sqrt(-1512)
sqrt(-342.52649343252952536428655)
sqrt(-1.43)
sqrt(-5624.886)
This process illustrates the power of evolutionary grammar-based fuzzing in identifying new defects within our system. By applying evolutionary algorithms to our fuzzing strategy, we can guide the search towards more defect-prone regions of the input space.
After the fuzzing process, you may want to examine all the generated inputs. These can be accessed using the get_all_inputs()
method. Additionally, we can sort these inputs based on their fitness scores to gain insights into which inputs performed best according to our fitness function.
all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)
Now, let’s print out these sorted inputs along with their respective fitness scores. Inputs with higher fitness scores will be displayed first, as these are the ones our evolutionary process deemed more likely to uncover potential defects.
# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:20]:
print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")
Output:
sqrt(-1187573157.4) fitness: 1
sqrt(-836.5) fitness: 1
sqrt(-583.14) fitness: 1
sqrt(-6.72) fitness: 1
sqrt(-337.634295292245) fitness: 1
sqrt(-1512) fitness: 1
sqrt(-5624.886) fitness: 1
sqrt(-4626744362893882.184) fitness: 1
sqrt(-5858254.1) fitness: 1
sqrt(-7717.84) fitness: 1
sqrt(-5.3) fitness: 1
sqrt(-716.5354) fitness: 1
sqrt(-553.2) fitness: 1
sqrt(-5.21654) fitness: 1
sqrt(-157817.181) fitness: 1
sqrt(-3.4) fitness: 1
sqrt(-54533524.255469459349) fitness: 1
sqrt(-2.7411313) fitness: 1
sqrt(-15815.85) fitness: 1
sqrt(-356.23) fitness: 1
This output provides an overview of the evolved inputs and their effectiveness in revealing potential defects, as gauged by our fitness function. It is a valuable resource for understanding the behavior of our program under various inputs and the effectiveness of our evolutionary grammar-based fuzzing approach.
The fitness function plays a crucial role in guiding the evolution process of our fuzzing inputs. A well-crafted fitness function can effectively direct the search towards the most promising regions of the input space.
To create your own fitness function, define a function that takes an Input
instance and returns a float value. The return value represents the ‘fitness’ of the given input, with higher values indicating better fitness. Here is a simple template:
from evogfuzz.input import Input
def fitness_function_XYZ(inp: Input) -> float:
# Implement your fitness function here.
return 0.0
For instance, suppose we’re interested in inputs that invoke the cosine function in our calculator. We could define a fitness function fitness_function_cos
that assigns a high fitness value to inputs containing ‘cos’. (Note that this might not be the best fitness function to find new expcetions.)
from evogfuzz.input import Input
def fitness_function_cos(inp: Input) -> float:
if 'cos' in str(inp):
return 1.0
else:
return 0.0
Once your fitness function is defined, you can incorporate it into the EvoGFuzz
instance by passing it as the fitness_function
argument.
epp = EvoGFuzz(
grammar=CALCGRAMMAR,
oracle=oracle,
inputs=initial_inputs,
fitness_function=fitness_function_cos,
iterations=10
)
found_exception_inputs = epp.fuzz()
print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")
for inp in found_exception_inputs:
print(str(inp))
Output:
EvoGFuzz found 29 bug-triggering inputs!
sqrt(-3)
sqrt(-11925)
sqrt(-552233.2921365)
sqrt(-866.7)
sqrt(-93)
sqrt(-522699.8391119)
sqrt(-534)
sqrt(-96)
sqrt(-66)
sqrt(-225465565.523)
sqrt(-827856969)
sqrt(-46)
sqrt(-657353533.115)
sqrt(-871768.3533648883225735)
sqrt(-9)
sqrt(-2349949)
sqrt(-69911)
sqrt(-5)
sqrt(-55.5)
sqrt(-1)
sqrt(-22349)
sqrt(-19125.232638656531)
sqrt(-26773556948)
sqrt(-25625618283655882813531383868136)
sqrt(-797375)
sqrt(-2)
sqrt(-4)
sqrt(-394765)
sqrt(-119151)
This way, the evolutionary grammar-based fuzzing process is now guided by your custom fitness function, focusing more on the areas you deem critical.
When utilizing a custom fitness function, such as fitness_function_cos
in our case, we expect inputs containing ‘cos’ to achieve the highest fitness scores. This is because our fitness function assigns a score of 1.0 to any input that includes ‘cos’.
To confirm this behavior, we retrieve all inputs generated during the fuzzing process using the get_all_inputs()
method and sort these inputs based on their fitness scores.
all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)
Let’s display these sorted inputs along with their fitness scores. The inputs that contain ‘cos’ should appear first, demonstrating their high fitness value.
# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:20]:
print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")
Output:
cos(5.353) fitness: 1.0
cos(-16.6) fitness: 1.0
cos(51) fitness: 1.0
cos(37576.94335339286) fitness: 1.0
cos(-85334667) fitness: 1.0
cos(41) fitness: 1.0
cos(-79329194) fitness: 1.0
cos(-765) fitness: 1.0
cos(-53) fitness: 1.0
cos(3122) fitness: 1.0
cos(118688157765.6338936363624) fitness: 1.0
cos(98.5) fitness: 1.0
cos(-95655295.5767) fitness: 1.0
cos(-1) fitness: 1.0
cos(-412) fitness: 1.0
cos(1111) fitness: 1.0
cos(5.47947) fitness: 1.0
cos(-114) fitness: 1.0
cos(-69) fitness: 1.0
cos(-24945) fitness: 1.0
The resulting output validates the effectiveness of our custom fitness function. It shows how we can guide the evolutionary grammar-based fuzzing process towards specific regions of the input space, thereby facilitating targeted exploration and bug discovery.
]]>However, your tool is most likely dependent on other software and resources that need to be installed. Unfortunately: Since you are sharing these machines with all other computer science students (and employees), root privileges (to install your dependencies) are only given to a handful of individuals. But your due date is tomorrow, and you really need to run and finish these last experiments.
So what can you do?
Help is here, at least if you depend on different python versions. In this quick tutorial, I show you how you can install, update, and quickly change between your favorite python versions.
[Note] Connecting to the HU Compute Cluster requires an active Humboldt-University account or an account from the computer science department. If you don’t have an account, you might still find the pyenv instructions helpful; thus, you might want to skip the next section.
With your local Linux or macOS machine, you can access the server via SSH: Open your terminal with the following command:
ssh -l <hu_cs_account> <server>.informatik.hu-berlin.de
where:
<hu_cs_account>
is the name of your Computer Science Account
<server>
is the server you want to connect to (Find an overview of all possible servers here)
Alternatively, you can also connect with your general HU account <hu_account>
via email
:
ssh -l <hu_account>@hu-berlin.de <server>.informatik.hu-berlin.de
You can check the current work load here: Overview.
All we need to do is install pyenv - a simple python version manager tool that allows you to easily switch between multiple versions of python. You can even set local or global system-wide python versions.
pyenv
:curl https://pyenv.run | bash
This should automatically install everything along with all dependencies.
Upgrade note: The startup logic and instructions have been updated for simplicity in 2.3.0. The previous, more complicated configuration scheme for 2.0.0-2.2.5 still works.
PYENV_ROOT
to point to the path where
Pyenv will store its data. $HOME/.pyenv
is the default.
If you installed Pyenv via Git checkout, we recommend
to set it to the same location as where you cloned it.pyenv
executable to your PATH
if it’s not already thereeval "$(pyenv init -)"
to install pyenv
into your shell as a shell function, enable shims and autocompletion
eval "$(pyenv init --path)"
instead to just enable shims, without shell integrationThe below setup should work for the vast majority of users for common use cases. See Advanced configuration for details and more configuration options.
For bash:
Stock Bash startup files vary widely between distributions in which of them source
which, under what circumstances, in what order and what additional configuration they perform.
As such, the most reliable way to get Pyenv in all environments is to append Pyenv
configuration commands to both .bashrc
(for interactive shells)
and the profile file that Bash would use (for login shells).
First, add the commands to ~/.bashrc
by running the following in your terminal:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
Then, if you have ~/.profile
, ~/.bash_profile
or ~/.bash_login
, add the commands there as well.
If you have none of these, add them to ~/.profile
.
~/.profile
:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.profile
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.profile
echo 'eval "$(pyenv init -)"' >> ~/.profile
~/.bash_profile
:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
For Zsh:
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
If you wish to get Pyenv in noninteractive login shells as well, also add the commands to ~/.zprofile
or ~/.zlogin
.
exec $SHELL
pyenv --version
Now that we have installed the latest version, we can finally install the specific python version with the install
command:
pyenv install 3.10.9
To list all already installed versions of python on your system:
pyenv versions
Use the global
command to set a specific python version as global (system-wide).
pyenv global 3.10.9
# (Note that you have to install the desired version first with the `install` command)
And to set a specific python version locally (project-based), you can use the local
command.
pyenv local 3.10.9
Congratulations, you did it!
Now you have everything you need! So buckle up, pull that all-nighter, and finish your experiments; it’s about time!
]]>Parts of this notebook were a joint work with my colleague Hoang Lam Nguyen from Humboldt-Universität Zu Berlin.
💡 Coming Soon! This Site is currently under construction and is only updated periodically (i.e. when I have time :) )
💡 [Info]: We use the functionality provided by The Fuzzingbook. For a more detailed description of grammars, have a look at the chapter Fuzzing with Grammars.
To illustrate Alhazen’s use case and necessity, we start with a quick motivating example.
First, let me introduce to you our program under test: The Calculator.
This infamous program is similar to the one described in the original paper;
however, we introduced a synthetic BUG
that you have to explain utilizing the machine learning models learned by Alhazen.
💡[Note] We altered the calculator’s behavior to make the bug explanation more challenging. The introduced bug by Kampmann et al. is different from ours.
Our program under test is a typical calculator that accepts arithmetic equations and trigonometric functions and allows us to calculate the square root. To help us determine faulty behavior, i.e., a crash, we implemented an evaluation function that takes an input file and returns whether a bug occurred during the evaluation of the mathematical equations (BUG
, NO_BUG
).
As the calculator only accepts a subset of all mathematical equations, input files must conform to a syntactical input format called grammar.
So let us have a look at the grammar definition:
# Load Grammar Definition
from fuzzingbook.Grammars import Grammar
# Custom Calculator Grammar from Kampmann et al. (See paper - with out regex)
CALCULATOR_GRAMMAR: Grammar = {
"<start>":
["<function>(<term>)"],
"<function>":
["sqrt", "tan", "cos", "sin"],
"<term>": ["-<value>", "<value>"],
"<value>":
["<integer>.<integer>",
"<integer>"],
"<integer>":
["<digit><integer>", "<digit>"],
"<digit>":
["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
}
START_SYMBOL = "<start>"
The CALCULATOR_GRAMMAR
consists of several production rules, non-terminals, and terminals. Therefore, with this syntactic input format specification, the calculator only accepts inputs conforming to this grammar definition.
Now, let us load two mathematical equations and test our calculator:
from alhazen.helper import read_files
# Load initial input files
sample_list = read_files(['src/samples/calculator.1.expr', 'src/samples/calculator.2.expr'])
if display_output:
display(sample_list)
Output:
sqrt(-16)
tan(4)
Let’s execute our two input samples and observe the calculator’s behavior. To do this, we load the function evaluate_samples. We can call the function with a list of input samples, and it returns the corresponding execution outcome (label/oracle). The output is a set of tuples (input: str, oracle: OracleResult).
from typing import Set, Tuple
# Load function execute samples
from alhazenML.calculator import evaluate_samples
from alhazenML.oracle import OracleResult
# evaluate_samples(List[str])
oracle: Set[Tuple[str, OracleResult]] = evaluate_samples(sample_list)
if display_output:
display(oracle)
This gives us the following output:
{
('sqrt(-16)', OracleResult.BUG),
('tan(4)', OracleResult.NO_BUG)
}
We observe that the sample sqrt(-16) triggers a bug in the calculator, whereas the sample tan(4) does not show unusual behavior. Of course, we want to know why the input sample fails the program. In a typical use case, the developers of the calculator program would now try other input samples and evaluate if similar inputs also trigger the program’s failure. Let’s try some more input samples; maybe we can refine our understanding of why the calculator crashes:
# Our guesses (maybe the failure is also in the cos or tan function?)
guess_samples = ['cos(-16)', 'tan(-16)', 'sqrt(-100)', 'sqrt(-20.23412431234123)']
# lets obtain the execution outcome for each of our guess
guess_oracle = evaluate_samples(guess_samples)
# lets show the results
if display_output:
display(guess_oracle)
Output:
{
('cos(-16)', OracleResult.NO_BUG),
('tan(-16)', OracleResult.NO_BUG)
('sqrt(-100)', OracleResult.NO_BUG),
('sqrt(-20.23412431234123)', OracleResult.BUG)
}
We observe that the failure only seems to occur in the sqrt(x) function, however, only for specific x values. We could try other values for x and repeat the process. However, this would be highly time-consuming and not an efficient debugging technique for a larger and more complex test subject.
Wouldn’t it be great if there was a tool that automatically does this for us? And this is precisely what Alhazen is used for. It helps us explain under which circumstances input files fail a program.
💡 [Info]: Alhazen is a tool that automatically learns the circumstances of program failure by associating syntactical features of sample inputs with the execution outcome. The produced explanations (in the form of a decision tree) help developers focus on the input space’s relevant aspects.
Stay Tuned! More coming soon!