Evolutionary Grammar-Based Fuzzing

11 minute read

Published:

EvoGFuzz: An Evolutionary Approach to Grammar-Based Fuzzing

EvoGFuzz stands for evolutionary grammar-based fuzzing. This approach leverages evolutionary optimization techniques to systematically explore the space of a program’s potential inputs, with a particular emphasis on identifying inputs that could lead to exceptional behavior. With a user-defined objective, EvoGFuzz can adapt and refine the input generation strategy over time, making it a powerful tool for uncovering software defects and vulnerabilities.

Efficient detection of defects and vulnerabilities hinges on the ability to automatically generate program inputs that are both valid and diverse. One common strategy is to use grammars, which provide structured and syntactically correct inputs. This approach leads to the concept of grammar-based fuzzing, where fuzzing strategies are guided by the rules defined within the grammar.

A further enhancement to this concept is probabilistic grammar-based fuzzing, where competing grammar rules are associated with probabilities that guide their application. By carefully assigning and optimizing these probabilities, we gain considerable control over the nature of the generated inputs. This enables us to direct the fuzzing process towards specific areas of interest—for example, those functions that are deemed critical, have a higher propensity for failures, or have undergone recent modifications.

In essence, EvoGFuzz represents a potent blend of evolutionary optimization and probabilistic grammar-based fuzzing, poised to reveal hidden defects and vulnerabilities in a targeted and efficient manner.

Fuzzing a Program

Our program under investigation is The Calculator. This program acts as a typical calculator, capable of evaluating not just arithmetic expressions but also trigonometric functions, such as sine, cosine, and tangent. Furthermore, it also supports the calculation of the square root of a given number.

import math

def calculator(inp: str) -> float:
    """
        A simple calculator function that can evaluate arithmetic expressions 
        and perform basic trigonometric functions and square root calculations.
    """
    return eval(
        str(inp), {"sqrt": math.sqrt, "sin": math.sin, "cos": math.cos, "tan": math.tan}
    )

Side Note: In the calculator, we use Python’s eval function, which takes a string and evaluates it as a Python expression. We provide a dictionary as the second argument to eval, mapping names to corresponding mathematical functions. This enables us to use the function names directly within the input string.

# Evaluating the cosine of 2π
print(calculator('cos(6*3.141)'))

Output:

0.999993677717667
# Calculating the square root of 36
print(calculator('sqrt(6*6)'))

Output:

6.0

Each of these calls to the calculator will evaluate the provided string as a mathematical expression, and print the result.

Now, to find new defects, we need to introduce an oracle that tells us if the error that is triggered is something we expect or a new/unkonwn defect. The OracleResult is an enum with two possible values, NO_BUG and BUG. NO_BUG donates a passing test case and BUG a failing one.

We import the OracleResult enumerated type from the evogfuzz library. This is used in the oracle function to indicate the outcome of executing the ‘calculator’ function with a given input.

from evogfuzz.oracle import OracleResult

This is a function called oracle, which acts as an intermediary to handle and classify exceptions produced by the calculator function when given a certain input.

# Make sure you use the OracleResult from the evogfuzz library
from evogfuzz.oracle import OracleResult

def oracle(inp: str):
    """
    This function serves as an oracle or intermediary that catches and handles exceptions 
    generated by the 'calculator' function. The oracle function is used in the context of fuzz testing.
    It aims to determine whether an input triggers a bug in the 'calculator' function.

    Args:
        inp (str): The input string to be passed to the 'calculator' function.

    Returns:
        OracleResult: An enumerated type 'OracleResult' indicating the outcome of the function execution.
            - OracleResult.NO_BUG: Returned if the calculator function executes without any exception or only with CalculatorSyntaxError
            - OracleResult.BUG: Returned if the calculator function raises a ValueError exception, indicating a potential bug.
    """
    try:
        calculator(inp)
    except ValueError as e:
        return OracleResult.BUG
    
    return OracleResult.NO_BUG

This oracle function is used in the context of fuzzing to determine the impact of various inputs on the program under test (in our case the calculator). When the calculator function behaves as expected (i.e., no exceptions occur), the oracle function returns OracleResult.NO_BUG. However, when the calculator function raises an unexpected exception, the oracle interprets this as a potential bug in the calculator and returns OracleResult.BUG.

We can see this in action by testing a few initial inputs:

initial_inputs = ['sqrt(1)', 'cos(912)', 'tan(4)']

for inp in initial_inputs:
    print(inp.ljust(20), oracle(inp))

Output:

sqrt(1)              NO_BUG
cos(912)             NO_BUG
tan(4)               NO_BUG

The following code represents a simple context-free grammar for our calculator function. This grammar encompasses all the potential valid inputs to the calculator, which include mathematical expressions involving square roots, trigonometric functions, and integer and decimal numbers:

from fuzzingbook.Grammars import Grammar, is_valid_grammar

CALCGRAMMAR: Grammar = {
    "<start>":
        ["<function>(<term>)"],

    "<function>":
        ["sqrt", "tan", "cos", "sin"],
    
    "<term>": ["-<value>", "<value>"], 
    
    "<value>":
        ["<integer>.<integer>",
         "<integer>"],

    "<integer>":
        ["<digit><integer>", "<digit>"],

    "<digit>":
        ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
}
    
assert is_valid_grammar(CALCGRAMMAR)

The defined grammar CALCGRAMMAR provides a structured blueprint for creating various inputs for our fuzz testing. Each rule in this grammar reflects a possible valid input that our calculator function can handle. By fuzzing based on this grammar, we can systematically explore the space of valid inputs to the calculator function.

Leveraging EvoGFuzz to Unearth New Defects

We apply our EvoGFuzz class to carry out fuzz testing using evolutionary grammar-based fuzzing. This is aimed at uncovering potential defects in our ‘calculator’ function.

To initialize our EvoGFuzz instance, we require a grammar (in our case, CALCGRAMMAR), an oracle function, an initial set of inputs, a fitness function, and the number of iterations to be performed in the fuzzing process.

Upon creating the EvoGFuzz instance, we can execute the fuzzing process. The fuzz() method runs the fuzzing iterations, evolving the inputs based on our fitness function, and returns a collection of inputs that lead to exceptions in the ‘calculator’ function.

from evogfuzz.evogfuzz_class import EvoGFuzz

epp = EvoGFuzz(
    grammar=CALCGRAMMAR,
    oracle=oracle,
    inputs=initial_inputs,
    iterations=20
)

Upon creating the EvoGFuzz instance, we can execute the fuzzing process. The .fuzz() method runs the fuzzing iterations, evolving the inputs based on our fitness function, and returns a collection of inputs that lead to exceptions in the ‘calculator’ function.

found_exception_inputs = epp.fuzz()
print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")
EvoGFuzz found 646 bug-triggering inputs!

Lastly, we can examine the inputs that resulted in exceptions. This output can provide valuable insight into potential weaknesses in the ‘calculator’ function that need to be addressed.

# print only the first 20 bug-triggering inputs
for inp in list(found_exception_inputs)[:20]:
    print(str(inp))

Output:

sqrt(-444744.5717)
sqrt(-41.4)
sqrt(-29.43)
sqrt(-1187573157.4)
sqrt(-9399.563215131992)
sqrt(-52353.2227175)
sqrt(-836.5)
sqrt(-6.41)
sqrt(-1.3)
sqrt(-6535.55956592274)
sqrt(-583.14)
sqrt(-6.72)
sqrt(-355571.1)
sqrt(-34.94)
sqrt(-337.634295292245)
sqrt(-28452153347992435335.8432233)
sqrt(-1512)
sqrt(-342.52649343252952536428655)
sqrt(-1.43)
sqrt(-5624.886)

This process illustrates the power of evolutionary grammar-based fuzzing in identifying new defects within our system. By applying evolutionary algorithms to our fuzzing strategy, we can guide the search towards more defect-prone regions of the input space.

Analyzing and Sorting All Generated Inputs by Fitness

After the fuzzing process, you may want to examine all the generated inputs. These can be accessed using the get_all_inputs() method. Additionally, we can sort these inputs based on their fitness scores to gain insights into which inputs performed best according to our fitness function.

all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)

Now, let’s print out these sorted inputs along with their respective fitness scores. Inputs with higher fitness scores will be displayed first, as these are the ones our evolutionary process deemed more likely to uncover potential defects.

# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:20]:
    print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")

Output:

sqrt(-1187573157.4)                      fitness: 1
sqrt(-836.5)                             fitness: 1
sqrt(-583.14)                            fitness: 1
sqrt(-6.72)                              fitness: 1
sqrt(-337.634295292245)                  fitness: 1
sqrt(-1512)                              fitness: 1
sqrt(-5624.886)                          fitness: 1
sqrt(-4626744362893882.184)              fitness: 1
sqrt(-5858254.1)                         fitness: 1
sqrt(-7717.84)                           fitness: 1
sqrt(-5.3)                               fitness: 1
sqrt(-716.5354)                          fitness: 1
sqrt(-553.2)                             fitness: 1
sqrt(-5.21654)                           fitness: 1
sqrt(-157817.181)                        fitness: 1
sqrt(-3.4)                               fitness: 1
sqrt(-54533524.255469459349)             fitness: 1
sqrt(-2.7411313)                         fitness: 1
sqrt(-15815.85)                          fitness: 1
sqrt(-356.23)                            fitness: 1

This output provides an overview of the evolved inputs and their effectiveness in revealing potential defects, as gauged by our fitness function. It is a valuable resource for understanding the behavior of our program under various inputs and the effectiveness of our evolutionary grammar-based fuzzing approach.

Incorporating Custom Fitness Functions

The fitness function plays a crucial role in guiding the evolution process of our fuzzing inputs. A well-crafted fitness function can effectively direct the search towards the most promising regions of the input space.

To create your own fitness function, define a function that takes an Input instance and returns a float value. The return value represents the ‘fitness’ of the given input, with higher values indicating better fitness. Here is a simple template:

from evogfuzz.input import Input

def fitness_function_XYZ(inp: Input) -> float:
    # Implement your fitness function here.
    return 0.0

For instance, suppose we’re interested in inputs that invoke the cosine function in our calculator. We could define a fitness function fitness_function_cos that assigns a high fitness value to inputs containing ‘cos’. (Note that this might not be the best fitness function to find new expcetions.)

from evogfuzz.input import Input

def fitness_function_cos(inp: Input) -> float:
    if 'cos' in str(inp):
        return 1.0
    else:
        return 0.0

Once your fitness function is defined, you can incorporate it into the EvoGFuzz instance by passing it as the fitness_function argument.

epp = EvoGFuzz(
    grammar=CALCGRAMMAR,
    oracle=oracle,
    inputs=initial_inputs,
    fitness_function=fitness_function_cos,
    iterations=10
)

found_exception_inputs = epp.fuzz()

print(f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")

for inp in found_exception_inputs:
    print(str(inp))

Output:

EvoGFuzz found 29 bug-triggering inputs!
sqrt(-3)
sqrt(-11925)
sqrt(-552233.2921365)
sqrt(-866.7)
sqrt(-93)
sqrt(-522699.8391119)
sqrt(-534)
sqrt(-96)
sqrt(-66)
sqrt(-225465565.523)
sqrt(-827856969)
sqrt(-46)
sqrt(-657353533.115)
sqrt(-871768.3533648883225735)
sqrt(-9)
sqrt(-2349949)
sqrt(-69911)
sqrt(-5)
sqrt(-55.5)
sqrt(-1)
sqrt(-22349)
sqrt(-19125.232638656531)
sqrt(-26773556948)
sqrt(-25625618283655882813531383868136)
sqrt(-797375)
sqrt(-2)
sqrt(-4)
sqrt(-394765)
sqrt(-119151)

This way, the evolutionary grammar-based fuzzing process is now guided by your custom fitness function, focusing more on the areas you deem critical.

Evaluating Inputs Based on Custom Fitness Function

When utilizing a custom fitness function, such as fitness_function_cos in our case, we expect inputs containing ‘cos’ to achieve the highest fitness scores. This is because our fitness function assigns a score of 1.0 to any input that includes ‘cos’.

To confirm this behavior, we retrieve all inputs generated during the fuzzing process using the get_all_inputs() method and sort these inputs based on their fitness scores.

all_generated_inputs = epp.get_all_inputs()
all_generated_inputs_sorted = sorted(all_generated_inputs, key=lambda inp: inp.fitness, reverse=True)

Let’s display these sorted inputs along with their fitness scores. The inputs that contain ‘cos’ should appear first, demonstrating their high fitness value.

# investigate only the first 20 bug-triggering inputs
for inp in all_generated_inputs_sorted[:20]:
    print(f"{str(inp).ljust(40)} fitness: {inp.fitness}")

Output:

cos(5.353)                               fitness: 1.0
cos(-16.6)                               fitness: 1.0
cos(51)                                  fitness: 1.0
cos(37576.94335339286)                   fitness: 1.0
cos(-85334667)                           fitness: 1.0
cos(41)                                  fitness: 1.0
cos(-79329194)                           fitness: 1.0
cos(-765)                                fitness: 1.0
cos(-53)                                 fitness: 1.0
cos(3122)                                fitness: 1.0
cos(118688157765.6338936363624)          fitness: 1.0
cos(98.5)                                fitness: 1.0
cos(-95655295.5767)                      fitness: 1.0
cos(-1)                                  fitness: 1.0
cos(-412)                                fitness: 1.0
cos(1111)                                fitness: 1.0
cos(5.47947)                             fitness: 1.0
cos(-114)                                fitness: 1.0
cos(-69)                                 fitness: 1.0
cos(-24945)                              fitness: 1.0

The resulting output validates the effectiveness of our custom fitness function. It shows how we can guide the evolutionary grammar-based fuzzing process towards specific regions of the input space, thereby facilitating targeted exploration and bug discovery.