Hawkeye:定向灰盒模糊测试技术

Hawkeye:定向灰盒模糊测试技术

Hawkeye Towards a Desired Directed Grey-box Fuzzer


Remarks


Abstract

Grey-box fuzzing is a practically effective approach to test real-world programs. However, most existing grey-box fuzzers lack directedness, i.e. the capability of executing towards user-specified target sites in the program. To emphasize existing challenges in directed fuzzing, we propose Hawkeye to feature four desired properties of directed grey-box fuzzers. Owing to a novel static analysis on the program under test and the target sites, Hawkeye precisely collects the information such as the call graph, function and basic block level distances to the targets. During fuzzing, Hawkeye evaluates exercised seeds based on both static information and the execution traces to generate the dynamic metrics, which are then used for seed prioritization, power scheduling and adaptive mutating. These strategies help Hawkeye to achieve better directedness and gravitate towards the target sites. We implemented Hawkeye as a fuzzing framework and evaluated it on various real-world programs under different scenarios. The experimental results showed that Hawkeye can reach the target sites and reproduce the crashes much faster than state-of-the-art grey-box fuzzers such as AFL and AFLGo. Specially, Hawkeye can reduce the time to exposure for certain vulnerabilities from about 3.5 hours to 0.5 hour. By now, Hawkeye has detected more than 41 previously unknown crashes in projects such as Oniguruma, MJS with the target sites provided by vulnerability prediction tools; all these crashes are confirmed and 15 of them have been assigned CVE IDs.


Summary

Hawkeye是一个定向模糊测试技术。定向模糊测试的基本原理是通过静态分析Call Graph和Control Flow Graph,计算function/basic block level distance, target function trace closure,这些距离用于fuzzing时的种子优先度,能量调度和适应性变异,以达到定向测试的目的。本文提出4个定向型fuzzer的特性并进行改进:考虑所有到达目标点的路径,不管长短;平衡静态分析的开销和实用性;合理分配能量;适应性变异策略。实验与AFL和AFLGo进行对比,对于某些漏洞发现的时间能从3.5个小时缩短到0.5个小时,缩短了到达目标点点时间和发现crash的时间,发现了41个unique crashes,并申请了15个CVE。


Introduction

定向模糊测试和一般测试的区别:

  • General Fuzzing: Cover more paths and induce more bugs (if any).
  • Directed Fuzzing: Given a target site (e.g., file & line number), test this site intensively, and induce more relevant bugs.

定向模糊测试的应用场景:

(1)补丁测试

(2)鉴别可疑的漏洞

(3)根据漏洞的描述去复现crash


本文的主要贡献:

  • 总结定向模糊测试应具备的4个特性
  • 提出能量函数的measure
  • 提速策略和能量调度,适应性变异,种子优先级
  • Fuzzing框架,测试崩溃复现,目标覆盖
  • 实验比AFL和AFLGO发现漏洞的速度更快
  • 发现了15个新CVE


Desired Properties

理想的定向模糊测试工具应具备的性质:

  • P1. 定向模糊测试应该具有一个鲁棒性非常好的路径机制,能引导fuzzer走到所有到达目标点点路径,而不只是偏向于特定路径。
  • P2. 定向模糊测试使用程序静态分析时应该平衡开销和实用性,搜集必要信息。
  • P3. 定向模糊测试应该优先和调度种子以快速到达目标点,根据距离合理分配能量。
  • P4. 定向模糊测试应该根据种子覆盖点不同程序状态,采取适应性变异策略。


AFLGo的解决方案:

  • For P1,AFLGo只是选择路径最短的那条,然而路径最短的那条可能无法触发某个漏洞。
  • For P2,通过函数指针进行的调用被定义为不可大;求目标距只算了最短单路径,没有考虑长路径,且边权重总为1。
  • For P3,基于模拟退火的能量调度,但没有对新种子的优先级排序,使得短距离的新种子无法及时进行变异。
  • For P4,缺少适应性的变异策略调整。


改进建议:

  • 准确地距离定义;
  • 考虑间接调用,考虑不同的调用路径;
  • 修改能量算法;距离导向的种子优先次序;
  • 适应性变异策略;


Methodology

在模糊测试的过程中,模糊器从优先级种子队列中选择一个种子。fuzzer对种子应用一个能量调度,目的是给那些被认为“更接近”目标位点的种子更多的突变机会,即能量。具体地说,这是通过幂函数实现的,幂函数是覆盖函数相似度和基本块跟踪距离的组合。对于每一个在变异过程中新生成的测试种子,在捕获其执行轨迹后,fuzzer将基于这些实用工具计算覆盖函数相似度和基本块跟踪距离。对于每个输入执行跟踪,其基本块跟踪距离计算为累积的基本块级距离除以执行的基本块总数;其覆盖函数相似度计算基于当前执行函数的重叠和目标函数跟踪闭合,以及功能级距离。


在能量确定后,模糊器根据突变子在种子上的粒度(粗粒突变和细粒突变)自适应地分配两种不同类型的突变预算。然后,fuzzer对新生成的种子进行评估,以对那些能量更大或已达到目标函数的种子进行优先级排序。


静态分析:

以源代码和目标点作为输入,输出插桩后的Binary(含basic block级距离信息)

  • 构造CG/CFG
  • 算Function level distance,用于计算covered function similarity
  • 算Basic Block level distance,用于计算basic block trace distance
  • 算Target function trace closure,用于计算covered function similarity


Fuzzing loop:

  • 种子选取
  • 能量调度
  • 计算Covered function similarity
  • 计算basic block trace distance
  • 变异策略适应性变化
  • 新种子优先级排序


权衡短路径与长路径的能量分配,覆盖期望集(所有能到达目标点的函数集)上更多函数的种子优先变异,路径越长重合越多,分数越高;


新种子优先级排序:

分3层存储,若为新种子,且发现新边、能量较高、可到达目标点,则放第1层,否则放第2层,不为新种子则放第3层。


适应性变异策略(详见文中算法):

若seed到达目标,细粒度变异增大,粗粒度变异下降;


Evaluation

实验与AFL和AFLGo进行对比,对于某些漏洞发现的时间能从3.5个小时缩短到0.5个小时,缩短了到达目标点点时间和发现crash的时间,发现了41个unique crashes,并申请了15个CVE。

实验设计围绕4个RQ:

  • RQ1 Is the static analysis really worth the effort?
  • RQ2 How good is Hawkeye’s performance in terms of reproducing the target crashes?
  • RQ3 How effective are the dynamic strategies in Hawkeye?
  • RQ4 How good is the ability of Hawkeye for reaching the specific target sites?


future work

实现binary fuzzing,目标识别基于二进制代码匹配,静态分析基于IDA,插桩基于Intel Pin。

编辑于 04-15

文章被以下专栏收录