Tevﬁk Bultan University of California, Santa Barbara, USA Pavol Cerny Vienna University of Technology, Austria. Krishna S India Institute of Technology, Bombay, India Sriram Sankaranarayanan University of Colorado at Boulder, USA Natarajan Shankar SRI International, USA.

AI Verification

Tool for Deep Neural Networks and Learning-Enabled Cyber-Physical

Systems

## 1 Introduction

NNV has been successfully applied to security verification and robustness analysis of various real-world DNNs, mainly feedforward neural networks (FFNNs) and convolutional neural networks (CNNs), as well as learning-enabled CPS. The first compares methods for safety verification of the ACAS Xu networks [21], and the second presents safety verification of a learning-based adaptive cruise control (ACC) system.

## 2 Overview and Features

When using an exact (sound and complete) reachability solver, such as the star-based solver, the safety checker can return either "safe" or "unsafe" along with a set of counterexamples. When using an overestimated (sound) reachability solver, such as the sonotope-based scheme or the approximate star-based solvers, the safety checker may return either "safe" or "uncertain" (unknown).

## 3 Set Representations and Reachability Algorithms

*Polyhedron [40]**Star Set [38,41] (code)**Zonotope [32] (code)**Abstract Domain [33]**ImageStar Set [37] (code)*

Then the exact reachable set of the layer RL is constructed by performing a series of stepReLU operations, that is, RL=stepReLUn(stepReLUn−1(· · ·(stepReLU1( ¯I)))). Similar to the overapproximation algorithm that uses star sets, the zonotope algorithm computes an overapproximation of the exact reachable set of an FFNN.

## 4 Evaluation

### Safety Verification of ACAS Xu Networks

For propertyφ3, NNV's exact-star method is about 20.7× faster than Reluplex, 14.2× faster than Marabou, 81.6× faster than Marabou-DnC (i.e. divide and conquer method). The approximate star method is 547× faster than Reluplex, 374× faster than Marabou, 2151× faster than Marabou-DnC, and 8× faster than ReluVal.

### Safety Verification of Adaptive Cruise Control System

When the initial velocity of the main vehicle is less than 27 (m/s), the ACC system with the discrete plant model is unsafe. As shown in Table 3, the verification for the linear model using the over-approximated method is on average 22.7 × faster than that of the nonlinear model.

## 5 Related Work

Because there are partitions in the reachable sets of the neural network controller, the number of star sets in the reachable set of the plant increases rapidly over time [38]. In contrast, the over-approximated method computes the interval hull of all reachable sets at each time step, maintaining a single reachable set of the plant throughout the computation.

## 6 Conclusions

In [54], a new simulation-guided approach has been developed to significantly reduce the computational cost for verification of NNCS. Finally, there is significantly related work in secure reinforcement learning, and it would be interesting to explore combining guarantees from NNV with those provided in these methods.

Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: Ai 2: Safety and robustness certification of neural networks with abstract interpretation. Xiang, W., Tran, HD, Johnson, TT: Reachable set computation and security verification for neural networks with relu activations (2017).

## Networks Using ImageStars

These available sets are then used to reason about the overall robustness of the network. Instead, we prove the robustness of the network on images attacked by perturbations bounded by arbitrary linear constraints.

## 2 Problem Formulation

First is the ImageStar set representation, which is an efficient representation for reachability analysis of CNNs. Second are exact and over-approximate reachability algorithms to construct reachable sets and verify the robustness of CNNs.

3 ImageStar

## 4 Reachability of CNN Using ImageStars

*Reachability of a Convolutional Layer**Reachability of an Average Pooling Layer**Reachability of a Fully Connected Layer**Reachability of a Batch Normalization Layer**Reachability of a Max Pooling Layer**Reachability of a ReLU Layer**Reachabilty Algorithm and Parallelization*

The worst-case complexity of the number of ImageStars in the exact reachable set of the maximum pool layer is given in Lemma 5. The worst-case complexity of the number of ImageStars in the exact reachability of the max-pool layer is O(( (p1×p2)h×w)nc) where [h, w, nc] is the size of the ImageStar output sets, and [p1, p2] is the size of the maximum pool layer.

## 5 Evaluation

### Robustness Veriﬁcation of MNIST Classiﬁcation Networks We compare our approach with the zonotope and polytope methods in two

We evaluate the robustness of the network under the well-known enlightening attack used in [8]. However, the sonotope method produces very large output ranges that cannot be used to prove the robustness of the network.

Robustness Veriﬁcation of VGG16 and VGG19

### Exact Analysis vs. Approximate Analysis

However, the over-approximated analysis cannot prove that VGG19 is not robust, since the obtainable set is an over-approximation of the exact one. Figure11 described the total reachability times of the convolutional layers, fully connected layers, maximum pool layers and ReLU layers in the VGG19 with 50% attack and 10−7 disturbance.

## 6 Discussion

In addition, the conservativeness of an over-approximation set is a crucial factor in the evaluation of an over-approximation approach. Therefore, the exact analysis still plays a significant role in the analysis of neural networks, as it helps to evaluate the conservativeness of the over-approximation methods.

## 7 Conclusion

Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. Ruan, W., Wu, M., Sun, Y., Huang, X., Kroening, D., Kwiatkowska, M.: Global robustness evaluation of deep neural networks with provable guarantees for the thel0 norm (2018).

## 2 Background

### Neural Networks

In Section 4, we discuss how to implement these abstraction and refinement steps as part of a CEGAR procedure, followed by an evaluation in Section 5. As part of our abstraction framework, we will sometimes need to consider a suffix of a DNN, in which the first layers of the DNN have been removed.

### Neural Network Verification

We will sometimes use the notation w(vi,j, vi+1,k) to refer to the input of Wi+1 representing the weight of the edge between neuron j of layer i and neuron kof layer i+ 1. The number of neurons to be added is typically negligible compared to the size of the DNN.

## 3 Network Abstraction and Refinement

### Abstraction

A neuron is positive if all the weights on its outgoing edges are positive and if all these weights are negative. Both v+i,j and v−i,j retain a copy of all incoming edges of the original vi,j; however, v+i,j preserves only the outgoing edges with positive weights, and v−i,j preserves only those with negative weights.

### Refinement

The outgoing edge from the abstract node toy has a weight of 8, which is the sum of the weights of the edges from toy v1 and v2. The first part of the inequality, ¯N(x)≥ N¯(x), follows from the fact that ¯N(x) can be obtained from ¯N(x) with a single application of the abstract.

## 4 A CEGAR-Based Approach

### Generating an Initial Abstraction

Another point addressed by Algorithm 2, in addition to how many rounds of abstraction should be performed, is which pairs of neurons should be joined in each abstract application. Since any pair of neurons we select will result in the same reduction in network size, our strategy is to prefer neurons that will result in a more accurate approximation.

### Performing the Refinement Step

Still, if this step happens to be a bottleneck, it is possible to adjust the algorithm to heuristically sample only some of the pairs and choose the best pair among those considered—without harming the robustness of the algorithm. As was the case with Algorithm 2, if it turns out to be too expensive to consider all possible nodes, it is possible to adjust the algorithm to explore only some of the nodes and choose the best one among those considered - without compromising the robustness of the algorithm .

## 5 Implementation and Evaluation

6.(From [20]) An illustration of the sensor readings passed as input to the ACAS Xu DNNs. Figure 7 depicts a comparison of the two approaches for generating initial abstractions: the abstraction-after-saturation scheme (x-axis), and the pointer-guided abstraction scheme described in Algorithm 2 (y-axis).

## 6 Related Work

*Neural Networks and Verification**Basic Geometric Path Enumeration Algorithm**Spatial Data Structures**ACAS Xu Benchmarks*

The verification problem in this method is presented in terms of the input/output properties of the neural network. The method works by taking an input set of states and performing a neural network implementation based on the set.

## 3 Improvements

*Local Search Type (DFS vs BFS)**Bounds for Splitting**Fewer LPs with Concrete Simulations**Zonotope Prefilter**Eager Bounds Computation**Zonotope Contraction*

During the affine transformation of θ (line 12 in Algorithm 2), the same transformation is also applied to the zonotope. Rationale for regularity: Domain shrinking procedures reduce the size of the zonotope z while maintaining an overestimation of the star abundance θ.

## 4 Evaluation with Other Tools

Path enumeration methods, on the other hand, explore all paths regardless of the distance to the uncertain set. Here we focused on the input/output properties of the neural network, given as linear constraints.

A Box Bounds Algorithm for Box-Halfspace Intersection

## B Parallelization

8. Doubling the number of cores roughly halves the computation time, down to the physical core count on each platform. The linear trend on the log-log plot shows continuous improvement as more cores are added, up to the physical-core limit on each platform.

## C Full Optimization Data

An evaluation where we adjusted the number of available cores to the computational process for each of the two platforms is shown in Fig.8. The AWS Server platform was faster than the laptop configuration and, with all cores used, could enumerate the same 484555 paths in about 15 seconds.

## D Full Tool Comparison Data

Mohapatra, J., Chen, P.-Y., Liu, S., Daniel, L., et al.: Towards verifying the robustness of neural networks against semantic perturbations. Xiang, W., Tran, H.-D., Johnson, T.T.: Reachable set computation and security verification for neural networks with ReLU activations.

Benchmarks for DNN Verification

## 1 Motivation

To this end, GDVBis is parameterized by: (1) a set of factors known to affect the performance of DNN veriﬁers; (2) coverage measures that determine the combination of factors that should be reflected in the benchmark; and (3) aseed verification problem from which a set of variant problems is generated. Finally, GDVB is designed to support the rapidly developing field of DNN veriﬁers by allowing the generation of benchmarks, e.g. from new seeds, as the veriﬁers improve, as new performance factors are identiﬁed, and to target challenge problems in diﬀerent DNN domains, regession. models for autonomous UAV navigation [39,53].

## 2 Background and Related Wok

As described in Sect.4, GDVB starts with seed problems that are challenging for current veriﬁers and “scales them down”, but it can also be used to start with easier seed problems and “scale them up”, as more typical of the previous work on scalable benchmarking. For DNN construction, we exploit a newer approach, R4V [47], that, given an original DNN and an architectural specification, automates the transformation of the DNN and uses distillation [28] to train it to match the test accuracy of the original. DNN.

## 3 Identifying Factors that Influence Verifier Performance

Potential Factors

### Exploratory Factor Study

Figure 1(d) plots the activation function against the number of features for which accurate results are produced using ERANDP. Figure 1(e) plots the input dimension against the number of features for which accurate results are produced using BaB. The verification accuracy can be increased with increasing input dimension.

## 4 The GDVB Approach

*Factor Diverse Benchmarks**From Factor Covering Arrays to Veriﬁcation Problems**Generating Benchmarks**An Instantiation of GDVB*

Figure 1(i) shows the number of exact properties for which results could be produced using Planet. The accuracy of the verification may vary depending on the learned weights of the network. The set of all possible combinations of factor levels is Πf∈FLf, i.e. the product of all factor levels.

## 5 GDVB in Use

### Setup

Our instantiation of GDVB supports the following factors: the total number of neurons in the DNN (neu), the number of fully connected layers (fc), the number of convolutional layers (conv), the dimension of the DNN input (idm), the size of each DNN input dimension (ids), the scale of the property (scl), and the translation of the property (trn). The fourth constraint ensures that the input dimension reduction results in a meaningful network; without it, the dimensionality reduction achieved by sets of convolutional layers produces an invalid network, i.e. the input to a layer is smaller than the kernel size.

### Comparing Veriﬁers Across a Range of Challenges

For example, the ReLuplex plot indicates that it can do well in verifying networks with multiple fully connected (fc) layers, but is challenged by larger networks (neu) and those with convolutional layers (conv). We note that a more restrictive measure biased toward less fully connected layers may not reveal such differences.

### GDVB and Benchmark Requirements R1–R3

We note that excluding factors or levels may yield a systematically generated measure that is unable to characterize differences between verifiers, or worse, mislead such a characterization by emphasizing certain factors while overlooking others become In applying GDVB, we propose to select as many factors as we know matter, starting from a challenging seed problem, and incrementally refine the levels as needed to focus benchmark results to differentiate verifier performance.

## 6 Conclusion

This system uses a neural network to estimate the aircraft's position from a camera image; a controller then steers the aircraft to follow the centerline of the runway. Design a new version of the system by retraining the neural network based on the results of falsification and debugging.

## Blockchain and Security

However, we are hopeful that the Move Prover will be used by the majority of Move programmers. Demonstrate that the Move Prover can verify important aspects of the Libra core modules (Sec.6).

## 2 Background: The Move Language

Finally, many common errors are prevented by the Move bytecode verifier (not to be confused with the Move Prover), a static analyzer that checks each bytecode program before executing it (similar to the JVM [26] or CLR [31] bytecode verifier). In the remainder of this section, we present an example that explains the memory-related invariants enforced by the Move bytecode verifier (6–8 above).

## 3 Tool Overview

In addition, the bytecode veriﬁer guarantees that no mutable reference can be an ancestor or descendant of another (mutable or immutable) reference in the same local or global data tree. It ensures that mutable procedure formals (mutable references or owned values) point to disjoint memory locations.

## 4 Boogie Model

Finally, each bytecode instruction is modeled as a procedure that changes local or global state in Boogie. 2 As with most verification approaches based on the generation of verification conditions, verification of recursive procedures or loops in Boogie requires writing loop invariants, which can be difficult and can also introduce quantifiers, making the problem harder for the underlying SMT solver.

## 5 Specifications

Without the assertion marked "new", the specification does not allow if the receiver and the sender are the same, as explained in Section 6. to handle illegal transactions, such as attempting to perform an operation not authorized by the sender of the transaction is not authorized. When aborts_ifP appears in the specification of a function, the Move Prover requires the function to abort if and hold only ifP.

## 6 Evaluation

In the example of Figure 4, the first postcondition states that the payee's account exists after a payment transaction (the payee's account may not exist before the payment, in which case it will be created). The modules with their specifications are available in Move Prover's resource tree.4 The Libra and LibraAccount modules contain almost 1300 lines (including specifications).

7 Related Work

## 8 Conclusion

However, some properties of the Libra blockchain are inherently global in nature, such as the fact that the total amount of currency must remain constant. Proceedings of the 6th ACM SIGPLAN-SIGSOFT Workshop on Software Analysis for Software Tools and Engineering, p. ed.).

### Our Refinement-Based Verification Approach

5 Note that this abstract model can also be derived automatically by instantiating the language semantics with the program in question if a formal semantics of the language is available (in the K framework). Note that no loop invariant is required in this bytecode verification, since each availability requirement involves only a limited number of execution steps—specifically, the second requirement involves only a single iteration of the loop.

## 2 Formal Verification of the Deposit Contract

### Incremental Merkle Tree Algorithm

The incremental Merkle tree algorithm takes as input a partial Merkle tree up to m and a new data hash, and inserts the new data hash into the (m+1)thleaf, resulting in a partial Merkle tree up to +1. Figure 1 illustrates the algorithm, showing how the given Merkle partial tree to 3 (shown on the left) is updated to the resulting Merkle partial tree to 4 (right) when a new data hash is inserted into the 4th leaf node.

### Bytecode Verification of the Deposit Contract

In addition to performing the incremental Merkle tree algorithm, most functions perform certain additional low-level tasks, and we verified that such tasks are performed correctly. The leaves of the Merkle tree only store the calculated hashes instead of the original stake data.

## 3 Findings and Lessons Learned

*Maximum Number of Deposits**ABI Standard Conformance of get deposit count Function In the previous version, the get deposit count function does not conform to the**Checking Well-Formedness of Calldata**Liveness**Discussion*

The call data decoding process in the previous version of the compiled bytecode has insufficient runtime checks for call data well-formed. We note that this issue was discovered when we were verifying the negative behavior of the deposit contract.

## 4 Related Work

Park, D.: Ethereum 2.0 deposit contract issue 1341: not ABI-standard return value of get deposit count of deposit contract.https://github.com/ethereum/eth2. Park, D.: Ethereum 2.0 deposit contract issue 38: a refactoring proposal for the deposit() loop.https://github.com/ethereum/deposit contract/issues/38 41.

## Control Policies

We present an empirical evaluation of our method over a large set of real IAM policies. We show that IAM Access Analyzer generates a robust, accurate, and compact set of ﬁndings for complex policies, taking less than one second per

## 2 Overview

### Approach

In particular, we must ensure that each access allowed by the policy is represented by several findings. The findings are correct because in each case there are requirements that match the conditions under which access is granted.1 Finally, the findings compactly summarize the policy in three positive statements that state who has access.

### Solution: Computing Findings via Stratified Abstraction

Findings can be computed by enumerating all the cubes generated by the above predicates and querying Zelk to determine whether the policy allows access to the requests described by the cube. Next to each cube, we show the result of a Zelkov query to determine whether the policy allows access to the requirements described by the cube: ✓(or

## 3 Algorithm

### Policies and Findings

Next, arrange the above cube from the point of view, considering two sub-cubes ∧ and pb∧ which respectively represent requests where SrcVpcis or vpc-aorvpc-b (andOrgId can be any value), and two sub-cubes∧q1. Now we further refine the rejected cubes, but we can consider the pa∧q1, pa∧q2 and pb∧q2 cubes in the improper boxes, since each of them is covered or contained by one of the two accepted cubes.

### Properties

Minimalism. The set of realizations Σ is minimal if the denotation of every Σ ⊂Σ is strictly contained in the denotation of Σ.

### Algorithm

The Refine procedure takes as input a finding σ and returns the set of findings achievable by individually refining a value of σ. Access summary. ProcedureAccessSummary(Fig.4) takes as input a policy panel returns a minimal set of non-passable findings covering sp.

## 4 Implementation and Evaluation

The algorithm maintains three loop invariants: (1) wkl∪res covers p; (2) Each finding in fact is unmitigated; (3) It is really minimal. The algorithm explores the entire search space for only 15% of the policies, with a median ratio of 0.22.

## Contracts Using Max-SMT

The EVM specification [25] provides the gas model, i.e., an exact determination of the gas consumption for each EVM bytecode instruction. Compilers for Solidity also try to optimize bytecode to minimize its gas consumption (eg, optimizing the .

## 2 Overview: Optimal Bytecode as a Synthesis Problem

*Extracting Stack Functional Specifications from EVM Bytecode Our method takes as input the set of blocks that make up the control ﬂow**The Synthesis Problem**Characteristics of Our SMT Encoding of the Synthesis Problem Our approach to super-optimize blocks is based on restricting the problem in**Optimal Synthesis Using Max-SMT*

The output stack is obtained by the symbolic execution of byte codes operating on the stack, as will be formalized in Section 3. The next example shows that the optimal code is obtained when the subterms of the exponential are calculated in the next order (compared to the previous example).