Tevﬁk Bultan University of California, Santa Barbara, USA Pavol Cerny Vienna University of Technology, Austria. Krishna S India Institute of Technology, Bombay, India Sriram Sankaranarayanan University of Colorado at Boulder, USA Natarajan Shankar SRI International, USA.

AI Verification

Tool for Deep Neural Networks and Learning-Enabled Cyber-Physical

Systems

## 1 Introduction

NNV has been successfully applied to security verification and robustness analysis of several real-world DNNs, primarily feedforward neural networks (FFNNs) and convolutional neural networks (CNNs), as well as learning-enabled CPS. The first compares methods for safety verification of the ACAS Xu networks [21] and the second presents safety verification of a learning-based adaptive cruise control (ACC) system.

## 2 Overview and Features

When an exact (sound and complete) accessibility solver is used, such as the star-based solver, the safety checker can return either "safe" or "unsafe" along with a set of counterexamples. When an over-approximated (sound) accessibility solver is used, such as the zonotope-based scheme or the approximate star-based solvers, the safety checker may return either "safe" or "unsafe" (unknown).

## 3 Set Representations and Reachability Algorithms

*Polyhedron [40]**Star Set [38,41] (code)**Zonotope [32] (code)**Abstract Domain [33]**ImageStar Set [37] (code)*

Next, the exact reachable set of RL layers is constructed by performing a sequence of stepReLU operations, i.e. RL=stepReLUn(stepReLUn−1(· · ·(stepReLU1( ¯I)))). Similar to the over-approximation algorithm that uses star masses, the zonotope algorithm computes an over-approximation of the exact attainable FFNN mass.

## 4 Evaluation

### Safety Verification of ACAS Xu Networks

For propertyφ3, the exact star method of NNV is about 20.7× faster than Reluplex, 14.2×faster than Marabou, 81.6×faster than Marabou-DnC (i.e. divide and conquer method). The estimated star method is 547× faster than Reluplex, 374× faster than Marabou, 2151× faster than Marabou-DnC and 8× faster than ReluVal.

### Safety Verification of Adaptive Cruise Control System

When the initial speed of the lead vehicle is less than 27 (m/s), the ACC system with the discrete device model is unsafe. As shown in Table 3, verification for the linear model using the over-approximate method is on average 22.7 times faster than for the nonlinear model.

## 5 Related Work

Because there are partitions in the reachable sets of the neural network controller, the number of star sets in the reachable set of the plant increases rapidly over time [38]. In contrast, the over-approximated method computes the interval hull of all reachable sets at each time step, maintaining a single reachable set of the plant throughout the computation.

## 6 Conclusions

In [54], a new simulation-guided approach was developed to significantly reduce the computational cost for NNCS verification. Finally, there is a lot of related work in the field of secure reinforcement learning, and combining NNV guarantees with those provided in these methods would be interesting to explore.

Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: Ai 2: Certifying security and robustness of neural networks through abstract interpretation. Xiang, W., Tran, HD, Johnson, T.T.: Reachable set computation and security checking for neural networks with real activations (2017).

## Networks Using ImageStars

These reachable sets are then used to reason about the overall robustness of the network. Instead, we prove the robustness of the network on images attacked by perturbations bounded by arbitrary linear constraints.

## 2 Problem Formulation

First, there is the ImageStar set representation, which is an efficient representation for CNN reachability analysis. Second, there are exact and over-estimated reachability algorithms for constructing reachable sets and verifying the robustness of CNNs.

3 ImageStar

## 4 Reachability of CNN Using ImageStars

*Reachability of a Convolutional Layer**Reachability of an Average Pooling Layer**Reachability of a Fully Connected Layer**Reachability of a Batch Normalization Layer**Reachability of a Max Pooling Layer**Reachability of a ReLU Layer**Reachabilty Algorithm and Parallelization*

The worst case complexity of the number of ImageStars in the exact reachable set of the max-pooling layer is given in Lemma 5. The worst-case complexity of the number of ImageStars in the exact reachable set of the max-pooling layer is O (( (p1×p2)h×w)nc) where [h, w, nc] is the size of the ImageStar output sets and [p1, p2] is the size of the max-pooling layer.

## 5 Evaluation

### Robustness Veriﬁcation of MNIST Classiﬁcation Networks We compare our approach with the zonotope and polytope methods in two

We evaluate the robustness of the network under the well-known enlightening attack used in [8]. However, the zonotope method produces very large output ranges that cannot be used to prove the robustness of the network.

Robustness Veriﬁcation of VGG16 and VGG19

### Exact Analysis vs. Approximate Analysis

However, the too approximate analysis cannot prove that VGG19 is not robust, since its obtained reachable set is too close to the exact one. Figure 11 describes the total reach times of convolution layers, fully connected layers, max-pooling layers, and ReLU layers in VGG19 with 50% attack and 10-7 interference.

## 6 Discussion

Furthermore, the conservativeness of the overapproximate achievable set is a key factor in evaluating the overapproximation approach. Therefore, exact analysis still plays an essential role in neural network analysis, helping to assess the conservativeness of over-approximation approaches.

## 7 Conclusion

Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. Ruan, W., Wu, M., Sun, Y., Huang, X., Kroening, D., Kwiatkowska, M.: Global robustness assessment of deep neural networks with provable guarantees for the0 norm (2018).

## 2 Background

### Neural Networks

In Section 4, we discuss how to apply these abstraction and refinement steps as part of a CEGAR procedure, followed by an evaluation in Section 5. As part of our abstraction framework, we will sometimes need to consider a DNN suffix, in which the first layers of the DNN are omitted.

### Neural Network Verification

We will sometimes use the notation w(vi,j, vi+1,k) to refer to the entry of Wi+1 that represents the weight of the edge between neuron j of layer i and neuron kof layer i+ 1. neurons to be added are typically negligible compared to the size of the DNN.

## 3 Network Abstraction and Refinement

### Abstraction

A neuron is positional if all the weights on its outgoing edges are positive, and is negative if all those weights are negative. Both v+i,j and v−i,j retain a copy of all incoming edges from the original vi,j; However, v+i,j keeps only the outgoing edges with positive weights, and v−i,j only keeps those with negative weights.

### Refinement

The outgoing edge of the abstract node toy has a weight of 8, which is the sum of the weights of the edges of v1 and v2 toys. The first part of the inequality, ¯N(x)≥ N¯(x), follows from the fact that ¯N(x) can be obtained from ¯N(x) by a single application of abstraction.

## 4 A CEGAR-Based Approach

### Generating an Initial Abstraction

Another point addressed by Algorithm2, besides how many rounds of abstraction to perform, is which pair of neurons to merge in any application of abstract. Since any pair of neurons we choose will result in the same reduction in network size, our strategy is to prefer neurons that will result in a more accurate approximation.

### Performing the Refinement Step

However, if this step happens to become a bottleneck, it is possible to adjust the algorithm to heuristically sample only some of the pairs and select the best pair among those considered - without compromising the reliability of the algorithm. As was the case with Algorithm 2, if it turns out to be too expensive to consider all possible nodes, the algorithm can be adjusted to explore only some nodes and select the best among the nodes considered – without compromising the robustness of the algorithm.

## 5 Implementation and Evaluation

6.(Out [20]) An illustration of the sensor readings passed as inputs to the ACAS Xu DNNs. Figure 7 shows a comparison of the two approaches for generating initial abstractions: the abstraction-to-saturation scheme (x-axis) and the indicator-driven abstraction scheme described in Algorithm 2 (y-axis).

## 6 Related Work

*Neural Networks and Verification**Basic Geometric Path Enumeration Algorithm**Spatial Data Structures**ACAS Xu Benchmarks*

The verification problem in this method is presented in terms of the input/output properties of the neural network. The method works by taking an input set of states and performing a neural network implementation based on the set.

## 3 Improvements

*Local Search Type (DFS vs BFS)**Bounds for Splitting**Fewer LPs with Concrete Simulations**Zonotope Prefilter**Eager Bounds Computation**Zonotope Contraction*

During the affine transformation of θ (line 12 in Algorithm 2), the same transformation is also applied to the zonotope. Rationale for regularity: Domain shrinking procedures reduce the size of the zonotope z while maintaining an overestimation of the star abundance θ.

## 4 Evaluation with Other Tools

Path counting methods, on the other hand, explore all paths, regardless of distance in the uncertain set. Here, we focused on the input/output properties of the neural network, given as linear constraints.

A Box Bounds Algorithm for Box-Halfspace Intersection

## B Parallelization

8. Doubling the number of cores roughly halves the computation time up to the physical number of cores on each platform. The linear trend on the log-log graph shows continuous improvement as more cores are added, up to the physical core limit on each platform.

## C Full Optimization Data

An evaluation where we adjusted the number of available cores for the computation process for each of the two platforms is shown in Fig.8. The AWS Server platform was faster than the laptop setup, and with all the cores used, it could enumerate the same 484555 paths in about 15 s.

## D Full Tool Comparison Data

Mohapatra, J., Chen, P.-Y., Liu, S., Daniel, L., et al.: Towards verifying the robustness of neural networks against semantic perturbations. Xiang, W., Tran, H.-D., Johnson, T.T.: Reachable set computation and safety verification for neural networks with ReLU activations.

Benchmarks for DNN Verification

## 1 Motivation

To this end, GDV is parameterized by: (1) a range of factors known to influence the performance of DNN verifiers; (2) a coverage target that determines the combination of factors to be reflected in the benchmark; and (3) a seed verification problem from which a set of variant problems is generated. Finally, CSFP is designed to support the rapidly evolving field of DNN verifiers by enabling benchmark generation, for example, based on new seeds as verifiers improve, as new performance factors are identified, and to address challenge issues in various DNN domains. such as regression. models for autonomous UAV navigation [39,53].

## 2 Background and Related Wok

As described in Section 4, GDVB starts with seed problems that are challenging for current verifiers and "scales them down", but it can also be applied to start with easier seed problems and "scale them up" as more typical of previous work. on scaled comparison. For DNN construction, we use a recent approach, R4V [47], that given an original DNN and an architectural specification automates the DNN transformation and uses distillation [28] to train it to closely match the accuracy of proof of the original. DNN.

## 3 Identifying Factors that Influence Verifier Performance

Potential Factors

### Exploratory Factor Study

Figure 1(d) shows the activation function versus the number of features for which accurate results are obtained using ERANDP. Figure 1(e) shows the input dimension versus the number of features for which accurate results are generated with BaB. The accuracy of the check can be increased with a larger input dimension.

## 4 The GDVB Approach

*Factor Diverse Benchmarks**From Factor Covering Arrays to Veriﬁcation Problems**Generating Benchmarks**An Instantiation of GDVB*

Figure 1(i) shows the number of exact properties for which results could be produced using Planet. The accuracy of the verification may vary depending on the learned weights of the network. The set of all possible combinations of factor levels is Πf∈FLf, i.e. the product of all factor levels.

## 5 GDVB in Use

### Setup

Our instantiation of GDVB supports the following factors: the total number of neurons in the DNN (neu), the number of fully connected layers (fc), the number of convolutional layers (conv), the dimension of the DNN input (idm), the size of each DNN input dimension (ids), the scale of the property (scl), and the translation of the property (trn). The fourth constraint ensures that the input dimension reduction results in a meaningful network; without it, the dimensionality reduction achieved by sets of convolutional layers produces an invalid network, i.e. the input to a layer is smaller than the kernel size.

### Comparing Veriﬁers Across a Range of Challenges

For example, the ReLuplex plot indicates that it can verify networks with multiple fully connected (fc) layers well, but is challenged by larger networks (neu) and those with convolutional layers (conv). We note that a more restrictive benchmark oriented towards fewer fully connected layers may not reveal such differences.

### GDVB and Benchmark Requirements R1–R3

We note that excluding factors or levels may yield a systematically generated benchmark that is unable to characterize differences between veriﬁers, or worse, misleads such characterization by emphasizing certain factors while overlooking others. When applying GDVB, we suggest choosing as many factors as we know may matter, starting from a challenging seed problem and incrementally refining the levels as needed to focus benchmark results to differentiate veriﬁers’ performance.

## 6 Conclusion

This system uses a neural network to estimate the position of the aircraft based on the camera image; the controller then directs the aircraft to follow the centerline of the runway. Designing a new version of the system by retraining the neural network based on the results of falsification and error correction.

## Blockchain and Security

However, we are hopeful that the Move Prover will be used by most Move programmers. Demonstrate that the Move Prover can verify important aspects of the Libra core modules (Section 6).

## 2 Background: The Move Language

Finally, many common errors are prevented by the Move bytecode verifier (not to be confused with the Move Prover), a static analyzer that checks each bytecode program before execution (similar to JVM [26] or CLR [31] bytecode veriﬁers). In the remainder of this section, we present an example that explains the memory-related invariants enforced by the Move bytecode veriﬁer (6–8 above).

## 3 Tool Overview

In addition, the bytecode verifier guarantees that no mutable reference can be an ancestor or descendant of another (mutable or immutable) reference in the same local or global data tree. It causes mutable procedure formals (mutable references or property values) to refer to disjointed memory locations.

## 4 Boogie Model

Finally, each bytecode instruction is modeled as a procedure that modifies local or global state in Boogie. 2 As with most verification approaches based on the generation of verification conditions, the verification of recursive procedures or loops in Boogie requires invariant writing of the loop, which can be difficult and can also be quantitative, making the most difficult problem for the basic SMT solver.

## 5 Specifications

Without the assertion marked as 'new', the specification does not stand if the payee and sender are the same, as explained in section 6. to handle illegal transactions, such as attempting to perform an operation that is not authorized by the sender of the transaction. When aborts_ifP appears in a function's specification, the Move Prover requires the function to abort if and only ifP is valid.

## 6 Evaluation

In the example of Fig.4, the first post-condition asserts that the payee account exists after a payment transaction (the payee account may not exist before the payment, in which case it is created). The modules with their specifications are available in the Move Prover source tree. 4 The Scale and Scale Account modules consist of almost 1300 lines (specifications included).

7 Related Work

## 8 Conclusion

However, some properties of the Libra blockchain are inherently global in nature, such as the fact that the total amount of currency must remain constant. Proceedings of the 6th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. eds.).

### Our Refinement-Based Verification Approach

5 Note that this abstract model can also be derived automatically by instantiating the language semantics with the specific program, if formal language semantics are available (in the K framework). Note that no loop invariant is needed in this bytecode verification, as each reachability claim involves only a limited number of execution steps - in particular, the second claim involves only a single iteration of the loop.

## 2 Formal Verification of the Deposit Contract

### Incremental Merkle Tree Algorithm

The incremental Merkel tree algorithm takes as input a partial Merkel tree up to m and a new hash value of the data, and inserts the new hash value of the data into (m+ 1)list, resulting in a partial Merkel tree do-tom+ 1. Figure 1 illustrates the algorithm, which shows, how a given partial Merkel tree up to 3 (shown on the left) is updated to the resulting partial Merkel tree up to 4 (right) when a new data hash is inserted into the 4th leaf node.

### Bytecode Verification of the Deposit Contract

In addition to running the incremental Merkle tree algorithm, most functions perform certain additional low-level tasks, and we have verified that such tasks are performed correctly. The leaves of the Merkle tree only store the computed hashes instead of the original deposit data.

## 3 Findings and Lessons Learned

*Maximum Number of Deposits**ABI Standard Conformance of get deposit count Function In the previous version, the get deposit count function does not conform to the**Checking Well-Formedness of Calldata**Liveness**Discussion*

The call data decoding process in the previous version of the composite bytecode lacks sufficient runtime checks for the well-formedness of call data. We note that this problem was found when we verified the negative behavior of the deposit contract.

## 4 Related Work

Park, D.: Ethereum escrow contract Issue 2.0 1341: Non-standard ABI return value of escrow contract number of deposits received.https://github.com/ethereum/eth2. Park, D.: Ethereum 2.0 deposit contract issue 38: a refactoring suggestion for the deposit() loop.https://github.com/ethereum/deposit contract/issues/38 41.

## Control Policies

We present an empirical evaluation of our method on a large set of real-world IAM rules. We show that IAM Access Analyzer produces a good, accurate, and compact set of findings for complex policies, taking less than a second per finding.

## 2 Overview

### Approach

In particular, we must ensure that each access permitted by the policy is represented by some finding. The findings are precise as in each case there are requests that match the conditions that access is granted.1 Finally, the findings compactly summarize the policy in three positive statements that state who has access.

### Solution: Computing Findings via Stratified Abstraction

We can compute the findings by enumerating all the cubes created by the predicates above and querying Zelkov to determine if the policy allows access to the requests described by the cube. Next to each cube, we show the result of Zelkova's search to determine whether the policy allows access to the requirements described by the cube: ✓(answer.

## 3 Algorithm

### Policies and Findings

Next, arrange the above cube point by point, considering two sub-cubes∧ and pb∧ which respectively represent requests where SrcVpcis or vpc-aorvpc-b(andOrgId can be any value), and the two sub-cubes ∧q1. Now we further refine the rejected cubes, but we can consider the pa∧q1, pa∧q2 and pb∧q2 cubes in the unshaded boxes, since each of them is covered or contained by one of the two accepted cubes.

### Properties

Minimalism. A set of findings Σ is minimal if the denotation of each Σ ⊂Σ is strictly contained in the denotation of Σ.

### Algorithm

The Refine procedure takes as input a finding σ and returns the set of findings that can be obtained by individually refining one value of σ. A summary of Access. The AccessSummary procedure (Fig. 4) takes as input a policy and returns a minimal set of irreducible findings that p.

## 4 Implementation and Evaluation

The algorithm maintains three loop invariants: (1) wkl∪res covers p; (2) Each finding in fact is unmitigated; (3) It is really minimal. The algorithm explores the entire search space for only 15% of the policies, with a median ratio of 0.22.

## Contracts Using Max-SMT

The EVM specification [25] provides a gas model, i.e. precise gas consumption definition for each EVM bytecode instruction. Solidity compilers also try to optimize bytecode to reduce gas consumption (eg the optimize flag for .

## 2 Overview: Optimal Bytecode as a Synthesis Problem

*Extracting Stack Functional Specifications from EVM Bytecode Our method takes as input the set of blocks that make up the control ﬂow**The Synthesis Problem**Characteristics of Our SMT Encoding of the Synthesis Problem Our approach to super-optimize blocks is based on restricting the problem in**Optimal Synthesis Using Max-SMT*

The output stack is obtained by symbolic execution of the bytecodes operating on the stack, as will be formalized in Sect.3. The next example shows that the optimal code is obtained when the subterms of the exponential are calculated in the second order (compared to the previous example).