Các phương pháp dựa trên tính mới để lựa chọn thử nghiệm ngẫu nhiên. Đổi mới trong xác minh
Coverage improvement effectiveness through randomized testing declines as total coverage improves. Attacking stubborn holes in coverage could be augmented through learned novel test guidance to random test selection. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.
The Innovation
This month’s pick Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure. This article was published in 2023 IEEE AITest. The authors are from Bristol University (UK) and SiFive.
Randomized tests already benefit from ML methods to increase coverage in lightly covered regions of state space. However they struggle to handle coverage holes where there are no or few representative tests from which learning can be derived. This paper suggests learned methods to generate novel tests from input tests based on dissimilarity in each case from the input test
Paul’s view
AI again this month, this time AI to guide randomized simulation vs. to root cause bugs. In commercial EDA, AI-driven random simulation is hot and beginning to deploy at scale.
This paper focuses on an automotive RADAR signal processing unit (SPU) from Infineon. The SPU has 265 config registers and an 8,400-event test plan. Infineon tried 2 million random assignments of values to the config registers to cover their test plan.
The authors propose using a NN to guide config register values to close coverage faster. Simulations are run in batches of 1000. After each batch the NN is re-trained and used to select the next batch of 1000 configs that the NN scores highest from a test pool of 85k configs. Configs that are more different (“novel”) from previously simulated configs score higher. The authors try 3 NN scoring methods:
- Autoencoder: NN determines only novelty of the config. The NN is a lossy compressor/decompressor for config register values. The 265 values for a config are compressed down to 64 (as trained by configs simulated so far) then expanded back to 265 (as trained same way). The bigger the error in decompression the more “novel” that config is.
- Density: NN predicts coverage from config register values. The novelty of a new config is determined by inspecting hidden nodes in the NN and comparing to the values of these nodes for previously simulated configs. The bigger the differences the more novel that config is.
- Coverage: NN predicts coverage from config register values. A final layer is added to the NN with only one neuron, trained to compute a novelty score as a weighted sum of predicted coverage over 82,000 cover events. The weight of each event is based on its rarity – events rarely hit by configs simulated so far are weighted higher.
Results are intriguing: the coverage-NN achieves the biggest improvement at around a 2.13x reduction in simulations needed to hit 99% and 99.5% coverage. However, it’s quite noisy and repeating the experiment 10 times reduces the gain to 1.75x. The autoencoder-NN is much more stable, achieving 1.87x best case and a matching 1.75x on average – even though it doesn’t consider coverage at all! The density-NN is just bad all over.
Great paper, well written, would welcome follow-on research.
Raúl’s view
This is about Neural networks to increase functional coverage, to find “coverage holes”. In previous blogs we reviewed the use of ML for fault localization (May 2024), to simulate transient faults (March 2024), verifying SW for Cyber-Physical systems (November 2023), generating Verilog assertions (September 2023), code review (July 2023), detecting and fixing bugs in Java (May 2023), improving random instruction generators (February 2023) – a wide range of functional verification topics tackled by ML!
The goal is to choose tests generated from a Constrained Random Test Generator to favor “novel” tests based on the assumption that novel tests are more likely to hit different functional coverage events. This has been done before with good results as explained in section II. The authors build a platform called Neural Network based Novel Test Selector (NNNTS). NNNTS picks tests in a loop, retraining three different NN for three different similarity criteria. These NNs have 5 layers with 1-512 neurons in each layer. The three criteria are:
- Calculates the probability of a coverage event being hit by the input test
- Reduces an input test into lower dimensions and then rebuilds the test from the compressed dimensions. The mean squared difference that expresses the reconstruction error is considered as Novelty Score.
- Assumes that for a simulated test, if a coverage event hit by the test is also often hit by other simulated tests, then the test is very similar to the other tests in that coverage-event dimension. The overall difference of a simulated test in the coverage space is the sum of the difference in each coverage-event dimension.
They test against a Signal Processing Unit of the ADAS system. The production project consumes 6-month simulation of ~2 million constrained random tests with almost 1,000 machines and EDA licenses. The simulation expense of each test is 2 hours on average, there is some manual intervention and in the end 85,411 tests are generated.
In the experiment 100 tests from all generated tests are randomly picked to train NNNTS and then 1000 tests are picked at a time before retraining until reaching a coverage of 99% and 99.5%. This is repeated many times to get statistics. Density does the worst, saving on average 22% over random selection of tests to achieve 99% coverage and 14% to achieve 99.5%. Autoencoder and Density perform similarly, saving on average about 45% to reach 99% and 40% to reach 99.5% coverage.
This work is impressive as it can reduce the time and cost for functional verification by 40%, in the example of 6 months,1000 machines and EDA licenses and people – though the paper does not specify the cost of running NNNTS. The paper reviewed in February 2023 achieved 50% improvement on a simpler test case and a different method (DNNs were used to approximate the output of the simulator). I think enhancing/speeding up coverage in functional verification is one of the more promising areas for the application of ML, as shown in this paper.
Also Read:
Using LLMs for Fault Localization. Innovation in Verification
A Recipe for Performance Optimization in Arm-Based Systems
Anirudh Fireside Chats with Jensen and Cristiano
Share this post via: