20
ON THE ORIGIN OF BUGS Or… Understanding Hardware Bugs And How To Avoid Them BRYAN DICKMAN DVCLUB: NOVEMBER 26 TH 2019 1 Where do bugs come from? What are the common ways that bugs are introduced into designs? What can design engineers and verification engineers jointly do to avoid them?

ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

ON THE ORIGIN OF BUGSOr…

Understanding Hardware Bugs And How To Avoid Them

BRYAN DICKMAN DVCLUB: NOVEMBER 26TH 2019

1

Where do bugs come from? What are the common ways that bugs are introduced into designs?What can design engineers and verification engineers jointly do to avoid them?

Page 2: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BRYAN DICKMAN: VALYTIC CONSULTING LIMITED

¡ A senior technology manager and people leader¡ Recognised industry expert with 35 years of experience in the

semiconductor industry¡ 22 years of leading engineering teams at Arm

¡ IP Design-Verification delivery over an era when many new methodologies were introduced into engineering workflows

¡ Development of Design and Verification best practices¡ Engineering data strategies that exploit modern data science practices to

drive rich engineering insights and process/workflow improvements¡ Senior Director within the Technology Services Group ¡ Experienced developer of people

¡ T&VS Associate¡ Acuerdo Limited Associate (Joe Convey)

2

https://www.linkedin.com/in/bryan-dickman-74b1a914/?originalSubdomain=uk

Page 3: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

INTRODUCTION AND DISCLAIMERS

¡ The following is a personal perspective

¡ None of the data shown is real data – it is hypothetical

¡ It’s all established thinking

¡ but some of it is more established in software development today

¡ I have a whitepaper on the subject to follow

3

Page 4: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

WHY DO WE STILL HAVE BUGS?

…and jobs as DV engineers?

4

Page 5: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BUG FREE UTOPIA?

¡ ASSERTION: All complex designs contain bugs¡ No design can ever be 100% bug-free, no matter how hard

you try!

¡ Verification is a time and resource-limited quest to find as many bugs as possible before shipping

¡ Verification completeness is not generally achievable¡ Test planning can never be 100% complete

¡ Coverage models can never be 100% complete

¡ Infinite verification cycles is not possible

¡ …and so bugs will be missed

¡ Verification should employ strategies that increase the chances to find all bugs

¡ Designers should employ strategies that minimize bug risks

5

Page 6: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

TWO DISCIPLINES…With A Lot In Common

6

Logic DesignBehavioral modellingRTL codingSynthesisDFTTiming AnalysisPower AnalysisImplementation

'ABC’Testbench Design

CoverageTest Planning

UVMSoftware test development

Hardware AccelerationDeep Formal

SpecificationArchitecture

Micro-Architecture designSystem Architecture

BenchmarkingSimulation

Waveform AnalysisSoftware understandingVerification techniques

AssertionsFormal (for assertions)

Scripting/building workflowsData Analysis/Data Science

DESIGN VERIFICATION

Page 7: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

WHEN IS A BUG A BUG?…Only When You See It?

¡ Observable¡ An error is eventually seen (preferably detected by

verification)

¡ Lockups and denial of service – easy to detect – reset might workaround

¡ Data Corruption - if silent/undetected, consequences might be severe

¡ How ‘rare’ is it?

¡ What takes days and weeks to detect in the verification env, might manifest as a debilitating failure rate in silicon

¡ Non-Observable¡ Might be ‘spotted’ by chance, or found with formal?

¡ Might be an error in coding that is masked or unreachable

¡ Fix it or leave it? Weigh up the risks!

¡ Vulnerabilities¡ Reachable by malicious code, risk for security!

¡ Non-operational functions¡ Debug or event counters – software developers impacted

¡ Safety-critical and Reliability

¡ DRAM and logic sensitivity to SEUs – e.g. ECC functions for SECDED – errors rates in 1010 to 1017 errors/bit-h

¡ Performance and Power¡ Non-functional – but entitled performance is lost though

coding error, or too much power is consumed by device

¡ Clocking and Reset

¡ Asynchronous events can lead to meta-stability

7

Page 8: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

WHEN ARE BUGS FOUND?…The Sooner The Cheaper!

¡ OPINION: most bugs should be flushed out in the early stages (say with 30% of the work) these are the easy finds.

¡ Verification work (consumption of resources and human effort to find, debug and fix) is disproportionately high for the remaining 10% of bugs

¡ But these are often the critical ones

¡ Focus workflow and methodology improvements on this later stage for the biggest ROI

8

Page 9: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

COMMON ROOT CAUSES

9

Verification

Copy-Paste

Typos

Missing Code

Perfomance Tuning

Bug Fixing

Refactoring

Interfaces

Specifications Creating

Changing

¡ Bugs occur while creating code

¡ Spec errors/ambiguities

¡ Interface misunderstandings/spec

¡ Typos and Copy-Paste errors

¡ Incorrect verification env assumptions

¡ Missing code

¡ Or while changing code (code churn)

¡ Adding features

¡ Fixing bugs

¡ Performance tuning

¡ Or as a consequence of COMPLEXITY

Page 10: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

A WORD ON COMPLEXITY

10

¡ It’s complicated because…

¡ The architecture specification is complex to meet function and performance targets

¡ The architecture-implementation (or micro-architecture) use a catalog of complex ‘clever-tricks’ to meet performance targets (the Art of Design)

¡ It’s complicated due to…

¡ Behavior is no longer fully understood.

¡ Design partitioning is sub-optimal

¡ Code style is more implied-gates and less behavioral

¡ Comments are missing or worse – incorrect

¡ Code health has deteriorated – accumulation of technical (code) debt

¡ Is it measurable? ¡ Engineering experience and gut feel!

¡ LOCs

¡ MaCabe Cyclomatic Complexity

¡ Code indentation complexity

¡ Other metrics from Verilog compilers and linting tools

¡ E.g. #registers, #wires, gated clocks, redundant code, logic depth

¡ A search will reveal a limited number of tools to measure RTL complexity – most use McCabe

¡ If I can measure it, I can visualize it and then decide how to act upon it.¡ E.g. Refactor code, or intensify verification

Page 11: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

ESTABLISHED BUG AVOIDANCE

¡ Coding rules and static linting

¡ Design reviewing/ code scrubs

¡ Designer assertions

¡ Formal correct-by-construction

¡ If all else fails….

¡ Implement Feature toggle bits

11

Page 12: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

SOFTWARE DEVELOPMENT IDEAS

¡ In the last decade we have seen the emergence of DevOps for the software development world (rooted in Lean and then Agile)

¡ Most mainstream software platforms are developed and operated using the DevOps model.¡ Enables 10s,100s, 1000s software deployments per day, while achieving stability, reliability, high availability and

security.

¡ What DevOps principles might apply to hardware development?

12

Automate Testing:

Continuous Integration Trunk based

development

Code Refactoring

(build in >20%) Integrate Performance

Testing

Test-Driven Development

Pair Programming

(automate with Gerrit)

Blameless Post-Mortems

(Retrospectives)

Swarm on Defects

Telemetry:Continuous analysis of

metrics

Page 13: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

HOW CAN DATA AND ANALYTICS HELP?…assuming I am managing my data!

13

Page 14: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

COVERAGE ANALYTICS

¡ Not wishing to state the obvious…

¡ Well established practices of tracking all available coverage metrics to achieve 100% or as close as possible with analysis

¡ Remembering…

Covered != Verified

14

Page 15: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BUG ANALYTICS

¡ Collecting and tracking bug data from your bugs database is a great way visualize where things are at

¡ We eventually expect a plateau

¡ But the plateau might just be a very shallow curve

¡ And there may be many false summits en-route

¡ Time to review and change something?

15

Page 16: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

CORRELATING DATA

¡ Better insights gained from correlating bug data with other data such as the verification effort (machine/human hours).¡ “I’m running good cycles, but bugs are no

longer being found”

¡ And/or commit data…¡ “not only are no bugs being found, the

design and verification codebases are stable”

¡ From that you judge when to stop!¡ Or migrate to the next platform e.g.

Emulation, FPGA?

¡ And what if a late bug is then found?¡ How does that impact my sign off

verification target?

16

Page 17: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BUG PREDICTION…How Great Would That Be?

¡ It might be possible, and it may be a worthy endeavor to try

¡ It has been done before (see DVClub 2011 – Greg Smith*)

¡ Searching finds several papers for doing this for software

¡ Some use Machine Learning techniques

¡ So long as you have good datasets and have collected relevant design metrics for bugs

¡ Experiment with different training approaches e.g. Decision Trees, Naïve Bayes, Artificial Neural Networks (ANNs) to find the best prediction models

¡ Be aware of social factors and differences between teams when looking at historical datasets

17* https://www.testandverification.com/DVClub/24_Jan_2011/Greg_Smith.pdf

Page 18: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

CODEBASE ANALYTICS…Exploiting Version Control Data

¡ Another book recommendation

¡ Another idea from the software world

¡ Ideas on how to extract insights from GIT

¡ Hotspots indicate complex code with a high commit rate –what’s going on there? Complexity tracking over development time. Refactoring indicators.

¡ Correlate this with Defects

¡ Unexpected couplings between modules that frequently get committed together

¡ Social aspects of code development – how many different editors, code that is now ‘abandoned’

¡ Architecture and Project Management insights

18

Screenshot of Hotspot visualization taken from codescenehttps://codescene.io/projects/171/jobs/15343/results/code/hotspots/system-map

(Permission kindly granted by Adam Tornhill)

Page 19: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

THANK-YOU

¡ Hardware and software developers both develop code

¡ It’s a different mindset – but there are lessons from software that can be reused for hardware such as CI, Pair-programming/Gerrit, Refactoring, Complexity analysis, Bug Prediction, GIT analytics. Take some time to look at DevOps!

¡ Critical hardware bugs can be very costly (lith-masks, packaging, end-products), but so can software bugs in modern business-critical, high-availability and high-security platforms

¡ Successful teams will use data analytics to gain insights and apply improvements to reduce cost, shorten schedule and improve quality (less bugs!).

19

Page 20: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

20