Software Engineering for Learning-Enabled Systems

Software Engineering for

Learning-Enabled Systems

Michigan State University

College of Engineering

CSE435: Software Engineering

Guest Lecturer

Michael Austin Langford

November 10, 2021

Introduction

▪ Teaching Assistant, Ph.D. student (5th year)▪ Systems for Trusted AI / Robustness & Resiliency

▪ M.S./B.S. Computer Science▪ Computer Vision & Graphics

▪ Prior experience▪ Private Industry, Military

▪ Areas of interest▪ Software Engineering for AI

▪ AI for Software Engineering

▪ Evolutionary Computation

▪ Autonomous Systems with Deep Learning

Michael Austin Langford

IntroductionMotivation

▪ Artificial Intelligence is central to autonomy.

▪ Autonomous systems must predict, plan, and categorize their surroundings

▪ Almost one-third of IT professionals report businesses using AI [Bishop 2021]

▪ Autonomous solutions spurred by global lockdown restrictions [Geylani 2020]

▪ Consumer anxiety over widely publicized failures of AI services [Slingerland 2021]

Gender and Race Bias in Commercial AI Services

[Hardestry 2018] [Dastin 2018]

False Accusations of Fraud By AI Auditors

[Charette 2018]

Only 14% Drivers Trust Autonomous Vehicles

[Edmonds 2021]

https://newsroom.ibm.com/IBMs-Global-AI-Adoption-Index-2021

https://axaxl.com/fast-fast-forward/articles/after-lockdown-the-future-of-autonomy

https://aerospace.org/paper/framework-developing-trust-artificial-intelligence

https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

https://spectrum.ieee.org/michigans-midas-unemployment-system-algorithm-alchemy-that-created-lead-not-gold

https://newsroom.aaa.com/2021/02/aaa-todays-vehicle-technology-must-walk-so-self-driving-cars-can-run/

IntroductionOverview

Traditional Software Systems

▪ Well-defined sequence of instructions

▪ Common design patterns exist

▪ Logic can be verified by formal methods

Learning-Enabled Systems (LESs)

▪ Optimized/refined to fit given data

▪ Enables solutions to complex tasks

▪ Decision logic not easy to interpret

What software challenges are unique toLearning-Enabled Systems?

IntroductionObjectives

▪ Introduce concepts of machine learning and deep learning

▪ Introduce the limitations of deep learning and data-driven systems

▪ Introduce software engineering challenges for LESs

▪ Types of Learning

▪ Supervised Learning• Training examples are labeled with expected outcomes

▪ Unsupervised Learning• Data is automatically organized by visible characteristics

▪ Modes of Learning

▪ Offline Learning• System is trained at design time and fixed at run time

▪ Online Learning• System learns and can change behavior at run time Netflix Recommendation Engine

(Unsupervised Learning)[Chong 2020]

Zillow Zestimates(Supervised Learning)

[Zillow 2021]

Learning-Enabled Systems (LESs)Learning Methods

Real estate estimatesby analyzing past listings.

Clusters users together based on preferences.

https://towardsdatascience.com/deep-dive-into-netflixs-recommender-system-341806ae3b48

https://www.prnewswire.com/news-releases/zillow-launches-new-neural-zestimate-yielding-major-accuracy-gains-301312541.html

Learning-Enabled Systems (LESs)Supervised Learning

▪ Labeled training data

▪ Collection of data pairs comprised of example inputs (𝒙) and expected outputs (𝒕)

▪ Examples:

Learning Task: learn relationship between input and output (𝒇(𝒙) = 𝒕)

(𝒙 = , 𝒕 = “2”)

(𝒙 = , 𝒕 = “4”)

(𝒙 = , 𝒕 = “0”)

(𝒙 = , 𝒕 = “3”)(𝒙 = , 𝒕 = “4”)

(𝒙 = , 𝒕 = “3”)(𝒙 = [48910, 0.25, 1025, …], 𝒕 = $95000.00)

(𝒙 = [48912, 0.33, 1530, …], 𝒕 = $210000.00)

(𝒙 = [48910, 0.20, 950, …], 𝒕 = $75000.00)

Real Estate Estimates• Input (𝒙): [zip, acreage, floorspace, etc.]• Output (𝒕): dollar value

(𝑓:ℝ𝑛 → ℝ1) Handwritten Numbers [Deng 2012]

• Input (𝒙): image• Output (𝒕): digit label

(𝑓:ℝ1024 → ℝ10 )

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MNIST-SPM2012.pdf

Learning-Enabled Systems (LESs)Data Considerations

▪ Generalization Assumption

▪ Assumes that a general trend can be derived from a finite set of examples

▪ “Garbage In, Garbage Out”

▪ Data-driven systems are highly dependent on quality of input & training data

▪ Cognitive Bias [Calikli 2010]

▪ Collected data can be misrepresentative of reality• Availability Bias: emphasis given to most readily available data

• Confirmation Bias: emphasis given to data with positive results

• Selection Bias: emphasis given to an incorrect data distribution

Collecting labeled data is time-consuming.

Collecting fair and balanced data is

difficult.

https://doi.org/10.1145/1868328.1868344

Learning-Enabled Systems (LESs)Learning Model Considerations

▪ Occam’s Razor (Law of Parsimony)

▪ Favor solutions with the fewest assumptions

▪ Simpler solutions are typically easier to test and less sensitive to data bias

▪ Model Complexity

▪ How complex should our model be?

▪ Underfitting data leads to poor predictions

▪ Overfitting data leads to poor generalizationsUnderfitting Overfitting

How can we determine if a model generalizes well?

Learning-Enabled Systems (LESs)Machine Learning Algorithms [Bishop 2006]

▪ Algorithms that can refine system behavior in response to data

▪ Linear Regression• Linear relationship between input variables and observed output values

▪ Decision Trees• Tree of decisions represented as nodes with branching outcomes

▪ Support Vector Machines (SVMs)• Hyperplane that can separate data according to relevant data features.

▪ Artificial Neural Networks (ANNs) [Haykin 1998]

• Multi-layered perceptron inspired by neuron activity (linear transformations + activations).

Linear Regression

Support Vector Machine

Decision Tree

Deep Neural Network

https://link.springer.com/book/9780387310732

https://dl.acm.org/doi/book/10.5555/521706

Artificial Neural Networks (ANNs) [Haykin 1998]

Brief History

▪ Single Layer Perceptron (1958)▪ Frank Rosenblatt, Cornell Aeronautical Laboratory

▪ Method to classify linearly separable data

▪ Multilayer Perceptron (1960s-1980s)▪ Classifies non-linearly separable data by linking perceptrons

▪ Dark period (1990s)▪ Methods to train ANNs are difficult and computationally expensive

▪ SVMs gain popularity

▪ Deep Neural Networks (1998 to Present)▪ More efficient methods to train deep networks (supported by GPUs)

▪ Godfathers of Deep Learning: Geoffrey Hinton, Yann LeCun, and Yoshua Bengio

▪ Dozens of layers, millions of trainable parameters

Neuron Perceptron

Multilayer Perceptron

Deep Neural Network(AlexNet)



Linearly Separable Data (Single Layer Perceptron)

▪ Data is easiest to classify when linearly separable

x1

x2

𝑓 𝑥1, 𝑥2 = 𝑥1 − 2𝑥2 + 2

1. Define linear transformation (standard form)

2. Define non-linear activation function

𝜎 𝑓 𝑥1, 𝑥2 = {𝑓 𝑥1, 𝑥2 ≥ 0 ? 0 ∶ 1}

𝑦 = 𝜎 𝒇 𝒙𝟏, 𝒙𝟐 = {𝒙𝟏 − 𝟐𝒙𝟐 + 𝟐 ≥ 𝟎 ? 0 ∶ 1}

3. Combine to create single layer perceptron

4. Classify based on final output

𝑦 ≥ 0 ? ∶

𝒇𝟏

𝑥1

𝑥2

𝑤1𝑥1

𝑤2𝑥2

𝑏

+ 𝜎 𝑦

Perceptron

𝑦 = 𝜎 𝒇 𝒙𝟏, 𝒙𝟐



Non-linearly Separable Data

▪ Classic XOR problem [Karajgi 2020]

x1 x2 x1 ⊕ x2

0 0 0

0 1 1

1 0 1

1 1 0

No single line exists that can separate X’s and O’s.

Output of XOR functionis not linearly separable.

x1(0,0)

(0,1) (1,0)

(1,0)

x2

Can be separated by chaining together multiple

linear separators.

???


https://towardsdatascience.com/how-neural-networks-solve-the-xor-problem-59763136bdd7


Non-linearly Separable Data (Multilayer Perceptron)

▪ Classic XOR problem

x1(0,0)

(0,1) (1,0)

(1,0)

x2

▪ Multiple Layers of Perceptrons

▪ First layer: determines which side point resides (both vertically and horizontally)

▪ Second layer: determines which corner𝒇𝟏𝒂

𝒇𝟏𝒃

Multilayer Perceptron

𝑦 = 𝜎 𝐖𝟐𝜎 𝐖𝟏𝐱

Weight constants determine where lines

are drawn.

𝑥1

𝑥2

𝑤1𝑎1 𝑥1 + 𝑤1𝑎2 𝑥2 + 𝑏1𝑎3 𝜎

𝜎

First Layer (𝑓1𝑏 and 𝑓1𝑏)Input

𝑤1𝑏1 𝑥1 + 𝑤1𝑏2 𝑥2 + 𝑏1𝑏3

𝜎 𝑦

Second Layer (𝑓2)

Output

𝑤21 ℎ1 + 𝑤21 ℎ2 + 𝑏23

ℎ1

ℎ2

𝐖1 =−1 0 0.50 −1 0.50 0 1

𝐖2 = −1 0 0.5

Chain of Matrix Computations



Deep Learning [Goodfellow 2016]

▪ More complex datasets and problem spaces require many layers

▪ Each consecutive layer discerns higher order relationships

▪ Deep Neural Networks (DNNs)

▪ Not unusual to see DNNs with dozens of layers and millions of weight values

How do we figure out the proper number of layers?

What about the size of each layer?

What about network topology?

How do we figure out the proper weight values?

There are numerous design considerations & configuration variables (hyperparameters).


https://www.deeplearningbook.org/

Deep Neural Networks (DNNs) [Goodfellow 2016]

Training Process

▪ Analogy: network of branching pipes with valves

▪ Each valve can open/close to adjust flow. How to find best valve settings?

▪ Weight optimization (gradient descent)

▪ “Training” phase tweaks weight values to determine settings with minimal error

▪ Gradient Descent with Back Propagation

▪ Start with random weight settings (weight initialization)

▪ Feed in some inputs with known expected outputs (labeled training data)

▪ Compare actual output of network to expected output (objective loss function)

▪ Compute gradient derivative to determine direction to minimize error

▪ Repeat at each layer in reverse to adjust weights (backpropagation)


train_dnn(dnn, dataset, criterion, n_epochs)

1 dnn = init_weights(network)

2 optimizer = init_optimizer(dnn)

3 for i in n_epochs:

4 for x, t in dataset:

5 y = dnn(x)

6 error = criterion(y, t)

7 dnn = backpropagation(dnn, optimizer error)


Training Algorithm

▪ Train by comparing error for each example in dataset over multiple epochs

Initialize with random weight values.

Each epoch is an iteration over the entire dataset.

Compare DNN output to ground truth for each given dataset entry.

Compute gradient of error w.r.t.weights and update weights.



Training Optimization

▪ Gradient Descent Optimization

▪ Assumes a smooth concave gradient landscape to global minimum

▪ Millions of weights are involved (numerous dimensions)

▪ Landscape is extremely complex

▪ The Exploration vs. Exploitation dilemma

▪ Stochastic Gradient Descent (SGD)

▪ Introduce randomness to weight adjustments

▪ Adaptive Gradient Descent Methods (Adam, Adadelta, RMSprop)

▪ Changing magnitude of weight adjustments over time

Simple Concave (3D)

Non-Concave Landscape



Training Progress

▪ Weights for DNN continue to be adjustedover a number of epochs

▪ Plot of loss over the number of epochsfollows a decaying curve

▪ Training terminates when objective loss (error) converges to a minimum

epoch

ob

ject

ive

loss

(er

ror)

Loss is high withrandom weights

Training has converged.Diminishing returns.

When training phase completes,DNN is good at handling training examples.

But what about overfitting?

How do we determine if the DNN is any good at non-training data?



Validation

▪ Test data can be used to estimate generalizability

▪ Data must be independent and identically distributed (i.i.d.)

▪ Must be careful about data contamination

▪ Validation data may be used to optimize hyperparameters

▪ Typically taken from a subset of training data (k-fold cross validation)

▪ Dataset partitioning

▪ Divide dataset into training, validation, and test data (80 / 20 split)

Validation

Test

Training80%

20%

Assumes that test data is representative of real data.

AvailabilityBias!



Network Architectures

▪ Numerous architectures (network topologies) have been devised

▪ Feedforward Neural Networks

▪ Most common, data follows a direct path from input to output layer

▪ Convolutional Neural Networks (CNNs)

▪ Tailored for processing image data more efficiently

▪ LeNet, AlexNet, VGG, ResNet, Inception, etc.

▪ Recurrent Neural Networks (RNNs)

▪ Tailored for processing sequences of data (time series)

▪ Loops output of layer back into input (state memory)

▪ LSTMs, GRUs, MGUs, etc.

Feedforward

CNN

RNNs



Convolutional Neural Networks (CNNs)

▪ Normal DNNs are fully-connected

▪ Perceptrons are a linear combination of all input variables

▪ Images contain many pixels (32 x 32 image = 1024 input variables)

▪ CNNs are only locally-connected

▪ Performs a linear combination of neighboring pixels (for each pixel)

▪ Output from each unit in layer is a feature map of image

▪ Image Convolution

▪ Common image processing filtering operation

▪ Applies a moving window kernel to each pixel

▪ Convolutions can be used to enhance/isolate features of image

Image

Feature Map

Activation Map


Convolutional Neural Networks (CNNs)Common Tasks

▪ Image Recognition [Krizhevsky 2009]

▪ Images are classified into categories based on imaged object

▪ Predictions are given as a probability distribution over possible categories

▪ Performance is measured by the accuracy (% correct) of predictions

Image

Classification

https://www.cs.toronto.edu/~kriz/cifar.html

Convolutional Neural Networks (CNNs)Demonstration

▪ ConvNetJS [Karpathy 2014]

▪ Deep learning in your browser

▪ Visualizes each layer of a CNN in real time

▪ Completely implemented in javascript

http://cs.stanford.edu/people/karpathy/convnetjs/

Useful tool for understanding CNNs.


https://cs.stanford.edu/people/karpathy/convnetjs/index.html


Convolutional Neural Networks (CNNs)Common Tasks

▪ Object Detection [Lin 2017] [Waymo 2020]

▪ Multiple objects are located within an image with bounding boxes

▪ Performance is measured by the precision and recall of detections

Bounding Box

Classification

https://arxiv.org/abs/1708.02002

https://github.com/waymo-research/waymo-open-dataset

Convolutional Neural Networks (CNNs)Example Applications

▪ Automatic Machine Translation (Google Translate) [Good 2015]

▪ Text is located, translated, and rendered in target language

Source Image (Swedish) Isolated Words Language Translation Rendered Translation

https://research.googleblog.com/2015/07/how-google-translate-squeezes-deep.html


▪ Image-to-Image Translation (Pix2Pix) [Isola 2016] (github)

▪ Transforms the appearance of an image from one form to another

B&W to Color Labels to Facade

Day to Night Edges to Photo

These techniques require many images

in both the source and target domain.

Be cautious of hype. Often examples are

“cherry picked.”


https://phillipi.github.io/pix2pix/


▪ Image-to-Image Translation (Pix2Pix) [Isola 2016] (github)

▪ Transforms the appearance of an image from one form to another

Ideal Example: Sketch to Cat Uncanny Valley: Sketch to Human?

Novel Example: Sketch to Beholder

These techniques require many images

in both the source and target domain.

Be cautious of hype. Often examples are

“cherry picked.”




▪ StyleGAN (Nvidia) [Karras 2019] (github)

▪ Generates photorealistic synthetic faces in high resolution

▪ DeepFakes [Toews 2020]

▪ Realistic media generated from inauthentic sources

These are not real people. Anyone can be Obama. [Choi 2017]

DeepFakes have potential to devastate society.


https://github.com/NVlabs/stylegan2

https://www.forbes.com/sites/robtoews/2020/05/25/deepfakes-are-going-to-wreak-havoc-on-society-we-are-not-prepared/?sh=130f13b87494

https://spectrum.ieee.org/ai-creates-fake-obama

Software Engineering ChallengesMaturity of the Field

▪ Research has made impressive leaps with DNN applications

▪ Rate of ML papers published surpasses Moore’s Law [Dean 2019] (arxiv)

▪ Provides interesting solutions to hard problems

▪ From Theory to Practice

▪ Research typically targets toy problems and simple benchmarks

▪ Issues with reproducibility [Ivie 2018]

▪ Only 40% IEEE publications have practices toensure reproducibility [Goodrich 2021]

Be careful about overpromising results.

1097 15182314

4704

8784

11060

2015 2016 2017 2018 2019 2020


https://arxiv.org/about/reports/submission_category_by_year

https://doi.org/10.1145/3186266

https://spectrum.ieee.org/study-shows-ensuring-reproducibility-in-research-is-needed

Software Engineering ChallengesDesign Challenges

▪ Hyperparameters – there are numerous design variables

▪ Dataset variables: data format, dataset distribution, etc.

▪ Network variables: # of layers, size of layers, activation types, etc.

▪ Training variables: # of epochs, loss functions, optimization methods, etc.

▪ Traditional software has the benefit of decades of experience

▪ Design patterns and best practices have been established

We are still establishing design patterns and best practices for Learning-Enabled Systems.

Software Engineering ChallengesValidation Challenges

▪ Validations against an i.i.d. dataset is not enough

▪ Assumes that all relevant run-time scenarios have been captured

▪ Should only be treated as one piece of a more complete test strategy

▪ AI systems must be able to handle deviations from known data

▪ Beyond accuracy, precision, and recall, we need to know the robustnessof DNNs to new contexts

▪ Users need more information to trust AI

▪ For real applications, we need more than a simple prediction

▪ We need to know the margin of error (confidence intervals)

▪ We need to understand how the final output was derived (interpretability)

Software Engineering ChallengesTrusted AI

▪ Trust has become a significant challenge for safety-critical applications

▪ Industry has responded by establishing Trusted AI GuidelinesAerospace Corp.

Trusted AI Framework

• Stability

• Confidence/Uncertainty

• Adversarial Robustness

• Interpretability

• Familiarity

• Fairness

DeloitteTrustworthy AI

• Privacy

• Responsible/Accountable

• Safe/Secure

• Robust/Reliable

• Transparent/Explainable

• Fair/Impartial

IBMTrustworthy AI

• Privacy

• Robustness

• Transparency

• Explainability

• Fairness

MicrosoftResponsible AI

• Privacy & Security

• Accountability

• Reliability & Safety

• Transparency

• Fairness

• Inclusiveness

[Slingerland 2021] [Deloitte 2021] [IBM 2021] [Microsoft 2021]

Looking beyond the hype, we need more rigorous standards.


https://www2.deloitte.com/us/en/pages/about-deloitte/articles/press-releases/deloitte-introduces-trustworthy-ai-framework.html?nc=1

https://www.ibm.com/watson/trustworthy-ai

https://www.microsoft.com/en-us/ai/our-approach

Software Engineering ChallengesMisunderstandings

▪ DNNs can easily latch onto superficial details

▪ Typical DNNs do not derive any semantic model of the problem space

▪ DNNs latch onto surface statistical regularities (patterns that appear often)

▪ Example Scenario

▪ All training images of cats contain a specific cloud pattern in the background

▪ DNN associates “cloud pattern” with “cats”

▪ DNNs simply isolate the features most useful for separating classifications

▪ No sophisticated understanding of what a cat is beyond pattern matching

It is easy for the average user to project more intelligence to DNNs.

Software Engineering ChallengesRobustness

▪ Uncertainty remains with how robust DNNs are to new data

▪ Robustness: how well DNN performs as inputs deviate from training/validation

▪ Not safe to assume a small deviation will result in similar results

▪ Adversarial Examples [Szegedy 2013]

▪ Inputs specifically tailored to exploit and confuse a DNN

▪ Can be generated with humanly imperceptible deviations

“Bus” Noise “Ostrich”

DNNs can latch onto patterns beyond our

senses.

Robust DNNs are less sensitive to these sorts of insignificant disturbances.


Software Engineering ChallengesUnderstanding Adversarial Examples

▪ Adversarial examples are created in a similar manner to training DNNs

▪ Training DNN: weights adjusted to minimize error for a fixed input

▪ Adversarial Examples: weights are fixed and input is adjusted to maximize error with constraint to keep changes to input minimal

▪ Defenses against adversarial examples

▪ Adversarial training: add adversarial examples to training set

• Whack-a-mole approach, not very effective

▪ Training Noise: methods to add noise to training to make DNN less sensitive

▪ Input Denoising: methods to remove noise from inputs at run time

Adversarial attacks remain a real threat to any outward-facing DNN.

Software Engineering ChallengesEnvironmental Uncertainty

▪ Beyond adversarial attacks there remains uncertainty with how DNNs respond to unseen environmental phenomena [Langford 2021]

When a CNN has only been trained/validated with images in clear conditions, how will it react to a raindrop on the camera lens?

It is not possible to include every real world corner case into a dataset!

https://doi.org/10.1109/SEAMS51251.2021.00020

Software Engineering ChallengesInterpretability

▪ DNNs contain many hidden layers that are trained as one

▪ Weights for all layers are adjusted end-to-end to minimize training error

▪ The output of a single layer in isolation makes no sense without the other layers

▪ CNN layers will not isolate semantically relevant features like “eyes” “mouth”

▪ Grad-CAM [Selvaraju 2016]

▪ Highlights region of image that hadmost significant impact on final output

How do we know what a DNN is looking at when it makes its decisions?

Source Image Region resultingin “dog” decision

Region resultingin “cat” decision

Users need these sorts of tools to trust DNNs.


Software Engineering ChallengesMonitor and Control

▪ Run-time environments will present new and unexpected conditions

▪ Systems must continuously monitor capability of learning components

▪ Systems must assess the trustworthiness of learning components

▪ Systems must implement failsafes for untrusted states

How do we assess trustworthiness at run time?

Tesla Accident [NTSB 2016] Uber Accident [NTSB 2019]

How can we avoid these types

of incidents?

https://www.ntsb.gov/investigations/AccidentReports/Reports/HAR1702.pdf



▪ Uber Accident Timeline Review [NTSB 2019]

Time (s) Classification Action

-5.6 to 2.7 [Vehicle/Other] Tracking history is reset with each change in classification

-2.6 to 2.5 [Bicycle] Classification stabilizes. Path updated to traveling left through lane

-1.5 [Other] Classification changes. Path updated to static

-1.2 [Bicycle] Classification changes. Path updated to static but on SUV path.

-1.2 to 0.2 [Bicycle] ADS cannot steer around, begins slowing down, notifies driver

-0.02 Driver takes control of steering, disengages ADS

**IMPACT**

+0.7 Driver brakes.

Tracking history reset each time classification changed.

Learning-Enabled Systems must include online risk and trust assessment.



▪ Behavior Oracles to assess impact of known unknowns [Langford 2021]

▪ Known unknowns – phenomena with unknown impact but can be simulated

▪ Simulation to introduce the phenomena into existing datasets

▪ Evolutionary Computation to generate relevant contexts of phenomenon

▪ Machine Learning to create a generalized predictive model

Perceived Context dust density: 0.60dust intensity: 0.59

Inferred Behavior “compromised”

Behavior OracleAutomated Assessment(Simulation + Evolution)

Perceived Context dust density: 0.22dust intensity: 0.19

Inferred Behavior “little impact”

Run-Time AssessmentImage Input

ExampleHow will DNN

respond to dust clouds?



▪ Behavior oracle in action [Langford 2021]

▪ Enable assessment of run-time context

▪ Outputs the perceived context tohelp locate the source of interference

▪ Outputs inferred behavior to determinecapability of learning component

Behavior oracles can help determine applicability of

learning components .


ConclusionSummary

▪ Advances in Deep Learning have enormous promise.

▪ We live in exciting times, but be careful about hype and overpromising

▪ A huge leap from research on toy problems to real-world applications

▪ Software engineering is difficult for learning-enabled systems!

▪ Emphasis is shifting to bi-directional trust for AI systems.

▪ Tools/techniques to enable users to understand AI decision-making

▪ AI must understand limitations and when human expertise is required

State-of-the-art systems are good at what they know but are bad at knowing what they know.

References▪ [Bishop 2006] C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

▪ [Bishop 2021] Z. Bishop, “Global Data from IBM Points to AI Growth as Businesses Strive for Resilience,” IBM, 2021.

▪ [Calikli 2010] G. Calikli and A. Bener, “Empirical Analyses of the Factors Affecting Confirmation Bias and the Effects of Confirmation Bias on Software Developer/Tester Performance,” Proc of the 6th Int Conf on Predictive Models in Software Engineering (PROMISE ‘10), ACM, 2010.

▪ [Charette 2018] R. N. Charette, “Michigan’s MiDAS Unemployment System: Algorithm Alchemy Created Lead, Not Gold,” IEEE Spectrum, IEEE, 2018.

▪ [Choi 2017] C. Q. Choi, “AI Creates Fake Obama,” IEEE Spectrum, IEEE, 2017.

▪ [Chong 2020] D. Chong, “Deep Dive into Netflix’s Recommender System,” Towards Data Science, 2020.

▪ [Dastin 2018] J. Dastin, “Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women,” Reuters, 2018.

▪ [Dean 2019] J. Dean, "The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design," Proc of the IEEE Int. Solid- State Circuits Conf (ISSCC ‘20), IEEE, 2020.

▪ [Deloitte 2021] “Deloitte Introduces Trustworthy AI Framework to Guide Organizations in Ethical Application of Technology in the Age of With,” Deloitte, 2021.

▪ [Deng 2012] L. Deng, “The MNIST Database of Handwritten Digit Images For Machine Learning Research,” IEEE Signal Processing Magazine, 29(6), 2012.

▪ [Edmonds 2021] E. Edmonds, “AAA: Today’s Vehicle Technology Must Walk So Self-Driving Cars Can Run,” AAA, 2021.

▪ [Geylani 2020] V. S. Geylani, “After Lockdown, the Future of Autonomy,” Insurance Reinsurance, 2020.

https://link.springer.com/book/9780387310732

https://newsroom.ibm.com/IBMs-Global-AI-Adoption-Index-2021

https://doi.org/10.1145/1868328.1868344

https://spectrum.ieee.org/michigans-midas-unemployment-system-algorithm-alchemy-that-created-lead-not-gold

https://spectrum.ieee.org/ai-creates-fake-obama

https://towardsdatascience.com/deep-dive-into-netflixs-recommender-system-341806ae3b48

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G


https://www2.deloitte.com/us/en/pages/about-deloitte/articles/press-releases/deloitte-introduces-trustworthy-ai-framework.html?nc=1

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MNIST-SPM2012.pdf

https://newsroom.aaa.com/2021/02/aaa-todays-vehicle-technology-must-walk-so-self-driving-cars-can-run/

https://axaxl.com/fast-fast-forward/articles/after-lockdown-the-future-of-autonomy

References

▪ [Good 2015] O. Good, “How Google Translate squeezes deep learning onto a phone,” Google AI Blog, Google, 2015.

▪ [Goodfellow 2016] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.

▪ [Goodrich 2021] J. Goodrich, “Study Shows Ensuring Reproducibility in Research Is Needed,” IEEE Spectrum, IEEE, 2021.

▪ [Hardestry 2018] L. Hardestry, “Study Finds Gender and Skin-Type Bias in Commercial Artificial-Intelligence Systems,” MIT News, MIT, 2018.

▪ [Haykin 1998] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1998.

▪ [Karpathy 2014] A. Karpathy, ConvNetJS: Deep Learning In Your Browser, 2014.

▪ [Krizhevsky 2009] A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.

▪ [Langford 2021] M. A. Langford and B. H.C. Cheng, “‘Know What You Know’: Predicting Behavior for Learning-Enabled Systems When Facing Uncertainty,” Proc of the Int Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ‘20), ACM, 2021.

▪ [Lin 2017] T. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., 42 #2, 2020.

▪ [IBM 2021] Trustworthy AI, IBM, 2021.

▪ [Isola 2016] (github) P. Isola, J. Y. Zhu, T. Zhou, A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR ‘17), IEEE, 2017.

▪ [Ivie 2018] P. Ivie and D. Thain, “Reproducibility in Scientific Computing,” ACM Comput. Surv., 56(3) #63, ACM, 2018.

https://research.googleblog.com/2015/07/how-google-translate-squeezes-deep.html


https://spectrum.ieee.org/study-shows-ensuring-reproducibility-in-research-is-needed

https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-systems-0212


https://cs.stanford.edu/people/karpathy/convnetjs/index.html

https://www.cs.toronto.edu/~kriz/cifar.html



https://www.ibm.com/watson/trustworthy-ai



https://doi.org/10.1145/3186266

References▪ [Karajgi 2020] A. Karajgi, “How Neural Networks Solve the XOR Problem,” Towards Data Science, 2020.

▪ [Karras 2019] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Improving the Image Quality of StyleGAN,” 2019.

▪ [Microsoft 2021] Responsible AI, Microsoft 2021.

▪ [NTSB 2016] “Collision Between a Car Operating With Automated Vehicle Control Systems and a Tractor-Semitrailer Truck Near Williston, Florida,” Accident Report, NTSB/HAR-17/02, US NTSB, 2016.

▪ [NTSB 2019] “Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian Tempe, Arizona,” Accident Report, NTSB/HAR-19/03, US NTSB, 2021.

▪ [Selvaraju 2016] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization," Proc of the IEEE Int Conf on Computer Vision (ICCV ‘16), 2017.

▪ [Slingerland 2021] P. Slingerland and L. Perry, “A Framework for Developing Trust in Artificial Intelligence,” Aerospace, 2021.

▪ [Szegedy 2013] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing Properties of Neural Networks,” 2013.

▪ [Toews 2020] R. Toews, “Deepfakes Are Going To Wreak Havoc On Society. We Are Not Prepared,” Forbes, 2020.

▪ [Waymo 2020] P. Sun et al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset”, Proc of the IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR ’20), IEEE, 2020.

▪ [Zillow 2021] “Zillow Launches New Neural Zestimate, Yielding Major Accuracy Gains,” Zillow, 2021.

https://towardsdatascience.com/how-neural-networks-solve-the-xor-problem-59763136bdd7


https://www.microsoft.com/en-us/ai/our-approach






https://www.forbes.com/sites/robtoews/2020/05/25/deepfakes-are-going-to-wreak-havoc-on-society-we-are-not-prepared/?sh=130f13b87494

https://github.com/waymo-research/waymo-open-dataset

https://www.prnewswire.com/news-releases/zillow-launches-new-neural-zestimate-yielding-major-accuracy-gains-301312541.html

Documents

Software Engineering for Learning-Enabled Systems