Relationship Prediction for Scene Graph Generationcs229.stanford.edu/proj2019spr/poster/8.pdf ·...

Relationship Prediction for Scene Graph GenerationUzair Navid Iftikhar (unavid), Aamnah Khalid (aamnah)

CS229 Stanford University

Introduction

Methodology§ Used subset of 12k images out of 113k.§ Scene graph has characteristic long tail distribution of

relationship predicates [Figure 1]§ 75 - 12.5 - 12.5 train-val-test split. 99-1 split does not

capture variance with rare predicates.§ Pre-trained object detector YOLOv3 to predict nodes

in graph. § Maximum Likelihood estimator for baseline.§ Custom Neural network for relationship classification

with architecture [Figure 3]§ DenseNet Relationship Prediction Model: accepts

cropped images containing source, target, and union of the bounding boxes. Outputs a relationship class.[Figure 4]

Methodology (cntd.)

Results

Future Work

REFERENCES

Results (cntd.)Scene graphs offer powerful representations of images,aimed at preserving information relating to semanticand spatial relationships, as well as salient attributes ofobjects. We explore relationship prediction on the GQA

Figure 2: Example of an image and scene graph [1]

dataset.

Figure 3: Custom FC-ReLU Model (left) Figure 4: DenseNet Architecture [3]

§ Hyperparameter tuning on validation set first with a random search, and then an organized search of the hyperparameter space.

§ Cross-entropy loss:

§ Evaluation metric for each image was accuracy:

§ Maximum Likelihood Estimator: Laplace smoothing with 𝜆 = 1. Given source object and target object names, we estimated conditional distribution of relationship label. 37% accuracy to establish baseline. Accuracy on predicates with top-20 accuracy shown below:

Figure 5: Top-20 MLE Accuracy Predicates

Model AccuracyMLE 37%FC-ReLU 4 Layer 4.2%DenseNet 38.5%

Figure 6: Accuracy comparisons for different models on validation set

Figure 7: Validation set accuracy over epochs during training of FC-ReLU 4-Layer network

Figure 8: Accuracy over epochs during training of DenseNet model

(blue: training, red: val)

§ DenseNet performed the best: 41% training acc, and 38.5% validation accuracy. Beat baseline.

§ Poor performance of FC-ReLU network suggests that the shapes of bounding boxes are not enough to give relationship context. DenseNets extract more semantic information while reducing the learning of redundant feature maps. [3]

§ Training with more images / epochs.§ Incorporating Q/A data, which is a part of the GQA

dataset.§ Comparing loss functions: New loss functions recently

proposed by Zhang et al. [4] to reduce penalty for negative relationship direction and better handle multiple instances of same object class in an image.

[1] Hudson, D. A., Manning, C. D. (2019). GQA: a new dataset for compositional question answering over realworld images. arXiv preprint arXiv:1902.09506.[2] Dornadula et al. “Graphr: Scene Graph Generation using Deep Variation-structured Reinforcement Learning.” CS234 Project.

[3] Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.[4] Zhang, Ji, et al. "Graphical Contrastive Losses for Scene Graph Parsing." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

Figure 9:Architecture forDenseNet-basedmodel [2]

Figure 10:Relationship Prediction

Pipeline

Figure 1: Long-tail predicate distribution of

relationships

Relationship Prediction for Scene Graph Generationcs229.stanford.edu/proj2019spr/poster/8.pdf ·...

Documents

CS229 Section: Python Tutorial

CS229 Term Final Project Reportcs229.stanford.edu/proj2019spr/report/94.pdfCS229 Term Final Project Report Hasan Unlu June 12, 2019 Controlling Two Wheels Balancing Robot Us-ing Reinforcement

Levels of Management -Uzair Javed

CS229 MACHINE LEARNING, STANFORD UNIVERSITY, …cs229.stanford.edu/proj2016/report/FegelisHebert...CS229 MACHINE LEARNING, STANFORD UNIVERSITY, DECEMBER 2016 3 t= f(k x t +h y t) (6)

MELJUN CORTES Cs229 electronics for_cs_switching_theory_updated_hours

Listen to Your Data: Turning Chemical Dynamics Simulations ...cs229.stanford.edu/proj2019spr/poster/1.pdft, and the current LSTM state c t. It is evaluated as follows: o t = (W o[h

MGT201 Subjective Material · Arranged by Uzair Husnain …… Subjective Material …….. Join Us on Arranged by Uzair Husnain ……

CS229 PROJECT REPORT - GitHub Pages

CS229 Lecture Notescs229.stanford.edu/notes2020fall/notes2020fall/cs229...CS229 Lecture Notes Andrew Ng (updates by Tengyu Ma) Supervised learning Let’s start by talking about a

CS229 FINAL PROJECT 1 Reduced order modeling approach for cardiovascular stent …cs229.stanford.edu/proj2016/report/ROMberkin.pdf · 2017-09-23 · CS229 FINAL PROJECT 1 Reduced

Building Unbiased Comment Toxicity Classiﬁcation Model ...cs229.stanford.edu/proj2019spr/report/79.pdfwith Natural Language Processing Louise Qianying Huang, Mi Jeremy Yu Department

MachineLearning CS229/STATS229

Ebanking Uzair Ahmed

Presentation for Uzair Sb

CS229 Machine Learning Lecture Notes

soc architechture uzair

Machine Learning for Statistical Arbitrage: Using News ...cs229.stanford.edu/proj2019spr/report/25.pdf · Predicting foreign exchange rates is a well deﬁned problem. Kamruzzaman

Grand digital piano: multimodal transfer of learning of sound and …cs229.stanford.edu/proj2019spr/report/5.pdf · 2019. 6. 18. · Most digital pianos use sample-based synthesis,

Sinusitis by muhammad uzair khan

cs229 termproject Interpreter aided salt boundary ...cs229.stanford.edu/proj2011/Zhang-SVMDeformation.pdf · CS229 Project, Yang Zhang, 05591714, Dec.2011 ! Interpreter aided salt