33
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia ?1 , Mengyun Shi ?1,4 , Mikhail Sirotenko ?3 , Yin Cui ?3 , Claire Cardie 1 , Bharath Hariharan 1 , Hartwig Adam 3 , Serge Belongie 1,2 1 Cornell University 2 Cornell Tech 3 Google Research 4 Hearst Magazines Abstract. In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute cat- egorization (recognize one or multiple attributes). The proposed task requires both localizing an object and describing its properties. To illus- trate the various aspects of this task, we focus on the domain of fash- ion and introduce Fashionpedia as a step toward mapping out the vi- sual aspects of the fashion world. Fashionpedia consists of two parts: (1) an ontology built by fashion experts containing 27 main apparel categories, 19 apparel parts, 294 fine-grained attributes and their re- lationships; (2) a dataset with everyday and celebrity event fashion im- ages annotated with segmentation masks and their associated per-mask fine-grained attributes, built upon the Fashionpedia ontology. In order to solve this challenging task, we propose a novel Attribute-Mask R- CNN model to jointly perform instance segmentation and localized at- tribute recognition, and provide a novel evaluation metric for the task. We also demonstrate instance segmentation models pre-trained on Fash- ionpedia achieve better transfer learning performance on other fash- ion datasets than ImageNet pre-training. Fashionpedia is available at: https://fashionpedia.github.io/home/index.html. Keywords: Dataset, Ontology, Instance Segmentation, Fine-Grained, Attribute, Fashion 1 Introduction Recent progress in the field of computer vision has advanced machines’ abil- ity to recognize and understand our visual world, showing significant impacts in fields including autonomous driving [52], cancer detection [29], product recog- nition [33,14], etc . These real-world applications are fueled by various visual understanding tasks with the goals of naming, describing (attribute recognition), or localizing objects within an image. Naming and localizing objects is formulated as an object detection task (Fig- ure 1(a-c)). As a hallmark for computer recognition, this task is to identify and ? equal contribution. arXiv:2004.12276v1 [cs.CV] 26 Apr 2020

1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia: Ontology, Segmentation, and anAttribute Localization Dataset

Menglin Jia?1, Mengyun Shi?1,4, Mikhail Sirotenko?3, Yin Cui?3,Claire Cardie1, Bharath Hariharan1, Hartwig Adam3, Serge Belongie1,2

1Cornell University 2Cornell Tech 3Google Research 4Hearst Magazines

Abstract. In this work we explore the task of instance segmentationwith attribute localization, which unifies instance segmentation (detectand segment each object instance) and fine-grained visual attribute cat-egorization (recognize one or multiple attributes). The proposed taskrequires both localizing an object and describing its properties. To illus-trate the various aspects of this task, we focus on the domain of fash-ion and introduce Fashionpedia as a step toward mapping out the vi-sual aspects of the fashion world. Fashionpedia consists of two parts:(1) an ontology built by fashion experts containing 27 main apparelcategories, 19 apparel parts, 294 fine-grained attributes and their re-lationships; (2) a dataset with everyday and celebrity event fashion im-ages annotated with segmentation masks and their associated per-maskfine-grained attributes, built upon the Fashionpedia ontology. In orderto solve this challenging task, we propose a novel Attribute-Mask R-CNN model to jointly perform instance segmentation and localized at-tribute recognition, and provide a novel evaluation metric for the task.We also demonstrate instance segmentation models pre-trained on Fash-ionpedia achieve better transfer learning performance on other fash-ion datasets than ImageNet pre-training. Fashionpedia is available at:https://fashionpedia.github.io/home/index.html.

Keywords: Dataset, Ontology, Instance Segmentation, Fine-Grained,Attribute, Fashion

1 Introduction

Recent progress in the field of computer vision has advanced machines’ abil-ity to recognize and understand our visual world, showing significant impacts infields including autonomous driving [52], cancer detection [29], product recog-nition [33,14], etc. These real-world applications are fueled by various visualunderstanding tasks with the goals of naming, describing (attribute recognition),or localizing objects within an image.

Naming and localizing objects is formulated as an object detection task (Fig-ure 1(a-c)). As a hallmark for computer recognition, this task is to identify and

? equal contribution.

arX

iv:2

004.

1227

6v1

[cs

.CV

] 2

6 A

pr 2

020

Page 2: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

2 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

(a) (b) (c) (d)

Relationships: Part ofTextile Finishing Textile Pattern Silhouette Opening TypeLength Nickname Waistline

Jacket

Tops

Ankle length

Symmetrical

Above-the-hip length

Single-breasted

Dropped shoulder

Regular (fit)

Plain

Neckline

Shoe

SleeveSleeve PocketPocket

Above-the-hip length

Fly (Opening)

Slim (fit)

Regular (fit)

Symmetrical

Washed

Distressed

Plain

Plain

Normal Waist

Collar

Pants

Glasses

Bag

Pocket

Shoe

Ensemble

Instance Segmentation

Localized Attribute

Annotations: +

Symmetrical

Above-the-hip length

Single-breasted

Dropped shoulder

Regular (fit)

Plain

Washed

Above-the-hip length

Regular (fit)

Plain

Ankle length

Fly (Opening)

Slim (fit)

Symmetrical

Distressed

Plain

Normal Waist

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS

_ATT

RIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIB

UTE

HAS_ATTR

IBUTE

HAS

_ATT

RIB

UTE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_A

TTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_A

TTR

IBU

TE

HA

S_A

TTR

IBU

TE HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATT

RIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_A

TTRIB

UTE

HAS_ATTRIB

UTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATT

RIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATT

RIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS

_ATT

RIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIB

UTE

HAS

_ATT

RIB

UTE

HAS_ATTRIB

UTE

HAS_ATT

RIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HAS

_ATT

RIB

UTE

HAS_ATT

RIBUTE

HAS

_ATT

RIB

UTE

HAS_ATT

RIBUTE

HA

S_A

TTR

IBU

TE

HAS_A

TTRIB

UTE

HAS_A

TTRIB

UTE

HAS

_ATT

RIB

UTE

HA

S_A

TTR

IBU

TE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HA

S_A

TTR

IBU

TE

HAS

_ATT

RIB

UTE

HA

S_A

TTR

IBU

TE

HAS

_ATT

RIB

UTE

HA

S_A

TTR

IBU

TE

HAS

_ATT

RIB

UTE

HAS

_ATT

RIB

UTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATT

RIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATT

RIBUTE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE H

AS_ATTRIBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATT

RIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_A

TTRIB

UTE

HAS_ATTRIB

UTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS

_ATT

RIB

UTE

HAS_ATT

RIBUTE

HA

S_A

TTR

IBU

TEH

AS

_ATT

RIB

UTE

HA

S_A

TTR

IBU

TEH

AS

_ATT

RIB

UTE

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRI…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATT

RIBUTE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TTR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_AT

TR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_A

TTR

IBU

TE

HA

S_AT

TR

IBU

TE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_A

TTRIB

UTE

HA

S_A

TTR

IBU

TE

HA

S_A

TTR

IBU

TE

HAS

_ATT

RIB

UTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HAS_ATTRIB

UTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_A

TTRIB

UTE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TEH

AS

_ATT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TEH

AS

_AT

TR

IBU

TE

HAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEH

AS

_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

EHAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTR

IBUTEH

AS

_ATT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TTR

I…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TEHA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS

_ATT

RIB

UTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS

_ATT

RIB

UTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

EH

AS

_AT

TR

IBU

TE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTR

IBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIB

UTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIB

UTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIB

UTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_A

TTRIB

UTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

EH

AS

_ATT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

EH

AS

_ATT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HAS

_ATTRIBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_AT

TRIB

UTE

HA

S_AT

TR

IBU

TE

HAS

_ATT

RIB

UTE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_ATTR

IBU

TE

HA

S_AT

TR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HA

S_A

TT

RIB

UT

E

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HA

S_A

TTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTR

IBUTE

HA

S_ATTR

IBU

TE

HA

S_ATTR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HA

S_A

TTR

IBU

TE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HA

S_ATTR

IBU

TE

HAS_ATTR

IBUTE

HA

S_A

TT

RIB

UT

E

HA

S_AT

TR

IBU

TE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

Asymm-etrical

Symm-etrical

Peplum

Circle

Flare

Fit and Flare

Trumpet

Merm-aid

Balloon /Bubble

Bell

Bell Bottom

Bootcut /Boot cut

Peg

Pencil

Straight

A-Line / ALine

Tent /Trapeze

Baggy

Wide Leg

High Low

Curved (fit)

Tight (fit) /Slim (fit) /Skinny (fit)

Regul-ar (fit)

Loose (fit)

Oversized /Oversize

EmpireWaistline

DroppedWaistline /

High waist /

NormalWaist /

Normal-w…

Low waist /Low Rise

Basque(wasitline)

No waistline

Above-the-…

Hip (length)

Micro (length)

Mini(length) /Mid-thigh(length)

Above-the-k…

Knee(Length) /Knee High

(length)

Below TheKnee

(length) /Below-th…

Midi /Mid-calf /mid calf /

Tea-length /tea len…

Maxi(length) /

Ankle(length)

Floor(length) / Full

(length)

Sleeveless

Short (length)

Elbow-length

Threequarter

(length) /

Wristlength

Singlebreasted

Doublebreasted

Lace Up /Lace-up

Wrapping /Wrap

Zip-up

Fly (Opening)

Chained(Opening)

Buckled(Opening)

Toggled(Opening)

no opening

Plastic

Rubber

Metal

StrawFeather

Gem /Gemstone

Bone

Ivory

Fur / Faux-fur

Leather /Faux-leather

Suede

Shearling

Crocodile

Snakeskin

Wood

nonon-textilematerial

Burnout

Distressed /Ripped

Washed

Embossed

FrayedPrinted

Ruched /Ruching

Quilted

Pleat /pleated

Gathering /gathered

Smocking /Shirring

Tiered /layered

Cutout

Slit

Perfor-ated

Lining / lined

Applique /Embroidery /

Patch

Bead

Rivet / Stud /Spike

Sequin

no specialmanufactur…

Plain(pattern)

Abstract

Cartoon

Letters &Numbers

Camou-flage

Check / Plaid/ Tartan

Dot

Fair Isle

Floral

Geometric

Paisley

Stripe

Houndstooth(pattern)

Herringbone(pattern)

Chevron

Argyle

Animal

Leopard

Snakeskin(pattern)

Cheetah

Peacock

Zebra

Giraffe

Toile de Jouy

Plant

Classic(T-shirt)

Polo (Shirt)

Under-shirt

Henley (Shirt)

Ringer(T-shirt)

Raglan(T-shirt)

Rugby (Shirt)

Sailor (Shirt)

Crop (Top) /Midriff (Top)

Halter (Top)

Camisole

Tank (Top)

Peasant(Top)

Tube (Top) /bandeau

(Top)

Tunic (Top)

Smock (Top)

Hoodie

Blazer

Pea (Jacket)

Puffer(Jacket) /

Down(Jacket)

Biker(Jacket) /

Moto(Jacket)

Trucker(Jacket) /

Denim(Jacket)

Bomber(Jacket)

Anorak

Safari(Jacket) /

Utility(Jacket) /

Cargo

Mao (Jacket)

Nehru(Jacket)

Norfolk(Jacket)

Classicmilitary(Jacket)

Track(Jacket)

Windbreaker Chanel(Jacket)

Bolero

Tuxedo(Jacket)

Varsity(Jacket)

Crop (Jacket)

Jeans

Sweatpants/ Jogger(Pants)

Leggings

Hip-huggers(Pants) / Hiphuggers (…

Cargo(Pants) /Military(Pants)

Culottes

Capri (Pants)

Harem(Pants)

Sailor (Pants)

Jodhpur /Breeches(Pants)

Peg (Pants)

Camo(Pants)

Track (Pants)

Crop (Pants)

Short(Shorts) / Hot

(Pants)

Booty(Shorts)

Berm-uda(Shorts)

Cargo(Shorts)

Trunks

Boardshorts/ Board(Shorts)

Skort

Roll-Up(Shorts) /Boyfriend(Shorts)

Tie-up(Shorts)

Culotte(Shorts)

Lounge(Shorts)

Bloom-ers

Tutu (Skirt) /Ballerina

(Skirt)

Kilt

Wrap (Skirt)

Skater (Skirt)

Cargo (Skirt)

Hobble(Skirt) /Wiggle(Skirt)

Sheath (Skirt)

Ball Gown(Skirt)

Gypsy(Skirt) /

Broomstick(Skirt)

Rah-rah(Skirt)

Prairie (Skirt)

Flame-nco

(Skirt)

Accordion(Skirt)

Sarong(Skirt)

Tulip (Skirt)

Dirndl (Skirt)

Godet (Skirt)

Blanket(Coat)

Parka

Trench (Coat)

Pea (Coat) /Reefer (Co…

Shearling(Coat)

TeddyBear (Coat)

/ Teddy(Coat) / Fur

(Coat)

Puffer (Coat)

Duster (Coat)

Raincoat

Kimono

Robe

Dress (Coat

Duffle(Coat) /

Duffel (Co…

Wrap (Coat)

Military(Coat)Swing

(Coat)

Halter(Dress)

Wrap (Dress)

Chemise(Dress)

Slip (Dress)

Cheongsams/ Qipao

Jumper(Dress)

Shift (Dress)

Sheath(Dress)

Shirt (Dress)

Sundress /Sun (Dress)

Kaftan /Caftan

Bodycon(Dress)

Nightgown

Gown

Sweater(Dress)

Tea (Dress)

Blouson(Dress)

Tunic (Dress)

Skater(Dress)

Asymmetric(Collar)

Regular(Collar)

Shirt (Collar)

Polo(Collar) / FlatKnit (Collar)

Chelsea(Collar)

Banded(Collar)

Mandarin(Collar) /Nehru(Collar)

Peter Pan

(Collar)

Bow(Collar) / Tie

(Collar) /Ascot

(Collar)

Stand-away(Collar)

Jabot (Collar)

Sailor (Collar)

Oversized(Collar)

Notched(Lapel)

Peak (Lapel)

Shawl (Lapel)

Napoleon(Lapel)

Oversized(Lapel)

Set-in sleeve

Dropped-sh…

Raglan(Sleeve)

Cap (Sleeve)

Tulip(Sleeve) /

Petal(Sleeve)

Puff (Sleeve)

Bell(Sleeve) /

Circularflounce

(Sleeve) /Ruffle

(Sleeve)

Poet (Sleeve)

Dolman(Sleeve) &

Batwing(Sleeve)

Bishop(Sleeve)

Leg ofmutton

(Sleeve)

Kimono(Sleeve)

Cargo(Pocket) /

Gusset(Pocket) /

Bellow

Patch(Pocket)

Welt (Pocket)

Kang-aroo(Pocket)

Seam(Pocket)

Slash(Pocket) /Slashed

(Pocket) /slant (…

Curved(Pocket)

Flap (Pocket)

Collarless

Asymmetric(Neckline)

Crew (Neck)

Round(Neck) /

Roundneck

V-neck / V(Neck)

Surplice(Neck)

Oval (Neck)

U-neck / U(Neck)

Sweetheart(Neckline)

Queen anne(Neck)

Boat (Neck)

Scoop (Neck)

Square(Neckline)

Plunging(Neckline) /

Plunge(Neckline)

Keyhole(Neck)

Halter (Neck)

Crossover(Neck)

Choker(Neck)

High (Neck) /

Bottle (Neck)

Turtle(Neck) /

Mock (Neck)/ Polo(Neck)

Cowl (Neck)

Straightacross(Neck)

Illusion(Neck)

Off-the-sho…

One shoulder

Jacket

Shirt / Blouse

Tops

Sweater(pullover, without

opening)

Cardigan (withopening)

Vest / Gilet Pants

ShortsSkirt

CoatDress

Jumpsuit

CapeCollar

Sleeve

Neckline

Lapel

Pocket

Scarf

Umbrella

Epaulette

Buckle

Zipper

Applique

Bead

Bow

Ribbon

Rivet

Ruffle

Sequin

Tassel

Shoe

Bag, wallet

Flower

Fringe

Headband,

Hair Accessory Tie

Glove

Watch

Belt

Leg warmer

Tights, stockings

Sock

Glasses

Hat

ModaNetDeepFashion2

Apparel CategoriesFine-grained Attributes

(f)

JacketTops

Sweater

Cardigan (withopening)

Vest / Gilet Pants

ShortsSkirt

CoatDress

Jumpsuit

CapeCollar

Sleeve

Neckline

Lapel

Pocket

Scarf

Umbrella

Epaulette

Buckle

Zipper

Applique

Bead

Bow

Ribbon

Rivet

Ruffle

Sequin

Tassel

Shoe

Bag, wallet

Flower

Fringe

Tie

Glove

Watch

Belt

Leg warmer

Sock

Glasses

Hat

ModaNetDeepFashion2

Apparel CategoriesFine-grained Attributes

(e)

Fig. 1. An illustration of the Fashionpedia dataset and ontology: (a) main gar-ment masks; (b) garment part masks; (c) both main garment and garment part masks;(d) fine-grained apparel attributes; (e) an exploded view of the annotation diagram: theimage is annotated with both instance segmentation masks (white boxes) and per-maskfine-grained attributes (black boxes); (f) visualization of the Fashionpedia ontology: wecreated Fashionpedia ontology and separate the concept of categories (yellow nodes)and attributes [39] (blue nodes) in fashion. It covers pre-defined garment categoriesused by both Deepfashion2 [10] and ModaNet [54]. Mapping with DeepFashion2 alsoshows the versatility of using attributes and categories. We are able to present all 13garment classes in DeepFashion2 with 11 main garment categories, 1 garment part,and 7 attributes. Best viewed digitally

indicate the boundaries of objects in the form of bounding boxes or segmentationmasks [11,38,17]. Attribute recognition [5,30,8,37] (Figure 1(d)) instead focuseson describing and comparing objects, since an object also has many other prop-erties or attributes in addition to its category. Attributes not only provide acompact and scalable way to represent objects in the world, as pointed out byFerrari and Zisserman [8], attribute learning also enables transfer of existingknowledge to novel classes. This is particularly useful for fine-grained visualrecognition, with the goal of distinguishing subordinate visual categories such asbirds [47] or natural species [44].

In the spirit of mapping the visual world, we propose a new task, instancesegmentation with attribute localization, which unifies object detection and fine-grained attribute recognition. As illustrated in Figure 1(e), this task offers astructured representation of an image. Automatic recognition of a rich set of at-tributes for each segmented object instance complements category-level objectdetection and therefore advance the degree of complexity of images and sceneswe can make understandable to machines. In this work, we focus on the fash-ion domain as an example to illustrate this task. Fashion comes with rich and

Page 3: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 3

complex apparel with attributes, influences many aspects of modern societies,and has a strong financial and cultural impact. We anticipate that the proposedtask is also suitable for other man-made product domains such as automobileand home interior.

Structured representations of images often rely on structured vocabular-ies [28]. With this in mind, we construct the Fashionpedia ontology (Figure 1(f))and image dataset (Figure 1(a-e)), annotating fashion images with detailed seg-mentation masks for apparel categories, parts, and their attributes. Our pro-posed ontology provides a rich schema for interpretation and organization ofindividuals’ garments, styles, or fashion collections [24]. For example, we cancreate a knowledge graph (see supplementary material for more details) by ag-gregating structured information within each image and exploiting relationshipsbetween garments and garment parts, categories, and attributes in the Fash-ionpedia ontology. Our insight is that a large-scale fashion segmentation andattribute localization dataset built with a fashion ontology can help computervision models achieve better performance on fine-grained image understandingand reasoning tasks.

The contributions of this work are as follows:A novel task of fine-grained instance segmentation with attribute localization.

The proposed task unifies instance segmentation and visual attribute recognition,which is an important step toward structural understanding of visual content inreal-world applications.

A unified fashion ontology informed by product descriptions from the internetand built by fashion experts. Our ontology captures the complex structure offashion objects and ambiguity in descriptions obtained from the web, containing46 apparel objects (27 main apparels and 19 apparel parts), and 294 fine-grainedattributes (spanning 9 super categories) in total. To facilitate the development ofrelated efforts, we also provide a mapping with categories from existing fashionsegmentation datasets, see Figure 1(f).

A dataset with a total of 48,825 clothing images in daily-life, street-style,celebrity events, runway, and online shopping annotated both by crowd workersfor segmentation masks and fashion experts for localized attributes, with thegoal of developing and benchmarking computer vision models for comprehensiveunderstanding of fashion.

A new model, Attribute-Mask R-CNN, is proposed with the aim of jointlyperforming instance segmentation and localized attribute recognition; a novelevaluation metric for this task is also provided. We further demonstrate instancesegmentation models pre-trained on Fashionpedia achieve better transfer learn-ing performance on DeepFashion2 and ModaNet than ImageNet pre-training.

2 Related Work

The combined task of fine-grained instance segmentation and attribute lo-calization has not received a great deal of attention in the literature. On onehand, COCO [32] and LVIS [15] represent the benchmarks of object detection

Page 4: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

4 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

Table 1. Comparison of fashion-related datasets: (Cls. = Classification, Segm. = Seg-mentation, MG = Main Garment, GP = Garment Part, A = Accessory, S = Style,FGC = Fine-Grained Categorization). To the best of our knowledge, we include allfashion-related datasets focusing on visual recognition.

Name Category Annotation Type Attribute Annotation Type

Cls. BBox Segm. Unlocalized Localized FGC

Clothing Parsing [50] MG, A - - - - -Chic or Social [49] MG, A - - - - -Hipster [26] MG, A, S - - - - -Ups and Downs [19] MG - - - - -Fashion550k [23] MG, A - - - - -Fashion-MNIST [48] MG - - - - -

Runway2Realway [45] - - MG, A - - -ModaNet [54] - MG, A MG, A - - -Deepfashion2 [10] - MG MG - - -

Fashion144k [41] MG, A - - X - -Fashion Style-128 Floats [42] S - - X - -UT Zappos50K [51] A - - X - -Fashion200K [16] MG - - X - -FashionStyle14 [43] S - - X - -

Main Product Detection [40] - MG - X - -

StreetStyle-27K [36] - - - X - X

UT-latent look [21] MG, S - - X - XFashionAI [6] MG, GP, A - - X - XiMat-Fashion Attribute [14] MG, GP, A, S - - X - X

Apparel classification-Style [4] - MG - X - XDARN [22] - MG - X - XWTBI [25] - MG, A - X - XDeepfashion [33] S MG - X - X

Fashionpedia - MG, GP, A MG, GP, A - X X

for common objects. Panoptic segmentation is proposed to unify both seman-tic and instance segmentation, addressing both stuff and thing classes [27]. Inspite of the domain differences, Fashionpedia has comparable mask qualitieswith LVIS and the similar total number of segmentation masks as COCO. Onthe other hand, we have also observed an increasing effort to curate datasets forfine-grained visual recognition, evolved from CUB-200 Birds [47] to the recentiNaturalist dataset [44]. The goal of this line of work is to advance the state-of-the-art in automatic image classification for large numbers of real world, fine-grained categories. A rather unexplored area of these datasets, however, is toprovide a structured representation of an image. Visual Genome [28] providesdense annotations of object bounding boxes, attributes, and relationships in thegeneral domain, enabling a structured representation of the image. In our work,we instead focus on fine-grained attributes and provide segmentation masks inthe fashion domain to advance the clothing recognition task.

Clothing recognition has received increasing attention in the computer vi-sion community recently. A number of works provide valuable apparel-related

Page 5: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 5

datasets [50,4,49,26,45,41,22,25,19,42,33,23,48,51,16,43,40,36,21,54,6,10]. Thesepioneering works enabled several recent advances in clothing-related recognitionand knowledge discovery [9,35]. Table 1 summarizes the comparison among dif-ferent fashion datasets regarding annotation types of clothing categories andattributes. Our dataset distinguishes itself in the following three aspects.

Exhaustive annotation of segmentation masks: Existing fashion datasets[45,54,10] offer segmentation masks for the main garment (e.g., jacket, coat,dress) and the accessory categories (e.g., bag, shoe). Smaller garment objectssuch as collars and pockets are not annotated. However, these small objectscould be valuable for real-world applications (e.g., searching for a specific collarshape during online-shopping). Our dataset is not only annotated with the seg-mentation masks for a total of 27 main garments and accessory categories butalso 19 garment parts (e.g., collar, sleeve, pocket, zipper, embroidery).

Localized attributes: The fine-grained attributes from existing datasets[22,33,40,14] tend to be noisy, mainly because most of the annotations are col-lected by crawling fashion product attribute-level descriptions directly from largeonline shopping websites. Unlike these datasets, fine-grained attributes in ourdataset are annotated manually by fashion experts. To the best of our knowl-edge, ours is the only dataset to annotate localized attributes: fashion experts areasked to annotate attributes associated with the segmentation masks labeled bycrowd workers. Localized attributes could potentially help computational modelsdetect and understand attributes more accurately.

Fine-grained categorization: Previous studies on fine-grained attributecategorization suffer from several issues including: (1) repeated attributes be-longing to the same category (e.g., zip,zipped and zipper) [33,21]; (2) basiclevel categorization only (object recognition) and lack of fine-grained catego-rization [50,4,49,26,25,45,42,43,23,16,48,54,10]; (3) lack of fashion taxonomieswith the needs of real-world applications for the fashion industry, possibly dueto the research gap in fashion design and computer vision; (4) diverse taxonomystructures from different sources in fashion domain. To facilitate research in theareas of fashion and computer vision, our proposed ontology is built and veri-fied by fashion experts based on their own design experiences and informed bythe following four sources: (1) world-leading e-commerce fashion websites (e.g.,ZARA, H&M, Gap, Uniqlo, Forever21); (2) luxury fashion brands (e.g., Prada,Chanel, Gucci); (3) trend forecasting companies (e.g., WGSN); (4) academicresources [7,3].

3 Dataset Specification and Collection

3.1 Ontology specification

We propose a unified fashion ontology (Figure 1(f)), a structured vocabu-lary that utilizes the basic level categories and fine-grained attributes [39]. TheFashionpedia ontology relies on similar definitions of object and attributes as pre-vious well-known image datasets. For example, a Fashionpedia object is similar

Page 6: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

6 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

-gown-sequin-floor (length)-flare-high waist-lining-zip-up-fair isle-ruched-bead-plain (pattern)-asymmetrical-no non-textile material

-single breasted-floral-symmetrical

-shirt (collar)

-wrist-length-plain (pattern) -above-the-hip -no non-textile material -tight (fit) -symmetrical

-plain (pattern) -above-the-hip (length) -no non-textile material -trucker (jacket) -washed -regular (fit) -single breasted-symmetrical -no waistline

-regular (collar)

-plain (pattern) -no non-textile material -regular (fit) -fly (opening) -peg (pants) -maxi (length) -symmetrical -peg

-welt (pocket)

-welt (pocket)

-wrist-length-set-in sleeve

-plain (pattern) -no non-textile material -double breasted -regular (fit) -trench (coat) -mini (length) -symmetrical -no waistline

-above-the-hip (length)-plain (pattern)-normal waist-no non-textile material-tight (fit)-classic (t-shirt)-symmetrical

-above-the-hip (length)-normal waist-no non-textile material-regular (fit)-single breasted-symmetrical-dot

-plain (pattern)-no non-textile material-regular (fit)-mini (length)-blanket (coat)-single breasted-lining-symmetrical-no waistline

-v-neck

-wrist-length-set-in sleeve

-wrist-length

-plain (pattern)-jeans-no non-textile material-regular (fit)-washed-low waist-maxi (length)-symmetrical-fly (opening)-straight

-patch (pocket)

-curved (pocket)

total masks: 53 total masks: 55

total categories: 16 total categories: 16 total categories: 15 total categories : 15

(a) (b)

(c) (d) (e) (f) (g) (h) (i)

Fig. 2. Image examples with annotated segmentation masks (a-f) and fine-grainedattributes (g-i).

to “item” in Wikidata [46], or “object” in COCO [32] and Visual Genome [28]).In the context of Fashionpedia, objects represent common items in apparel (e.g.,jacket, shirt, dress). In this section, we break down each component of the Fash-ionpedia ontology and illustrate the construction process. With this ontologyand our image dataset, a large-scale fashion knowledge graph can be built as anextended application of our dataset (more details can be found in the supple-mentary material).

Apparel categories. In the Fashionpedia dataset, all images are annotatedwith one or multiple main garments. Each main garment is also annotated withits garment parts. For example, general garment types such as jacket, dress,pants are considered as main garments. These garments also consist of severalgarment parts such as collars, sleeves, pockets, buttons, and embroideries. Maingarments are divided into three main categories: outerwear, intimate and acces-sories. Garment parts also have different types: garment main parts (e.g., collars,sleeves), bra parts, closures (e.g., button, zipper) and decorations (e.g., embroi-dery, ruffle). On average, each image consists of 1 person, 3 main garments,3 accessories, and 12 garment parts, each delineated by a tight segmentationmask (Figure 1(a-c)). Furthermore, each object is assigned to a synset ID in ourFashionpedia ontology.

Fine-grained attributes. Main garments and garment parts can be asso-ciated with apparel attributes (Figure 1(e)). For example, “button” is part ofthe main garment “jacket”; “jacket” can be linked with the silhouette attribute“symmetrical”; the garment part “button” could contain the attribute “metal”with a relationship of material. The Fashionpedia ontology provide attributes for13 main outerwear garments categories, and 5 out of 19 garments parts (“sleeve”,“neckline”, “pocket”, “lapel”, and “collar”). Each image has 16.7 attributes onaverage (max 57 attributes). As with the main garments and garment parts, wecanonicalize all attributes to our Fashionpedia ontology.

Relationships. Relationships can be formed between categories and at-tributes. There are three main types of relationships (Figure 1(e)): (1) outfitsto main garments, main garments to garment parts: meronymy (part-of) rela-

Page 7: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 7

tionship; (2) main garments to attributes or garment parts to attributes: theserelationship types can be garment silhouette (e.g., peplum), collar nickname(e.g., peter pan collars), garment length (e.g., knee-length), textile finishing (e.g.,distressed), or textile-fabric patterns (e.g., paisley), etc.; (3) within garments,garment parts or attributes: there is a maximum of four levels of hyponymy(is-an-instance-of) relationships. For example, weft knit is an instance of knitfabric, and the fleece is an instance of weft knit.

3.2 Image Collection and Annotation Pipeline

Image collection. A total of 50,527 images were harvested from Flickrand free license photo websites, which included Unsplash, Burst by Shopify,Freestocks, Kaboompics, and Pexels. Two fashion experts verified the quality ofthe collected images manually. Specifically, the experts checked a scenes’ diversityand made sure clothing items were visible in the images. Fdupes [34] is used toremove duplicated images. After filtering, 48,825 images were left and used tobuild our Fashionpedia dataset.

Annotation Pipeline. Expert annotation is often a time-consuming pro-cess. In order to accelerate the annotation process, we decoupled the work be-tween crowd workers and experts. We divided the annotation process into thefollowing two phases.

First, segmentation masks with apparel objects are annotated by 28 crowdworkers, who were trained for 10 days before the annotation process (with pre-pared annotation tutorials of each apparel object, see supplementary materialfor details). We collected high-quality annotations by having the annotators fol-low the contours of garments in the image as closely as possible (See section 4.2for annotation analysis). This polygon annotation process was monitored dailyand verified weekly by a supervisor and by the authors.

Second, 15 fashion experts (graduate students in the apparel domain) wererecruited to annotate the fine-grained attributes for the annotated segmenta-tion masks. Annotators were given one mask and one attribute super-category(“textile pattern”, “garment silhouette” for example) at a time. An additionaltwo options, “not sure” and “not on the list” were added during the annotation.The option “not on the list” indicates that the expert found an attribute thatis not on the proposed ontology. If “not sure” is selected, it means the expertcan not identify the attribute of one mask. Common reasons for this selctioninclude occlusion of the masks and viewing angles of the image; (for example,a top underneath a closed jacket). More details can be found in Figure 2. Eachattribute supercategory is assigned to one or two fashion experts, dependingon the number of masks. The annotations are also checked by another expertannotator before delivery.

We split the data into training, validation and test sets, with 45,623, 1158,2044 images respectively. The dataset will be publicly available by the timeof paper publication. More details of the dataset creation can be found in thesupplementary material.

Page 8: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

8 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

4 Dataset Analysis

This section will discuss a detailed analysis of our dataset using the train-ing images. We begin by discussing general image statistics, followed by ananalysis of segmentation masks, categories, and attributes. We compare Fash-ionpedia with four other segmentation datasets, including two recent fashiondatasets DeepFashion2 [10] and ModaNet [54], and two general domain datasetsCOCO [32] and LVIS [15].

4.1 Image Analysis

We choose to use images with high resolutions during the curating processsince Fashionpedia ontology includes diverse fine-grained attributes for both gar-ments and garment parts. The Fashionpedia training images have an average di-mension of 1710 (width) × 2151 (height). Images with high resolutions are ableto show apparel objects in detail, leading to more accurate and faster annotationsfor both segmentation masks and attributes. These high resolution images canalso benefit downstream tasks such as detection and image generation. Examplesof detailed annotations can be found in Figure 2.

4.2 Mask Analysis

We define “masks” as one apparel instance that may have more than oneseparate components (the jacket in Figure 1 for example), “polygon” as a disjointarea.

Mask quantity. On average, there are 7.3 (median 7, max 74) number ofmasks per image in the Fashionpedia training set. Figure 3(a) shows that theFashionpedia has the largest median value in the 5 datasets used for comparison.Fashionpedia also has the widest range among three fashion datasets, and a com-parable range with COCO, which is a dataset in a general domain. Compared toModaNet and Deepfashion2 datasets, Fashionpedia has the widest range of maskcount distribution. However, COCO and LVIS maintain a wider distribution overFashionpedia owing to their more common objects in their dataset. Figure 3(d)illustrates the distribution within the Fashionpedia dataset. One image usuallycontains more garment parts and accessories than outerwears.

Mask sizes. Figure 3(b) and 3(e) compares relative mask sizes within Fash-ionpedia and against other datasets. Ours has a similar distribution as COCOand LVIS, except for a lack of larger masks (area > 0.95). DeepFashion2 has aheavier tail, meaning it contains a larger portion of garments with a zoomed-inview. Unlike DeepFashion2, our images mainly focused on the whole ensemble ofclothing. Since ModaNet focuses on outwears and accessories, it has more maskswith relative area between 0.2 and 0.4. whereas ours has an additional 19 apparelparts categories. As illustrated in Figure 3(e), garment parts and accessories arerelatively small compared to the outerwear (e.g., “dress”, “coat”).

Mask quality. Apparel categories also tend to have complex silhouettes.Table 2 shows that the Fashionpedia masks have the most complex boundaries

Page 9: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 9

101 102

Number of mask per image

10 3

10 2

10 1

100

101

102

Perc

ent o

f im

ages

DeepFashion2 [2|8]ModaNet [5|16]COCO2017 [4|93]LVIS [6|556]Fashionpedia [7|74]

(a) Mask count per image across

datasets.

0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size

10 4

10 3

10 2

10 1

100

101

Perc

ent o

f ins

tanc

es

DeepFashion2ModaNetCOCO2017LVISFashionpedia

(b) Relative mask size across

datasets.

2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Number of categories per image

10 3

10 2

10 1

100

101

102

Perc

ent o

f im

ages

DeepFashion2ModaNetCOCO2017LVISFashionpedia

(c) Category diversity per image

across datasets.

100 101

Number of mask per image

10 3

10 2

10 1

100

101

Perc

ent o

f ins

tanc

es

Outerwears [1|6]Garment Parts [3|69]Accessories [2|18]

(d) Mask count per image in

Fashionpedia.

0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size

10 3

10 2

10 1

100

101

Perc

ent o

f ins

tanc

es

OuterwearsGarment PartsAccessories

(e) Relative mask size in Fash-

ionpedia.

0 50 100 150 200 250 300Categories and attributes

100

101

102

103

104

105

Coun

ts

CategoriesAttributes

(f) Categories and attributes

distribution in Fashionpedia.

Fig. 3. Dataset statistics: First row presents comparison among datasets. Secondrow presents comparison within Fashionpedia. Y-axes are in log scale. Relative seg-mentation mask size were calculated same as [15]. Relative mask size was rounded toprecision of 2. For mask count per image comparisons (Figure 3(a) and 3(d)), legendsfollow [median | max ] format, X-axes are in log scale. Values in X-axis for Figure 3(a)was discretized for better visual effect. Best viewed digitally

amongst the five datasets (according to the measurement used in [15]). This sug-gests that our masks are better able to represent complex silhouettes of apparelcategories more accurately than ModaNet and DeepFashion2. We also report thenumber of vertices per polygons, which is a measurement representing how gran-ularity of the masks produced. Table 2 shows that we have the second-highestaverage number of vertices among five datasets, next to LVIS.

4.3 Category and Attributes Analysis

There are 46 apparel categories and 294 attributes presented in the Fash-ionpedia dataset. On average, each image was annotated with 7.3 instances, 5.4categories, and 16.7 attributes. Of all the masks with categories and attributes,each mask has 3.7 attributes on average (max 14 attributes). Fashionpedia hasthe most diverse number of categories within one image among three fashiondatasets, while comparable to COCO (Figure 3(c)), since we provide a com-prehensive ontology for the annotation. In addition, Figure 3(f) shows the dis-tributions of categories and attributes in the training set, and highlights thelong-tailed nature of our data.

During the fine-grained attributes annotation process, we also ask the ex-perts to choose “not sure” if they are uncertain to make a decision, “not on thelist” if they find another attribute that not provided. the majority of “not sure”comes from three attributes superclasses, namely “Opening Type”, “Waistline”,“Length”. Since there are masks only show a limited portion of apparel (a top

Page 10: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

10 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

Table 2. Comparison of segmentation mask complexity amongst segmentation datasetsin both fashion and general domain (COCO 2017 instance training data was used).Each statistic (mean and median) represents a bootstrapped 95% confidence intervalfollowing [15]. Boundary complexity was calculated according to [15,2]. Reported maskboundary complexity for COCO and LVIS was different compared with [15] due todifferent image resolution and image sets. The number of vertices per polygon is calcu-lated as the number of vertices in one polygon. Polygon is defined as one disjoint area.Masks (and polygons) with zero area were ignored.

Dataset Boundary complexity No. of vertices per polygon Images countmean median mean median

COCO [32] 6.65 - 6.66 6.07 - 6.08 21.14 - 21.21 15.96 - 16.04 118,287LVIS [15] 6.78 - 6.80 5.89 - 5.91 35.77 - 35.95 22.91 - 23.09 57,263ModaNet [54] 5.87 - 5.89 5.26 - 5.27 22.50 - 22.60 18.95 - 19.05 52,377Deepfashion2 [10] 4.63 - 4.64 4.45 - 4.46 14.68 - 14.75 8.96 - 9.04 191,960

Fashionpedia 8.36 - 8.39 7.35 - 7.37 31.82 - 32.01 20.90 - 21.10 45, 623

inside a jacket for example), the annotators are not sure how to identify thoseattributes due to occlusion. Less than 15% of masks for each attribute super-classes account for “not on the list”, which illustrates the comprehensiveness ofour proposed ontology (see supplementary material for more details of the extradataset analysis).

5 Evaluation Protocol and Baselines

5.1 Evaluation metric

In object detection, a true positive (TP) for each category c is defined as asingle detected object that matches a ground truth object with a Intersectionover Union (IoU) over a threshold τIoU. COCO’s main evaluation metric usesaverage precison averaged across all 10 IoU thresholds τIoU ∈ [0.5 : 0.05 : 0.95]and all 80 categories. We denote such metric as APIoU.

In the case of instance segmentation and attribute localization , we extendstandard COCO metric by adding one more constraint: the macro F1 score forpredicted attributes of single detected object with category c (see supplimentarymaterial for the average choice of f1-score). We denote the F1 threshold as τF1

,and it has the same range as τIoU (τF1 ∈ [0.5 : 0.05 : 0.95]). The main metricAPIoU+F1

reports averaged precision score across all 10 IoU thresholds, all 10macro F1 scores, and all the categories. Our evaluation API, code and trainedmodels will be released.

5.2 Attribute-Mask R-CNN

We develop baseline models to perform two tasks on Fashionpedia: (1) ap-parel instance segmentation (ignoring attributes) using Mask R-CNN; (2) in-stance segmentation with attribute localization. For the second task, we present

Page 11: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 11

Jacket

Symmetrical

Above-the-hip length

Single-breasted

Dropped shoulder

Regular (fit)Plain

Washed

Instance Segmentation

Localized Attribute

Attributes

Class

Box

1024

Mask

28x28x80

28x28x256

14x14x256

14x14x256 x4 RoI

RoI 7x7x256

+

1024

ResNetFPN

Fig. 4. Attribute-Mask R-CNN adds a multi-label attribute prediction head upon MaskR-CNN for instance segmentation with attribute localization.

a strong baseline model named Attribute-Mask R-CNN that is built upon MaskR-CNN [17] for Fashionpedia. As illustrated in Figure 4, we extend existing MaskR-CNN heads to include an additional multi-label attribute prediction head. Sig-moid cross-entropy loss is used for attribute head. Attribute-Mask R-CNN canbe trained end-to-end for jointly performing instance segmentation and localizedattribute recognition.

We leverage ResNet-50/101 (R-50/101) [18] with feature pyramid network(FPN) [31] as backbone. The input image is resized to 1024 of the longer edgeto feed the network. Random horizontal flipping and scale augmentation with arandom ratio between [0.8, 1.2] is applied during the training. We use an open-srouced Tensorflow [1] codebase1 for implementation and all models are trainedon a single Cloud TPU v3 with a batch size of 64. We follow the 1× trainingschedule used in Detectron [12], except the learning rate is set to 4× larger usinglinear learning rate scaling suggested by Goyal et al . [13].

Table 3. Baseline results of Mask R-CNN and Attribute-Mask R-CNN on Fashionpe-dia. The big performance gap between APIoU and APIoU+F1 suggests the challengingnature of our proposed task.

Model Backbone APboxIoU APmask

IoU APboxIoU+F1

APmaskIoU+F1

Mask R-CNNR-50-FPN 38.2 34.9 - -R-101-FPN 40.2 35.9 - -

Attribute-Mask R-CNNR-50-FPN 38.5 35.1 26.5 25.3R-101-FPN 40.7 36.4 27.8 26.3

1 https://github.com/tensorflow/tpu/tree/master/models/official/detection

Page 12: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

12 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

Table 4. Per super-category results (for masks) using Attribute-Mask R-CNN with R-101-FPN backbone. We follow the same COCO sub-metrics for overall and three super-categories for apparel objects. Result format follows [APIoU / APIoU+F1 ] or [ARIoU /ARIoU+F1 ] (see supplementary material for per-class results).

Category AP AP50 AP75 APl APm APs

overall 36.4 / 26.3 53.0 / 35.1 40.8 / 29.9 46.9 / 28.7 33.6 / 22.4 10.7 / 6.40

outerwear 54.7 / 30.5 68.1 / 38.1 62.8 / 35.0 57.2 / 32.0 36.8 / 22.5 9.10 / 4.50parts 15.1 / 12.8 28.5 / 21.0 14.3 / 13.6 29.1 / 16.1 17.6 / 15.7 7.40 / 5.80accessory 48.3 / - 72.3 / - 56.4 / - 57.3 / - 54.4 / - 17.1 / -

5.3 Results Discussion

Attribute-Mask R-CNN. From results in Table 3, we have the follow-ing observations: (1) Our baseline models achieve promising performance onchallenging Fashionpedia dataset. (2) Compared with Mask R-CNN, Attribute-Mask R-CNN achieves slightly better performance on both box and mask APIoU.This suggests learning attributes with an additional attribute prediction headis helpful for detection. (3) There is a significant drop (e.g ., from 40.7 to 27.8for R-101-FPN) in box AP if we add τF1 as another constraint for true posi-tive. This is further verified by per super-category mask results in Table 4. Thissuggests that joint instance segmentation and attribute localization is a signifi-cantly more difficult task than instance segmentation, leaving much more roomfor future improvements.

Main apparel detection analysis. We also provide in-depth detector anal-ysis following COCO detection challenge evaluation [32] inspired by Hoiem etal . [20]. Figure 5 illustrates a detailed breakdown of bounding boxes false posi-tives produced by the detectors.

Figure 5(a) and 5(b) compare two detectors trained on Fashionpedia andCOCO. Errors of the COCO detector are dominated by imperfect localization(AP is increased by 28.3 from overall AP at τIoU = 0.75) and background confu-sion (+15.7) (5(b)). Unlike the COCO detector, no mistake in particular domi-nates the errors produced by the Fashionpedia detector. Figure 5(a) shows thatthere are errors from localization (+8.0), classification (+6.6), background con-fusions (+7.8). Due to the space constraint, we leave super-category analysis inthe supplementary material.

Generalization to other fashion datasets. Other fashion datasets such asDeepFashion2 [10] and ModaNet [54] also contain instance segmentation masks.To demonstrate models trained on Fashionpedia generalize well to other fashiondatasets, we conduct instance segmentation transfer learning on DeepFashion2and ModaNet by fine-tuning Mask R-CNN (R-101-FPN) pre-trained on Fash-ionpedia. From results in Table 5, we can see that Fashionpedia pre-trainingoutperforms ImageNet pre-training. We believe this is because the mask anno-tations in Fashionpedia have higher quality therefore Fashionpedia pre-trainingprovides additional benefits to DeepFashion2 and ModaNet. In addition, the per-

Page 13: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 13

overall-all-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1pr

ecis

ion

[.408] C75[.530] C50[.610] Loc[.655] Sim[.676] Oth[.754] BG[1.00] FN

(a) Fashionpedia.

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

overall-all-all

[.399] C75[.589] C50[.682] Loc[.695] Sim[.713] Oth[.870] BG[1.00] FN

(b) COCO (2015 challenge winner).

Fig. 5. Main apparel detectors analysis. Each plot shows 7 precision recall curves whereeach evaluation setting is more permissive than the previous. Specifically, C75: strictIoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: localization errors ignored(τIoU = 0.1); Sim: supercategory False Positives (FPs) removed; Oth: category FPsremoved; BG: background (and class confusion) FPs removed; FN: False Negatives areremoved. Two plots are a comparison between two detectors trained on Fashionpediaand COCO respectively. The results are averaged over all categories. Legends presentthe area under each curve (corresponds to AP metric) in brackets as well. Best vieweddigitally.

formance of same Mask R-CNN (R-101-FPN) is much worse on Fashionpedia(Table 3) compared with DeepFashion2 and ModaNet, suggesting Fashionpediais a much more challenging benchmark.

Table 5. Transfer learning of instance segmentation using Mask R-CNN (R-101-FPN).Models pre-trained on Fashionpedia achieve better performance compared with Ima-geNet pre-training.

Dataset Pre-training APbox APmask

DeepFashion2ImageNet 68.3 63.9Fashionpedia 69.7 (+1.4) 65.1 (+1.2)

ModaNetImageNet 59.4 55.1Fashionpedia 62.6 (+3.2) 58.2 (+3.1)

Prediction visualization. Baseline outputs (with both segmentation masksand localized attributes) are also visualized in Figure 6. Our Attribute-Mask R-CNN achieves good results even for small objects like shoes and glasses. Modelcan correctly predict fine-grained attributes for some masks on one hand (e.g.,Figure 6 (b) ). On the other hand, it also incorrectly predict the wrongnickname (welt) to pocket (e.g., Figure 6 (c) ). These results show that

Page 14: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

14 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

Pants : length: maxi (length)textile pattern: plain (pattern)

Sleeve 1 : length: wrist-lengthnickname: set-in sleeve

Jacket : silhouette: symmetricaltextile pattern: plain (pattern)opening type: single breasted

Pants : textile pattern: plain (pattern)length: maxi (length)

Sleeve 1 : nickname: set-in sleevelength: wrist-lengthlength: short (length)

Collar : nickname: shirt (collar)nickname: regular (collar)

Top : silhouette: symmetricalwaistline: no waistlinelength: above-the-hip (length)

Skirt : silhouette: symmetricaltextile pattern: plain (pattern)length: mini (length)

Sleeve 1 : nickname: set-in sleevelength: wrist-length

Sleeve 2 : nickname: set-in sleevelength: wrist-length

Jacket : silhouette: symmetricaltextile pattern: plain (pattern)opening type: single breasted

Top : silhouette: symmetricaltextile pattern: plain (pattern) length: above-the-hip (length)

Sleeve 2 : nickname: set-in sleevelength: wrist-lengthlength: short (length)

Pants

Sleeve 1

Sleeve 2

Jacket

Sleeve 2

Sleeve 1

Top

Pants

Jacket

Top

Skirt

Pocket 1 : nickname: patch (pocket) nickname: flap (pocket) nickname: welt (pocket)

Neckline : neckline type: round (neck) Pocket 1 : nickname: patch (pocket)nickname: welt (pocket)nickname: flap (pocket)silhouette: symmetrical

Sleeve 1 : nickname: set-in sleevelength: short (length)

Sleeve 2 : nickname: set-in sleevelength: short (length)

Pocket 1

Neckline

Pocket

Collar

Lapel Pants

Sleeve 2

Sleeve 1 Collar

Pocket 2Pocket 1

Pocket 3

Sleeve 2Sleeve 1

Shoe 1

Shoe 2

Glasses

Glasses

Glasses

Shoe 1 Shoe 2

Neckline

Shoe 2

Shoe 1

Neckline : neckline type: v-neckneckline type: round (neck)neckline type: oval (neck)

Collar

Top

(a) (b) (c) (d)

Pocket : nickname: patch (pocket) nickname: flap (pocket) nickname: welt (pocket)

Fig. 6. Attribute-Mask R-CNN results on the Fashionpedia validation set, using MaskR-CNN with R-101-FPN backbone. Masks, bounding boxes, and apparel categories(category score > 0.6) are shown. The localized attributes from the top 5 masks (thatcontain attributes) on each image are also shown. Correctly predicted categories andlocalized attributes are bolded. Best view digitally.

there is headroom remaining for future development of more advanced computervison models on this task (see supplementary material for more details of thebaseline analysis).

6 Conclusion

In this work, we focus on a new task that unifies instance segmentation andattribute recognition. To solve challenging problems entailed in this task, weintroduced the Fashionpedia ontology and dataset. To the best of our knowl-edge, Fashionpedia is the first dataset that combines part-level segmentationmasks with fine-grained attributes. We presented Attribute-Mask R-CNN, anovel model for this task, along with a novel evaluation metric. We expect mod-els trained on Fashionpedia can be applied to many applications including betterproduct recommendation in online shopping, enhanced visual search results, andresolving ambiguous fashion-related words for text queries. We hope Fashionpe-dia will contribute to the advances in fine-grained image understanding in thefashion domain.

Page 15: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 15

7 Acknowledgements

We thank Kavita Bala, Carla Gomes, Dustin Hwang, Rohun Tripathi, OmidPoursaeed, Hector Liu, and Nayanathara Palanivel for their helpful feedback anddiscussion in the development of Fashionpedia dataset. We also thank Zeqi Gu,Fisher Yu, Wenqi Xian, Chao Suo, Junwen Bai, Paul Upchurch, Anmol Kabra,and Brendan Rappazzo for their help developing the fine-grained attribute an-notation tool.

8 Supplementary Material

In our work we presented the new task of instance segmentation with at-tribute localization. We introduced a new ontology and dataset, Fashionpedia,to further describe the various aspects of this task. We also proposed a novelevaluation metric and Attribute-Mask R-CNN model for this task. In the sup-plemental material, we provide the following items that shed further insight onthese contributions:– More comprehensive experimental results (§ 8.1)– An extended discussion of Fashionpedia ontology and potential knowledge

graph applications (§ 8.2)– More details of dataset analysis (§ 8.3)– Additional information of annotation process (§ 8.4)– Other concerns about Fashionpedia (§ 8.5)– Attribute-Mask R-CNN inference demo (§ 8.6)

8.1 Attribute-Mask R-CNN

Per-class evaluation. Fig. 7 presents detailed evaluation results per super-category and per category. In Fig. 7, we follow the same metrics from COCOleaderboard (AP, AP50, AP75, APl, APm, APs, AR@1, AR@10, AR@100,ARs@100, ARm@100, ARl@100), with τIoU and τF1 if possible. Fig. 7 showsthat metrics considering both constraint τIoU and τF1 are always lower than us-ing τIoU alone across all the supercategories and categories. This further demon-strates the challenging aspect of our proposed task.

In general, categories belong to “garment parts” have a lower AP and AR,comparing with “outerwear” and “accessories”. Interestingly, the category “shirt,blouse” does not have any predictions. Yet there is relatively large number ofinstances in this category (see Fig. 8). We hypothesize that the model mightbe confused with the other similar categories such as top, t-shirt, sweatshirtor sweater. Fig. 9(a) also confirms that one of the main errors of detectors isclass-similarity confusion.

A detailed breakdown of detection errors is presented in Fig. 9 for super-categories and three main categories. In terms of supercategories in Fashionpe-dia, “outerwear” errors are dominated by within supercategory class confusions(Fig. 9(a)). Within this supercategory class, ignoring localization errors would

Page 16: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

16 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

APAP50AP75APl APm APsAR@1AR@10AR@100ARs@100ARm@100ARl@100IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1 IoU

overall

supercls-accessory

Max

Min

outerw

ear

parts

Min

Fig. 7. Detailed results (for masks) using Mask R-CNN with R-101-FPN backbone.We present the same metrics as COCO leaderboard for overall categories, three su-per categories for apparel objects, and 46 fine-grained apparel categories. We use bothconstraints (for example, APIoU and APIoU+F1) if possible. For categories withoutattributes, the value represents APIoU or ARIoU. “top” is short for “top, t-shirt, sweat-shirt”. “head acc” is short for “headband, head covering, hair accessory”

Page 17: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 17

sleev

esho

e

neckl

inepo

cket

dress

top, t-

shirt,

swea

tshirt

pants col

larzip

per

jacket

bag,

wallet

belt

shirt,

blouse lap

elbe

ad skirt

rivet

glasse

s

tights

, stock

ings

appliq

ue

hair a

cc*watc

hbu

ckle

coat

shorts soc

k hat

ruffle

swea

ter tieglo

vesca

rfflo

werho

od

cardig

anseq

uin

jumpsu

it

epau

lette ve

stfrin

ge bow

tassel

ribbo

ncap

e

umbre

lla

leg warm

er

Apparel categories (with masks)

0

10000

20000

30000

40000

50000

60000

Coun

ts

5944

8

4637

4

3425

8

2717

9

1873

9

1654

8

1241

4

1015

9

7991

7833

7217

6851

6161

5972

5084

5046

4893

4855

4326

3529

3470

3389

3300

3124

2756

2582

2518

2407

1494

1457

1385

1374

1367

1226

1107

929

922

874

719

588

528

335

274

152

135

112

Fig. 8. Mask counts per apparel category

only raise AP slightly from 68.1 to 69.7 (+1.6). A similar trend can be observedin class “skirt”, which belongs to “outerwear” (Fig. 9(d)). Detection errors of“part” (Fig. 9(b) 9(e)) and “accessory”(Fig. 9(c) 9(f)) on the other hand, aredominated by both background confusion and localization. “part” also has alower AP in general, compared with other two super-categories. A possible rea-son is that objects belong to “part” usually have smaller sizes and lower counts.

F1 score calculation. Since we measure the f1 score of predicted attributesand groundtruth attributes per mask, we consider the both options of multi-labelmulti-class classification with 294 classes for one instance, and binary classifica-tion for 294 instances. Multi-label multi-class classification is a straightforwardtask, as it is a common setting for most of the fine-grained classification tasks. Inbinary classification scenario, we consider the 1 and 0 of the multi-hot encodingof both results and ground-truth labels as the positive and negative classes re-spectively. There are also two averaging choices: “micro” and “macro”. “Micro”averaging calculates the score globally by counting the total true positives, falsenegatives and false positives. “Macro” averaging calculates the metrics for eachattribute class and reports the unweighted mean. In sum, there are four optionsof f1-score averaging methods: 1) “micro”, 2) “macro”, 3) “binary-micro”, 4)“binary-macro”.

As shown in Fig. 10, we present the APF1=τF1

IoU+F1, with τIoU averaged in therange of [0.5 : 0.05 : 0.95]. τF1 is increased from 0.0 to 1.0 with a increment of0.01. Fig. 10 illustrates that as the value of τF1 increases, APF1=τF1

IoU+F1 decreasesin different rates given different choices of f1 score calculation. There are 294attributes in total, and an average of 3.7 attributes per mask in Fashionpediatraining data. It’s not surprising to observe that“Binary-micro” produces highf1-scores in general (higher than 0.97), as the APIoU+F1 score only decreases ifthe τF1 ≥ 0.97. On the other hand, “macro” averaging in multi-label multi-classclassification scenario gives us extremely low f1-scores (0.01 − 0.03). This fur-ther demonstrates the room for improvement for localized attribute classificationtask. We used “binary-macros” as our main metric.

Result visualization. More baseline results are also visualized in Figure 11.It shows that our simple baseline model can detect most of the apparel categoriescorrectly. However, it also produces false positives sometimes. For example, it

Page 18: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

18 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

overall-outerwear-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.628] C75[.681] C50[.697] Loc[.825] Sim[.833] Oth[.845] BG[1.00] FN

(a) Super category: outerwear

overall-part-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.143] C75[.285] C50[.421] Loc[.434] Sim[.458] Oth[.596] BG[1.00] FN

(b) Super category: parts

overall-accessory-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.564] C75[.723] C50[.786] Loc[.799] Sim[.827] Oth[.883] BG[1.00] FN

(c) Super category: accessory

outerwear-skirt-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.741] C75[.788] C50[.795] Loc[.894] Sim[.894] Oth[.911] BG[1.00] FN

(d) Outerwear: skirt

part-pocket-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.264] C75[.468] C50[.686] Loc[.690] Sim[.700] Oth[.851] BG[1.00] FN

(e) Parts: pocket

accessory-tights, stockings-all

0 0.2 0.4 0.6 0.8 1recall

0

0.2

0.4

0.6

0.8

1

prec

isio

n

[.719] C75[.855] C50[.866] Loc[.868] Sim[.877] Oth[.921] BG[1.00] FN

(f) Accessory: tights, stock-

ingsFig. 9. Main apparel detectors analysis. Each plot shows 7 precision recall curves whereeach evaluation setting is more permissive than the previous. Specifically, C75: strictIoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: localization errors ignored(τIoU = 0.1); Sim: supercategory False Positives (FPs) removed; Oth: category FPsremoved; BG: background (and class confusion) FPs removed; FN: False Negativesare removed. Two plots in the first row are a comparison between two detectors trainedon Fashionpedia and COCO respectively. The results are averaged over all categories.The first row (overall-[supercategory]-[size]) contains results for three supercategories inFashionpedia; the second row ([supercategory]-[category]-[size]) shows results for threefine-grained categories (one per supercategory). Legends present the area under eachcurve (corresponds to AP metric) in brackets as well

segments legs as tights and stockings (Figure 11(f) ). A possible reason isthat both objects have the same shape and stockings are worn on the legs.

Predicting fine-grained attributes, on the other hand, is a more challeng-ing problem for the baseline model. We summarize several issues: (1) predictmore attributes than needed: (Figure 11(a) , (b) , (c) ); (2) fail todistinguish among fine-grained attributes: for example, dropped-shoulder sleeve(ground truth) v.s. set-in sleeve (predicted) (Figure 11(e) ); (3) other falsepositives: Figure 11(e) has a double-breasted opening, yet the model pre-dicted it as the zip opening.

These results further show that there are rooms for improvement and fu-ture development of more advanced computer vision models on this instancesegmentation with attribute localization task.

Result visualization on other datasets. Other fashion datasets such asModaNet and DeepFashion2 also contain instance segmentation masks. Asidefrom the overall AP results presented in the main paper (see Table 5 of the

Page 19: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 19

0.0 0.2 0.4 0.6 0.8 1.0F1 score thresholds

0.175

0.200

0.225

0.250

0.275

0.300

0.325

0.350

APIo

U+

F1 sc

ore

microsmacrosbinary-microsbinary-macros

Fig. 10. APF1=τF1IoU+F1 score with different τF1. The value presented are average over

τIoU ∈ [0.5 : 0.05 : 0.95]. We use “binary-macro” as our main metric

main paper), we present the qualitative analysis on the segmentation masksgenerated among Fashionpedia(Fig. 11), ModaNet(Fig. 12(a-f)), and DeepFash-ion2(Fig. 12(g-l)) datasets. Photos of the first row in Fig. 12 are from ModaNet.They show that the quality of the generated masks on ModaNet is fairly goodand comparable to Fashionpedia in general (Fig. 12(a)). We also have a coupleof observations of the failure cases: (1) fail to detect apparel objects: for exam-ple, the shoe from Fig. 12(c) is not detected. Parts of the pants (Fig. 12(c)) andcoat (Fig. 12(d)) are not detected; (2) fail to detect some categories: Fig. 12(e)shows that the shoes on the shoe rack and right foot are not detected, possi-bly due to a lack of such instances in the ModaNet training dataset. Similar toFashionpedia, ModaNet mostly consist of street style images. See Fig. 13(b) forexample predictions from model trained on Fashionpedia; 3) close-up images:ModaNet contains mostly full-body images. This might be the possible reasonto the decreased quality of predicted masks on close-up shot like Fig. 12(f).

For DeepFashion 2 (Fig. 12(g,h,k)), the generated segmentation masks tendsto not tightly follow the contours of garments in the images. The main reasonpossibly is that the average number of vertices per polygon is 14.7 for Deep-fashion2, which is lower than Fashionpedia and ModaNet (see Table 2 in themain text). Our qualitative analysis also shows that: 1) the model will generatethe segmentation masks of pants (Fig. 12(i)) and tops (Fig. 12(j)) that are notvisible in the images. Both of them are covered by a jacket. And we find thatin DeepFashion 2, some part of the garments which is covered by other objectsare indeed annotated with segmentation masks; 2) better performance on ob-jects that are not on human body (Fig. 12(l)): DeepFashion 2 contains manycommercial-customer image pairs (both images with and without human body)in the training dataset. In contrast, both Fashionpedia and ModaNet containmore images with human body than images without human body in the train-ing datasets.

Generalization to the other image domains. For Fashionpedia, we alsoinference on images found in online shopping websites, which usually displaysa single apparel category, with or without a fashion model. We found out thatthe learned model works reasonably well if the apparel item is worn by a model(Fig. 13).

Page 20: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

20 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

Pants : opening type: fly (opening)length: maxi (length)textile pattern: plain (pattern)

Top : silhouette: symmetrical length: above-the-hip (length) textile pattern: plain (pattern)

: neckline type: round (neck) neckline type: oval (neck) neckline type: v-neck

: nickname: curved (pocket) nickname: slash (pocket) nickname: patch (pocket)

Bag : silhouette: symmetrical textile pattern: plain (pattern)

: nickname: patch (pocket)nickname: welt (pocket) nickname: slash (pocket) nickname: curved (pocket) nickname: flap (pocket)

: nickname: set-in sleeve length: wrist-length

Collar : nickname: shirt (collar) nickname: regular (collar)

: neckline type: round (neck)neckline type: oval (neck)

Belt : silhouette: symmetrical silhouette: regular (fit)textile pattern: plain (pattern)

: silhouette: symmetrical textile pattern: plain (pattern) opening type: single breasted

: nickname: set-in sleeve length: short (length)

: neckline type: round (neck)neckline type: v-neckneckline type: oval (neck) neckline type: scoop (neck)

: length: short (length) nickname: set-in sleeve

Dress : silhouette: symmetrical opening type: zip-up textile pattern: plain (pattern)

: nickname: set-in sleeve length: wrist-length

Dress : silhouette: symmetrical textile pattern: plain (pattern)

: nickname: set-in sleeve length: wrist-length

Coat : silhouette: symmetrical textile pattern: plain (pattern)

Belt : silhouette: symmetrical textile pattern: plain (pattern)

Collar : nickname: shirt (collar) nickname: regular (collar)

: nickname: patch (pocket) nickname: flap (pocket)

Bag : silhouette: symmetrical waistline: high waist waistline: normal waist length: mini (length) opening type: zip-up textile pattern: plain (pattern)

Dress : silhouette: symmetrical textile pattern: plain (pattern) opening type: zip-up

Bag : silhouette: symmetrical textile pattern: plain (pattern)

: length: wrist-length nickname: set-in sleeve

: nickname: set-in sleeve length: wrist-length

: silhouette: symmetrical textile pattern: plain (pattern) textile finishing, manufacturing techniques: lining

Dress : silhouette: symmetrical textile pattern: plain (pattern) opening type: zip-up

Top : silhouette: symmetrical textile pattern: plain (pattern) length: above-the-hip (length)

Pants : silhouette: symmetrical textile pattern: plain (pattern) opening type: fly (opening) silhouette: regular (fit)

: neckline type: round (neck) silhouette: symmetrical neckline type: oval (neck)

: silhouette: symmetrical silhouette: tight (fit) length: maxi (length) textile pattern: plain (pattern) nickname: leggings

(a) (b) (c) (d) (e) (f)

Pants

TopNeckline

PocketBag

Shoe 1

Shoe 2

Shoe 1Shoe 1

Shoe 1 Shoe 1Shoe 2Shoe 2

Shoe 2

Shoe 2

Glasses

Pocket 1

Sleeve

Collar

Neckline

Belt

Jacket

Top

Pants Buckle

Neckline Sleeve 2Sleeve 1

Dress

BagBag

Sleeve 1

Sleeve 2

Jacket

Coat

Dress

Sleeve 1 Sleeve 2Dress

Pants

Coat

Sleeve 3

Neckline

Tights 1 Tights 2

Belt

Collar

Pocket

BagDress

Coat

Pocket 2

Tights 1Tights 1

Tights 2

Tights

Neckline

Neckline

Neckline

Neckline

Pocket

Pocket 1

Pocket

Sleeve

Sleeve 1

Sleeve 2

Sleeve 1

Sleeve 2

Sleeve 1

Sleeve 2

Jacket

Jacket

Tights

: nickname: leggingssilhouette: tight (fit)length: maxi (length)textile pattern: plain (pattern)

Tights 1

: nickname: leggingssilhouette: tight (fit)length: maxi (length)textile pattern: plain (pattern)

Tights 2

: nickname: patch (pocket)nickname: welt (pocket) nickname: slash (pocket) nickname: curved (pocket) nickname: flap (pocket)

Pocket 2

: silhouette: tight (fit) length: maxi (length) textile pattern: plain (pattern) nickname: leggingsopening type: fly (opening)

Tights

Fig. 11. Baseline results on the Fashionpedia validation set, using R-101-FPN. Masks,bounding boxes, and apparel categories (category score > 0.6) are shown. Attributesfrom top 10 masks (that contain attributes) from each image are also shown. Correctpredictions of objects and attributes are bolded

8.2 Fashionpedia Ontology and Knowledge Graph

Fig. 14 presents our Fashionpedia ontology in detail. Utilizing the proposedontology and the image dataset, a large-scale fashion knowledge graph can beconstructed to represent the fashion world in the product level. Fig. 15 illustratesa subset of the Fashionpedia knowledge graph.

Apparel graphs. Integrating the main garments, garment parts, attributes,and relationships presented in one outfit ensemble, we can create an apparelgraph representation for each outfit in an image. Each apparel graph is a struc-tured representation of an outfit ensemble, containing certain types of gar-ments. Nodes in the graph represent the main garments, garment parts, andattributes. Main garments and garment parts are linked to their respective at-tributes through different types of relationships. Figure 16 shows more imageexamples with apparel graphs.

Fashionpedia knowledge graph. While apparel graphs are localized rep-resentations of certain outfit ensembles in fashion images, we can also create asingle Fashionpedia knowledge graph (Figure 15). The Fashionpedia knowledge

Page 21: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 21

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

Mod

aNet

Deep

Fashion2

Fig. 12. Baseline results on ModaNet and DeepFashion2 validation set

(a) (b)

Fig. 13. Generated masks on online-shopping images [53]. (a) and (b) show the sametypes of shoes in different settings. Our model correctly detects and categorizes thepair of shoes worn by a fashion model, yet mistakenly detects shoes as jacket and abag in (b)

graph is the union of all apparel graphs and includes entire main garments, gar-ment parts, attributes, and relationships in the dataset. In this way, we are ableto represent and understand fashion images in a more structured way.

We expect our Fashionpedia knowledge graph and the database to have ap-plicability to extending the existing knowledge graph (such as WikiData [46])with novel domain-specific knowledge, improving the underlying fashion prod-uct recommendation system, enhancing search engine’s results for fashion visualsearch, resolving ambiguous fashion-related words for text search, and more.

8.3 Dataset Analysis

Fig.16 shows more annotation examples, represented in the exploded viewsof annotation diagrams. Table 6 displays the details about “not sure” and “noton the list” results during attribute annotation process. We present the resultper super-categories of attributes. Label “not sure” means the expert annotatoris uncertain about the choice given the segmentation mask. “Not on the list”means the annotator is certain that the given mask presents another attributes

Page 22: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

22 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

(a) Categories (b) Attributes (partially)

Fig. 14. Apparel categories (a) and fine-grained attributes (b) hierarchy in Fashionpe-dia

that is not presented in the Fashionpedia ontology. Other than “nicknames”(which is the specific name for a certain apparel category), less than 6% of thetotal masks account for the “not on the list” category.

Fig. 17 and 18 also compare Fashionpedia and other images datasets in termsof image size and vertices per polygons.

We compare image resolutions between Fashionpedia and four other segmen-tation datasets (COCO and LVIS share the same images). Fig. 17 shows thatimages in Fashionpedia have the most diverse image width and height. WhileModaNet has the most consistent resolutions of images. Note that high resolu-tion images will burden the data loading process of training. With that in mind,we will release our dataset in both the resized and the original versions.

We also report the distribution of number of vertices per polygons in Fig. 18.This measures the annotation effort in mask annotation. Fashionpedia has thesecond-widest range, next to LVIS.

8.4 Fashionpedia dataset creation details

Image collection. To avoid photo bias, all the images are randomly collectedfrom Flickr and free license stock photo websites (Unsplash, Burst by Shopify,Free stocks, Kaboompics, and Pexels). The collected images are further verifiedmanually by two fashion experts. Specifically, they check the scenes diversity andmake sure the clothing items were visible and annotatable in the images. Theestimated image type breakdown is listed as follows: street style images (30%

Page 23: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 23

CONTAIN

CO

NTA

IN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONT

AIN

CONTAIN

CO

NTA

IN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAINCONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CO

NTAIN

CONT

AIN

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CO

NTA

INCONT

AIN

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CO

NTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

CONTAIN

CONT

AIN

CONTAIN

CONT

AIN

CONTAIN

CO

NTAIN

CONTAIN

HAS_ATTRIBUTE

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

EHAS

_ATT

RIBUTE

CONTAIN

CONTAIN

CONTAIN

CO

NTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

CO

NTAIN

CONTAIN

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS

_ATT

RIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

CO

NTA

IN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

EH

AS_ATTRIBU

TE

HAS

_ATT

RIB

UTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

EHAS

_ATT

RIBU

TE

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

CONT

AIN

CONTAIN

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_ATTRIBUTEHA

S_AT

TRIB

UTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTECONTAIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_A

TTRIB

UTE

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTEHAS_ATTRIBUTE

CONT

AIN

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

E

CONTAIN

CO

NTA

IN

CONTAIN

HAS

_ATT

RIB

UTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTEHAS_

ATTR

IBUT

E

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

CONT

AIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTEH

AS_A

TTR

IBU

TEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_ATTRIBUTEHAS_ATTRIB…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_

ATT…

HAS_ATTRIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

CONTAIN

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

CONTAIN

HAS

_ATT

RIB

UTE

HAS_ATTR

IBUTE

HAS_

ATTR

IBUT

E HAS_

ATTR

IBUT

E

HAS_

ATTR

IBUT

EHA

S_AT

TRIB

UTE

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONT

AIN

CONT

AIN

CO

NTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

HAS_ATT

RIBUTE

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATT

RIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CONTAIN

CONT

AIN

HAS_ATTRIBUTE

CONTAIN

CONTAIN

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

CONTAIN

CO

NTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

CO

NTA

IN

CONTAIN

product_3

Jacket / Blazer

product_2

product_1

product_7

Coat

product_6

product_5

product_4

product_9

Shorts

product_8

product_13

Skirt

product_12

product_11

product_10

product_18

Tops / T-ShirtSweatshirt

product_17

product_16

product_15

product_14

product_20

Dress

product_19

Bolero

Collarless

Set-in sleeve

Symm-etrical

Regular (fit)

Above-the-…

Wrist-length

Crop (Jacket)

Plain(pattern)

EmpireWaistline

Asym-metrical

Pleat /pleated

Zip-up

Peplum

Leather /Faux-leather

Biker(Jacket) /

Napoleon(Lapel)

Threequarter

Puffer(Jacket) /

Down(Jacket)

Quilted

Oversized(Collar)

Above-the-k…

Check / Plaid/ Tartan

Dropped-sh…

Loose (fit)

Fur

Hip (length)

Washed

Distressed /Ripped

Slit

Fly (Opening)

Mini(length) /

Low waist /

Curved(Pocket)

NormalWaist /

Pencil

StraightTight (fit) /

Slim (fit) /Skinny (fit)

Skater (Skirt)

Circle

Flare

Off-the-shoulder

Straightacross

(Neck)

Tiered /layered

Sleeveless

Shirt (Collar)

Ringer(T-shirt)

Round(Neck) /

Roundneck Stripe

One shoulder

Perfo-rated

Dolman(Sleeve) &

CONTAIN

CONTAIN

CONTAIN

CO

NTA

IN

CONTAIN

CONTAINCONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

HAS_ATTRIBUTE

H…

HAS_ATTRIBUTE

HAS_ATTRIBUTEH

AS_A

HAS_

ATTR

IBUT

E

HAS_…

HAS_ATTRIBUTE

Jacket / Blazer

Set-in sleeve

Regular (fit)

Above-the-hip

Plain(pattern)

Asymmet -rical

Zip-up Leather

Biker(Jacket)

Napoleon(Lapel)

Threequarter(length) /

CO

NTAIN

CONTAIN

CONTAINCONTAIN

CONTAIN

CONTAIN

CONT

AIN

CONTAIN

HAS_ATTRIBUTE

HAS_ATTRIBUTEHAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_A

TTRIB

UTE

HAS_

ATTR

IBUT

E

Skirt

Symmet -rical

Plain(pattern) Above

-the-Knee

NormalWaist

Skater (Skirt)

Circle

FlareCONTAIN

CONTAIN

CONT

AIN

CONTAIN

CONTAIN

CONTAIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTRI…

HAS_ATTRIBUTE

HAS_ATTRIBUTE

HAS_ATTRIBUTE

Shorts

Symmetr -ical

Regular (fit)

Washed

Distressed /Ripped

Fly (Opening)

Mini(length)

CONTAINCONTAIN

CONTAIN

CONTAIN CONTAIN

CONT

AIN

CONTAIN

HAS_

ATTR

IBUT

E

HAS_ATTRIBUTE

HAS_ATTR

IBUTE

HAS_ATTRIBUTE

CoatCollarless

Set-in sleeve

Symmet -rical

Wrist-length

Fur

Hip (length)

Apparel CategoriesFine-grained Attributes

Fig. 15. Fashionpedia Knowledge Graph: we present a subset of the Fashionpediaknowledge graph by aggregating 20 annotated products. The knowledge graph can beused as a tool for generating structural information

Table 6. Percentage of attributes in Fashionpedia broken down by super-class. “Texfinish, manu-tech.” is short for “Textile finishing, Manufacturing techniques”. Sum-maries of “not sure” and “not on the list” during attributes annotations are also pre-sented. It was calculated by the counts divided by the total masks with attributes. “notsure” is mainly due to occlusion inside the images, which cause some super-classes (suchas waistline, opening type, and length) are unidentifiable in the images. The percentageof “not on the list” is less than 15%. This demonstrates the comprehensiveness of ourFashionpedia ontology

Super-category class count not sure not on the list

Length 15 12.79% 0.01%Nickname 153 9.15 % 12.76%Opening Type 10 32.69% 3.90%Silhouettes 25 2.90% 0.27%Tex finish, manu-tech 21 4.47% 1.34%Textile Pattern 24 2.18% 5.30%None-Textile Type 14 4.90% 4.07%Neckline 25 9.57% 3.38%Waistline 7 30.46% 0.17%

of the full dataset), celebrity events images (30%), runway show images (30%),and online shopping product images (10%). For gender distribution, the genderin 80% of images are female, and 20% of images are male.

We did aim to address the issue of photographic bias in the image collectionprocess. Our dataset includes the images that are not centered, not full shot andwith occlusion (see examples in Fig. 19). Furthermore, our focus is to identifyclothing items, not to identify people during the image collection process.

Crowd workers and 10-day training for mask annotation. In the spiritof sharing the same apparel vocabulary for all the annotators, we prepared adetailed tutorial (with text descriptions and image examples) for each categoryand attributes in the Fashionpedia ontology (see Fig. 20 for an example). Beforethe official annotation process, we spent 10 days on training the 28 crowd workersfor the following three main reasons.

Page 24: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

24 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

Pants

Ensemble

Tops Fly (opening)

Ankle length

Normal waist

Plain

Collar

Glasses

SleeveSleeve PocketPocket

Jacket

Ensemble

Dress

Below-the-knee

Double breasted

Loose (fit)Check / Plaid

plain

Shoes

SleeveSleeve Lapel

Glasses

Plain

Single-breasted

Slim (fit)

Symmetrical

Symmetrical

Asymmetrical

SymmetricalRegular (fit)

Plain

ShoesCape

ScarfPantyhose

Coat

Ensemble

Tops

Mini length

Ruched

Symmetrical

Wrapping

Floor lengthLoose (fit)

Abstract (pattern)

Neckline

Shorts Glasses

RuffleRuffle RuffleRuffle

Slim (fit)

Fly (opening)

Slim (fit)

Printed

Washed

Frayed

Plain

Pocket

Relationships: Part of

Textile Finishing Textile Pattern Silhouette Opening Type

Length Nickname Waistline

Fig. 16. Example images and annotations from our dataset: the images are annotatedwith both instance segmentation masks and fine-grained attributes (black boxes)

Page 25: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 25

Imag

e Di

agon

al L

engt

hFig. 17. Image size comparison among Fashionpedia, ModeNet, DeepFashion2, andCOCO2017, LVIS. Only training images are shown. The Fashionpedia images has themost diverse resolutions. Note that COCO2017 and LVIS have higher resolution imagesfor annotation. The distribution presented here are the publicly available photos

0 250 500 750 1000 1250 1500 1750Number of vertices per polygon

10 4

10 3

10 2

10 1

100

101

102

Perc

ent o

f pol

ygon

DeepFashion2ModaNetCOCO2017LVISFashionpedia

Fig. 18. The number of vertices per polygon. This represents the quality of masks andthe efforts of annotators. Values in the x-axis were discretized for better visual effect.Y-axis is on log scale. Fashionpedia has the second widest range, next to LVIS

First, some apparel categories are commonly referred as other names in gen-eral. For example, “top” is a general term for “shirt”, “sweater”, “t-shirt”,“sweatshirt”. Some annotators can mistakenly annotate a “shirt” as a “top”.We need to train these workers so they have the same understanding of theproposed Fashionpedia ontology. Utilizing the prepared tutorials (see Fig. 20 foran example), we trained and educated annotators on how to distinguish amongdifferent apparel categories.

Second, there are fine-grained differences among apparel categories. For ex-ample, we observed that some workers initially had difficulty in understandingthe difference among different garment parts, such as tassel and fringe. To helpthem understand the difference of these objects, we ask them to practice andidentify more sample images before the annotation process. Fig. 21 shows ourtutorials for these two categories. We specifically shows some correct and wrongexamples of annotations.

Third, we ask for the quality of annotations. In particular, we ask the an-notators to follow the contours of garments in the images as closely as possible.The polygon annotation process is monitored and verified for a few days beforethe workers started the actual annotation process.

Quality control of debatable apparel categories. During the anno-tation process, we allow the annotators to ask questions about the uncertaincategories. Two fashion experts monitored the annotation process, by answering

Page 26: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

26 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

Fig. 19. Example of different type (people position/gesture, full/half shot, occlusion,scenes, garment types, etc.) of images in Fashionpedia dataset

questions, checking the final annotation quality, and providing weekly feedbackto annotators.

Instead of asking annotators to rate their confidence level of each segmenta-tion mask, we asked them to send back all the uncertain masks to us during theannotation. The same two fashion experts made the final judgement and gavethe feedback to the workers on these debatable or unsure fashion categories.Some examples of debatable or fuzzy fashion items that we have documentedcan be found in Figure 22.

8.5 Other concern and thoughts

Does this dataset include the images or labels of previous datasets?We only include the previous datasets for comparison. Our dataset doesnt in-tentionally use any images or labels from previous datasets. All the images andlabels from Fashionpedia are newly collected and annotated.

Who were the fashion experts annotating localized attributes inFashionpedia dataset? The fashion experts are the 15 fashion graduate stu-dents that we recruited from one of the top fashion design institutes. For double-blind policy, we cannot mention the name of the university. But we will releasethe name of this university and the collaborators from this university in the finalversion of this paper.

Instance segmentation v.s. semantic segmentation. We didn’t con-duct semantic segmentation experiments on our dataset for the following tworeasons: 1) Although semantic segmentation is a useful task, we believe instancesegmentation is more meaningful for fashion images. For example, if we need todistinguish the different shoe style of a fashion image containing 3 pair of dif-ferent shoes, instance segmentation (Figure 23(a)) can help us distinguish eachshoe separately. However, semantic segmentation (Figure 23(b)) will mix all theshoe instances together. 2) Semantic segmentation is the sub-problem of instancesegmentation. If we merge the same detected object class from our instance seg-mentation experimental result, it yields the results for semantic segmentation.

Page 27: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 27

Fig. 20. Annotation tutorial example for shirt and top

Which image has the most annotated masks? In Fashionpedia dataset,the maximum number of segmentation masks in an image is 74 (Fig. 24). Weobserve that most of the masks in this image is “rivets” (which belong to garmentparts).

What’s the difference between Fashionpedia and other fine-graineddatasets like CUB-200? We propose to localize fine-grained attributes withinsegmentation masks of images. This is a novel task with real-world applicationto the best of our knowledge. The differences between Fashionpedia and CUBare as follows: 1) CUB uses keypoints as annotation to indicate different loca-tions on birds, while Fashionpedia has segmentation masks of garments, garmentparts, and accessories; 2) Fashionpedia attributes are associated with garmentor garment part instances in images, whereas CUB provides global attributes,not associated with any specific keypoints.

8.6 Attribute-Mask R-CNN inference demo

Fig. 25 shows the inference code and results of Attribute-Mask R-CNN. Theinference code and model is available at: https://fashionpedia.github.io/home/Model_and_API.html

Page 28: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

28 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

What is Fringe? Correct & Wrong way to trace Fringe

What is Tassel? Correct & Wrong way to trace Tassel

Fig. 21. Annotation tutorial for fringe and tassel

Fig. 22. Example of debatable fashion items in Fashionpedia dataset. The questionsare asked by the crowdworkers. The answers are provided by two fashion experts

8.7 Fashionpedia API

The Fashionpedia API is available at: https://fashionpedia.github.io/home/

Model_and_API.html.

8.8 iMat-Fashion Kaggle challenges

To advance state-of-the-art of visual analysis of clothing, we hosted two kagglechallenges (imaterialist-fashion) on Kaggle in 2019 and 2020 respectively. The challengelinks are available at: https://www.kaggle.com/c/imaterialist-fashion-2019-FGVC6and https://www.kaggle.com/c/imaterialist-fashion-2020-fgvc7.

Page 29: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 29

Fig. 23. Instance segmentation (left) and semantic segmentation (right)

References

1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe-mawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machinelearning. In: OSDI (2016) 11

2. Attneave, F., Arnoult, M.D.: The quantitative study of shape and pattern percep-tion. Psychological bulletin (1956) 10

3. Bloomsbury.com: Fashion photography archive, retrieved May 9, 2019from https://www.bloomsbury.com/dr/digital-resources/products/

fashion-photography-archive/ 54. Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.:

Apparel classification with style. In: ACCV (2012) 4, 55. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their at-

tributes. In: CVPR (2009) 26. FashionAI: Retrieved May 9, 2019 from http://fashionai.alibaba.com/ 4, 57. Fashionary.org: Fashionpedia - the visual dictionary of fashion design, retrieved

May 9, 2019 from https://fashionary.org/products/fashionpedia 58. Ferrari, V., Zisserman, A.: Learning visual attributes. In: Advances in neural in-

formation processing systems (2008) 29. Fu, C.Y., Berg, T.L., Berg, A.C.: Imp: Instance mask projection for high accuracy

semantic segmentation of things. arXiv preprint arXiv:1906.06597 (2019) 510. Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: Deepfashion2: A versatile bench-

mark for detection, pose estimation, segmentation and re-identification of clothingimages. In: CVPR (2019) 2, 4, 5, 8, 10, 12

11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-rate object detection and semantic segmentation. In: CVPR (2014) 2

12. Girshick, R., Radosavovic, I., Gkioxari, G., Dollar, P., He, K.: Detectron. https://github.com/facebookresearch/detectron (2018) 11

13. Goyal, P., Dollar, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tul-loch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1hour. arXiv preprint arXiv:1706.02677 (2017) 11

14. Guo, S., Huang, W., Zhang, X., Srikhanta, P., Cui, Y., Li, Y., Adam, H., Scott,M.R., Belongie, S.: The imaterialist fashion attribute dataset. In: ICCV Workshops(2019) 1, 4, 5

15. Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instancesegmentation. In: CVPR (2019) 3, 8, 9, 10

16. Han, X., Wu, Z., Huang, P.X., Zhang, X., Zhu, M., Li, Y., Zhao, Y., Davis, L.S.:Automatic spatially-aware fashion concept discovery. In: ICCV (2017) 4, 5

17. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: ICCV (2017) 2, 1118. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.

In: CVPR (2016) 1119. He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion

trends with one-class collaborative filtering. In: WWW (2016) 4, 5

Page 30: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

30 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

20. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In:ECCV (2012) 12

21. Hsiao, W.L., Grauman, K.: Learning the latent look: Unsupervised discovery of astyle-coherent embedding from fashion images. In: ICCV (2017) 4, 5

22. Huang, J., Feris, R., Chen, Q., Yan, S.: Cross-domain image retrieval with a dualattribute-aware ranking network. In: ICCV (2015) 4, 5

23. Inoue, N., Simo-Serra, E., Yamasaki, T., Ishikawa, H.: Multi-label fashion imageclassification with minimal human supervision. In: ICCV (2017) 4, 5

24. Kendall, E.F., McGuinness, D.L.: Ontology engineering. Synthesis Lectures on TheSemantic Web: Theory and Technology (2019) 3

25. Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it:Matching street clothing photos in online shops. In: ICCV (2015) 4, 5

26. Kiapour, M.H., Yamaguchi, K., Berg, A.C., Berg, T.L.: Hipster wars: Discoveringelements of fashion styles. In: ECCV (2014) 4, 5

27. Kirillov, A., He, K., Girshick, R., Rother, C., Dollar, P.: Panoptic segmentation.In: CVPR (2019) 4

28. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S.,Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: Connecting languageand vision using crowdsourced dense image annotations. IJCV (2017) 3, 4, 5

29. Kuan, K., Ravaut, M., Manek, G., Chen, H., Lin, J., Nazir, B., Chen, C., Howe,T.C., Zeng, Z., Chandrasekhar, V.: Deep learning for lung cancer detection: tack-ling the kaggle data science bowl 2017 challenge. arXiv preprint arXiv:1705.09435(2017) 1

30. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile clas-sifiers for face verification. In: ICCV (2009) 2

31. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramid networks for object detection. In: CVPR (2017) 11

32. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P.,Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014) 3, 5,8, 10, 12

33. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothesrecognition and retrieval with rich annotations. In: CVPR (2016) 1, 4, 5

34. Lopez, A.: Fdupes is a program for identifying or deleting duplicate files residingwithin specified directories., retrieved May 9, 2019 from https://github.com/

adrianlopezroche/fdupes 735. Mall, U., Matzen, K., Hariharan, B., Snavely, N., Bala, K.: Geostyle: Discovering

fashion trends and events. In: ICCV (2019) 536. Matzen, K., Bala, K., Snavely, N.: StreetStyle: Exploring world-wide clothing styles

from millions of photos. arXiv preprint arXiv:1706.01869 (2017) 4, 537. Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011) 238. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detec-

tion with region proposal networks. In: Advances in neural information processingsystems (2015) 2

39. Rosch, E.: Cognitive representations of semantic categories. Journal of experimen-tal psychology: General (1975) 2, 5

40. Rubio, A., Yu, L., Simo-Serra, E., Moreno-Noguer, F.: Multi-modal embedding formain product detection in fashion. In: ICCV (2017) 4, 5

41. Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics infashion: Modeling the perception of fashionability. In: CVPR (2015) 4, 5

42. Simo-Serra, E., Ishikawa, H.: Fashion style in 128 floats: Joint ranking and classi-fication using weak data for feature extraction. In: CVPR (2016) 4, 5

Page 31: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 31

43. Takagi, M., Simo-Serra, E., Iizuka, S., Ishikawa, H.: What makes a style: Experi-mental analysis of fashion prediction. In: ICCV (2017) 4, 5

44. Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam,H., Perona, P., Belongie, S.: The inaturalist species classification and detectiondataset. In: CVPR (2018) 2, 4

45. Vittayakorn, S., Yamaguchi, K., Berg, A.C., Berg, T.L.: Runway to realway: Visualanalysis of fashion. In: WACV (2015) 4, 5

46. Vrandecic, D., Krotzsch, M.: Wikidata: a free collaborative knowledge base (2014)5, 21

47. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSDBirds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, California Institute ofTechnology (2011) 2, 4

48. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for bench-marking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017) 4,5

49. Yamaguchi, K., Berg, T.L., Ortiz, L.E.: Chic or social: Visual popularity analysisin online fashion networks. In: ACM MM (2014) 4, 5

50. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashionphotographs. In: CVPR (2012) 4, 5

51. Yu, A., Grauman, K.: Semantic jitter: Dense supervision for visual comparisonsvia synthetic images. In: ICCV (2017) 4, 5

52. Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., Darrell, T.: Bdd100k:A diverse driving video database with scalable annotation tooling. arXiv preprintarXiv:1805.04687 (2018) 1

53. ZARA.com: Zara white leather flat ankle boots with top stitching size 5 bnwt,retrieved May 9, 2019 from https://www.zara.com 21

54. Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: A large-scale streetfashion dataset with polygon annotations. In: ACM MM (2018) 2, 4, 5, 8, 10, 12

Page 32: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

32 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.

instances_attributes_train2019.json, image 27 has 74 masks

Original image Original image with masks

Detailed info of each mask with associated localized attributes:

instances_attributes_train2019.json, image 27 has 74 masks

Original image Original image with masks

Detailed info of each mask with associated localized attributes:

instances_attributes_train2019.json, image 27 has 74 masks

Original image Original image with masks

Detailed info of each mask with associated localized attributes:The detailed info of each mask with associated localized attributes:

instances_attributes_train2019.json, image 27 has 74 masks

Original image Original image with masks

Detailed info of each mask with associated localized attributes:

instances_attributes_train2019.json, image 27 has 74 masks

Original image Original image with masks

Detailed info of each mask with associated localized attributes:

Original image Original image with masks

Fig. 24. The image with 74 masks in Fashionpedia dataset

Page 33: 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Menglin Jia?1, Mengyun Shi;4, Mikhail Sirotenko 3, Yin Cui

Fashionpedia 33

detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:,!num_detections]

detection_masks = np.squeeze(detection_masks, axis=(0,))[0:num_detections]detection_logits = np.squeeze(detection_logits, axis=(0,))[0:num_detections]attribute_logits = np.squeeze(attribute_logits, axis=(0,))[0:num_detections]

# include attributesattributes = []for i in range(num_detections):

prob = softmax(attribute_logits[i,:])attributes.append([j for j in range(len(prob)) if prob[j] > score_th])

[5]: max_boxes_to_draw = 10

linewidth = 2fontsize = 10line_alpha = 0.8mask_alpha = 0.5

output_image_path = 'results.pdf'

image = copy.deepcopy(image_raw)

cm_subsection = np.linspace(0., 1., min(max_boxes_to_draw,�,!len(detection_scores)))

colors = [cm.jet(x) for x in cm_subsection]

2

plt.figure()fig, ax = plt.subplots(1)

for i in range(len(detection_scores)-1, -1, -1):if i < max_boxes_to_draw:

# draw segmentation maskseg_mask = detection_masks[i,:,:]color = list(np.array(colors[i][:3])*255)pil_image = Image.fromarray(image)solid_color = np.expand_dims(

np.ones_like(seg_mask), axis=2) * np.reshape(color, [1, 1, 3])pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert('RGBA')pil_mask = Image.fromarray(np.uint8(255.0*mask_alpha*seg_mask)).

,!convert('L')pil_image = Image.composite(pil_solid_color, pil_image, pil_mask)image = np.array(pil_image.convert('RGB')).astype(np.uint8)

# draw bboxtop, left, bottom, right = detection_boxes[i,:]width = right - leftheight = bottom - topbbox = patches.Rectangle((left, top), width, height,

linewidth=linewidth, edgecolor=colors[i],facecolor='none', alpha=line_alpha)

ax.add_patch(bbox)

# draw textattributes_str = ", ".join([val['attributes'][attr]['name'] for attr in�

,!attributes[i]])detections_str = '{} ({}%)'.

,!format(val['categories'][detection_classes[i]]['name'],int(100*detection_scores[i]))

display_str = '{}: {}'.format(detections_str, attributes_str)

font = ImageFont.truetype('arial.ttf', fontsize)text_width, text_height = font.getsize(detections_str)props = dict(boxstyle='Round, pad=0.05', facecolor=colors[i],�

,!linewidth=0, alpha=mask_alpha)ax.text(left, bottom, detections_str, fontsize=fontsize,�

,!verticalalignment='top', bbox=props)print(display_str)

plt.imshow(image, interpolation='none')plt.axis('off')plt.savefig('result.pdf', transparent=True, bbox_inches='tight', pad_inches=0.05)

pocket (95%): patch (pocket), slash (pocket), curved (pocket), flap (pocket)

3

buckle (98%): curved (pocket)pocket (99%): patch (pocket), welt (pocket), slash (pocket), curved (pocket),flap (pocket)pants (99%): maxi (length), fly (opening), no non-textile material, no specialmanufacturing technique, plain (pattern)collar (99%): shirt (collar)glasses (99%):lapel (99%): symmetrical, notched (lapel), single breastedbelt (99%): plain (pattern)sleeve (99%): wrist-length, set-in sleevesleeve (99%): wrist-length, set-in sleeve

<Figure size 432x288 with 0 Axes>

4

[1]: import warningswarnings.filterwarnings('ignore')

[2]: %matplotlib inlinefrom matplotlib import cmimport matplotlib.patches as patchesimport matplotlib.pyplot as pltfrom PIL import Imageimport numpy as npimport PIL.ImageFont as ImageFontimport copyimport jsonimport sys

from pycocotools import maskfrom scipy.special import softmax

import tensorflow as tf

[3]: session = tf.Session(graph=tf.Graph())saved_model_dir = 'model'_ = tf.saved_model.loader.load(session, ['serve'], saved_model_dir)

[4]: val = json.load(open('val2020.json'))n_class = len(val['categories'])score_th = 0.05

image_path = 'input.jpg'with open(image_path, 'rb') as f:

np_image_string = np.array([f.read()])image_raw = Image.open(image_path)width, height = image_raw.sizeimage_raw = np.array(image_raw.getdata()).reshape(height, width, 3).astype(np.

,!uint8)plt.imshow(image_raw, interpolation='none')plt.axis('off')

num_detections, detection_boxes, detection_classes, detection_scores,�,!detection_masks, detection_logits, attribute_logits, image_info = session.run(

['NumDetections:0', 'DetectionBoxes:0', 'DetectionClasses:0',�,!'DetectionScores:0', 'DetectionMasks:0', 'DetectionLogits:0', 'AttributeLogits:,!0', 'ImageInfo:0'],

feed_dict={'Placeholder:0': np_image_string})num_detections = np.squeeze(num_detections.astype(np.int32), axis=(0,))detection_boxes = np.squeeze(detection_boxes * image_info[0, 2], axis=(0,))[0:

,!num_detections]detection_scores = np.squeeze(detection_scores, axis=(0,))[0:num_detections]

1

detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:,!num_detections]

detection_masks = np.squeeze(detection_masks, axis=(0,))[0:num_detections]detection_logits = np.squeeze(detection_logits, axis=(0,))[0:num_detections]attribute_logits = np.squeeze(attribute_logits, axis=(0,))[0:num_detections]

# include attributesattributes = []for i in range(num_detections):

prob = softmax(attribute_logits[i,:])attributes.append([j for j in range(len(prob)) if prob[j] > score_th])

[5]: max_boxes_to_draw = 10

linewidth = 2fontsize = 10line_alpha = 0.8mask_alpha = 0.5

output_image_path = 'results.pdf'

image = copy.deepcopy(image_raw)

cm_subsection = np.linspace(0., 1., min(max_boxes_to_draw,�,!len(detection_scores)))

colors = [cm.jet(x) for x in cm_subsection]

2

detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:,!num_detections]

detection_masks = np.squeeze(detection_masks, axis=(0,))[0:num_detections]detection_logits = np.squeeze(detection_logits, axis=(0,))[0:num_detections]attribute_logits = np.squeeze(attribute_logits, axis=(0,))[0:num_detections]

# include attributesattributes = []for i in range(num_detections):

prob = softmax(attribute_logits[i,:])attributes.append([j for j in range(len(prob)) if prob[j] > score_th])

[5]: max_boxes_to_draw = 10

linewidth = 2fontsize = 10line_alpha = 0.8mask_alpha = 0.5

output_image_path = 'results.pdf'

image = copy.deepcopy(image_raw)

cm_subsection = np.linspace(0., 1., min(max_boxes_to_draw,�,!len(detection_scores)))

colors = [cm.jet(x) for x in cm_subsection]

2

Fig. 25. Attribute-Mask R-CNN inference demo