Components and Support Services Rel.2

R2 COMPONENT AND PIPELINE SUPPORT SERVICES Human-enhanced time-aware multimedia search

CUBRIK

Project IST-287704

Deliverable D8.2 WP8

Deliverable Version 1.0 – 31/07/2013

Document. ref.: cubrik.D82.FRH.WP8.V1.0

CUBRIK R2 Component and Pipeline Support Services D8.2 Version 1.0

Programme Name: ...................... IST Project Number: ........................... 287704 Project Title:.................................. CUBRIK Partners:........................................ Coordinator: ENG (IT)

Contractors: UNITN, TUD, QMUL, LUH, POLMI, CERTH, NXT, MICT, ATN, FRH, INN, HOM, CVCE, EIPCM, EMP

Document Number: ..................... jpl3.cubrik.D82.FRH.WP8.V1.0.doc Work-Package: ............................. WP8 Deliverable Type: ........................ Accompanying document Contractual Date of Delivery: ..... 31 July 2013 Actual Date of Delivery: .............. 31 July 2013 Title of Document: ....................... R2 Component and Pipeline Support Services Author(s): ..................................... Christina Weigel (FRH) Contributor(s): . ............................ Marco Tagliasacchi (POLMI), Chiara Pasini

(POLMI), Theodoros Semertzidis (CERTH), Markus Brenner(QMUL), Mathias Otto (EMP), Martha Larson (TUD)

Approval of this report ............... Summary of this report: .............. Keyword List: ............................... content processing, components, pipelets,

evaluation Availability .................................... This report is: public

This work is licensed under a Creative Commons Attribution-NonCommercial-

ShareAlike 3.0 Unported License.

This work is partially funded by the EU under grant IST-FP7-287704


Disclaimer

This document contains confidential information in the form of the CUbRIK project findings, work and products and its use is strictly regulated by the CUbRIK Consortium Agreement and by Contract no. FP7- ICT-287704.

Neither the CUbRIK Consortium nor any of its officers, employees or agents shall be responsible or liable in negligence or otherwise howsoever in respect of any inaccuracy or omission herein.

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7-ICT-2011-7) under grant agreement n° 287704.

The contents of this document are the sole responsibility of the CUbRIK consortium and can in no way be taken to reflect the views of the European Union.


Table of Contents EXECUTIVE SUMMARY 5

1. COMPONENTS FOR CONTENT ANALYSIS 6

1.1 FACE DETECTION 6 1.1.1 Identity Card 6

1.2 FACE IDENTIFICATION 7 1.2.1 Identity Card 7

1.3 GENERIC VISUAL FEATURE EXTRACTOR 8 1.3.1 Identity Card 8

1.4 DESCRIPTORS EXTRACTOR 9 1.4.1 Identity Card 9 1.4.2 Comments 9

1.5 ACCESSIBILITY-RELATED DESCRIPTORS EXTRACTOR 10 1.5.1 Identity Card 10 1.5.2 Comments 10

1.6 VIDEO SEGMENT MATCHER 11 1.6.1 Identity Card 11

1.7 IMAGE SIMILARITY DETECTION 12 1.7.1 Identity Card 12

1.8 BODY PART DETECTOR 13 1.8.1 Identity Card 13

1.9 LIKELINES - IMPLICIT FILTER FEEDBACK 14 1.9.1 Identity Card 14 1.9.2 Comments 14

2. COMPONENTS FOR CRAWLING 16

2.1 MEDIA HARVESTING AND UPLOAD 16 2.1.1 Identity Card 16 2.1.2 Comments 16

2.2 COPYRIGHT AWARE CRAWLER 17 2.2.1 Identity Card 17

2.3 IMAGE EXTRACTION FROM SOCIAL NETWORK 18 2.3.1 Identity Card 18

3. COMPONENTS FOR CONTENT BASED QUERY PROCESSING 19

4. EVALUATION OF FACE DETECTION COMPONENTS 20

4.1 ANNOTATION 20 4.2 EVALUATION METHOD 20 4.3 RESULTS 22

4.3.1 Results for the whole data set 22 4.3.2 Results for fully annotated files 23 4.3.3 Results for frontal faces 25 4.3.4 Conclusion & Recommendations 26

5. REFERENCES 27

6. DIAGRAMS 28

CUBRIK R2 Component and Pipeline Support Services Page 5 D8.2 Version 1.0

Executive Summary

The first deliverable in WP8 - D8.1 - gave a broad overview of candidate components for content analysis. This deliverable focuses on components for content analysis and platform support specifically within the CUbRIK-platform. They are an integral part of the V-Apps “History of Europe” and “Fashion App” pipelines as well as of the supporting H-Demos. In the following sections these components will be briefly described in terms of a short functional description, internal and external dependencies and the qualitative description of input and output data.

Besides the component description, this deliverable contains a comprehensive technical evaluation of the face detection components used in the “History of Europe”-V-App. This evaluation will be used as a measure for the success of the CUbRIK idea in terms of crowd involvement. Since the evaluation is closely connected to specific components for face detection, it is presented here instead of in WP10, which aims at the evaluation of the CUbRIK pipelines as a whole.


1. Components for content analysis

1.1 Face Detection

1.1.1 Identity Card

Component Name Face Detection

Responsibility This component automatically detects faces in an image and sends results to validation

Rel. V-App / H-Demo History of Europe (HoE)

Depends on Delivers to Interaction within the CUbRIK platform

- Crowd Face Position Validation (HoE)

- Face Identification (HoE)

External dependencies (i.e. third party libraries)

- ThirdPartyTool face detection and recognition component

(commercial)

Provided Interfaces SMILA compliant interface

Input data file system folder with images;

single image

Output data position of the detected faces as bounding rectangle;

position of face keypoints (eyes, mouth, chin)

Platform(s) Windows

SMILA Version 1.1

Java Version JDK 1.7.0


1.2 Face Identification

Face Identification is the component in charge of identifying a person in an image. It starts calculating face similarity between a face and a fixed subset of know faces (portraits) and then sends the face with the top-K most similar portraits.

1.2.1 Identity Card

Component Name Face Identification

Responsibility This component aims to identify the person represented in a face by performing face similarity between the face and a fixed set of already identified faces.



- Face Detection component (HoE)

- Crowd Pre-Filtering (HoE)

- Expert crowd entity verification (HoE)

- Crowd Face Identification (HoE)

- Crowd Pre-Filtering (HoE)

- Expert crowd entity verification (HoE)

- Entitypedia Storage (HoE)


- ThirdPartyTool face detection and recognition component (commercial)


Input data position of the detected faces as bounding rectangle

position of face key points (eyes, mouth, chin)

image

portraits

Output data ID of top-K most similar portraits with matching confidence value

Platform(s) Windows

SMILA Version 1.1



1.3 Generic Visual Feature Extractor

The Generic Visual Feature Extractor analyses visual data such as images and videos extracts features and generates descriptors for the processed files. The features/descriptors supported by the interface can easily be extended since internally the components can be configured by xml processing instructions.

1.3.1 Identity Card

Component Name Generic Visual Feature Extractor

Responsibility This component extracts visual features from images or videos and delivers them as descriptors. Currently, color layout and Optical Flow are supported. Extension to other descriptors such as color histogram, dominant color, edge histogram, Tamura Features etc. is a matter of configuration only.

Rel. V-App / H-Demo News Content History (NCH), History of Europe (HoE)


- Video Segment Matcher (NCH)


- Fraunhofer XPX extraction framework (commercial, framework of CUbRIK partner institution)


Java API

Input data Single image or video file

Folder with image or video files

Feature type to be extracted

Output data The descriptors as binary file. The file format currently in use (afp) is proprietary and can handle multiple descriptors within one file. If needed by the CUbRIK platform readers and writers can be provided.

Platform(s) Windows, Linux

SMILA Version 1.1



1.4 Descriptors extractor

The descriptors extractor component extracts content and metadata descriptors from images. The component can be used to retrieve a description of the full image or of segmented parts of the image, respectively.

1.4.1 Identity Card

Component Name Descriptors extractor

Responsibility This component extracts color and texture (dominant color, colorPalette, LBP, OSIFT) descriptors from images or videos and delivers them as descriptors.

Rel. V-App / H-Demo Fashion V-App


Trend Analyser (Fashion V-App)

Accessibility filtering


- OSIFT exec (http://koen.me/research/colordescriptors/)

Provided Interfaces Java API

Input data image URL

Output data Descriptor as String


SMILA Version 1.1


1.4.2 Comments

This component delivers similar functionality as the component described in section 1.3 (Generic Visual Feature Extractor). Due to different requirements in analysed data (segmented images vs. video) and descriptor output format (String vs. binary) it has been decided to leave the components separated for now but activities to examine a unified component will follow.


1.5 Accessibility-related Descriptors Extractor

The descriptors extractor component extracts accessibility-related content and metadata descriptors from images. The component can be used to retrieve a description of the full image or of segmented parts of the image, respectively.

1.5.1 Identity Card

Component Name Accessibility-related Descriptors Eextractor

Responsibility This component extracts color, texture and shapial (brightness, contrast, dominant color, color_list histogram, colour percentage, colour saturation, shapial properties, etc.) descriptors from images and delivers them as descriptors.



Descriptors extractor Accessibility filtering


- OPENCV (http://docs.opencv.org/)


Input data image URL

Output data Descriptor as String (JSON)


SMILA Version 1.1


1.5.2 Comments

This component delivers similar functionality as the component described in section 1.4 (Descriptors extractor). Due to different requirements in analysed data (segmented images vs. video) and descriptor output format (String vs. binary) it has been decided to leave the components separated for now but activities to examine a unified component will follow.


1.6 Video Segment Matcher

The Video Segment Matcher matches the visual descriptors of a reference media file against one or more other media file descriptors. It identifies perceptually identical video segments and returns them. In order to obtain fast processing on large data sets the component utilizes an in memory data base for the descriptors. Descriptors to be matched must exist in this data base and are specified by a unique ID. The Video Segment Matcher understands the proprietary afp data format provided by the Generic Visual Feature Extractor component.

1.6.1 Identity Card

Component Name Video Segment Matcher

Responsibility The Video Segment Matcher is responsible for matching a query fingerprints / descriptor against one or multiples reference fingerprints / descriptors. It returns video segments that match as perceptually identical segments.

Rel. V-App / H-Demo News Content History (NCH), History of Europe (HoE)


Generic Visual Feature Extractor (NCH)


Fraunhofer VideoSegmentMatcher (commercial, framework of CUbRIK partner institution)


Java API

Input data uid of query descriptor

uids of reference descriptors

Output data matching segment information


SMILA Version 1.1



1.7 Image similarity detection

The component “Image similarity detection” returns a set of images that are similar to the selected one. The selected image is therefore the query image. The result set is dependent on the kind of dress that is also been specified during the query.

1.7.1 Identity Card

Component Name Image similarity detection

Responsibility The component is responsible for finding similar images (in terms of color and texture). Besides the query image the kind of clothing which is of interest has to be taken into account.



- Descriptors extractor (CERTH)


- OpenCV

Provided Interfaces JSON/REST

Input data image directory, clothing of interest query

Output data list of similar images


SMILA Version 1.1



1.8 Body part detector

The body part detector component analyses images and identifies upper and lower body parts of depicted people, eventually highlighting the parts of an image that depict upper or lower body parts. The component provides an interface to submit an image and retrieve metadata (e.g. bounding rectangle) of any detected body parts.

1.8.1 Identity Card

Component Name Body part detector

Responsibility Detects upper and lower body parts in an image.



None - Extraction workflow (Descriptors extractor) (CERTH)


None

Provided Interfaces - JAVA (JSON)

- Native (JSON)

Input data Path to local image file

Output data Metadata describing an identified body (upper body, lower body, entire body). All body parts follow the same JSON metadata structure:

- Type: upperBody, lowerBody, entireBody, face

- Rectangle: y, x and height, width in absolute coordinates

- Confidence score

Platform(s) Linux, Windows (not tested)

SMILA Version N/A



1.9 LikeLines - Implicit Filter Feedback

LikeLines is a multimedia player component that captures user interactions in order to improve the representation of multimedia items within a system’s index. The user interactions are collected and used to identify the most interesting/relevant fragments in a video. The type of interactions can be both implicit (e.g., play, pause, rewind) and explicit (i.e., explicitly liking particular time points in the video).

1.9.1 Identity Card

Component Name LikeLines - Implicit Filer Feedback

Responsibility Detects interesting frames in a video

Rel. V-App / H-Demo Fashion V-App / LikeLines H-Demo


- Trend analyser (CERTH)


-

Provided Interfaces JAVA (SMILA Pipelet)

Input data YouTube video URL

Output data Interesting frames


SMILA Version 1.1


1.9.2 Comments

Previous work

The history of LikeLines began in 2011 when the concept was submitted to the Knight-Mozilla Journalism challenged (“MoJo”, short for “Mozilla Journalism”). The challenge was open to which people could submit ideas that would help in transforming news storytelling. The original description of the challenge can be found at https://drumbeat.org/en-US/challenges/unlocking-video/full.

The original LikeLines concept (https://drumbeat.org/en-US/challenges/unlocking-video/submission/60/) was one of 60 projects that got accepted into the second round of the challenge and one of the 20 projects that got into the final round. In the subsequent rounds, the concept was refined and a prototype was developed. The prototype was merely a mock-up of the graphical user interface and was created to convey the ideas behind the concept. Functionally, it was only a GUI with no functional logic at all. The original code of the mock-up is available at the previous repository of LikeLines: https://github.com/ShinNoNoir/likelines-mojo/tree/bf63facfbbae644aa71f3c7fec906ecffc212c3c

Enhancements

After the Knight-Mozilla challenge ended, the LikeLines idea was shelved. During the beginning of the CUbRIK project, the LikeLines idea resurfaced, but actual development work


did not start until May 2012. Since the original LikeLines code was just a mockup, the existing code-base was scrapped and the new implementation was done from scratch (e.g., the original mockup mimicked a heat-map using CSS gradients while the rewritten functional implementation uses a canvas to draw an actual heat-map). The current incarnation of LikeLines became fully functional on the day it was presented at ACM MM’12, October 30, 2012.


2. Components for crawling

2.1 Media harvesting and Upload

The media harvesting and upload component is responsible for populating the History of Europe database with content from the CVCE archive and external sources. The external data sources that will be considered are the Flickr service and the Europeana collections.

2.1.1 Identity Card

Component Name Media harvesting and Upload

Responsibility Responsible for population HoE data bases with content from different sources



- -


-


Input data textual query

Output data Images and associated meta data (uploader profile)

Platform(s) Windows

SMILA Version 1.2


2.1.2 Comments

The component will be used offline and as independent scripts to collect the datasets and populate the database. In case this is needed for the next versions of the demonstrator, then a SMILA pipelet for the Europeana API may be provided.


2.2 Copyright Aware Crawler

The copyright-aware crawler (CAC) is responsible for the crawling of public content which is compliant with a pre-defined set of usage rules, via pre-filtering based on (trustworthy) license information available via APIs.

2.2.1 Identity Card

Component Name Media harvesting and Upload

Responsibility The copyright-aware crawler (CAC) is responsible for the crawling of public content which is compliant with a pre-defined set of usage rules.



- -


-

Provided Interfaces - SMILA BPEL: JSON/REST

Input data Services to be crawled, which permissions have to be granted to the crawled content

Output data Crawled content, content meta data

Platform(s) Windows

SMILA Version 1.1



2.3 Image extraction from Social Network

The “image extraction from SNs” component is responsible of collecting tweets and the associated images from twitter that is relevant to the predefined topics for the fashion v-app. The component takes a set of categories that listens to. The component operates in a tight loop with the trend analyser component which feeds this component with usernames of twitter users that are characterised as trend setters. This information is exploited in the retrieval process in order to enhance the quality of the retrieved content.

2.3.1 Identity Card

Component Name Image extraction from Social Network

Responsibility Responsible for retrieving content from different photo sharing sites and posted through twitter service to feed Fashion V-App with content



- Body part detector (QMUL)


-twitter streaming API

Provided Interfaces Directly to MongoDB. Stored as a JSON object

Input data A set of query strings and / or twitter accounts to follow

Output data A stream of matched tweets with their media URLs and the corresponding images

Platform(s) Linux, Windows

SMILA Version -

Java Version 1.7


3. Components for content based query processing

T8.2 had been established to define components for content based query processing. Content based query processing is understood as pre-processing and optimization of a content-based or especially multimodal query in order to obtain a “low cost”-query to the data base. The extraction of descriptors from images or videos to reduce the dimensionality of media is the simplest way of such a pre-processing but might be not sufficient especially when multi-modal queries are used.

An enquiry for requirements took place in order to gain input from all V-App and H-Demo responsible partners. As a starting point the enquiry document gave some hints in terms of identifying potential use cases that incorporate querying by media content (i.e. images, audio, or video).

The result of enquiry showed, that hardly no use case of the CUbRIK platform required or is not yet aware of the need for content based query processing. A reason for this could be the focus of the core content processing and other components that constitute the V-Apps. Similarly most of use cases do no incorporate a content based query scenario but use text queries instead. The lack of large data sets in the beginning for gaining statistical data and testing algorithms might be another reason. Thus no component had been specified within the V-Apps or H-Demos yet although the components for feature extraction as described e.g.

in section 1.3. or 1.4 can be interpreted as such components.

Anyhow some use cases may require such components in the next release in order to reduce query cost (i.e. answering time). Those are:

• News Content History H-Demo: “Create Visualization” (part of UC “Query” ) o Although the use case is designed to support asynchronous queries

improvements by query processing may be required.

• Fashion V-App: “Trend analysis for image sample”

• Fashion V-App: “Search similar images”


4. Evaluation of Face Detection components

In the context of the “History of Europe” application CVCE provided data set containing 3924 images of persons (mainly politicians). In WP 8.1 (Components for content analysis) FRH evaluated the performance of five components for face detection: SHORE library, ThirdPartyTool library, OpenCV library, Face.com web service and reKognition.com web

service. The data set consist of a variety of low resolution multi-person images1 with all kinds of image distortions and therefore is a very demanding data set. The results of the evaluation are presented in this report.

4.1 Annotation

In 1005 images of the data set 2535 faces have been hand annotated for benchmark purposes. For each face the following meta-data was annotated:

- the rectangular face region

- the horizontal face orientation (Frontal, SemiLeft, SemiRight, Left, Right)

- depending on the orientation: eye centers, mouth center & ear lobe

- the faces perspective (EyeLevel, SemiTop, SemiBottom)

- the amount of occlusion of the annotated face (None, Moderate, Much)

- the gender of the annotated face (Male, Female)

Figure 1 Example image with hand annotated rectangular regions

The annotation data is stored in a XML document (one for each image) that uses a simple and generic schema for annotation purposes.

4.2 Evaluation method

For evaluation of the detection results we used two well established state of the art distance

measures on the rectangular regions [1] [2]. For each image the each of the rectangular

1 ThirdPartyTool library was additionally tested on high resolution images. These values are only for information and should not be compared to the results of the other components that were evaluated with low res images only.


regions of the annotated ground truth data are compared to the each detected rectangular

region by first comparing the Euclidian distance c∆ of the region centres:

²² yxc ∆+∆=∆ with 22

))'5.0'()''5.0''(( wxwxx +−+=∆

and 22

))'5.0'()''5.0''(( hyhyy +−+=∆

The value of c∆ is compared to a configurable threshold 'wξ . The second measure related

the regions widths to each other and defines a tolerance measure for different regions widths:

'

'''

w

ww −=ω

A faces is labelled true positive ( tp) when the following expression is true. Otherwise it is set

to false positive ( fp ).

ψωξ <∧<∆ 'wc

Faces that have been annotated but are not detected are set to false negative ( fn ).The

scalars ξ and ψ are determined heuristically and can be adapted if required. Precision

pre and recall rec are calculated using the summed value of tp, fp , fn over all ground

truth faces:

∑∑∑

+=

fptp

tppre and

∑∑∑

+=

fntp

tprec

Figure 2 gives an overview of the relations.

(x',y')(x'',y'')

h'

w'

w''

h''

boundig region ground truth

boundig region detected

Figure 2 Region measures used for face detection evaluation


4.3 Results

First at all we evaluated the influence of ξ and ψ with respect to the result. Regarding ψ

we found a significant increase in recall and precision up to a value of 5.0 that remained

more or less constantly with further increase in all data sets. The same applies toξ at a

value of 0.4. Therefore we decided to set 6.0=ψ and 4.0=ξ . See Figure 10 and Figure 11.

We did not train the face detectors ourselves but used the presets that have been provided by the libraries. From the SHORE library we used presets for standard, rotated and tiny faces. OpenCV provides a number of Haar-cascades that we tested using the default parameters. Face.com and reKognition.com had been set to “normal/default” and “aggressive” detection mode. The ThirdPartyTool lib was configured using “save” parameters obtaining a high precision rate and standard parameters.

4.3.1 Results for the whole data set

Figure 3 shows the results for the whole test set. Since we did annotated all faces in each image (we only annotated important stakeholders and some image contain large person crowds etc.) the precision value is not valid since it might contain correctly detected faces that are labeled as false positives because they haven’t been annotated. Therefore we only show the recall values here, which can give only a first impression.

In this scenario ThirdPartyTool performs best in recall rate both on high and low res images. The “aggressive” modes of face.com and reKognition.com rank second best shortly followed by the SHORE library and the face.com “normal” mode. OpenCV is far behind.

Figure 3 Results for all annotations without restriction (recall only)


4.3.2 Results for fully annotated files

Since the results of the previous section can only be seen as orientation we evaluated the algorithms against fully annotated files. That means only against image where all faces have been annotated. Figure 4 shows precision and recall.

Figure 4 Precision and Recall - fully annotated files only

Figure 5 F-Measure - fully annotated files only


This evaluation identifies the algorithms of reKognition.com as winner closely followed by face.com. While the tiny mode of SHORE hast a good recall, its precision is the worst. The “save mode” of ThirdPartyTool guarantees for a high precision rate (only real faces detected) at the cost of low detection rates. Thus its F-Measure is close to the results of OpenCV which performs worst. Further tweaking the parameters of ThirdPartyTool might improve the results significantly.

SHORE, ThirdPartyTool, reKognition.com and face.com also provide a confidence measure for each detected face. We defined a threshold from 0…100 for this measure and estimated the results only for face of which the confidence value is above the current threshold (i.e. the number of false positives increases with the threshold). The ROC curve in Figure 6 shows the false positives against the recall when increasing the threshold. The steep slopes of the reKognition.com and face.com show that higher recall rates are achieved at having less false positives at the same time when using the confidence values as thresholds. SHORE performs better than the “standard” setting of ThirdPartyTool.

Figure 6 ROC curve - fully annotated files


4.3.3 Results for frontal faces

Looking for reasons for the results above, we recognized that the SHORE library default sets are trained for frontal faces only. Therefore we used only frontal ground truth faces for evaluation. The results are shown in Figure 7 to Figure 9.

Figure 7 Precision and Recall - fully annotated files with frontal faces only

Figure 8 F-Measure- fully annotated files with frontal faces only


Figure 9 ROC curve - fully annotated files with frontal faces only

Now the SHORE library results (especially analysis) are close to face.com. Rekognition.com is the clear winner on frontal faces as can be seen in F-Measure and ROC curve. ThirdPartyTool and OpenCV still cannot deliver such good results. Noticeable the ROC curve of face.com drops below SHORE while reKognition.com remains on top.

4.3.4 Conclusion & Recommendations

Evaluating frontal faces only, reKogntion.com is the clear winner. SHORE and face.com deliver almost the same results. The Viola/Jones algorithm used by OpenCV does not belong to the State of the art anymore and explains the comparatively worse results. ThirdPartyTool might be used for high precision detection but with the risk of missing important faces. The ThirdPartyTool and SHORE results for any kind of face orientation might be improved by tweaking the parameters and training sets for the specific task which is something that cannot be done with the web services.

Since face.com has recently been acquired by facebook it has been shut down its API services and these results might be the last retrieved from that service. The service might be provided by facebook in the future. Therefore the best choice for component for face detection would be the rekogniton.com or the SHORE library with respect to the results determined in this study.


5. References

[1] V. Popovici, Y. Rodriguez, J.-P. Thiran and S. Marcel, “On performance evaluation of face detection and localization algorithms. Technical Report 03-80. IDIAP, 2003

[2] R. Kasturi and et. al., “Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol,” IEEE Trans. Pattern Analysis and Machine Intelligence, pp. 1–17, 2003


6. Diagrams

Figure 10 Influence of evaluation parameters on recall


Figure 11 Influence of evaluation parameters on precision

APPENDICES TO D8.2 Human-enhanced time-aware multimedia search

CUBRIK

Project IST-287704

Deliverable D8.2 WP8

Deliverable Version 1.0 – 31/07/2013

Document. ref.: cubrik.D82.FRH.WP8.V1.0.appendices


Programme Name: ...................... IST Project Number: ........................... 287704 Project Title:.................................. CUBRIK Partners:........................................ Coordinator: ENG (IT)

Contractors: UNITN, TUD, QMUL, LUH, POLMI, CERTH, NXT, MICT, ATN, FRH, INN, HOM, CVCE, EIPCM, EMP

Document Number: ..................... cubrik.D82.FRH.WP8.V1.0.appendices Work-Package: ............................. WP8 Deliverable Type: ........................ Accompanying document Contractual Date of Delivery: ..... 31 July 2013 Actual Date of Delivery: .............. 31 July 2013 Title of Document: ....................... R2 Component and Pipeline Support Services Author(s): ..................................... Christina Weigel (FRH) Contributor(s): . ............................ Marco Tagliasacchi (POLMI), Chiara Pasini

(POLMI), Theodoros Semertzidis (CERTH), Markus Brenner(QMUL), Mathias Otto (EMP), Martha Larson (TUD)

Approval of this report ............... Summary of this report: .............. Keyword List: ............................... content processing, components, pipelets,

evaluation Availability .................................... These appendices are confidential

This work is licensed under a Creative Commons Attribution-NonCommercial-

ShareAlike 3.0 Unported License.

This work is partially funded by the EU under grant IST-FP7-287704


Disclaimer

This document contains confidential information in the form of the CUbRIK project findings, work and products and its use is strictly regulated by the CUbRIK Consortium Agreement and by Contract no. FP7- ICT-287704.

Neither the CUbRIK Consortium nor any of its officers, employees or agents shall be responsible or liable in negligence or otherwise howsoever in respect of any inaccuracy or omission herein.

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7-ICT-2011-7) under grant agreement n° 287704.

The contents of this document are the sole responsibility of the CUbRIK consortium and can in no way be taken to reflect the views of the European Union.


Table of Contents EXECUTIVE SUMMARY 1

1. "FACE DETECTION" COMPONENT DESCRIPTION 2

1.1 REQUIREMENTS 2 1.1.1 Functional requirements 2

1.2 USE CASES LIST 2 1.2.1 Associated sequence diagrams 2

1.3 COMPONENT IDENTITY CARD 3 1.4 ELEMENTS OF CONCEPTUAL MODEL 3 1.5 COMPONENT INTERFACES SPECIFICATION 3

1.5.1 Methods description 3 1.5.2 Relevant WP, partners, deliverables 6 1.5.3 Component delivery 6

1.6 SMILA DEPLOYMENT ENVIRONMENT 7 1.6.1 Configuration of the 3rd party library and components to be integrated in SMILA 7

2. "FACE IDENTIFICATION" COMPONENT DESCRIPTION 8

2.1 REQUIREMENTS 8 2.1.1 Functional requirements 8



2.5.2 Relevant WP, partners, deliverables 11 2.5.3 Component delivery 11

2.6 SMILA DEPLOYMENT ENVIRONMENT 12 2.6.1 Configuration of the 3rd party library and components to be integrated in SMILA12

3. "GENERIC VISUAL FEATURE EXTRACTOR" COMPONENT DESCRIPTION 13

3.1 REQUIREMENTS 13 3.1.1 Functional requirements 13 3.1.2 Non Functional requirements 13


3.3 COMPONENT IDENTITY CARD 14 3.4 ELEMENTS OF CONCEPTUAL MODEL 14 3.5 COMPONENT INTERFACES SPECIFICATIONS 15



3.7 VIDEO CONTENT QUERY 18

4. "DESCRIPTORS EXTRACTOR" COMPONENT DESCRIPTION 19




4.5.1 Methods description 22




5. "ACCESSIBILITY" COMPONENT DESCRIPTION 24

5.1 REQUIREMENTS 24 5.1.1 Functional requirements 24 5.1.2 Non functional requirements 24

5.2 USE CASES LIST 24 5.2.1 Associated sequence diagrams 25 5.2.2 Component Identity Card 25 5.2.3 Elements of Conceptual Model 26 5.2.4 Component Interfaces Specification 26 5.2.5 Methods description 27 5.2.6 Relevant WP, partners, deliverables 27 5.2.7 Component delivery 27


6. "VIDEO SEGMENT MATCHER" COMPONENT DESCRIPTION 29



6.3 COMPONENT IDENTITY CARD 30 6.4 ELEMENTS OF CONCEPTUAL MODEL 30 6.5 COMPONENT INTERFACES SPECIFICATIONS 31



6.7 VIDEO CONTENT QUERY 34

7. "EXTRACTION OF IMAGES FROM SOCIAL NETWORKS" COMPONENT DESCRIPTION 35






8. "BODY PART DETECTOR AND ROUTER" COMPONENT DESCRIPTION 41







9. "LIKELINES - IMPLICIT FEEDBACK FILTER" COMPONENT DESCRIPTION 47






10. "MEDIA HARVESTING AND UPLOAD" COMPONENT DESCRIPTION 52





10.6 SMILA DEPLOYMENT ENVIRONMENT 56

11. "COPYRIGHT-AWARE CRAWLER" COMPONENT DESCRIPTION 57





11.6 SMILA DEPLOYMENT ENVIRONMENT 59

12. "IMAGE EXTRACTION FROM SOCIAL NETWORK" COMPONENT DESCRIPTION 60






12.6 SMILA DEPLOYMENT ENVIRONMENT 63 12.6.1 Configuration of the 3rd party library and components to be integrated in SMILA 63


Executive Summary

These appendices to deliverable D8.2 are the latest confidential detailed component specifications


1. "Face Detection" component description

Face Detection is the component in charge of invoking the Face Detection Tool (Keesquare) and invoking the Crowd Face Position Validation if required to obtain a set of faces from a photo.

1.1 Requirements

1.1.1 Functional requirements

RF1:must receive photos to be processed from the user interface

RF2:must invoke the Face Detection Tool to detect faces in the photo and save results.

RF3:must invoke the Crowd Face Position Validation Component to validate automatic face detection results

RF4:must invoke the Crowd Face Position Validation Component to add faces not detected automatically

1.2 Use Cases List

Face Detection is involved in the following use cases:

1. Indexation of images (UseCase 1.1)

1.2.1 Associated sequence diagrams

This sequence diagram describes the behaviour of the Face Detection component when a new IMAGE is processed.


1.3 Component Identity Card

Component Name Face Detection

Responsibilities This component automatically detect faces in an image and send results to validation

Provided Interfaces POST:

smila/hoe/processCollection used to start processing a folder of images

smila/hoe/processImage used to upload an image to be processed

GET:

smila/hoe/getCollection used to retrieve detected faces

Dependencies /

Required Interfaces

This component interacts with a Face Detection Tool (keesquare) for automatic face detection and with the Crowd Face Position Validation component to validate results. It also interacts with the Entitypedia storage to save faces and photos.

1.4 Elements of Conceptual Model

In the following, the data model that will be used by the component and described as per document D2.1 is provided.

1.5 Component Interfaces Specification

1.5.1 Methods description

POST smila/hoe/processCollection?collectionPath=<path>&collectionName=<name>

Start processing images into the specified folder.

Resource URL

http://smilaAddress:smilaPort/smila/hoe/processCollection


Parameters

collectionPath Path of the folder that contains the collection to process

collectionName Name of the collection

On Error or Not found

Returns HTTP/1.1 500

Example Request

http://smilaAddress:smilaPort/smila/hoe/processCollection?collectonName=myCollection&collectionPath=myPath

POST smila/hoe/facedetection/processImage

Upload an image into the system and start processing.

Resource URL

http://smilaAddress:smilaPort/smila/hoe/processImage

Parameters

collection Collection of the image to add

imagePath Path of the image

name Name of the image



Example Request

http://smilaAddress:smilaPort/smila/hoe/processImage=?collection=myCollection‹name=00008&imagePath=myPath

GET smila/hoe/getCollection?collection=<name>

Retieve face and match result for the specified collection.

Resource URL

http://smilaAddress:smilaPort/smila/hoe/getCollection?collection=<name>

Parameters

collection name




Example Request

http://smilaAddress:smilaPort/smila/hoe/getCollection?collection=myColl

{

"photos": [

{

"timestamp": 1368782833238,

"photoDSURI": "images\/Group Photos\/00008.jpg",

"faces": [

{

"timestamp": 1368782833238,

"confidenceCrowdValidation": 1.0,

"identificationIds": [

],

"matchIds": [

"m0",

"m1",

],

"faceId": "00008_0",

"bottom": 839,

"left": 2403,

"confidenceFDT": 0.7292743739326019,

"right": 2523,

"top": 620

}

],

"portraits": [

{

"timestamp": 1368782833285,

"photoDSURI": "images\/Portrait\/Harold+Macmillan_8.jpg",

"faces": [

{

"faceId": "Harold+Macmillan_8_0",

"bottom": 165,

"left": 160,

"confidenceFDT": 0.730196043210003,

"right": 252,

"top": 14

}

],

"name": "Harold+Macmillan_8",

"personName": "Harold Macmillan"

},

{


}

],

"matches": [

{

"timestamp": 1368782833285,

"portraitId": "Jean-Bernard+Raimond_3",

"faceId": "00008_0",

"matchId": "m0",

"confidenceFDT": 0.2010913994684408

},

{

}

]

}

1.5.2 Relevant WP, partners, deliverables

The partner responsible for the Face Detection component is POLMI in collaboration with FRH. The work is part of tasks 5.1, 5.2 8.1

1.5.3 Component delivery

The component is part of R3. The present version is available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/Demos/HistoryOfEurope_POLMI/


1.6 SMILA Deployment Environment

ANY

LINUX

WINDOWS

OS - Operating System

(Specify ANY if the operating system is not a constraint for the component)

MAC

JDK 1.6.0_X JVM – Java Virtual Machine

(Specify if your component needs a specific JDK version; SMILA v1.2 requires JDK 1.7)

JDK 1.7.0_Y

SMILA source code version

1.1

Eclipse SDK version 4.2.2

DBMS – Data Base Management System

(Specify here if your component needs a DBMS)

Derby

Postgres

MySQL

Other - Specify

Note

1.6.1 Configuration of the 3rd party library and components to be integrated in SMILA

In order to properly run the Keesquare executable file it is necessary to set the following parameters in the configuration file:

SMILA.application/configuration/cubrikproject.service.polmi.FaceDetection/detector.properites

#path of executable

detector.path=/home/pasini/detector.exe

#where to save templates

template.path=/path

#where to save descriptor

descriptor.path=/path

#where to save edited photos

editedPhoto.path=/path

#if the component has to save templates

template.enable=true

#if the component has to save descriptions

descriptor.enable=false

#if the component has to save edited photos

editedPhoto.enable=true


2. "Face Identification" component description

Face Identification is the component in charge of identifying a person in an image. It starts calculating face similarity between a face and a fixed subset of know faces (portraits) and then sends the face with the top-K most similar portraits to the crowd to be validated.

2.1 Requirements


RF1:must receive a face to be processed from the keesquare RF2:must invoke the Face Matcher Component (keesquare) to find similarity between the new face and all the portraits RF3:must send faces and matches to the Expert Crowd Entity Verification component that creates the expert crowd identification tasks.

2.2 Use Cases List

Face Detection is involved in the following use cases:



This sequence diagram describes the behaviour of the Face Identification component when it receives a new face to process.



Component Name Face Identification

Responsibilities This component aims to identify the person represented in a face by performing face similarity between the face and a fixed set of already identified faces.

Provided Interfaces The component is a SMILA pipeline activated after the Face Detection pipeline receiving has input a SMILA record representing a Face. To retrieve results:

GET:

smila/hoe/getCollection used to retrieve detected faces and matches

Dependencies /

Required Interfaces

This component interacts with a Face Matcher Tool (keesquare) for automatic face similarity and with the Crowd Face Identification component and Crowd Pre-Filtering to validate results. It also interacts with the Entitypedia storage to add matches.


In the following, the data model that will be used by the component and described as per document D2.1 is provided.


GET smila/hoe/getCollection?collection=<name>

Retrieve face and match result for the specified collection.

Resource URL

http://smilaAddress:smilaPort/smila/hoe/getCollection?collection=<name>

Parameters

collection name




Example Request

http://smilaAddress:smilaPort/smila/hoe/getCollection?collection=myColl

{

"photos": [

{

"timestamp": 1368782833238,

"photoDSURI": "images\/Group Photos\/00008.jpg",

"faces": [

{

"timestamp": 1368782833238,

"confidenceCrowdValidation": 1.0,

"identificationIds": [

],

"matchIds": [

"m0",

"m1",

],

"faceId": "00008_0",

"bottom": 839,

"left": 2403,

"confidenceFDT": 0.7292743739326019,

"right": 2523,

"top": 620

}

],

"portraits": [

{

"timestamp": 1368782833285,

"photoDSURI": "images\/Portrait\/Harold+Macmillan_8.jpg",

"faces": [

{

"faceId": "Harold+Macmillan_8_0",

"bottom": 165,

"left": 160,

"confidenceFDT": 0.730196043210003,

"right": 252,

"top": 14

}

],

"name": "Harold+Macmillan_8",


"personName": "Harold Macmillan"

},

{

}

],

"matches": [

{

"timestamp": 1368782833285,

"portraitId": "Jean-Bernard+Raimond_3",

"faceId": "00008_0",

"matchId": "m0",

"confidenceFDT": 0.2010913994684408

},

{

}

]

}


The partner responsible for the Face Identification component is POLMI in collaboration with FRH. The work is part of tasks 5.1, 5.2, 8.1, 8.2


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/Demos/HistoryOfEurope_POLMI



ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.1




Derby

Postgres

MySQL

Other - Specify

Note


In order to properly run the Keesquare executable file it is necessary to set the following parameters in the configuration file:

SMILA.application/configuration/cubrikproject.service.polmi.FaceDetection/facematcher.properties

#path of executable

matcher.path=/home/pasini/detector


3. "Generic Visual Feature Extractor" component description

The Generic Visual Feature Extractor analyses visual data such as images and videos extracts features and generates descriptors for the processed files. The features/descriptors supported by the interface can easily be extended since internally the components can be configured by xml processing instructions.

3.1 Requirements


RF1: must accept file system urls to images

RF2: must accept file system urls to video

RF3: must accept file system urls to directories with images

RF3: must accept file system urls to directories with video

RF4: must store extraction results to file

RF5: must report progress on extraction

RF6: must be configurable for different extractors

RF7: could accept file streams as input

3.1.2 Non Functional requirements

RNF1: Should extract features in at least 1/10 real media time

RNF2: Must use error states for error handling

3.2 Use Cases List

The Generic Visual Feature Extractor is involved in the following use cases:

1. News Content History H-Demo: UC “Query” 2. News Content History H-Demo: UC “Content Insertion” 3. HoE V-App: Use Case 4 context expander



Figure 1 “Analyze new content” use case (sub-use case of NCH use case “Query)”

Figure 1 shows the activity diagram for the “Analyze new content” use case which is a sub use case of (amongst others) the NCH “Query->Video content query” use case (see Figure 2). The VisualFeatureExtractorPipelet uses the Generic Visual Feature Extraction component and - along with the VisualFeatureMatcherPipelet - are SMILA compliant in order to be used as simple worker or within a BPEL pipeline.


Component Name Generic Visual Feature Extractor

Responsibilities The Generic Visual Feature Extractor is responsible for extracting multiple visual features from an image or video file and for storing them into a file.

Provided Interfaces Java POJO Interface XPXInterfaceNative

SMILA VisualFeatureExtractorPipelet configuration

Dependencies /

Required Interfaces

The component uses the Fraunhofer XPX feature extraction framework which is accessed via JNI API and bundles with the JAR.


The descriptors/fingerprints are stored in binary file. The file format currently in use (afp) is proprietary and can handle multiple descriptors within one file. If needed by the CUbRIK platform readers and writers can be provided. The Visual Feature Matcher component of the CUbRIK platform already supports afp input data.


3.5 Component Interfaces Specifications

Java Interface specification <XPXInterfaceNative>

Modifier and Type Method and Description

XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus addDataOutputaddDataOutputaddDataOutputaddDataOutput(StringStringStringString name, StringStringStringString dst)

Connects a data output destination (file or path) to a process tree data connector.

boolean getNextFailureMessagegetNextFailureMessagegetNextFailureMessagegetNextFailureMessage(StringBufferStringBufferStringBufferStringBuffer msg)

Gets failure messages of the interface.

XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus processprocessprocessprocess()

Starts processing.

XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus registerPregisterPregisterPregisterProgressCallbackrogressCallbackrogressCallbackrogressCallback( XPXInterfaceProgressCallbackXPXInterfaceProgressCallbackXPXInterfaceProgressCallbackXPXInterfaceProgressCallback cb)

Registers a progress callback.

XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus setConfigsetConfigsetConfigsetConfig(StringStringStringString xmlsource)

Sets the IPS configuration in XML format.

XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus setDirectoryDataSourcesetDirectoryDataSourcesetDirectoryDataSourcesetDirectoryDataSource( StringStringStringString path, StringStringStringString extensions, long starttuid)

Sets a directory data source to be processed.

static XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus setLoggingsetLoggingsetLoggingsetLogging(XPXInterfaceLoggingModeXPXInterfaceLoggingModeXPXInterfaceLoggingModeXPXInterfaceLoggingMode logMode, XPXInterfaceLoggingLevelXPXInterfaceLoggingLevelXPXInterfaceLoggingLevelXPXInterfaceLoggingLevel logLevel, StringStringStringString logFile)

Sets the logging mode and level.

XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus setParameterssetParameterssetParameterssetParameters(XPXInterfaceParametersXPXInterfaceParametersXPXInterfaceParametersXPXInterfaceParameters params)

Sets parameter set for XPXInterface.

XPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatusXPXInterfaceStatus setSingleDataSourcesetSingleDataSourcesetSingleDataSourcesetSingleDataSource(StringStringStringString filename, long tuid)

Sets a single data source to be processed.


Pipelet configuration {

"class" :

"de.fraunhofer.idmt.cubrik.smila.pipelets.VisualFeatureExtractorPipelet",

"description": "Extracts visual features from multimedia data",

"parameters": [

{

"name": "dataSourceURLAttribute",

"type": "string",

"multi": false,

"optional": false,

"description": "The name of the attribute with the file source URL."

},

{

"name": "isDirectoryAttribute",

"type": "string",

"multi": false,

"optional": false,

"description": "The name of the attribute that specifies whether the URL submitted in dataSourceURLAttribute is a directory to be processed recursively"

},

{

"name": "fileFilterAttribute",

"type": "string",

"multi": true,

"optional": false,

"description": "specifies "

},

{

"name": "featureTypes",

"type": "string",

"multi": true,

"optional": false,

"description": " Specifies the features to be extracted.”

}

]

}


See above.


The Generic Visual Feature Extractor component is part of T8.1 of WP8. Responsible partner is Fraunhofer IDMT (FRH). This specification is part of D8.2.


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/PACKAGES/R3/H-Demos/NewsHistory_FRH_POLMI/



ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.1




Derby

Postgres

MySQL

Other - Specify

Note


The required third party components are delivered as binaries along with the deployed artifacts (i.e. JAR). The OSGI bundle activator is taking care of initializing the native libraries properly.

A set of H-Demo specific extraction configurations is also delivered with the artefact. Further configurations can be retrieved upon request from Fraunhofer IDMT.

.


3.7 Video content query

Figure 2 Sub use case “video content query” of the News Content History H-Demo use case “Query”


4. "Descriptors Extractor" component description

The Descriptors Extractor component extracts content and metadata descriptors for each image retrieved from Twitter. The component is used for full image description or only for a part of the image after segmenting it by “lower and upper body parts detector” component or Sketchness component.

4.1 Requirements


RF1: Must: The component extracts for a given image file or image URL its color and texture multimedia descriptors


RNF2: Should: The component requires access to mongoDB to extract metadata descriptor for the record

RNF3: Won’t: The component will not segment images using information from other components. The components calling this component should have done the segmentation and give an image file (e.g. a temporary file that contain only the segmented part of the image) as input to the component.

4.2 Use Cases List

The Fashion Portal is involved in the following use cases:

1. Request images votes 2. what do I wear today ? 3. Images crawling from SN 4. trend analysis for sample





Component Name Descriptors Extractor

Responsibilities Extract multimedia descriptors and textual descriptors (optional) for each image retrieved, in order to be used from the trend analyser

Provided Interfaces extractDominantColor

extractColorPalette

extractLBP

extractOSIFT

Dependencies /

Required Interfaces




The component provides a set of descriptor extraction methods to support multimedia content indexing and extraction of popularity, clusters and finally trendiness.

Interface <descriptor extraction>

The method extracts Dominant Color for the provided image string extractDominantColor(string imageURL) throws Exception The method extracts color palette information for each image String extractColorPalette(string imageURL) throws Exception

The method extracts LBP features for each image to support texture clustering. String extractLBP (string imageURL) throws Exception The method extracts opponentSIFT features String extractOSIFT(string imageURL) throws Exception


dominantColor method extracts the dominant color for the image provided. The dominant color is a strong feature if the provided image (or image segment) is clear of clutter and background.

colorPalette method extracts a set of colors that are popular in the image (or the image segment). This information will be used to give also color combinations for the trending color.

The method implements an LBP variant that is able to describe texture information available in the provided image. The extracted texture description will be used to cluster the images to texture clusters and find which are the most popular for the time period examined.

OSIFT method will extract Opponent SIFT descriptors using the colordescriptors.exe executable available from http://koen.me/research/colordescriptors/


The partner responsible for the Descriptors Extractor component is CERTH. The work is part of tasks T4.2 and T4.4


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/DescriptorsExtractor_CERTH/



ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.2




Derby

Postgres

MySQL

Other - Specify MongoDB

Note


Aiming to support content streams of high throughput we developed a parallelisation and task distribution framework in the context of task 4.5. This framework will finally support also our own indexing structures developed in the context of task 4.3 in a distributed scheme.

The framework will provide access through API calls, as the ones specified in the methods section, to other services. For the installation of the framework virtual boxes are required in order to be hosted in machines near the SMILA installation (to have LAN speed). Otherwise the service may be hosted in CERTH’s premises and accessed remotely using the same API calls.

Specifically for this component we may also provide a SMILA pipelet if necessary but then we will not take advantage of the parallelisation scheme for the description extraction.


5. "Accessibility" component description

The Accessibility Component aims at providing the users with accessibility reranking tools that can be activated, if needed, in order to rerank the set of retrieved results so as to promote those that are most accessible for the user. For this purpose, the system maintains an impairment profile for each user, which encodes the user’s impairments. The user impairment profile is provided by the user during his/her registration and initially contains some rough values about the user’s impairments. This profile is used during the reranking process.

During image annotation, the accessibility component analyses the multimedia objects (i.e. images, up to now) contained into the QDB and adds to them the metatag after having detected information on the images about colour, contrast and other accessibility-related attributes. The metatag will be used in the reranking of the images when they are presented to the users.

Apart from image annotation, the accessibility component is also responsible for providing accessibility-aware rerankings of a set of results, along with mechanisms with which the user can provide accessibility-aware feedback. The methods related with this functionality are described in the description of the Accessibility filtering use case. In the following, only the functionality related to accessibility annotation will be presented.

5.1 Requirements


Concerning accessibility-related annotation, the accessibility component has to satisfy the following functional requirements:

RF1: must accept an image file or an image URL as its input RF2: must provide methods to extract a set of accessibility-related features (such

as image brightness, contrast, dominant color etc.) RF3: must encode the accessibility annotation as a set of accessibility scores for

all the supported types of impairments and store these scores to the image record

5.1.2 Non functional requirements

The accessibility component has to satisfy the following non-functional requirements, as far accessibility-related annotation is concerned:

RNF1: The accessibility-related annotation procedure should be fast enough, so that the whole image annotation process is not delayed.

RNF2: The accessibility-related features which are extracted from the images should be relevant to the supported impairment types.

5.2 Use Cases List

The Accessibility component is involved in all use cases that apply to individual users. As far as image annotation is concerned, the Accessibility component is involved in the following use cases:

1. Request images votes

2. what do I wear today? 1/2

3. extraction of images from SN



During the image annotation process of the above mentioned use cases (search similar images 1a-1, what should I wear today 1/2 and extraction of images from SN), an extra step takes place, besides the entity recognition and extraction. This step involves the extraction of accessibility-related annotation from the uploaded images. The extraction of the accessibility-related features and scores is performed by the methods of the accessibility component, as depicted in Figure 3.

Figure 3: Sequence diagram for extraction of accessibility-related annotation from images

5.2.2 Component Identity Card

Component Name CO-ACC – Accessibility component

Responsibilities In the context of the accessibility-related annotation, the purpose of the accessibility component is to provide methods for extracting accessibility-related annotation from images, in the form of a vector of accessibility scores for the various supported impairments. These accessibility scores will eventually be used in order to rerank the search results according to how accessible they are for a specific user.

Provided Interfaces accessibilityAnnotation

calculateAccessibilityScores

Dependencies /

Required Interfaces

The implementation of the component’s methods does not depend on any interface exposed by other CUbRIK components. All information needed for the execution of the component’s methods is passed to them via their arguments.


5.2.3 Elements of Conceptual Model

In order for a media item to be evaluated in terms of its accessibility for a specific user, special accessibility-related annotation is extracted from it during indexing. The final product of the accessibility features extraction process is a vector of accessibility scores. Each of these scores takes values in the [0, 1] range and describes how accessible the media item is for a person being disabled in the respective impairment (1: the media item is accessible, 0: the media item is not accessible).

As already mentioned in the description of the Accessibility Annotation workflow step, the accessibility scores for a specific media item are stored as a special kind of annotation for it. Referring to the Content Description Model, described in deliverable D2.1, an AccessibilityAnnotation class is added as a sub-class of Annotation, holding the accessibility scores for the various impairments (see Figure 4). Each object of class Accessibility contains an impairment type, of class ImpairmentType, and a value in the [0, 1] range describing how suitable the content object is for this type of impairment (1: object is suitable, 0: object is not suitable).

Figure 4: Accessibility-related classes and attributes of Content Description Model

5.2.4 Component Interfaces Specification

Interface AccessibilityComponent

Produces accessibility-related annotation of a media item. @param url the URL address of the media item @param mediaType the type of the media item (e.g. “image”, or “sound”) @return a JSON object containing the accessibility-related annotation of the media item @throws MediaUnavailableException if the given URL is not valid @throws MediaTypeUnsupportedException if the given media type is unsupported JSONObject accessibilityAnnotation (String url, String mediaType) throws MediaUnavailableException, MediaTypeUnsupportedException Extracts the accessibility scores of a media item for the various supported impairments, by using the item’s accessibility-related annotation. @param annotation the accessibility-related annotation of a media item, as a JSON object @param mediaType the type of the media item (e.g. “image”, or “sound”) @return an array containing the accessibility scores of the media item for the supported impairments Accessibility[] calculateAccessibilityScores (JSONObject annotation)



The accessibilityAnnotation method produces accessibility-related annotation of a media item. The accessibility-related annotation contains information about the media item that can be used in evaluating the accessibility of it. Such information is e.g. the contrast and the dominant color of an image, or the objects appearing in it. The accessibility-related annotation is returned by the method in JSON format.

The accessibilityAnnotation method is called during the data annotation phase and will eventually be embedded in the components responsible for annotation extraction.

For example, an image containing a lot of distinct and sharp red and green areas would be annotated as having large contrast and sharpness values and the red and green colors would be the dominant color combination.

The calculateAccessibilityScores method extracts the accessibility scores of a media item for the various supported impairments, by using the item’s accessibility-related annotation. The score for each impairment is a number in the range [0, 1], denoting how accessible the media item is for a person being completely disabled in the respective impairment (1: the media item is accessible, 0: the media item is not accessible).

The calculateAccessibilityScores method is called during the indexing procedure (indexing pipeline).

Continuing with the example of the accessibilityAnnotation method above, the annotation extracted from this image would be transformed in an array of accessibility score for the various impairments. For the red-green color-blindness impairment, a low accessibility score would be assigned to the above image, while the opposite would happen for a blurred-vision impairment.


Partner responsible for the component: CERTH

Reference persons responsible for the development: Anastasios Drosou, Ilias Kalamaras and Dimitrios Tzovaras

WP: 7, Task: 7.4

Deliverables involved: D7.1 [Month 17], D7.2 [Month 35]


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/Accessibility_CERTH/



ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.2




Derby

Postgres

MySQL

Other - Specify

Note


For the extraction of accessibility-related features from the uploaded images, such as image contrast, colours etc, the accessibility component relies currently on the Java Vision Toolkit library.


6. "Video Segment Matcher" component description

The Video Segment Matcher matches the visual descriptors of a reference media file against one or more other media file descriptors. It identifies perceptually identical video segments and returns them. In order to obtain fast processing on large data sets the component utilizes an in memory data base for the descriptors. Descriptors to be matched are must exist in this data base and are specified by a unique ID. The Video Segment Matcher understands the proprietary afp data format provided by the Generic Visual Feature Extractor component.

6.1 Requirements


RF1: must accept and store afp data as byte array input

RF2: must accept delete request based on uid

RF3: must match a reference uid against one multiple comparison uids

RF4: must return a list of matching segments along with confidence values

RF5: must be configurable for different types of feature adaptation

RF6: must be configurable for different matching accuracy thresholds


RNF1: Should match as fast as possible

RNF2: Matching should be scalable in terms of processors available

6.2 Use Cases List

The Video Segment Matcher is involved in the following use cases:

1. News Content History H-Demo: UC “Query” 2. News Content History H-Demo: UC “Content Insertion” 3. HoE V-App: Use Case 4 context expander



Figure 5 “Match new content” use case (sub-use case of NCH use case “Query)”

Figure 5 shows the activity diagram for the “Match new content” use case which is a sub use case of (amongst others) the NCH “Query->Video content query” use case (see Figure 6). The VisualFeatureMatcherPipelet uses the Video Segment Matcher component and is SMILA compliant in order to be used as simple worker or within a BPEL pipeline.


Component Name Video Segment Matcher

Responsibilities The Video Segment Matcher is responsible for matching a query fingerprints / descriptor against one or multiples reference fingerprints / descriptors. It returns video segments that match as perceptually identical segments.

Provided Interfaces Java POJO Interface XPXInterfaceNative

SMILA VisualFeatureMatcherPipelet configuration

Dependencies /

Required Interfaces

The component uses the Fraunhofer VideoSegmentMatcher library which is accessed via JNI API and bundles with the JAR.


The descriptors/fingerprints are stored in binary file. The file format currently in use (afp) is proprietary and can handle multiple descriptors within one file. If needed by the CUbRIK platform readers and writers can be provided. The Video Segment Matcher component of the CUbRIK platform already supports afp input data.


6.5 Component Interfaces Specifications

Java Interface specification < VideoSegmentMatcherNative>

Modifier and Type Method and Description

VideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatus addFingerprintaddFingerprintaddFingerprintaddFingerprint(byte[] afpData)

Adds a new fingerprint in afp-format to the internal matching data base.

VideoSegmentMatcherResultVideoSegmentMatcherResultVideoSegmentMatcherResultVideoSegmentMatcherResult getMatchingResultsgetMatchingResultsgetMatchingResultsgetMatchingResults()

Returns the result of the matching process.

boolean getNextFailureMessagegetNextFailureMessagegetNextFailureMessagegetNextFailureMessage(StringBufferStringBufferStringBufferStringBuffer msg)

Gets failure messages of the interface.

VideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatus matchmatchmatchmatch(long queryTuid, long[] refTuids)

Starts matching.

VideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatus registerProgressCallbackregisterProgressCallbackregisterProgressCallbackregisterProgressCallback( VideoSegmentMatcherCallbackVideoSegmentMatcherCallbackVideoSegmentMatcherCallbackVideoSegmentMatcherCallback cb)

Registers a progress callback.

VideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatus removeFingerprintremoveFingerprintremoveFingerprintremoveFingerprint(long tuid)

Removes a fingerprint from the internal matching data base.

VideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatus setLoggingsetLoggingsetLoggingsetLogging(VideoSegmentMatcherLoggingModeVideoSegmentMatcherLoggingModeVideoSegmentMatcherLoggingModeVideoSegmentMatcherLoggingMode logMode, VideoSegmentMatcherLoggingLevelVideoSegmentMatcherLoggingLevelVideoSegmentMatcherLoggingLevelVideoSegmentMatcherLoggingLevel logLevel, StringStringStringString logFile)

Sets the logging mode and level.

VideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatusVideoSegmentMatcherStatus setParameterssetParameterssetParameterssetParameters(VideoSegmentMatcherParVideoSegmentMatcherParVideoSegmentMatcherParVideoSegmentMatcherParametersametersametersameters params)

Sets parameter set for VideoSegmentMatcherVideoSegmentMatcherVideoSegmentMatcherVideoSegmentMatcher.

Matching Pipelet configuration {

"class" :

"de.fraunhofer.idmt.cubrik.smila.pipelets.VideoSegmentMatcherPipelet",

"description": "Matches fingerprints specified by a reference uids against one or more fingerprints specified by uids",

"parameters": [

{

"name": "queryTUIDAttribute",

"type": "string",

"multi": false,

"optional": false,

"description": "The name of the attribute containing the query TUID."

},

{

"name": "referenceTUIDsAttribute",

"type": "string",

"multi": false,


"optional": false,

"description": "The name of the attribute containing the reference TUIDs"

},

{

"name": "matchingOutputAttachment",

"type": "string",

"multi": false,

"optional": false,

"description": "The name of the attachment storing the matching results."

},

]

}


See above.


The Video Segment Matcher component is part of T8.1 of WP8. Responsible partner is Fraunhofer IDMT (FRH). This specification is part of D8.2.


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/PACKAGES/R3/H-Demos/NewsHistory_FRH_POLMI/



ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.1




Derby

Postgres

MySQL

Other - Specify

Note


The required third party components are delivered as binaries along with the deployed artifacts (i.e. JAR). The OSGI bundle activator is taking care of initializing the native libraries properly.


6.7 Video content query

Figure 6 Sub use case “video content query” of the News Content History H-Demo use case “Query”


7. "Extraction of Images from Social Networks" component description

The “image extraction from SNs” component is responsible of collecting tweets and the associated images from twitter that are relevant to the predefined topics for the fashion v-app. The component takes a set of categories that listens to. The component operates in a tight loop with the trend analyser component which feeds this component with usernames of twitter users that are characterised as trend setters. This information is exploited in the retrieval process in order to enhance the quality of the retrieved content.

7.1 Requirements


FR1: Must: The component requires a set of queries to start collecting content from Twitter. The queries should be fashion items categories. FR2: Should: The component should be fed with lists of confidence levels for the usernames collected from the SN. These lists are critical in order to refine the search queries and fetch quality content FR3: Won’t: The component will not expand the search queries to new categories on demand. If the end user needs to add new queries then a restart of the service with the new query set is needed. This may be considered for implementation on a next version of the component.


NFR1: Must: The component must fetch as many images as possible in order to have a clear view of the trends that emerge in the tracked SNs

NFR2: Should: The quality of the retrieved content is important in order to have better results at the end of the pipeline. The component should apply techniques to refine the search results and fetch content of higher quality if possible.

7.2 Use Cases List

Fashion Portal is involved in the following use cases:

1. Image crawler from SN




Component Name Extraction of Images from SN

Responsibilities The purpose of the component is to retrieve streams of images and metadata from SNs to be used for trend analysis. The SN that we will examine for now is twitter

Provided Interfaces Data are recorded to a mongoDB database. The components that use the retrieved data get them directly from the database.

Dependencies /

Required Interfaces

-




For example, the Fashion portal provides the following methods:

Interface <Extraction of Images from SN>

Starts the crawling process. void startCrawling () throws Exception Stops the crawling process void stopCrawling() throws Exception

An example of the JSON format of the data retrieved from twitter is following. The data will be stored in a mongo database for further processing from this component as well as other components of the fashion v-app (e.g. trend analyser)

{

"id": "1",

"mediaLocator": "http://image-url",

"descriptions": [

{

"id": "1",

"name": "TwitterAcquiredImage",

"itemAnnotations": [

{


"id": "1",

"name": "query-string",

"language": "eng",

"value": "t-shirt"

},

{

"id": "2",

"name": "hashtag",

"language": "eng",

"value": "red "

},

{

"id": "3",

"name": "hashtag",

"language": "eng",

"value": "shirt"

},

{

"id": "4",

"name": "geo",

"values": [

30,

30

]

},

{

"id": "5",

"name": "text",

"language": "eng",

"values": "this is a great t-shirt"

}

],

"mediaSegment": [

{

"id": "6",

"name": "date_posted",

"startTs": 1364997564000

}

]

}

],

"provider": {

"id": "1",

"name": "Twitter",

"url": "https://twitter.com/",

"apiUri": "https://dev.twitter.com/docs/api"

},


"permissions": {

"id": "1",

"permission": "Creative Commons"

},

"provenance": {

"id": "234455230",

"username": "user name"

}

}


StartCrawling() starts the crawling process and listens for tweets that match the predefined fashion item categories (queries).

StopCrawling() stops the listener and any running image retrieval process.

No other methods needed to interact with the service, since everything is done inside the database. The retrieved content is stored in a database from where the other components get the retrieved content. The updated list of “trend setters” is also stored in the database from the trend analyser and this component is using it to filter the content.


The partner responsible for the “Image extraction from SN” component is CERTH.. The work is part of task T4.2 and T4.4


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/ImageSimilarityDetection_EMPOLIS/



[This section contains the information for the development and deployment environment that is related to the latest version of SMILA 1.2. Each component should refer to this environment]

ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.2




Derby

Postgres

MySQL

Other - Specify MongoDB

Note


Aiming to support content streams of high throughput we developed a parallelisation and task distribution framework in the context of task 4.5. This framework will finally support also our own indexing structures developed in the context of task 4.3 in a distributed scheme.

The framework will provide access through API calls, as the ones specified in the methods section, to other services. For the installation of the framework a virtual box is required in order to be hosted in a machine near the SMILA installation. Otherwise the service may be hosted in CERTH’s premises and accessed remotely using the same API calls.


8. "Body Part Detector and Router" component description

The body part detector component analyses images and identifies upper and lower body parts of depicted people, eventually highlighting the parts of an image that depict upper or lower body parts. The component provides an interface to submit an image and retrieve metadata (e.g. bounding rectangle) of any detected body parts.

An accompanying router (to be added in the third year) provides advanced decision/scoring support. Such a router could be useful for images that depict several people or body parts. In that case, a router could help to discard and thus only consider the most relevant detections, e.g. people in the foreground that also boast a high confidence score. The router would adjust all confidence scores accordingly, and thus be transparent to the caller and framework.

8.1 Requirements


The body part detector component analyses an image and returns meta-information of depicted body parts. The requirements of the body part detector are:

• RF1: must provide an automatic function that analyses an image and detects upper and lower body parts of depicted people

• RF2: must be able to handle multiple depicted people in an image

• RF3: must provide detection results in form of metadata (in particular, relative coordinates of bounding rectangles)

• RF4: should provide a score value that reflects the confidence of each detected body part


The non-functional requirements of the body part detector are:

• RNF1: should return results within few seconds if called in a synchronous manner, or provide an asynchronous interface

• RNF2: should ideally provide all necessary logic as a self-contained package with as few dependencies on other components as possible

• RNF3: should provide a simple interface, for example:

o INPUT to the component (when called from another component):

IN1: URL or file path linking to an image

o OUTPUT of the functionalities (returned to the caller)

OUT1: Metadata describing an identified body (upper body, lower body, entire body). All body parts follow the same metadata structure:

• Type: upperBody, lowerBody, entireBody, face

• Rectangle: y, x and height, width in relative coordinates (relative to the dimensions of the input image)

• Confidence score: scalar value (< 10 unlikely, > 20 likely)

If there are multiple persons in an image, there are multiple metadata structures (in other words, the metadata will be grouped on a per-person basis).

8.2 Use Cases List

The Body Part Detector and Router is involved in the following use cases:


1. Request images votes 2. Search similar images 3. What do I wear today? 1/2 4. What do I wear today? 2/2 5. Images crawling from SN 6. Trend analysis for sample


The body part detector component is self-contained and exposes only one function. Its interactions are described by the Entity Recognition and Extraction (ERE) step led by CERTH.

For reference, the following figure shows the sequence diagram of ERE (courtesy of CERTH) in the case where Sketchness is used. Note, there is also a very similar use case were Sketchness is not involved.



Component Name CO-BPDR - Body Part Detector and Router

Responsibilities Detects upper and lower body parts in an image.

Provided Interfaces The component offers its service through a REST-based interface (using JSON syntax). A BEPL file will describe the REST-based service and make it available as a SMILA component. Alternatively, direct Java invocation may be provided.

Dependencies /

Required Interfaces

The component will package all necessary dependencies and does not depend on SMILA. It can thus be run independently.



The component acts like a single function: an IN parameter points to the location of an image, and an OUT structure describes the metadata results.


Interface <BodyPartDetector>

Detect bodies (upper and lower body parts) in an image.

json detectBodies (string link) throws Exception


Method: detectBodies

IN: The parameter link is a string that points to an image location. If link starts with http:// it is considered a URL, and the component will automatically fetch the image from the Internet. Otherwise, the component will consider it as a local file.

OUT: The component returns a list of detected bodies (a grouping of body part detections). Each body part detection specifies the type, the rectangle surrounding the detection and the confidence score of the detection.

• type: upperBody, lowerBody, entireBody, face

• rectangle: y, x and height, width in relative coordinates (relative to the dimensions of the input image) packaged in a list

• confidenceScore: scalar value (< 10 unlikely, > 20 likely)

Example of a REST request (in JSON syntax) and the returned results in JSON syntax:

IN: One only needs to provide a link to an image

{

‘link’: ‘/tmp/file12345.jpg’

}

OUT: Below, two persons are detected in the submitted image. For the first person, both upper and lower body parts are detected. For the second person, only the upper body is detected.

[

[

{

‘type’: ‘upperBody’

‘rectangle’: [0.2, 0.3, 0.5, 0.6]

‘confidenceScore’: 15

}

{

‘type’: ‘lowerBody’

‘rectangle’: [0.1, 0.2, 0.4, 0.5]



}

]

[

{

‘type’: ‘upperBody’

‘rectangle’: [0.4, 0.2, 0.7, 0.4]


}

]

]


Partner responsible for the component (and the reference person responsible for the development): QMUL (Markus Brenner). The component is related to WP5 (T5.1).


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/Lower&UpperBodyPartsDetector_QMUL/



Currently, the core functionality of the body part detector component is implemented in a combination of C and Matlab. As of now it is planned to provide a REST interface (HTTP-based service). A BEPL file will describe the service and make it available as a SMILA component; thus, there are no Java requirements.

Note: If possible, the Matlab code will be compiled to C or native code. Ideally, Matlab would then not be a required dependency anymore. In this case, the code might be invoked directly from Java (thus making the REST interface obsolete).

Although the implementation is or should be OS agnostic, it will only be tested on Linux.


The body part detector component will package all necessary dependencies and does not depend on SMILA. It can thus be run independently.


9. "LikeLines - Implicit Feedback Filter" component description

This component aims at providing the extraction of key frames (images) fashion related starting from videos uploaded in YouTube. It is based on LikeLines, a tool that collects implicit feedback from users concerning portions of a video that are of particular interests.

9.1 Requirements


RF1: Must output the time codes of the top-N most interesting key frames for a given video that was identified in social network analysis (i.e., Task 4.4).

RF2: Should: If the video has not been seen by the system before, it should index the video first. Indexing involves

• crawling user comments and extracting time-coded deep-links;

• downloading the video and perform content analysis.

RF3: Should: If there has not yet been enough user activity such that LikeLines can determine the top-N most interesting key frames, the component should apply a cascading fall-back for making a best educated guess at the most interesting key frames:

• Use deep-links extracted from user comments;

• Use previous content analysis.

RF4: Should: signal an error if the YouTube video no longer exists.


RNF1: The images must be CC licensed. RNF2: The component should take at most 1 minute when the requested video has been indexed and processed before. RNF3: The component should take at most 30 minutes when the requested video has not been indexed and processed before.

9.2 Use Cases List

Implicit feedback is involved in:

1. Image Extraction from Social Network

N.B. since it is used in Trend Analysis use cases, Implicit feedback is also indirectly involved in those use cases.




Component Name LikeLines - Implicit Feedback Filter

Responsibilities given a link to the video, the component delivers the time-codes of interesting keyframes in the video.

Provided Interfaces takes input from ““Trend analyser and Image extraction from Social Network” and provides output to “Full image Identification” getNKeyFrames

Dependencies /

Required Interfaces

no dependencies other than the existence of an up and running LikeLines server




Conceptual interface (leaving out serialization details):

Interface <LikeLines server>

Aggregates collected user feedback for a video. Used by the LikeLines video player. Param videoId for YouTube videos: “YouTube:<youtube_id>” AggregateInfo aggregate (String videoId) Used by the LikeLines video player to create a session. Returns a session token. Param videoId for YouTube videos: “YouTube:<youtube_id>” Param ts: Unix timestamp of client performing the method call String createSession (String videoId, double ts) Used by the LikeLines video player to regularly send user interactions with the video to the server. void sendInteractions (String token, Interaction[] interactions) throws Exception

Used for indexing content analysis of videos. Param videoId for YouTube videos: “YouTube:<youtube_id>” void addMCA (String videoId, String mcaType, double[] curve)

Interface of component for SMILA use:

One method is exposed externally.

Interface <LikeLines>


Computes the top N key frames for a queried video and returns the time-codes of these key frames. Param videoId for YouTube videos: “YouTube:<youtube_id>” double[] getNKeyFrames (int N, String videoId) throws Exception


The getNKeyFrames method in LikeLines is responsible for obtaining the top N most interesting key frames.

Concept example:

LikeLines ll = new LikeLinesWebServiceImpl(“http://llserver:9000”);

try {

double[] keyframes = ll.getNKeyFrames(10, “YouTube: qmozsGBYAV8”);

} catch (…) { … }


The component is implemented by TU Delft. It is part of Task 3.2: Implicit User-Derived Information and Task 7.2: Pipelines for relevance feedback. It is related to D7.1 R1 Pipelines for relevance feedback and D7.2 R2 Pipelines for relevance feedback.


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/ImplicitFeedbackFilterLIKELINES_TUD/

The component is also available from the Github repository:

https://github.com/ShinNoNoir/likelines-player



ANY

LINUX

WINDOWS


MAC


JDK 1.7.0_Y


1.2




Derby

Postgres

MySQL

Other - Specify

Note


N/A. Initial implementation will use a default LikeLines server.


10. "Media harvesting and upload" component description

The media harvesting and upload component is responsible for populating the History of Europe database with content from the CVCE archive and external sources. The external data sources that will be considered for the second year demonstrator will be the Flickr service and the Europeana collections.

10.1 Requirements


The media harvesting and upload has to satisfy the following Functional requirements:

FR1 must: The media harvesting and upload component must provide the data to the database of HoE application FR2: must: The component requires a set of textual input queries in order to retrieve relevant content from the selected data sources


The media harvesting and upload has to satisfy the following Non functional requirements:

NFR1 must: The quality of the retrieved content must be adequate for the face detection components. NFR2: should: the selected image data sources should also contain metadata to be used for enhancing the accuracy of the HoE applications.

10.2 Use Cases List

Media harvesting and upload is involved in the following use case:

1. use case 1.1: indexation of images. In the dataset creation workflow step.




Component Name Media harvesting and upload

Responsibilities The media harvesting and upload component is responsible for populating the History of Europe database with content from the CVCE archive and external sources. The external data sources that will be considered for the second year demonstrator will be the Flickr service and the Europeana collections.

L3S’s strategy with Flickr datasets is the following: starting from a text query from CVC, top-300 results from Flickr are retrieved. For each query, we are gathering the uploader profile, including: photo albums, joined groups + group infos, friends list + friends info. To this end we are storing the textual information in an Oracle Database, while the photos (.jpegs) are stored on a file system.

Provided Interfaces startHarvesting()

Dependencies /

Required Interfaces



[

Figure 7: General concept diagram

Figure 8: Flickr images harvesting


Interface<media harvesting and upload> void startHarvesting() throws Exception



The startHarvesting method is responsible to trigger the component to start quering the selected image data sources for content to be retrieved. The retrieved content will be stored in local filesystem and be available through a


The partner responsible for the media harvesting and upload is CERTH in collaboration with L3S. The work is part of tasks 4.1, 4.2.


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/MediaHarvestingAndUpload_CERTH/



The component will be used offline and as independent scripts to collect the datasets and populate the database. In case this is needed for the next versions of the demonstrator, then a SMILA pipelet for the Europeana API may be provided.

ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.2

Eclipse SDKversion 4.2.2



Derby

Postgres

MySQL

Other - Specify

Note


11. "Copyright-aware crawler" component description

The copyright-aware crawler (CAC) is responsible for the crawling of public content which is compliant with a pre-defined set of usage rules, via pre-filtering based on (trustworthy) license information available via APIs.

11.1 Requirements


FR1: The CAC must, based on (1) a given usage definition set (including e.g. expression of right to copy, store, analyse, modify, present, distribute) and (2) textual query input (search string, tags) and content type (AVI), download the respective content from all relevant services.

FR2: The CAC must provide downloaded content and associated rights / license metadata to the system.

FR3: The CAC should provide contextual metadata that can be used to enable other components (license checker) to assess the trustworthiness of the rights metadata.


NFR1: The CAC should maximize the amount of collected content on the given query (currently not quantified)

NFR2:

11.2 Use Cases List

The CAC is currently involved in the following use case:



Copyright-Aware Crawlingsd

Consuming_Component PublicPortalCOPYRIGHT_AWARE_CRAWLER

loop : services

1: startCopyrightAwareCrawling(usageDef, queryDef, contentType)

1.3: downloadContentAndMetadata(licenseType, serviceQueryDef)

1.5: triggerProvisioning(contentRef, metadataRef)

CUBRIK_CONTROLLER

1.4: createPermissionMetadata()

1.1: checkCompliantLicenses()

1.2: prepareServiceRequest()



Component Name Copyright-Aware Crawler (CAC)

Responsibilities The copyright-aware crawler (CAC) is responsible for the crawling of public content which is compliant with a pre-defined set of usage rules, via pre-filtering based on (trustworthy) license information available via APIs.

Provided Interfaces To be derived from the sequence diagram once finalized.

Dependencies /

Required Interfaces

The CAC requires the existence of a central component (here: CUBRIK_CONTROLLER) and respective method that triggers the content provisioning process.



To be derived from the sequence diagram once finalized.


To be derived from the sequence diagram once finalized.


Relevant partners: FRH

Relevant WPs: WP5, WP8


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/CopyrightAwareCrawler_FRH/



ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.2

Eclipse SDKversion 4.2.2



Derby

Postgres

MySQL

Other - Specify

Note


12. "Image extraction from Social Network" component description

The component “Image similarity detection” gives back a set of images that are similar to the selected one. The selected image is therefore the query image. The result set is dependent on the specified kind of dress.

The general workflow is the following:

Input: There is one image given as the query. Moreover, the kind of dress is specified.

Output: A set of images which are similar to the query.

12.1 Requirements


RF1 – Input (images)

The component must accept images in standard formats (e.g. jpg, gif, png, etc.).

RF2 – Input (kind of dress)

The kind of dress which is of interest shall be given as an enumerated value.

RF3 – Input format

The component must accept queries via JSON/REST API.

RF4 – Output format

The component must return a set of images. These images are given via JSON records with URLs specifying the retrieved images.

RF5 – Similarity

The returned pictures have to be similar to the query image. The kind of dress which is of interest has to be taken into account.


RNF1 – Interface

The component has to use a JSON/REST interface for communication with other components.

12.2 Use Cases List

The use cases in which the component “Image Similarity Detection” is involved are listed below:

1. search similar images


User behaviour

The user chooses an image which is already stored in the image store. In this case the low-level feature extraction is not necessary because all annotations are available in the image store.


Afterwards the user shall give relevance feedback.

Upload

In order to retrieve the images they have to be annotated. This means there have to be the low-level features as metadata attached to all stored images.


Component Name Image extraction from Social Network

Responsibilities The component is responsible for finding similar images. Besides the query image the kind of dress which is of interest has to be taken into account.

Provided Interfaces There is provided a simple JSON/REST interface for calling the component:

getImages

retrieveImages

Dependencies /

Required Interfaces

The component depends on the indexed store of annotated images.




Interface <ImageSimilarityDetection>

By choosing an annotated image there are gained similar images taking the kind of dress into account. JSON getImages (AnnotatedImage image, KindOfDress kod)


Empolis: Indexing, Similarity Search

CERTH: Low-Level Feature Extraction

POLIMI: alternative Low-Level Feature Extraction


The component is part of R3 and following of the CUbRIK platform and available from the CUbRIK SVN: https://89.97.237.243/svn/CUBRIK/WORK/ImageExtractionFromSocialNetwork_CERTH/



ANY

LINUX

WINDOWS



MAC



JDK 1.7.0_Y


1.2




Derby

Postgres

MySQL

Other - Specify

Note


For example in order to properly run the OpenCV/SIFT executable file it is necessary to set the following parameters:

• image directory: the directory path containing the image files;

• SIFT descriptors directory: the directory path where to store the SIFT descriptors files.

So this means that these input parameters have to be properly set in a OpenCV.properties file like this:

imagesDir=C:/CUBRIKPRJ/Demos/LogoDetection/data/LOGO_DETECTION_CUBRIK_ENG/logos

descriptorsDir=C:/CUBRIKPRJ/Demos/LogoDetection/data/LOGO_DETECTION_CUBRIK_ENG/indexes

Documents

Components and Support Services Rel.2