View
215
Download
1
Category
Preview:
Citation preview
Presented by: Michal Nir, Saar GrossSupervisors: Nadav Golbandi, Oren Somekh
Computer Science Department
Industrial Project (234313)
Tuesday, January 24, 2012
This project extends on a previous project which includes a client application (Android) and a server application (Running on Tomcat). The user takes a photo using his smartphone and records an
audio linked to that photo. Tags are extracted from the audio using speech-to-text and
the photo, with its tags, is uploaded to Flickr. The speech-to-text engine (Sphinx) works best using
small dictionaries. In our project, we will try to supply Sphinx with a custom
dictionary created for each photo (Or stack of photos) using the photo’s geo-location information.
Using the geo-location info, we can extract relevant tags from Flickr, thus creating the custom dictionary.
Implement a new module, running on the server application, that will create custom dictionaries for the Sphinx voice-to-text engine.
Optimize the algorithm for creating the custom dictionary while achieving optimal results with acceptable hit on performance.
The server generates tag recommendations, in one of two ways:
Uploading an image (Or multiple images) that contains a geo-location, with an audio file attached, will trigger the server to create a custom dictionary for the Sphinx voice-to-text engine.
The client may ask for tag recommendations by sending a request containing the image’s geo-location only.
The server can also be instructed not to use the image’s geo-location for compiling the recommendations list (Privacy concerns) and in that case, only the user’s “private tags” will be used.
The server supports uploading multiple images- When uploading multiple images, images are
clustered into different groups based on location (Using a simple and deterministic algorithm).
The server will compile a recommendation list for each group.
Every image with an audio file attached will be processed using Sphinx with its group’s custom dictionary.
All images will be uploaded to Flickr using their identified tags and user-supplied tags.
Returning recommendations only for a group of images is essentially the same.
Except, we only return recommendations for the largest group of images.
Method of compiling a recommendation list for an image (Or group of images):
Group of images
Public Tags(Based on geo-location)
By ranking tags found in images near the given geo-location
Public Tags(Based on geo-location)
By querying Flicker’s Places API
Private Tags(NOT using geo-location)
By ranking the user’s past used tags
Implemented using
independent threads
(All running in parallel)
Implemented using
independent threads
(All running in parallel)
Merging Results
Merging parameters are configurable
To Android Client(When asking for Tag
Recommendations only)
To Sphinx(When uploading images to
Flickr)
Server side: 1. Tag Recommendation are compiled for an
image/group of images and can be presented to the user (Recommendation only) or used for Sphinx voice-to-text.
2. Performance:1. In general- Pretty good.2. Compiling a recommendation list usually takes no
more than a few seconds.3. In any case, a time limit is enforced.4. Most interaction with Flickr is completely multi-
threaded to avoid bottlenecks.5. Compiled recommendation lists are cached based on
time and location to optimize performance further.
Server properties file: 1. Virtually all parameters needed for the server are acquired externally
from a properties (Settings) file.1. Tweaking the server becomes an easy and intuitive task.
2. The server uses 2 different sets of settings:1. Settings to be used when uploading images to Flickr.2. Settings to be used when asking for Tag Recommendations only.
1. Gives us more flexibility when changing the server’s settings.
3. Example from imageupload.properties:
x
Client side:
Client side:Merged the Camera and Gallery applications into one.Added a new Tag Editor (Can now add/edit and remove tags from images).Added support for working with multiple images and getting tag recommendations.Many bug fixes and GUI improvements:
New Image Properties dialog. Updated menus and icons. Improved gallery performance and design.
For evaluating the algorithm’s performance, we would like to do the following:
Find a user who uploaded many tagged images (With a reasonable time difference between them) in a popular location (San Francisco bridge, Las-Vegas Strip).
Perform a cross-validation analysis- Choose a subset of images from the user’s images. Send the images to server and receive tag recommendations for
them. Evaluate the accuracy (Precision and Recall) of the
recommendations using the 2 left-out images. Repeat…
Our expectations are that accuracy will be affected by many factors-
Number of tags merged into final recommendation list from each source.
Dictionary size.
We wrote TagRecTestFramework- Completely automated. Behaves like a “normal” client (Server thinks
it’s talking to an Android client). For each given location-
Finds a user with enough tagged images (Configurable…) in the area with a small time difference between images (Also configurable).
Perform cross-validation on grouped images.
- 10 images in each group, Min. of 20 tags per image- Search radius: 1 KM, Time difference between images: Max. 1 day
Piazza San Pietro (Vatican City)(41.902309, 12.457341)
Algorithm’s accuracy is very image/user-dependent:
We found that most images in Flickr are not tagged or tagged with irrelevant tags.
Most images on Flickr are not geotagged. Flickr has ~5 billion photos. Only ~170 million are geotagged (~3% of all photos).
Quality of results could be improved by tweaking the server’s settings-
Giving more weight to private/public tags affects the accuracy.
Compiling a larger recommendation list (And thus, a larger dictionary for Sphinx) improves recall but may hurt Sphinx’s performance (Sphinx works best with small dictionaries).
Recommended