20
13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

UGFIDDUnstructured Geospatial File Indexer

and Distributed Dissemination

1

Page 2: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Present Scenario

2

Data

Users need data in low com situations

Transported

Very slow

Bottle Neck

Search Criteria

User must know what

to search on

Page 3: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

UGFIDD Overview

• Provide a simple to use Web Service interface– This allows for customized clients– Free text “Google” like searches– Complete un-structed data – No need for a data model to follow– Communication is done over HTTP through SOAP (Simple Object Access Protocol )

messages– Currently supports PDFs, Microsoft Docs, JPEGs

3

• Provide usable return types• RSS Feeds – Allow users to subscribe to standing queries• KML Results – Allow users to visually represent their data spatially• Plain Text – Give users their information fast and reliably• Bittorent – Allow users to distribute data quickly and distributed

Page 4: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Code Environment• Subversion

– I have written a lot of code and have spent a lot of time, provides a piece of mind– All of the code written was done under version control. This is very important in today’s

commercial atmosphere.– Allows for many developers to work at the same type– Assists with merge conficts– Allows easy reverts and diff’s to be done

• Maven– New and upcoming build tool– Allows for easy integration and dependency management– Completely written in XML– Repositories allow for open source projects to be easily be pulled in to assist in program

development

4

Page 5: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Pom File (Snippet)• <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">• <modelVersion>4.0.0</modelVersion>• <groupId>com.p2p</groupId>• <artifactId>Peer2peer</artifactId>• <packaging>war</packaging>• <version>1.0-SNAPSHOT</version>• <name>Peer2peer</name>• <url>http://maven.apache.org</url>

• <properties>• <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>• <final.version>1.0</final.version>• <artifact.name>${artifactId}</artifact.name>• <java.version>1.6</java.version>• </properties>

<dependency>• <groupId>junit</groupId>• <artifactId>junit</artifactId>• <version>3.8.1</version>• <scope>test</scope>• </dependency>• <dependency>• <groupId>jpath</groupId>• <artifactId>jpathwatch</artifactId>• <version>0.93</version>• </dependency>• <dependency>

5

Page 6: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

High Level Architecture

6

Extractors + PublishersDoc / Jpeg

ParserBittorent Publisher

Core Services

Indexing

SoapUI

Solr

Inte

rfac

esJe

tty

Daemon

Startup &Shutdown

File MonitorRss Feed / Kml

Feed

Query

EndP

oint

s

WebServiceEndpoints

Utilities

Page 7: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Ingest Orchestration

Ingest of a fileUses Tikka document

extractors to extract header information along with binary

data. JPEG parser parses Geospatial data

File

XMLMetadata

Ingest monitor is triggered off of system

level events.

Solr

FileSystem

MetadataExtraction

File MonitorHTTP

Schema has been customized to store

location and other valuable data

Page 8: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Query Orchestration

Publish of results

8

Depending on the return type and call UGFIDD will query and

return customized results

Files

XMLMetadata

User enters the query “Syracuse”

Solr

Web Service Endpoint

Core Services

HTTP

Use query to search index

Parse Query

HTTPPublish

Torrent

RSS

Google Earth

Page 9: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

GeoHash (Example GeoHash.java)• GeoHash algorithm recently developed by Gustavo Niemeyer

• Publicly released in 2008 • Very new way of representing geo-spatial data• UGFIDD takes advantage of the single hash produced by the algorithm• Found many implementations in other languages (Python), ported it over to

Java for the UGFIDD project• Distance searches

• Geohash produces bounding boxes by nature• This is a perfect fit for UGFIDD and it’s free text search capability• Geospatial searches are now extremely fast and easy to implement• Do not need complicated point radius algorithms which slow processing down

• WKT (Well Known Text)• A new spec to represent vector geometry on single lines• User can query using single strings and does not need to represent points as

Lat, Lon

Page 10: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Performance

10000 50000 100000 1500000

100

200

300

400

500

600

700

Distance AlgorithmGeoHash AlgorithmGeoHash Query

10

Number of Products

Tim

e (M

illis

econ

ds)

Page 11: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

WKT• POINT(6 10) • LINESTRING(3 4,10 50,20 25)• POLYGON((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)) • MULTIPOINT((3.5 5.6), (4.8 10.5)) • MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))

MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3)))

• http://en.wikipedia.org/wiki/Well-known_text11

Page 12: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Bittorrent• Allows for distributed downloading• Users download .torrent files which represent the tracker and

information about the file or files• Many free available clients available to use• Bittorrent takes pressure off of the central server

– Users only download the .torrent file– Communicate via the tracker (UGFIDD is using a open source tracker)– Users download from each other while there is a seed– UGFIDD will always be the initial seeder

• Extremely fast downloads– Users download from each other and do not tie up the bandwidth pipe going

to the server– Utilizes file pieces described in the .torrent file (pieces are downloaded from

each other

12

Page 13: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Torrent

13

Torrent file has been created and seeded.Others can now download the torrent file and connect to the swarmFile will then be downloaded from the server as well as clients

Page 14: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

RSS Feed

• Users want their information when they aren’t there• RSS Feed allows the user to set up specific query and walk away

– Query will be “standing” for a configurable amount of time– Feed will be updated as the query is hit– Fast and easy to learn publish and subscribe system – Most users know how to use RSS (easy to use)

• RSS page is unique to that user and query– User can however pass the URL to other users who then can

subscribe to the query too– Example : A group of users is interested in “IED and Iraq”. A RSS query

is set up, as products are placed into the monitor directory, that information is passed onto the user’s RSS feed

14

Page 15: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Google Earth

• KML (Keyhole Markup Language)– XML data that Google Earth knows how to display

• Visually represent data– More and more users are using tools to see their

data visually – Can see similarities (such as distance and location)– Quickly find relevant data

15

Page 16: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Geocoder

• Geocoding takes well known addresses and converts them into geospatial points

• UGFIDD uses Google’s geocoder web services• Searches for Rome, NY will return hits near

the geospatial location• Results are geohashed and queried to find

results even faster

16

Page 17: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

JPEG Product displayed via published KML

17

Page 18: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Future Work

• Make it faster!– Multiple SOLR implementations. Distributed data implementation– Java Executor Service allows for multi-thread workers. This has

been implemented but will take time to adjust based off of system• Create a client

– Currently UGFIDD is a server only implementation– Creating a client is easy with web services– Allow user to ingest files using HTTP and FTP upload

• Distributed Queries– Currently only one server is queried at a time– Would like to make a middle “tracker” to distribute queries and

results

Page 19: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

Demo• Server is running on my home computer with an ingest directory already set up• Local server is running on laptop will show some of the publish and query

capabilities• Will move files into ingest directory• Demonstrate query capability • Demonstrate publishing capability• Will use SOAP UI a web services test utility to demonstrate client interaction• http://www.soapui.org/

• http://web.cs.sunyit.edu/~randoc/

• Code is located at: http://code.google.com/p/peer2peersuny/source/browse/

19

Page 20: 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

13 October 2010

UGFIDD

• Questions?

20