17
[GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

[GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Embed Size (px)

Citation preview

Page 1: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

[GISCUP2013] Mailing List Q&A + Project Discussion

Ashok Dahal

Page 2: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Overview

• Discussion of questions asked by registered members

• Responses of GISCUP2013 team• Discussion of GISCUP updates• Project Discussion

Page 3: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A - Deadline

• Q: When is the submission deadline? (01/17)

• A: The deadline is August 1st, 2013(01/21)

Page 4: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

GISCUP2013 Update

• Dataset changed(02/27)• Summary of changes: – Some Polygon/Point pairs were incorrectly not

reported in the result set. – In the new version, we provide the polygons (I.e.,

stored in poly10.txt. And poly15.txt) sorted by the sequence number in ascending order. From now on, you can assume that all polygons data are given to you sorted that way.

Page 5: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Data Size Limit

• How many points and polygons?• Each object(point/polygon) will have number

of instances(point/polygon with timestamp)• According to problem statement: the

maximum number of points and polygons will be no more than 1M and 500 respectively.

• The question is: that many objects or instances?

Page 6: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Data Size Limit contd.

• Sample data provided has point files with 500 and 1000 points and polygon files with 10 and 15 polygon.

• Point500.txt = 39,289 lines(instances)• Point1000.txt = 69,619 lines(instances)• Poly10.txt = 30 lines (instances)• Poly15.txt = 40 lines (instances)

Page 7: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Data Size Limit [Response]

• The size limit applies to the number of instances of the points and polygons.

• That is, the total number of points in the points file will be less than 1M and the total number of polygons in the polygon file will be less than 500.

• To be more specific, the number of lines in the points file will be less than 1M and the number of lines in the polygon file will be less than 500.

Page 8: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Data Size Limit arguments

• Argument: Determining whether points are in a single polygon which gets redefined 500 times is a much easier problem than having 500 distinct polygons defined at the same time the whole time. In real world it may not happen.

• Response: we have to restrict certain dimensions of the problem to make it practical for a contest.

Page 9: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Data Size Limit arguments contd.

• Argument: Do we care how many lines are in the polygon input file or we care how many polygons can be defined at a given time?

• Response:– The maximum number of polygons that can be

defined at a given time is 500. In this case, none of the polygons would move, only the points will move.

– The minimum number of polygons that can be defied at a given time is 1. In this case, the polygon can move 499 times.

Page 10: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Defining Polygons

• Question: In the sample files, all of the polygons are defined at once with the first several timestamps, before any points are defined. Will all of the polygons initially defined at the start? Or is it possible new polygons will appear later on?

• Response: All the initial polygons will be defined before any points are defined as we did in the sample files

Page 11: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Defining Polygons[Arguments]

• Argument: Actually, the sample files do not agree with that statement. The sample file poly15.txt has a Polygon with ID 0 which does not get defined until timestamp 124106, and sample file poly10.txt has a Polygon with ID 0 which does not get defined until 403047.

• Response: This is a data error. We will fix it and redo the test files. There should not be any polygons with an ID less than 1.

Page 12: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Q&A – Evaluation Machine

• Question: can we assume a Java Runtime Environment installation to be present on the evaluation machine?

• Response: Yes, you can assume a JDK 1.6 version.

Page 13: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Your Own Questions

• Do you also want to ask some specific questions?

https://wwws.cs.umn.edu/mm-cs/listinfo/GISCup2013• Get to this link and register so that you can ask

questions to them. You will also get emails when somebody asks question and when GISCUP team responds.

Page 14: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Project Discussion

• The data set we are going to use for evaluation will be way bigger than the sample files provided in the CUP website.

• Example – no of lines in points500.txt file can go up to 1M from current 39,289 lines. Similarly, no of lines in poly10.txt can go up to 500 lines from current 30 lines.

• You need to work on speeding up your program since large dataset can take a lot of time to get processed and validate your output(Why?)

Page 15: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Project Discussion contd.

• My Experience:– I am using two methods to check a point INSIDE a

polygon(Ray-casting and Winding number method). The algorithms are exhaustive which means no speeding up is done yet.

– Programming language : PERL.– Speed and accuracy wise, both method seems similar.– It is taking around 800s for using points500.txt(39,289

instances)with poly10.txt(30 instances). – How long will it take for points500.txt(1M instances) and

poly10.txt(500 instances)?

Page 16: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Project Discussion contd.

• So Speeding Up is a MAJOR factor.• Accuracy:– All 10366 pairs matching.– Initially, I had more than 20,000 pairs in my output which

means, I had more than 9600 extra pairs.– If there are extra pairs, your score will go down because each

extra pair will decrement the score by 1.– So accuracy is another MAJOR factor.– HINTS for accuracy:

• Remove all the extra pairs based on the problem definition. i.e. check time stamp of point vs. polygon and check if the polygon is already expired.

Page 17: [GISCUP2013] Mailing List Q&A + Project Discussion Ashok Dahal

Project Discussion contd.

• Remember that you also have to do , WITHIN not only INSIDE the polygon.

• Things you need to consider:– Start early!!!– Work on the speed. • Apply filtering as discussed in the class.• If you can utilize multi core CPU, that is awesome.

– Work on accuracy.