1

Click here to load reader

IDP at the Entrepreneurship Research Institute (BWL ... · IDP at the Entrepreneurship Research Institute (BWL) Concept and Implementation of an interactive web-crawler with GUI in

Embed Size (px)

Citation preview

Page 1: IDP at the Entrepreneurship Research Institute (BWL ... · IDP at the Entrepreneurship Research Institute (BWL) Concept and Implementation of an interactive web-crawler with GUI in

IDP at the Entrepreneurship Research Institute (BWL)

Concept and Implementation of an interactive web-

crawler with GUI in Python The TUM Entrepreneurship Research Institute offers an interdisciplinary project in cooperation with cluelab GmbH.

cluelab is a TUM-start-up in the field of digital copyright management and develops an innovative new service

product, addressing customers in multimedia, software and music industry – from moviemakers to young upcoming

music artist.

Our aim is to counteract illegal copyright infringement. We therefore identify and delete illegal copies of our

customers’ works on servers around the world. Our web-crawler is one important part of our IT tool chain to support

this task. To further improve the web-crawler and other tools we are looking for students that are motivated to work

in a start-up atmosphere on a very interesting field.

The project (6 month part-time or 3 month full-time) focusses on development of a web-crawler to detect and

measure copyright infringement on certain websites. It consists of the following work packages:

1. Development of a web-crawling framework using Python and python-based scrapping libraries (eg. Scrapy)

2. Captcha handling – integration of external captcha solving services

3. Handling of IP-Blocking – block detection and integration of proxy services

Procedure:

You get one mentor from the chair (organisation) and one from the cluelab team (practical work)

Kick off with the cluelab team to define the small targets and create a project plan for the work packages

Independent realisation of the defined work packages as a team

Regular meetings or Skype calls for status updates, presentation and feedback

Conditions:

Team size: up to 4 students (if you apply individual, we will match you into a team)

Period: practical part: within SS 16. Lecture and exam: take place in SS 16 as well.

If you have any questions about this IDP, feel free to contact Simon & Andreas from cluelab ([email protected]).

Please send the application with your CV and transcript of records for each person with the subject line “IDP web-

crawler” to them.