Click here to load reader
Upload
dangkiet
View
217
Download
5
Embed Size (px)
Citation preview
IDP at the Entrepreneurship Research Institute (BWL)
Concept and Implementation of an interactive web-
crawler with GUI in Python The TUM Entrepreneurship Research Institute offers an interdisciplinary project in cooperation with cluelab GmbH.
cluelab is a TUM-start-up in the field of digital copyright management and develops an innovative new service
product, addressing customers in multimedia, software and music industry – from moviemakers to young upcoming
music artist.
Our aim is to counteract illegal copyright infringement. We therefore identify and delete illegal copies of our
customers’ works on servers around the world. Our web-crawler is one important part of our IT tool chain to support
this task. To further improve the web-crawler and other tools we are looking for students that are motivated to work
in a start-up atmosphere on a very interesting field.
The project (6 month part-time or 3 month full-time) focusses on development of a web-crawler to detect and
measure copyright infringement on certain websites. It consists of the following work packages:
1. Development of a web-crawling framework using Python and python-based scrapping libraries (eg. Scrapy)
2. Captcha handling – integration of external captcha solving services
3. Handling of IP-Blocking – block detection and integration of proxy services
Procedure:
You get one mentor from the chair (organisation) and one from the cluelab team (practical work)
Kick off with the cluelab team to define the small targets and create a project plan for the work packages
Independent realisation of the defined work packages as a team
Regular meetings or Skype calls for status updates, presentation and feedback
Conditions:
Team size: up to 4 students (if you apply individual, we will match you into a team)
Period: practical part: within SS 16. Lecture and exam: take place in SS 16 as well.
If you have any questions about this IDP, feel free to contact Simon & Andreas from cluelab ([email protected]).
Please send the application with your CV and transcript of records for each person with the subject line “IDP web-
crawler” to them.