jdp15 import.io workshop

Preview:

Citation preview

jpd15, Junio 2015

Ignacio Elola @ignacio_elola

Web data? Extrayendo datos de la web

who I am?

web data and import.io

example: text analysis with import.io and MonkeyLearn

summary

import.io?

the Web as a data source

What is import.io? ● Machine reading the web● Point-and-click UI● Map the data on a web page● Algorithms will turn it into structured data ● Real-time through an API

What is import.io? (continued) ● Custom Crawlers● Auto extraction● Authenticated APIs● Cloud scaling● Wide range of integration options

Structure the web

import.io consists of 4 tools

● Magic● Extractor● Crawler● Connector

and completely free...

import.io Magic

Sometimes we need to train the tool ourselves

import.io Extractor

import.io Extractorlets you structure a single page of data

import.io Extractorlets you structure a single page of data

Custom XPaths Custom Regex Updatable in real-time

Sometimes we need to extract data from a lot of URLS

Sometimes we need to extract data from a lot of URLS

import.io Crawler

Sometimes we need to extract data from a lot of URLS

import.io Crawler import.io extractor (bulk queries)

Sometimes we need to extract data from a lot of URLS we don’t know

import.io Crawler

The import.io Crawler relies on minimum input and gives you

maximum output

Sometimes we need to interact with the website

The import.io Connector uses page interactions, such as searches and

extracts the resulting data.

Example: analyzing newspapers with import.io and MonkeyLearn

Example: analyzing newspapers with import.io and MonkeyLearn

https://github.com/ignacioelola/web-text-analyzer

Example: analyzing newspapers with import.io and MonkeyLearn

Example: analyzing newspapers with import.io and MonkeyLearn

Example: analyzing newspapers with import.io and MonkeyLearn

Thanks!

Q & A

Recommended