33
1 The NIF format (hands on) Annotating Strings and Documents using the NLP Interchange Format

Nif practical

Embed Size (px)

Citation preview

1

The NIF format (hands on)Annotating Strings and Documents using the

NLP Interchange Format

2

Practical session outcomes • Participants will learn to use NIF API to

annotate strings and documents using the following wrappers:–OpenNLP–Stanford Core NLP–Snowball Stemmer–DBpedia Spotlight

• Query your corpus using SPARQL

3

NIF Example

4

Snowball Stemmer Wrapper

• Stemming algorithm is a process for removing suffixes from words.–CONNECT• CONNECTED• CONNECTION• CONNECTING• CONNECTIONS

5

Snowball Stemmer Wrapper

java -jar snowball.jar -f text -i 'I am

connected.'

• -f is used to define the format• -i is used to define the input

6

Snowball Stemmer Wrapper

7

Snowball Stemmer Wrapper

8

Snowball Stemmer Wrapper

NIF Standard AnnotationsNIF Offset

9

Snowball Stemmer Wrapper

NIF Standard Annotations

Snowball StemmerNIF Offset

10

Annotating Strings: Step-by-step

• 1. Open the USB stick folder• 2. Decompress the “session-nif.zip” folder • 3. Open the “NIF_DATATHON” folder and

decompress “NIF_tutorial_hands_on_jars.zip” • Open the prompt command, and use the

commands from the next slide in the “jar” folder.

11

Available Wrappers• To annotate documents, use the local wrappers (USB Stick)

java -jar opennlp.jar -f text -i 'This is a test.' -modelFolder ../model/

java -jar stanford.jar -f text -i 'This is a test.'

java -jar snowball.jar -f text -i 'This is my favorite test.'

java -jar spotlight.jar -f text -i 'Welcome to Germany.' -confidence 0.2

• To annotate small strings, you can try the on-line services: http://spotlight.nlp2rdf.aksw.org/spotlight?

f=text&i=Welcome+to+Germany.&t=direct&confidence=0.3&prefix=http://yourDomain.org/

• http://snowball.nlp2rdf.aksw.org/snowball?f=text&i=This+is+my+favorite+test.&t=direct&prefix=http://yourDomain.org/

• http://stanford.nlp2rdf.aksw.org/stanfordcorenlpn?f=text&i=This+is+a+test.&t=direct&prefix=http://yourDomain.org/

• http://opennlp.nlp2rdf.aksw.org/opennlp?f=text&i=This+is+a+test.&t=direct&modelFolder=model&prefix=http://yourDomain.org

12

Reading and Writing Files

• Write results in a file:“--outfile myAnnotatedFile.ttl“

• Read a document as input“--intype file -i /path/myDoc”

13

POS tagger for multiple languages

• The -modelFolder parameter set the folder that contains the POS tagging OpenNLP trained models and tokenization.

• Different languages can be found at OpenNLP website

http://opennlp.sourceforge.net/models-1.5/http://opennlp.sourceforge.net/models-1.5/

14

Example 2: Query a Corpus

15

Querying with Twinkle

Open the “/twinkle” folder and run the command:

java -jar twinkle.jar

16

Querying a Corpus

17

Querying a Corpus

18

Querying a Corpus

19

Querying a Corpus

20

Querying a Corpus

21

Querying a Corpus

22

Querying a Corpus

23

Querying a Corpus

24

Querying a Corpus

25

Querying a Corpus

26

Querying a Corpus

27

Querying a Corpus

28

Querying a Corpus

29

Exercise 3: Querying your own NIF annotated corpus

30

Querying your own NIF annotated corpus

1. Annotate your string using one of the wrappers2. Save your annotated sentence to a file (using “--outfile”)3. Open Twinkle4. Query your corpus using Twinkle

31

• Query your annotated corpus:– nif:Context– nif:Sentence– nif:anchorOf – nif:oliaCategory– nif:oliaLink

… or practice with Brown Corpus!

32

33

Thank you!

http://site.nlp2rdf.org/