11
NATURAL LANGUAGE PROCESSING CHRIS HARVEY

Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

NATURAL LANGUAGE PROCESSING

CHRIS HARVEY

Page 2: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

NLP IN A NUTSHELL

• Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding of human language.

• NLP is used to analyze text, allowing machines to understand how human’s speak. This human-computer interaction enables real-world applications like automatic text summarization, sentiment analysis, topic extraction, named entity recognition, parts-of-speech tagging, relationship extraction, stemming, and more. NLP is commonly used for text mining, machine translation, and automated question answering.

• It is very hard for machines to learn. Language is fluid and changes in context and content quickly and frequently. Recent breakthroughs in deep learning has accelerated NLP research and allowed machines to better understand language.

Page 3: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

WORD EMBEDDING AND SEGMENTATION

Page 4: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

BAG OF WORDS

• Uses a matrix to keep track of every word in each sentence. This model is no longer used but was the foundation of NLP.

• It is slow and uses a lot of memory as each new word added needs to be added to every matrix of each sentence even if it isn’t used.

Page 5: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

LSTM

• Previous inputs are stored and used again in the model. This sense of a ‘memory’ allows machines to remember words and the order that they appeared. This allows for sentiment and context to be understood by the machine.

Page 6: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

WORD 2 VEC

• Makes a mapping of the words using one-hot encoding. It then tries to predict words appearing close to each word. This is called the “skip-gram approach”.

• It uses groups of words called ‘grams’ and scans over each sentence to create smaller input vectors. This is similar to CNN’s convolutional and pooling layers.

• It attempts to learn from each gram appropriate mapping vectors in order to understand context and grammar.

Page 7: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

SENTIMENT ANALYSIS

Page 8: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

GENERATIVE MODELS

• Bert and GPT-2 are based off GPT or generative pre-trained Transformer. It is an unsupervised model that is trained on top of massive amounts of text data to predict sentiment and context.

• Bert improves GPT by adding a bi-directional LSTM to the model. It also trains differently, instead of predicting what comes before and after a word they instead mask 20% of the words and have the model try to predict the masked words. Bert works really well and was state of the art until GPT-2 came out.

• GPT-2 changed the game when it comes to NLP. GPT-2 has 1.5B parameters, 10x more than the original GPT, and it achieves SOTA results on 7 out of 8 tested language modeling datasets in a zero-shot transfer setting without any task-specific fine-tuning. The pre-training dataset contains 8 million Web pages collected by crawling qualified outbound links from Reddit. There are also structural changes to the model of GPT-2 over GPT. The model is finer tuned and generalizes better.

Page 9: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding
Page 10: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding
Page 11: Natural Language Processing - Mass Street University€¦ · •Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding

EXTRA LINKS TO LEARN ON YOUR OWN

• https://medium.com/dair-ai/deep-learning-for-nlp-an-overview-of-recent-trends-d0d8f40a776d

• https://becominghuman.ai/a-simple-introduction-to-natural-language-processing-ea66a1747b32

• https://www.oreilly.com/learning/perform-sentiment-analysis-with-lstms-using-tensorflow

• https://www.kaggle.com/c/word2vec-nlp-tutorial

• https://adventuresinmachinelearning.com/word2vec-tutorial-tensorflow/

• https://towardsdatascience.com/sentiment-analysis-for-text-with-deep-learning-2f0a0c6472b5

• https://medium.com/@Currie32/predicting-movie-review-sentiment-with-tensorflow-and-tensorboard-53bf16af0acf

• https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f

• https://github.com/adeshpande3/LSTM-Sentiment-Analysis

• https://www.topbots.com/generalized-language-models-bert-openai-gpt2/