Training Sentiment Analysis Models How Amazon turns unstructured text into meaningful insight Visionary companies like Amazon are leveraging sentiment analysis models to dig beyond surface-level understandings of what people are saying and examine the nuances of how it’s being said. However, sentiment in language is a difficult thing to parse. One person’s “negative” doesn’t always match their neighbor’s, and even short phrases (”I never liked this dinky office, but I’ll be sad to leave it” ) can contain layers of nuance. Those complications are only compounded when it comes to long-form writing like feature stories and product reviews. Ideally, the most sophisticated sentiment models could deliver broad-level, composite scores for long-form content, while simultaneously sifting through individual paragraphs, sentences, and words to extract granular- -level insights. When Amazon wanted to turn that ideal into a reality, they partnered with DefinedCrowd. The Challenge The days of the quantity-focused data provider are long gone. The dawn of the quality-focused data partner has arrived. Want to learn how partnering with DefinedCrowd can unlock cutting-edge AI solutions for your business? [email protected] Follow us www.definedcrowd.ai Contact us Visit us In partnering with DefinedCrowd, Amazon benefitted from extensive data expertise, customizable workflows, and full-service data solutions that made for guaranteed quality results, even within this kind of complex data collection. Our rigorous qualification tests, analysis of text-to-speed/ segment-to-speed ratios, and inter-annotator agreement calculations led to an error rate of less than 3%. High-precision training data makes for high-performance models. Amazon knows this all too well. They choose their data partners accordingly. The Results This is exactly the kind of use case our dedicated team of NLP experts loves to sink its teeth into. Amazon provided more than 100,000 documents, ranging from short paragraphs left on their site, to full-length 1,500-word articles published online. First, we analyzed those documents and developed an optimal segmentation methodology. On average, we cut each document into 4 distinct pieces, though the variance was wide-ranging. The longest document had 84 unique segments. The Solution Our Neevo contributors tagged the sentiment of each individual segment, while also providing high-level sentiment scores for each document as a whole. During that process, we ran a wide range of automated gatekeeping procedures to monitor their quality of work in real time. In the end, we sourced half a million annotations on the original 100,000 documents. Step 1 Image Collection Step 2 Image Tagging RTA % Tag Precision (Percentage of correct tags vs. RTA task) Users with low RTA% prevented from working. documents provided 100,000 segments/document identified 4 annotations collected 500,000 accuracy 97.3%* * Average Text to Speed Ratio (Length of the input document / task time) Outliers spot checked internally. Average Segment to Speed Ratio (Number of segments / task time) Outliers spot checked internally. Offensiveness (Percentage of user’s unique assessments vs. 2 other annotators) >20% spot checked internally.

Training Sentiment Analysis Models - DefinedCrowd®€¦ · Training Sentiment Analysis Models How Amazon turns unstructured text into meaningful insight Visionary companies like

Download PDF Report

Upload
others
View
2
Download
0

Embed Size (px)

Citation preview

Page 1: Training Sentiment Analysis Models - DefinedCrowd®€¦ · Training Sentiment Analysis Models How Amazon turns unstructured text into meaningful insight Visionary companies like

Training Sentiment Analysis Models How Amazon turns unstructured text into meaningful insight

Visionary companies like Amazon are leveraging sentiment analysis models to dig beyond surface-level understandings of what people are saying and examine the nuances of how it’s being said. However, sentiment in language is a di�cult thing to parse. One person’s “negative” doesn’t always match their neighbor’s, and even short phrases (”I never liked this dinky o�ce, but I’ll be sad to leave it” ) can contain layers of nuance. Those complications are only compounded when it comes to long-form writing like feature stories and product reviews.

Ideally, the most sophisticated sentiment models could deliver broad-level, composite scores for long-form content, while simultaneously sifting through individual paragraphs, sentences, and words to extract granular- -level insights. When Amazon wanted to turn that ideal into a reality, they partnered with DefinedCrowd.

The Challenge

The days of the quantity-focused data provider are long gone. The dawn

of the quality-focused data partner has arrived. Want to learn how partnering

with DefinedCrowd can unlock cutting-edge AI solutions for your business?

[email protected]

www.definedcrowd.ai

In partnering with De�nedCrowd, Amazon bene�tted from extensive data expertise, customizable work�ows, and full-service data solutions that made for guaranteed quality results, even within this kind of complex data collection. Our rigorous quali�cation tests, analysis of text-to-speed/ segment-to-speed ratios, and inter-annotator agreement calculations led to an error rate of less than 3%. High-precision training data makes for high-performance models. Amazon knows this all too well. They choose their data partners accordingly.

The Results

This is exactly the kind of use case our dedicated team of NLP experts loves to sink its teeth into. Amazon provided more than 100,000 documents, ranging from short paragraphs left on their site, to full-length 1,500-word articles published online.

First, we analyzed those documents and developed an optimal segmentation methodology. On average, we cut each document into 4 distinct pieces, though the variance was wide-ranging. The longest document had 84 unique segments.

The Solution

Our Neevo contributors tagged the sentiment of each individual segment, while also providing high-level sentiment scores for each document as a whole. During that process, we ran a wide range of automated gatekeeping procedures to monitor their quality of work in real time. In the end, we sourced half a million annotations on the original 100,000 documents.

Step 1Image Collection

Step 2Image Tagging

RTA % Tag Precision

(Percentage of correct tags vs. RTA task)

Users with low RTA% prevented from working.

documentsprovided

100,000segments/document

identified

4annotations

collected

500,000

accuracy97.3%*

Average Text to Speed Ratio

(Length of the input document / task time)

Outliers spot checked internally.

Average Segment to Speed Ratio

(Number of segments / task time)

Outliers spot checked internally.

Offensiveness

(Percentage of user’sunique assessments vs. 2 other annotators)

>20% spot checked internally.

Recursive Deep Models for Semantic Compositionality Over …socherr/EMNLP2013_RNTN.pdf · · 2013-09-17Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Documents

Models of Diversity Training

Documents

Training Deep Models Faster with Robust, Approximate ...papers.nips.cc/paper/7957-training-deep-models-faster-with...samplin… · a short period training, models usually perform

Documents

Honey Bunch Models Training!

Entertainment & Humor

TRAINING SYSTEMS & MODELS - ikmwebshop.no · TRAINING SYSTEMS & MODELS PLC V.D.I. System Programming Sensors Training panels Training Systems Regulation Energy ... † 3 lighting

Documents

Multi-class Sentiment Classification on Twitter …927073/FULLTEXT01.pdfMulti-class Sentiment Classiﬁcation on Twitter using an Emoji Training Heuristic FREDRIK HALLSMAR JONAS PALM

Documents

Multimodal Sequence to Sequence Seq2Seq2Sentiment: Models ...€¦ · sentiment prediction for the CMU-MOSI dataset? Dataset Multimodal Corpus of Sentiment Intensity and Subjectivity

Documents

Models Training.pptx - Models/Individualized Training

Documents

Sentiment Analysis & Opinion Mining€¦ · Sentiment Analysis Sentiment Classification System Experimente Perspektiven * Abbildung dem Sinn nach entnommen aus Heyer (2006: 5). Sentiment

Documents

Tree Communication Models for Sentiment Analysis · 2019-07-15 · Tree Communication Models for Sentiment Analysis Yuan Zhang and Yue Zhang School of Engineering, Westlake University,

Documents

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Documents

SMARTies: Sentiment Models for Arabic Target entitiesnoura/SMARTies-camera-ready.pdf · SMARTies: Sentiment Models for Arabic Target entities ... articles and is part of the Qatar

Documents

Environment for training models

Technology

Investor Sentiment in the Stock Marketdoc.xueqiu.com/14a0ff0dbed17d3fe0a0679e.pdf · these models make predictions about patterns in marketwide investor sentiment, stock prices, and

Documents

Adversarial Training for Aspect-Based Sentiment Analysis with BERT · 2020. 2. 3. · Adversarial Training for Aspect-Based Sentiment Analysis with BERT Akbar Karimi 1Leonardo Rossi

Documents

0.9plus0.9minus0.9100.9Joint and Pipeline Probabilistic Models for Fine … · 2019-07-16 · Joint and Pipeline Probabilistic Models for Fine-Grained Sentiment Analysis Roman Klingerand

Documents

AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion

Documents

Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Documents

$sept 2007 CardMaps · 2017. 7. 20. · you are very pager-naps pagernaps vaeœl\ues sentiment sentiment sentiment sentiment se co hello sent Th ema s sentiment sentiment rñdÞS sentime$