4
VIEW POINT Abstract The quality and quantity of training data is directly linked to the success of artificial intelligence (AI) / machine learning (ML) model development. In this paper, read about the challenges faced by data science teams and how they can use both man and machine for improved project outcomes. ENHANCING DATA ANNOTATION OUTCOMES FOR AI/ML MODELS

Enhancing data annotation outcomes for AI/ML models · 2020-06-12 · solely on core activities like mining data for additional attributes and feeding the rule book with broader labels

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

VIEW POINT

Abstract

The quality and quantity of training data is directly linked to the success of artificial intelligence (AI) / machine learning (ML) model development. In this paper, read about the challenges faced by data science teams and how they can use both man and machine for improved project outcomes.

ENHANCING DATA ANNOTATION OUTCOMES FOR AI/ML MODELS

External Document © 2020 Infosys Limited

Significance of trained datasets

It is well known that supervised learning and semi-supervised learning are the most adopted methods for ML modeling, and the common denominator for both is, the underlying datasets. Though more time is spent

in building algorithms, the significance of the datasets on these algorithms is often missed. AI/ML model for use cases such as self-driving, movie recommendations, automated farming, automated mining, diagnosis assistance, and drug creation

require huge volumes of training datasets.

The most common misconception about AI is that it’s all about building algorithms that are designed to perform a certain task.

Trained datasets make or break the algorithms built. Thus, it becomes much more important to focus on the type of datasets we consider during the building process – larger and broader

datasets.

One research found that 96% of the enterprises face AI training data issues. The research also indicates that by 2025,

the AI industry is going to be worth $190 billion and global spending on cognitive computing and AI systems is going to reach $35 billion by 2029.

Training the AI/ML Model

An AI system requires careful training before it can be put into action. To put into perspective, the AI system must be exposed to huge datasets that are accurately labeled and annotated. The research also uncovered that the process of creating training datatsets for AI/ML models is more difficult than the data scientists expected - the real challenge

being annotating data on their own requires a large amount of human effort.

Building an AI/ML system is very complex and training data is a major milestone. Here are some overarching challenges that are typically faced while creating training data.

1. Unexpected data complications

2. Biased annotations

3. Lack of confidence in the annotation

tool

4. Lack of annotation expertise

5. Bandwidth to perform annotation tasks

What others generally think AI is…

AI is all about building models using code 70% training data, 20% experimenting, and 10% coding

What AI actually is…

External Document © 2020 Infosys Limited

Overcoming the challenges:

As a best practice, adopting matured operations methodologies for annotation activities will accurately and successfully drive AI/ML projects till its end of development, and be ready for real-time implementation.

The major factor for annotation as a process-driven activity is high volumes. Just think how Data Scientists can be considerably free, allowing them to focus solely on core activities like mining data for additional attributes and feeding the rule

book with broader labels. In simple words, limiting the data scientists dependency for in-person annotation by hiring training data experts who can own the project, in a managed services model, until it is delivered efficiently and accurately.

Conclusion:

In a nutshell, we believe AI/ML projects can be successful if a data science team focusses its time and energy on improving the model itself and by outsourcing

the high volume data annotation tasks through a managed services operating model. A combination of ‘platform + people + process’ can help ensure not only

accelerated speed to market but also improved outcomes for AI/ML projects.

Mitigating data annotation challenges through a managed services solution:

• Build a pool of annotation experts: Create and build a team of trained data annotation (or labelling) experts with experience in multiple annotation projects, supporting a broad spectrum of data formats such as images, text, and videos to be annotated, who can adapt to dynamically changing project workflows, and perform annotation tasks without bringing in biases

• Delivering with higher accuracy: Implement the right quality management techniques to annotation projects using a multi-layer approach, e.g. including a dedicated Quality Analyst for checking the accuracy of the annotations

• Annotation platform to accelerate speed of delivery: Selecting and partnering with the right-fit annotation partner can make or break data annotation projects. It is critical to choose the right partner by evaluating the best platform based on - the number of years of experience, type of projects delivered, type of industries catered to, type of problems solved, and the capability of handling large datasets - scalable with increasing data volume, flexibility with changing workflows and lastly ensuring complete data security

© 2020 Infosys Limited, Bengaluru, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior permission of Infosys Limited and/ or any named intellectual property rights holders under this document.

For more information, contact [email protected]

Infosysbpm.com Stay Connected

Authors:

Sathish Kasthuri – Senior Solution Design Head, Digital Interactive Services, Infosys BPM

Sathish is responsible for GTM Strategy and Implementation of new solutions across the Digital Interactive Services portfolio.

Sathish has been with Infosys since 2010 and has held key positions across sales, transition, operations, and solution design. Prior to joining Infosys, he has been with several niche Media & Entertainment companies including animation studios. He has more than 25 years of experience across several geographies including the US.

Sathish has a Masters in Engineering from College of Engineering, Guindy, and has done his Bachelors in Engineering from PSG College of Technology.

Vijay Kalla – Solution Design Consultant, Digital Interactive Services, Infosys BPM

Vijay is responsible for implementing GTM Strategies and solutions across Digital Interactive Services Portfolio.

Vijay has been with Infosys since 2015 as a Solutions Consultant taking up various roles across project setup, project transition, and project management. Prior to Infosys, Vijay worked as a Research Analyst for the Cloud Technology industry. He has a total experience of 9 years across IT/ITES, CPG, and M&E industries.

Vijay has a PGDM from the Institute of Finance and International Management and BE from Anna University.

References:1. https://dataconomy.com/2019/07/why-96-of-enterprises-face-ai-training-data-issues/

2. https://www.mckinsey.com/featured-insights/artificial-intelligence/tackling-europes-gap-in-digital-and-ai

3. https://towardsdatascience.com/how-to-build-a-data-set-for-your-machine-learning-project-5b3b871881ac

4. https://medium.com/shallow-thoughts-about-deep-learning/how-to-find-datasets-for-artificial-intelligence-9131b2e72e88