AI Report - Google Docs

8/16/2019 AI Report - Google Docs

1/13

Artificial Intelligence

ADVANCEMENTS IN AI RESEARCH :

TEACHING MACHINES TO SEE AND UNDERSTAND

Harshit Jain(13CSU049) Dr. Supriyra Panda

Harshit Juneja(13CSU050) (Professor)

Himani Malhotra(13csu051)


2/13

ACKNOWLEDGEMENT

“We should all be thankful for those people who rekindle the inner spirit.”

Foremost, We would like to thank Almighty for making the endeavour towards success. It gives

us immense pleasure in acknowledging the efforts of the faculty of The NorthCap University.

They provided the very best opportunity at all levels that helped us to complete our project and to

polish our technical skills. We also express my gratitude to our respected teacher Dr. Supriya

Panda for their intellectual support throughout the course of this project. She is extremely

tremendous and energetic and her zeal for work has given me a new direction ahead.

2


3/13

Table of Contents

Abstract 4

Object detection and memory networks 5

Prediction and planning 8

Conclusion 12

References 13

3


4/13

Abstract

From text to photos, through video and soon VR, the amount of information being generated in the

world is only increasing. In fact, the amount of data we need to consider has been growing by about 50

percent year over year — and human waking hours aren't keeping up with that growth rate. The best

way I can think of to keep pace with this growth is to build intelligent systems that will help us sort

through the deluge of content.

To tackle this, AI groups have been conducting ambitious research in areas like image recognition and

natural language understanding.

4


5/13

Object detection and memory networks

The first of these is in a subset of computer vision known as object detection. Object detection is

hard. Take this photo, for example:

How many zebras do you see in the photo? Hard to tell, right? Imagine how hard this is for a

machine, which doesn't even see the stripes — it sees only pixels. Researchers have been

working to train systems to recognize patterns in the pixels so they can be as good as or better

than humans at distinguishing objects in a photo from one another — known in the field as

“segmentation” — and then identifying each object. Our latest system, which we'll be presenting

at NIPS next month, can segment images 30 percent faster than most other systems, using 10xless training data.

Next milestone is in natural language understanding, with new developments in a new

technology called Memory Networks (aka MemNets). MemNets add a type of short-term

memory to the convolutional neural networks that power our deep-learning systems, allowing

those systems to understand language more like a human would. This demo of MemNets at

work, reading and then answering questions about a short synopsis of The Lord of the Rings.

Now we've scaled this system from being able to read and answer questions on tens of lines of

text to being able to perform the same task on data sets exceeding 100K questions, an order of

magnitude larger than previous benchmarks.

5

http://l.facebook.com/l.php?u=http%3A%2F%2Farxiv.org%2Fpdf%2F1503.08895v4.pdf&h=TAQEDkEa8&s=1http://l.facebook.com/l.php?u=http%3A%2F%2Farxiv.org%2Fpdf%2F1506.06204.pdf&h=fAQH7V7jG&s=1


6/13

These advancements in computer vision and natural language understanding are exciting on their

own, but where it gets really exciting is when you begin to combine them. Take a look:

6


7/13

In this demo of the system we call VQA, or visual Q&A, you can see the promise of what

happens when you combine MemNets with image recognition: We're able to give people the

ability to ask questions about what's in a photo. Think of what this might mean to the hundreds of

millions of people around the world who are visually impaired in some way. Instead of being left

out of the experience when friends share photos, they'll be able to participate. This is still very

early in its development, but the promise of this technology is clear.

7


8/13

Prediction and planning

There are also some bigger, longer-term challenges we’re working on in AI. Some of these

include unsupervised and predictive learning, where the systems can learn through observation(instead of through direct instruction, which is known as supervised learning) and then begin to

make predictions based on those observations. This is something you and I do naturally — for

example, none of us had to go to a university to learn that a pen will fall to the ground if you

push it off your desk — and it's how humans do most of their learning. But computers still can’t

do this — our advances in computer vision and natural language understanding are still being

driven by supervised learning.

The FAIR team recently started to explore these models, and you can see some of early progress

demonstrated below. The team has developed a system that can “watch” a series of visual tests

— in this case, sets of precariously stacked blocks that may or may not fall — and predict the

outcome. After just a few months' work, the system can now predict correctly 90 percent of the

time, which is better than most humans.

8


9/13

Another area of longer-term research is teaching our systems to plan. One of the things we've

built to help do this is an AI player for the board game Go. Using games to train machines is a

pretty common approach in AI research. In the last couple of decades, AI systems have become

stronger than humans at games like checkers, chess, and even Jeopardy. But despite close to five

decades of work on AI Go players, the best humans are still better than the best AI players. This

is due in part to the number of different variations in Go. After the first two moves in a chess

game, for example, there are 400 possible next moves. In Go, there are close to 130,000.

We’ve been working on our Go player for only a few months, but it's already on par with the

other AI-powered systems that have been published, and it's already as good as a very strong

human player. We've achieved this by combining the traditional search-based approach —

modeling out each possible move as the game progresses — with a pattern-matching system built

by our computer vision team. The best human Go players often take advantage of their ability to

recognize patterns on the board as the game evolves, and with this approach our AI player is able

to mimic that ability — with very strong early results.

So what happens when you start to put all this together? Facebook is currently running a small

test of a new AI assistant called M. Unlike other machine-driven services, M takes things further:

It can actually complete tasks on your behalf. It can purchase items; arrange for gifts to be

delivered to your loved ones; and book restaurant reservations, travel arrangements,

9


10/13

appointments, and more. This is a huge technology challenge — it's so hard that, starting out, M

is a human-trained system: Human operators evaluate the AI's suggested responses, and then

they produce responses while the AI observes and learns from them.

We'd ultimately like to scale this service to billions of people around the world, but for that to be

possible, the AI will need to be able to handle the majority of requests itself, with no human

assistance. And to do that, we need to build all the different capabilities described above —

language, vision, prediction, and planning — into M, so it can understand the context behind

each request and plan ahead at every step of the way. This is a really big challenge, and we’re

just getting started. But the early results are promising. When someone asks M for help ordering

flowers, M now knows that the first two questions to ask are “What’s your budget?” and “Whereare you sending them?”

10


11/13

One last point here: Some of you may look at this and say, “So what? A human could do all of

those things.” And you're right, of course — but most of us don't have dedicated personal

assistants. And that's the “superpower” offered by a service like M: We could give every one of

the billions of people in the world their own digital assistants so they can focus less on

day-to-day tasks and more on the things that really matter to them.

11


12/13

Conclusion

Researchers have been working to train systems to recognize patterns in the pixels so

they can be as good as or better than humans at distinguishing objects in a photo from

one another - known in the eld as "Segmentation" - and then identifying each object.

Prediction and planning There are also some bigger, longer-term challenges we're

working on in AI. Some of these include unsupervised and predictive learning, where the

systems can learn through observation and then begin to make predictions based on

those observations.

This is something you and I do naturally - for example, none of us had to go to a

university to learn that a pen will fall to the ground if you push it o your desk - and it's

how humans do most of their learning.

In the last couple of decades, AI systems have become stronger than humans at games

like checkers, chess, and even Jeopardy.

Researchers have been working on our Go player for only a few months, but it's already

on par with the other AI-powered systems that have been published, and it's already as

good as a very strong human player.

This is a huge technology challenge - it's so hard that, starting out, M is a human-trained

system: Human operators evaluate the AI's suggested responses, and then they produce

responses while the AI observes and learns from them.

When someone asks M for help ordering owers, M now knows that the rst two

questions to ask are "What's your budget?" and "Where are you sending them?" One last

point here:Some of you may look at this and say, "So what? A human could do all of those

things."And you're right, of course - but most of us don't have dedicated personal

assistants.

12


13/13

REFERENCES

1. http://www.wired.com/2015/10/facebook-artificial-intelligence-describes-photo-captions-

for-blind-people/

2. https://research.facebook.com/ai

3. http://www.pcmag.com/news/343445/facebook-to-use-ai-to-describe-photos-to-blind-use

rs

13
https://research.facebook.com/aihttp://www.wired.com/2015/10/facebook-artificial-intelligence-describes-photo-captions-for-blind-people/http://www.wired.com/2015/10/facebook-artificial-intelligence-describes-photo-captions-for-blind-people/

Documents

AI Report - Google Docs