AI Report - Google Docs

Embed Size (px)

Citation preview

  • 8/16/2019 AI Report - Google Docs

    1/13

     

    Artificial Intelligence

    ADVANCEMENTS IN AI RESEARCH :

    TEACHING MACHINES TO SEE AND UNDERSTAND 

    Harshit Jain(13CSU049) Dr. Supriyra Panda

    Harshit Juneja(13CSU050) (Professor)

    Himani Malhotra(13csu051)

  • 8/16/2019 AI Report - Google Docs

    2/13

     

    ACKNOWLEDGEMENT

    “We should all be thankful for those people who rekindle the inner spirit.”

    Foremost, We would like to thank Almighty for making the endeavour towards success. It gives 

    us immense pleasure in acknowledging the efforts of the faculty of The NorthCap University. 

    They provided the very best opportunity at all levels that helped us to complete our project and to 

    polish our technical skills. We also express my gratitude to our respected teacher Dr. Supriya 

    Panda   for their intellectual support throughout the course of this project. She is extremely 

    tremendous and energetic and her zeal for work has given me a new direction ahead.

    2

  • 8/16/2019 AI Report - Google Docs

    3/13

     

    Table of Contents

    Abstract 4

    Object detection and memory networks 5

    Prediction and planning 8

    Conclusion 12

    References 13

    3

  • 8/16/2019 AI Report - Google Docs

    4/13

     

    Abstract

    From text to photos, through video and soon VR, the amount of information being generated in the

    world is only increasing. In fact, the amount of data we need to consider has been growing by about 50

    percent year over year — and human waking hours aren't keeping up with that growth rate. The best

    way I can think of to keep pace with this growth is to build intelligent systems that will help us sort

    through the deluge of content.

    To tackle this, AI groups have been conducting ambitious research in areas like image recognition and

    natural language understanding.

    4

  • 8/16/2019 AI Report - Google Docs

    5/13

     

    Object detection and memory networks

    The first of these is in a subset of computer vision known as object detection. Object detection is

    hard. Take this photo, for example:

    How many zebras do you see in the photo? Hard to tell, right? Imagine how hard this is for a

    machine, which doesn't even see the stripes — it sees only pixels. Researchers have been

    working to train systems to recognize patterns in the pixels so they can be as good as or better

    than humans at distinguishing objects in a photo from one another — known in the field as

    “segmentation” — and then identifying each object. Our latest system, which we'll be presenting

    at NIPS next month, can segment images 30 percent faster than most other systems, using 10xless training data.

    Next milestone is in natural language understanding, with new developments in a new

    technology called Memory Networks (aka MemNets). MemNets add a type of short-term

    memory to the convolutional neural networks that power our deep-learning systems, allowing

    those systems to understand language more like a human would. This demo of MemNets at

    work, reading and then answering questions about a short synopsis of The Lord of the Rings.

    Now we've scaled this system from being able to read and answer questions on tens of lines of

    text to being able to perform the same task on data sets exceeding 100K questions, an order of

    magnitude larger than previous benchmarks.

    5

    http://l.facebook.com/l.php?u=http%3A%2F%2Farxiv.org%2Fpdf%2F1503.08895v4.pdf&h=TAQEDkEa8&s=1http://l.facebook.com/l.php?u=http%3A%2F%2Farxiv.org%2Fpdf%2F1506.06204.pdf&h=fAQH7V7jG&s=1

  • 8/16/2019 AI Report - Google Docs

    6/13

     

    These advancements in computer vision and natural language understanding are exciting on their

    own, but where it gets really exciting is when you begin to combine them. Take a look:

    6

  • 8/16/2019 AI Report - Google Docs

    7/13

     

    In this demo of the system we call VQA, or visual Q&A, you can see the promise of what

    happens when you combine MemNets with image recognition: We're able to give people the

    ability to ask questions about what's in a photo. Think of what this might mean to the hundreds of

    millions of people around the world who are visually impaired in some way. Instead of being left

    out of the experience when friends share photos, they'll be able to participate. This is still very

    early in its development, but the promise of this technology is clear.

    7

  • 8/16/2019 AI Report - Google Docs

    8/13

     

    Prediction and planning

    There are also some bigger, longer-term challenges we’re working on in AI. Some of these

    include unsupervised and predictive learning, where the systems can learn through observation(instead of through direct instruction, which is known as supervised learning) and then begin to

    make predictions based on those observations. This is something you and I do naturally — for

    example, none of us had to go to a university to learn that a pen will fall to the ground if you

    push it off your desk — and it's how humans do most of their learning. But computers still can’t

    do this — our advances in computer vision and natural language understanding are still being

    driven by supervised learning.

    The FAIR team recently started to explore these models, and you can see some of early progress

    demonstrated below. The team has developed a system that can “watch” a series of visual tests

     — in this case, sets of precariously stacked blocks that may or may not fall — and predict the

    outcome. After just a few months' work, the system can now predict correctly 90 percent of the

    time, which is better than most humans.

    8

  • 8/16/2019 AI Report - Google Docs

    9/13

     

    Another area of longer-term research is teaching our systems to plan. One of the things we've

    built to help do this is an AI player for the board game Go. Using games to train machines is a

    pretty common approach in AI research. In the last couple of decades, AI systems have become

    stronger than humans at games like checkers, chess, and even Jeopardy. But despite close to five

    decades of work on AI Go players, the best humans are still better than the best AI players. This

    is due in part to the number of different variations in Go. After the first two moves in a chess

    game, for example, there are 400 possible next moves. In Go, there are close to 130,000.

    We’ve been working on our Go player for only a few months, but it's already on par with the

    other AI-powered systems that have been published, and it's already as good as a very strong

    human player. We've achieved this by combining the traditional search-based approach —

    modeling out each possible move as the game progresses — with a pattern-matching system built

    by our computer vision team. The best human Go players often take advantage of their ability to

    recognize patterns on the board as the game evolves, and with this approach our AI player is able

    to mimic that ability — with very strong early results.

    So what happens when you start to put all this together? Facebook is currently running a small

    test of a new AI assistant called M. Unlike other machine-driven services, M takes things further:

    It can actually complete tasks on your behalf. It can purchase items; arrange for gifts to be

    delivered to your loved ones; and book restaurant reservations, travel arrangements,

    9

  • 8/16/2019 AI Report - Google Docs

    10/13

     

    appointments, and more. This is a huge technology challenge — it's so hard that, starting out, M

    is a human-trained system: Human operators evaluate the AI's suggested responses, and then

    they produce responses while the AI observes and learns from them.

    We'd ultimately like to scale this service to billions of people around the world, but for that to be

    possible, the AI will need to be able to handle the majority of requests itself, with no human

    assistance. And to do that, we need to build all the different capabilities described above —

    language, vision, prediction, and planning — into M, so it can understand the context behind

    each request and plan ahead at every step of the way. This is a really big challenge, and we’re

     just getting started. But the early results are promising. When someone asks M for help ordering

    flowers, M now knows that the first two questions to ask are “What’s your budget?” and “Whereare you sending them?”

    10

  • 8/16/2019 AI Report - Google Docs

    11/13

     

    One last point here: Some of you may look at this and say, “So what? A human could do all of

    those things.” And you're right, of course — but most of us don't have dedicated personal

    assistants. And that's the “superpower” offered by a service like M: We could give every one of

    the billions of people in the world their own digital assistants so they can focus less on

    day-to-day tasks and more on the things that really matter to them.

    11

  • 8/16/2019 AI Report - Google Docs

    12/13

     

    Conclusion

    Researchers have been working to train systems to recognize patterns in the pixels so

    they can be as good as or better than humans at distinguishing objects in a photo from

    one another - known in the eld as "Segmentation" - and then identifying each object.

    Prediction and planning There are also some bigger, longer-term challenges we're

    working on in AI. Some of these include unsupervised and predictive learning, where the

    systems can learn through observation and then begin to make predictions based on

    those observations.

    This is something you and I do naturally - for example, none of us had to go to a

    university to learn that a pen will fall to the ground if you push it o your desk - and it's

    how humans do most of their learning.

    In the last couple of decades, AI systems have become stronger than humans at games

    like checkers, chess, and even Jeopardy.

    Researchers have been working on our Go player for only a few months, but it's already

    on par with the other AI-powered systems that have been published, and it's already as

    good as a very strong human player.

    This is a huge technology challenge - it's so hard that, starting out, M is a human-trained

    system: Human operators evaluate the AI's suggested responses, and then they produce

    responses while the AI observes and learns from them.

    When someone asks M for help ordering owers, M now knows that the rst two

    questions to ask are "What's your budget?" and "Where are you sending them?" One last

    point here:Some of you may look at this and say, "So what? A human could do all of those

    things."And you're right, of course - but most of us don't have dedicated personal

    assistants. 

    12

  • 8/16/2019 AI Report - Google Docs

    13/13

     

    REFERENCES

    1. http://www.wired.com/2015/10/facebook-artificial-intelligence-describes-photo-captions-

    for-blind-people/

    2. https://research.facebook.com/ai

    3. http://www.pcmag.com/news/343445/facebook-to-use-ai-to-describe-photos-to-blind-use

    rs

    13

    https://research.facebook.com/aihttp://www.wired.com/2015/10/facebook-artificial-intelligence-describes-photo-captions-for-blind-people/http://www.wired.com/2015/10/facebook-artificial-intelligence-describes-photo-captions-for-blind-people/