58
EXPLORING THE INTERNAL STATE OF USER INTERFACES BY COMBINING COMPUTER VISION TECHNIQUES USING SIKULI Germiya K Jose 4MCA Christ University Bangalore

Exploring the internal state of user interfaces using sikuli

Embed Size (px)

Citation preview

EXPLORING THE INTERNAL STATE OF USER

INTERFACES BY

COMBINING COMPUTER VISION TECHNIQUES

USING SIKULI

Germiya K Jose

4MCA

Christ University Bangalore

AGENDA

Introduction

Basics of Python

Sikuli Script

How Sikuli Works

Technical capabilities

Hello world Program

Predefined Functions

Disadvantages

Exception Handling

Special Keys

Conclusion

IF YOU .............

Are not good in programming !

Want to avoid Repeating, Boring or Annoying

Coding ?

Want to automate something but don’t have an

access to its source ?

WHAT DOES IT DO ?

Single click to run a series of clicking and typing

Make boring task easier and quicker

Testing

WHY SIKULI?

Sikuli automates anything you see on the screen.

It uses image recognition to identify and control

GUI components.

It is useful when there is no easy access to a GUI's

internal or source code.

Sikuli is an open-source research project originally

started at the User Interface Design Group at MIT.

Sikuli visual approach to search and automation

of graphical user interfaces using screenshots.

Sikuli allows users to take a screenshot of a GUI

element (such as a toolbar button, icon, or dialog

box) and query a help system using the screenshot

instead of element’s name.

Sikuli also provides a visual scripting API for

automating GUI interactions, using screenshot

patterns to direct mouse and keyboard events.

SIKULI SCRIPT

Sikuli automates the interaction with a GUI by

executing it, recognizing widgets .

such as buttons and text fields from their visual

appearance on the screen, and interacting with

those widgets by simulating mouse pointer or

keyboard actions .

Sikuli uses python for scripting

PYTHON FEATURES

Easy-to-learn

Easy-to-read

A broad standard library

Interactive Mode

Portable

Databases

Comments # symbol used

Quoting single (')

double (")

triple (''' or """') [span the string across multiple lines]

List

Python's compound data type

lists are similar to arrays

list can be of different data type

print (“ ",list[1:3]) or print list[2:] or print list[0] or print list

del lis[2]

Dictionary

kind of hash table type

consist of key-value pairs

tinydict = {'name': 'john','code':6734, 'dept': 'sales'}

IF - ELSE

if expression1:

statement(s)

elif expression2:

statement(s)

else:

statement(s)

membership operator

In

Evaluates to true if it finds a variable in the specified sequence and

false otherwise.

not in

Evaluates to true if it does not finds a variable in the specified

sequence and false otherwise.

identity operator

Is

Evaluates to true if the variables on either side of the operator point to

the same object and false otherwise.

is not

Evaluates to false if the variables on either side of the operator point

to the same object and true otherwise.

Range()

len keyword

Break

continue

pass

LOOPS

while

for

nested loops

loop with else

FUNCTIONS

def printme( str ):

print str;

return;

Function call

printme("I'm first call !");

TECHNICAL CAPABILITIES

The three core of Sikuli are :

Look

Recognize

Interact

LOOK:

Sikuli uses a system API to grab the pixel data from

the screen buffer and analyzes it.

This basic system function for screen capture is

available on most modern platforms

including Windows, Mac, Linux and Android.

RECOGNIZE:

Sikuli “recognizes” widgets on a GUI using

pattern matching based on visual appearance.

There are two use cases that must be dealt with

separately:

Recognizing a specific widget and recognizing a

class of widgets.

INTERACT:

Sikuli uses the Java Robot class to simulate mouse

and keyboard interaction.

After recognizing a widget, Sikuli knows that

widget’s location on the screen, and can move the

pointer to that location.

At that location, it can issue a click command which

will effectively click on that widget

BASIC EXPLORATION STRATEGIES

(a) random exploration first identifies all widgets on

the current screen image and interacts with one

uniformly at random. The length of each interaction

is a user-controlled parameter.

(b) Depthfirst exploration systematically explores

all interactions upto a given length, assuming that

the system is deterministic.

HOW SIKULI WORKS ?

SIKULI SCRIPT

Sikuli Script is a Jython and Java library thatautomates GUI interaction using image patterns todirect keyboard/mouse events.

The core of Sikuli Script is a Java library thatconsists of two parts:

java.awt.Robot, which delivers keyboard andmouse events to appropriate locations

C++ engine based on OpenCV, which searchesgiven image patterns on the screen.

THE STRUCTURE OF A SIKULI

SOURCE/EXECUTABLE SCRIPT (.SIKULI, .SKL)

A Sikuli script (.sikuli) is a directory that consists of a Python source file (.py), and all the image files (.png) used by the source file.

All images used in a Sikuli script are simply a path to the .png file in the .sikuli bundle. \

Therefore, the Python source file can also be edited by any text editor.

While saving a script using Sikuli IDE, an extra HTML file is also created in the .sikuli directory so that users can share the scripts on the web easily.

SIKULI IDE

Sikuli IDE edits and runs Sikuli source scripts. Sikuli

IDE integrates screen capturing and a custom text

editor (SikuliPane) to optimize the usability of

writing a Sikuli script.

To show embedded images in the SikuliPane, all

string literals that ends with ”.png” are replaced by a

custom JButton object, ImageButton.

If a user adjusts the image pattern’s similarity, a

Pattern() is automatically constructed on top of the

image.

SIKULI IDE

HELLO WORLD (WINDOWS)

Let us begin with a customary Hello World

example!

You will learn how to capture a screenshot of a GUI

element and write a Sikuli Script to do two things:

1. Click on that element

2. Type a string in that element

THE GOAL OF THE HELLO WORLD SCRIPT IS TO

AUTOMATICALLY TYPE “HELLO WORLD” INTO THE START

MENU SEARCH BOX, LIKE THIS:

SIKULI SCRIPT

CONTROLLING SIKULI SCRIPTS AND THEIR BEHAVIOR

setShowActions(False | True) If set to True, when a script is run, Sikuli shows a visual effect

(a blinking double lined red circle) on the spot where the action will take place before executing actions

exit([value ]) Stops the script gracefully at this point. The value is returned

to the calling environment.

Settings.MinSimilarity The default minimum similiarty of find operations. Sikuli

searches the region using a default minimum similarity of 0.7.

Settings.MoveMouseDelay Control the time taken for mouse movement to a target

location by setting this value to a decimal value (default 0.5). The unit is seconds. Setting it to 0 will switch off any animation.

PROBLEMS

Poor documentation

Slightly buggy

No reports , video or export of results .

KEY METHODS

Find

findAll

exists

wait

waitVanish

MOUSE ACTIONS

Click

doubleClick

RightClick

Hover

dragDrop

INTERACTING WITH THE USER AND OTHER APPLICATIONS

PopUps and input

popup(text[, title ])

Parameters

text – text to be displayed as message

title – optional title for the messagebox (default: Sikuli

Info)

Example:

popup("Hello World!\nHave fun with Sikuli!")

popError(text[, title ])

Same as popup() but with a different title (default Sikuli

Error) and alert icon.

Example:

popError("Uuups, this did not work")

A dialog box that looks like below will popup

popAsk(text[, title ])

Returns True if user clicked Yes, False otherwise

Same as popup() but with a different title (default Sikuli

Decision) and alert icon.

There are 2 buttons: Yes and No and hence the

message text should be written as an appropriate

question.

Example:

answer = popAsk("Should we really continue?")

if not answer:

exit(1)

A DIALOG BOX THAT LOOKS LIKE BELOW WILL

POPUP

input([msg ][, default ][, title ][, hidden ])

Display a dialog box with an input field, a Cancel button, and

an OK button.

The script then waits for the user to click either the Cancel

or the OK button.

Parameters

msg – text to be displayed as message (default: nothing)

default – optional preset text for the input field

title – optional title for the messagebox (default: Sikuli Input)

hidden – (default: False) if true the entered characters are shown as

asterisks

Returns

the text, contained in the input field, when the user clicked

Ok

None, if the user pressed the Cancel button or closed the

dialog

Example: plain input:

name = input("Please enter your name to log in:")

name = input("Please enter your name to log in:",

"anonymous")

EXAMPLE: INPUT WITH HIDDEN INPUT:

password = input("please enter your secret", hidden = True)

inputText([msg ][, title ][, lines ][, width ])

Parameters

msg – text to be displayed as message (default: nothing)

title – optional title for the messagebox (default: Sikuli Text)

lines – how many lines the text box should be high (default: 9)

width – how many characters the box should have as width

(default: 20)

Returns the possible multiline text entered by the user

(might be empty)

EXAMPLE:

story = inputText("please give me some lines of

text")

lines = story.split("\n") # split the lines in the list lines

for line in lines:

print line

select([msg ][, title ][, options ][, default ])

Parameters

msg – text to be displayed as message (default:

nothing)

title – optional title for the messagebox (default: Sikuli

Selection)

options – a list of text items (default: empty list, nothing

done)

default – the preselected list item (default: first item)

Returns the selected item (might be the default)

EXAMPLE:

items = ("nothing selected", "item1", "item2", "item3")

selected = select("Please select an item from the list",

options = items)

if selected == items[0]:

popup("You did not select an item")

exit(1)

STARTING AND STOPPING OTHER APLLICATIONS AND

BRINGING THEIR WINDOWS TO FRONT

Here we talk about the basic features of opening or

closing other applications and switching to them

(bring their windows to front).

openApp(application)

openApp("cmd.exe")

openApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe")

switchApp(application) Switch to the specified application.

switchApp("cmd.exe")

switchApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe")

closeApp(application) Close the specified application.

closeApp("cmd.exe")

closeApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe")

run(command)

Run command in the command line

Parameters command – a command that can be run from the

command line.

This function executes the command and the script waits for its

completion.

EXCEPTION HANDLING

setThrowException(False | True) By using this method you control, how Sikuli should handle not

found situations in this region.

Parameters True – all subsequent find operations (explicit or implicit) will

raise exception FindFailed(which is the default when a script is

started) in case of not found.

False – all subsequent find operations will not raise exception

FindFailed. Instead, explicit find operations such as

Region.find() will return None. Implicit find operations (action

functions) such as Region.click() will do nothing and return 0.

getThrowException()

Returns True or False

Get the current setting as True or False (after start of

script, this is True by default) in this region.

SPECIAL KEYS

The methods supporting the use of special keys are

type(), keyDown(), and keyUp(). String concatenation with with other text or other key constants

is possible using “+”.

type("some text" + Key.TAB + "more text" + Key.TAB +

Key.ENTER)

or eqivalent

type("some text\tmore text\n")

miscellanous keys ENTER, TAB, ESC, BACKSPACE, DELETE, INSERT, SPACE

function keys F1, F2, F3, F4, F5, F6, F7.......

navigation keys HOME, END, LEFT, RIGHT, DOWN, UP, PAGE_DOWN,

PAGE_UP

special keys PRINTSCREEN, PAUSE, CAPS_LOCK, SCROLL_LOCK,

NUM_LOCK

numpad keys NUM0, NUM1, NUM2, NUM3.............

SEPARATOR, ADD, MINUS, MULTIPLY, DIVIDE

modifier keys ALT, CMD, CTRL, META, SHIFT, WIN

These modifier keys cannot be used as a key modifier with

functions like type(), rightClick(), etc. They can only be used with

keyDown() and keyUp(). If you need key modifiers, use

KeyModifier instead.

type(Key.ESC, KeyModifier.CTRL + KeyModifier.ALT)

or equivalent

type(Key.ESC, KeyModifier.CTRL | KeyModifier.ALT)

They should only be used in the modifiers parameter with

functions like type(), rightClick(), etc.

They should never be used with keyDown() or keyUp().

KEYBOARD ACTIONS

Type(text)

Type(img , text)

Paste(text)

Paste(img , text)

CONCLUSION

Sikuli means “ God’s eye ” in mexican language .

Sikuli currently uses Python as the scripting

language.

Sikuli is a visual technology to search and

automates GUI using images (screenshots).

It automates anything you see on the screen

without internal API ‘s support

REFERENCE

SikuliX Documentation Release 1.1.0-Beta1 byRaimund

Hocke aka RaiMan ( October 19, 2014 ) .

Abstracting Perception and Manipulation in End-User Robot

Programming using Sikuli (IEEE) .

Exploring the Internal State of User Interfaces by Combining

Computer Vision Techniques with Grammatical Inference

(IEEE) .

Sikuli: Using GUI Screenshots for Search and Automation

(IEEE) .