33
Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant Face Detection

Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Embed Size (px)

Citation preview

Page 1: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

OverviewOverview

A Quantum Computation Simulation Language

Anomaly Detection in the Windows Registry

Detecting Splice Sites in Genes

Rotationally Invariant Face Detection

Page 2: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

-HSK-HSKA Quantum Programming A Quantum Programming Language and CompilerLanguage and Compiler

Katherine Heller, Krysta Svore, Maryam Kamvar(Al Aho)

Page 3: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

What is -HSK?What is -HSK?

Quantum Computation Simulation LanguageQuantum Computation Simulation Language

Quantum CompilerQuantum Compiler

Q-HSK enables simplified programming of Q-HSK enables simplified programming of quantum algorithms with built-in graphicsquantum algorithms with built-in graphics

Page 4: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Many Worlds InterpretationMany Worlds Interpretation

One formulation of quantum theoryOne formulation of quantum theory

Each universe has a corresponding Each universe has a corresponding amplitude (i.e. complex number)amplitude (i.e. complex number)

|amplitude||amplitude|22 = probability of existence = probability of existence

xu1

u2

u4

u3

Page 5: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

QubitsQubits

Quantum analogue of a classical bitQuantum analogue of a classical bit Takes on values 0, 1, or superposition of states:Takes on values 0, 1, or superposition of states:

|| ωω›› = = αα || 00›› + + ββ || 11›› wherewhere | |αα||22 + + ||ββ||22 = 1 = 1

|| ωω›› = cos( = cos(θθ / 2) / 2) || 00›› + e + eiiφφ sin( sin(θθ / 2) / 2) || 11››

Page 6: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Quantum GatesQuantum Gates

Reversible – all unitary operators (UReversible – all unitary operators (U† † U=U=II))

Universal quantum gates – {U2,XOR}, ToffoliUniversal quantum gates – {U2,XOR}, Toffoli

Some common gates – Hadamard, QFT, CNOTSome common gates – Hadamard, QFT, CNOT

H H|| 11›› || 00››

1/√21/√2 ( | ( | 00›› ++ || 11››))

Page 7: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Key Features of Key Features of the Q-HSK Compilerthe Q-HSK Compiler

Familiar C-style syntaxFamiliar C-style syntax

Matrix operations via CBLASMatrix operations via CBLAS

ComplexComplex and and real real data typesdata types

A quantum type A quantum type qregqreg

A graphical view of quantum algorithmsA graphical view of quantum algorithms Lucid representation of quantum qubits, registers, and gatesLucid representation of quantum qubits, registers, and gates

Interactive user options (start, stop, pause, change Interactive user options (start, stop, pause, change animation rate)animation rate)

Detailed text output to trace algorithmDetailed text output to trace algorithm

Page 8: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

A Simple ExampleA Simple Exampleint main( )int main( ){{

int a, i;int a, i;qreg *q;qreg *q;q=create(5);q=create(5);i = 0;i = 0;while (i < 5)while (i < 5)

{{q[i] = (0.0, 0.0);q[i] = (0.0, 0.0);i = i + 1;i = i + 1;

}}q = computeHadamard(q);q = computeHadamard(q);a = Measure(q);a = Measure(q);printf(“This is the measure: %d”, a);printf(“This is the measure: %d”, a);return 0;return 0;

}}

00000

q H M

Page 9: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Shor’s AlgorithmShor’s Algorithm

Factors large numbersFactors large numbers

n - number to factorizen - number to factorize

x – random numberx – random number

a – ranges from 0 to q-1a – ranges from 0 to q-1

nn22<=q<=2n<=q<=2n22

r – period of xr – period of xaa (mod n) – exp. classically (mod n) – exp. classically

one factor of n is gcd(xone factor of n is gcd(xr/2r/2-1,n) – fast classically-1,n) – fast classically

Page 10: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Graphical InterfaceGraphical Interface

Page 11: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Architecture of Q-HSK CompilerArchitecture of Q-HSK Compiler

Program.q Lexical Analyzer Syntax Analyzer Semantic Analyzer Translator

Program.cpp g++

Java

Graphics

Executable

lex.yy.c y.tab.c translate.c

javac

Page 12: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

One Class Support Vector Machines One Class Support Vector Machines for Detecting Anomalous Windows for Detecting Anomalous Windows

Registry AccessesRegistry Accesses

Collaborators: Krysta Svore, Angelos Keromytis, Sal Stolfo

Page 13: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Host Based Intrusion Detection Host Based Intrusion Detection SystemsSystems

Microsoft Windows – most often attackedMicrosoft Windows – most often attacked

Current method to combat attacksCurrent method to combat attacks Virus Scanners and Security PatchesVirus Scanners and Security Patches

Problem: These do not combat unknown attacks Problem: These do not combat unknown attacks so frequent updates are neededso frequent updates are needed

Host based IDSHost based IDS Monitor system accesses to detect intrusionsMonitor system accesses to detect intrusions

Application of data mining techniques Application of data mining techniques

Page 14: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

The Windows Registry and RADThe Windows Registry and RAD

Windows RegistryWindows Registry Stores configuration settings for system Stores configuration settings for system parameters – security information, programs, etc.parameters – security information, programs, etc. Programs query the registry for informationPrograms query the registry for information

Registry Anomaly DetectionRegistry Anomaly Detection audit sensoraudit sensor model generatormodel generator anomaly detectoranomaly detector

Process: EXPLORER.EXEQuery: OpenKey

Key: HKCR\CKSUD\{B41DB860-8EE4-11D2-9906-EA9FADC173CA}\shellex\MayChangeDefaultMenuResponse: SUCCESS

ResultValue: NOTFOUND

Page 15: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Probabilistic Anomaly Detection Probabilistic Anomaly Detection AlgorithmAlgorithm

Computes 25 consistency checks: Computes 25 consistency checks:

P(XP(Xii) and P(X) and P(Xii|X|Xjj))

Multinomial with Hierarchical PriorMultinomial with Hierarchical PriorFor observed elements i:For observed elements i:

P(X = i) = C*(NP(X = i) = C*(Nii + + αα)/(k)/(k00αα+N) +N)

where N - total number of observationswhere N - total number of observations

Ni - number of observations of symbol INi - number of observations of symbol I

αα – “pseudo count” for each observed symbol – “pseudo count” for each observed symbol

kk00 – number of observed symbols – number of observed symbols

L – number of possible symbolsL – number of possible symbols

For unobserved elements i:For unobserved elements i:

P(X = i) = (1-C)*1P(X = i) = (1-C)*1/(L-k/(L-k00))

C= N/(N+L-kC= N/(N+L-k00 ) )

Page 16: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

One Class SVMsOne Class SVMs

Analogous to two class SVM where all data lies in the first class Analogous to two class SVM where all data lies in the first class and the origin is sole member of second classand the origin is sole member of second class

Solve optimization problem to find rule f with maximal marginSolve optimization problem to find rule f with maximal margin

f(f(xx)=)=‹‹ww,,xx›+b›+b

Equivalent to solving the dual quadratic programming problem:Equivalent to solving the dual quadratic programming problem:

minminαα (1/2) (1/2) ∑∑I,j I,j ααiiααjjK(xK(xii,x,xjj)) s.t. 0 s.t. 0≤≤ααii≤1/(≤1/(ννl) , ∑l) , ∑i i ααi i = 0= 0

Kernel function projects input vectors into a feature space allowing Kernel function projects input vectors into a feature space allowing for non-linear decision boundariesfor non-linear decision boundaries

ΦΦ: X → R: X → RN N K(xK(xii,x,xjj) = ) = ‹‹ΦΦ((xxii), ), ΦΦ(x(xjj)›)›

Page 17: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

ExperimentsExperiments

Kernels:Kernels: Linear: K(x,y) = (xLinear: K(x,y) = (x·y)·y)

Polynomial: K(x,y) = (x·y+1)Polynomial: K(x,y) = (x·y+1)dd

Gaussian: K(x,y) = e Gaussian: K(x,y) = e -║x-y║-║x-y║22/(2/(2σσ22))

Feature Vectors:Feature Vectors: BinaryBinary

Frequency-basedFrequency-based

Page 18: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

ResultsResults

Page 19: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Sequence Information for the Sequence Information for the Splicing of Human Pre-mRNA Splicing of Human Pre-mRNA Identified by Support Vector Identified by Support Vector

Machine ClassificationMachine Classification

Collaborators: Xiang Zhang, Ilana Hefter, Christina Leslie, Larry Chasin

Page 20: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

What Is Splicing?What Is Splicing?

Exon1 Exon2Intron

Exon1 Exon2

Exon2Exon1

Donor Branch Acceptor

DNA

mRNA

Page 21: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Pseudo ExonsPseudo Exons

Consensus SequencesConsensus Sequences Donor Site: Donor Site: MAG|gtragt (M=A/C, r=a/g)

Acceptor Site: Acceptor Site: (y)10ncag|G (y=c/t, n=a/c/g/t)

Donor and acceptor sites scored based on Donor and acceptor sites scored based on closeness to consensuscloseness to consensus

Identifying Pseudo ExonsIdentifying Pseudo Exons Intronic segmentsIntronic segments

Have high scoring “donor” and “acceptor” sitesHave high scoring “donor” and “acceptor” sites

We look for discriminative signals in intronic We look for discriminative signals in intronic regions near real and pseudo exonsregions near real and pseudo exons

Page 22: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

String KernelsString Kernels

Feature map: number of times each k-length Feature map: number of times each k-length (contiguous) string occurs in sequence(contiguous) string occurs in sequence

Dimension of feature space is NDimension of feature space is Nkk

Example:

k=2 Sequence = ACCTGGTG

1

AC

0

AA

0

AG

0

AT

0

CA

1

CC

0

CG

1

CT

0

GA

0

GC

1

GG

1

GT

0

TA

0

TC

2

TG

0

TT

Page 23: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Splice KernelsSplice Kernels

Hypothesis: False splice sites are Hypothesis: False splice sites are intrinsically defective due to bad internal nt intrinsically defective due to bad internal nt combinationscombinations

All possible size k internal nt combinations All possible size k internal nt combinations are featuresare features

Example (k=2): If the internal combination Example (k=2): If the internal combination (3g,5a) occurs, that feature value is 1, (3g,5a) occurs, that feature value is 1, otherwise it is 0otherwise it is 0

Page 24: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Recursive Feature SelectionRecursive Feature Selection

Normal vector to the hyperplane:Normal vector to the hyperplane:

ww==∑∑i=1..m i=1..m

yyiiααiixxii

If |wIf |wjj| large in absolute value, the jth feature is | large in absolute value, the jth feature is

important for SVM discriminationimportant for SVM discrimination

Approximation due to degree 2 polynomial Approximation due to degree 2 polynomial kernel – calculate wkernel – calculate wupup and w and wdowndown separately, then separately, then

eliminate bottom 50% of features for eacheliminate bottom 50% of features for each

Stop when ROC score drops below 90% of Stop when ROC score drops below 90% of original value on untouched test setoriginal value on untouched test set

Page 25: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

ResultsResults

Flanks Splice SitesExon Body ROC Specificitya

US DS 3’ 5’

CVb 0.609 0.484

+ – – – – 0.791 0.638

– + – – – 0.784 0.618

+ + – – – 0.855 0.695

– – + – – 0.823 0.672

– – – + – 0.837 0.698

– – + + – 0.907 0.777

+ + + + – 0.932 0.825

– – – – + 0.946 0.841

+ + – – + 0.984 0.956

– – + + + 0.987 0.964

+ + + + + 0.991 0.976

Splice Sites

FlanksExon Bodies

True positives detected 32/37 35/37 37/37

- - - 1225 1225 1225

- + - 164 259 668

- - + 108 232 383

+ - + 58 111 180

+ + + 19 53 90

Page 26: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Rotationally Invariant Face Rotationally Invariant Face Detection Using Multi-Resolution Detection Using Multi-Resolution

HistogramsHistograms

Collaborators: Shikher Bisaria, Tony Jebara

Page 27: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Face DetectionFace Detection

Given a picture with faces, how do we Given a picture with faces, how do we determine where the faces are in the determine where the faces are in the image? Which pixels are face pixels?image? Which pixels are face pixels?

We would like to determine this with a We would like to determine this with a system that:system that:

Runs in real timeRuns in real time

Recognizes rotations of faces Recognizes rotations of faces

(e.g. when someone tilts their head to one side)(e.g. when someone tilts their head to one side)

Page 28: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Gaussian BlurringGaussian Blurring

Face images are greyscale (.pgms)Face images are greyscale (.pgms) Successive levels of blur are obtained by Successive levels of blur are obtained by reconvolving previous level of blur images with a reconvolving previous level of blur images with a 2 dimensional gaussian function 2 dimensional gaussian functionMathematically equivalent to two passes of a Mathematically equivalent to two passes of a one dimensional gaussian functionone dimensional gaussian functiong(i,j) = 1/(2g(i,j) = 1/(2πσπσ22) ∑) ∑mm∑∑nn e e -(m-(m22+n+n22)/(2)/(2σσ22)) · f(i-m,j-n)· f(i-m,j-n)

= = 1/(21/(2πσπσ22) ∑) ∑mm e e -m-m22/(2/(2σσ22)) · ∑· ∑nn e e -n-n22/(2/(2σσ22)) · f(i-m,j-n) · f(i-m,j-n)

Page 29: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Multi-Resolution HistogramsMulti-Resolution Histograms

Histogram equalize the imageHistogram equalize the image

Concatenate histograms of image together Concatenate histograms of image together after successive levels of gaussian blurringafter successive levels of gaussian blurring

Page 30: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Average HistogramsAverage Histograms

Compute average face and non-face Compute average face and non-face multi-resolution histograms from training setmulti-resolution histograms from training set

Average Non-Face HistogramAverage Non-Face Histogram Average Face Average Face HistogramHistogram

Page 31: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Optimization ProblemOptimization Problem

C(C(αα) = min) = minαα ║║HHFAVGFAVG – h – hFF║║22 + + ║║HHNFAVGNFAVG – – hhNFNF║║22

Where Where hhF F = (1/= (1/∑∑i i ααii) ∑) ∑ii ααiihhii

hhNF NF = (1/= (1/∑∑i i (1- (1- ααii)) ∑)) ∑ii (1- (1-ααii)h)hii

such that 0≤ such that 0≤ ααii ≤ 1 , ∑ ≤ 1 , ∑i i ααi i = 1= 1

Let Let ββii = (1- = (1- ααii) ) Q = ‹hQ = ‹hii,h,hjj› ›

ccαα = ‹h = ‹hii,,HHFAVGFAVG› · constant› · constant ccββ = ‹h = ‹hii,,HHNFAVGNFAVG› · constant› · constant

= min= minαα,,ββ ααTTQQαα + 1/(N-1) + 1/(N-1)2 2 ββTTQQββ – 2c – 2cαα

TTαα – 2/(N-1)c – 2/(N-1)cββTTββ

Page 32: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

Solve Using SMOSolve Using SMO

ααiiNEWNEW = [ 1/(N-1) = [ 1/(N-1)2 2 QQii ii - 1/(N-1)- 1/(N-1)2 2 ∑∑k≠i,jk≠i,jααkk Q Qjjjj + (1- ∑ + (1- ∑

k≠i,jk≠i,jααkk ) Q ) Qjjjj

- (1- ∑- (1- ∑k≠i,jk≠i,jααkk ) Q ) Qij ij + 1/(N-1)+ 1/(N-1)2 2 ∑∑k≠i,jk≠i,jααkk Q Qij ij - 1/(N-1)- 1/(N-1)2 2 QQijij - - ccααii

+ c+ cββii + c + cααjj

- c - cββjj + ∑ + ∑

k≠i,jk≠i,j((ααkk Q Qikik) - ∑) - ∑k≠i,jk≠i,j((ααkk Q Qjkjk) )

- 1/(N-1)- 1/(N-1)2 2 ∑∑k≠i,jk≠i,j((ααkk Q Qikik) + 1/(N-1)) + 1/(N-1)2 2 ∑∑k≠i,jk≠i,j((ααkk Q Qjkjk)] / [Q)] / [Qii ii + Q+ Qjjjj

- 2Q- 2Qij ij + 1/(N-1)+ 1/(N-1)2 2 QQii ii + 1/(N-1)+ 1/(N-1)2 2 QQjj jj - 2/(N-1)- 2/(N-1)2 2 QQijij] ]

Bounds for Bounds for ααiiNEWNEW : :

L = 0L = 0

H = 1 - ∑H = 1 - ∑k≠i,jk≠i,jααkk

ααjjNEW NEW = (1 - ∑= (1 - ∑

k≠i,jk≠i,jααkk ) - ) - ααiiNEWNEW

Page 33: Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows Registry Detecting Splice Sites in Genes Rotationally Invariant

ResultsResults