Detecting Distance-Based Outliers in Streams of Data

Preview:

DESCRIPTION

Detecting Distance-Based Outliers in Streams of Data. Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria. Problem Definition. - PowerPoint PPT Presentation

Citation preview

Detecting Distance-Based Outliers in Streams of Data

Fabrizio Angiulli and Fabio Fassetti

DEIS, Universit `a della Calabria

Problem Definition

• Definition 3.1 (Distance-Based Outlier). Let S be a set of objects, obj an object of S, k a positive integer, and R a positive real number. Then, obj is a distance-based outlier (or, simply, an outlier) if less than k objects in S lie within distance R from obj.

• The neighbors of an object obj that precede obj in the stream and belong to the current window are called preceding neighbors of obj.

• The neighbors of an object obj that follow obj in the stream and belong to the current window are called succeeding neighbors of obj.

Problem Definition

• If the number of succeeding neighbors of obj is less than k, obj could become an outlier depending on the stream evolution.

• Conversely, since obj will expire before its succeeding neighbors, inliers having at least k succeeding neighbors will be inliers for any stream evolution. Such inliers are called safe inliers.

Example

Information of ISB

• n.obj : a data stream object.• n.id: the identifier of n:obj, that is the arrival time of n:obj.• n.count after : the number of succeeding neighbors of• n.obj. This field is exploited to recognize safe inliers.• n.nn_before: a list, having size at most k, containing the

identifiers of the most recent preceding neighbors of n.obj. At query time, this list is exploited to recognize the number of preceding neighbors of n.obj.

Exact algorithm

Approximate Algorithm

Recommended