8
S. H. Lee et al.: Interactive E-Learning System Using Pattern Recognition and Augmented Reality Contributed Paper Manuscript received April 15, 2009 0098 3063/09/$20.00 © 2009 IEEE 883 Interactive E-Learning System Using Pattern Recognition and Augmented Reality Sang Hwa Lee, Junyeong Choi, and Jong-Il Park, Member, IEEE Abstract This paper proposes an interactive e-learning system using pattern recognition and augmented reality. The goal of proposed system is to provide students with realistic audio-visual contents when they are leaning. The proposed e- learning system consists of image recognition, color and polka-dot pattern recognition, and augmented reality engine with audio-visual contents. When the web camera on a PC captures the current page of textbook, the e-learning system first identifies the images on the page, and augments some audio-visual contents on the monitor. For interactive learning, the proposed e-learning system exploits the color-band or polka-dot markers which are stuck to the end of a finger. The color-band and polka-dot marker act like the mouse cursor to indicate the position in the textbook image. Appropriate interactive audio-visual contents are augmented as the marker is located on the predefined image objects in the textbook. The proposed e-learning system was applied to the educational courses in the elementary school, and we obtained satisfactory results for real applications. We expect that the proposed e- learning system is popular when the educational contents and scenarios are sufficiently provided. 1 Index Terms — E-learning system, interactive learning, augmented reality, pattern recognition I. INTRODUCTION Recently, new media and systems for education are appearing in the form of portable dictionary, e-book, distant/virtual classrooms, and so on. The main concept of new educational systems is to combine the educational contents using information technologies [5], [6], [7], [8]. The students study their textbooks with auxiliary audio- visual contents which are played on the personal computer and specific terminals. In the distant classroom, the remote education is performed by communication networks. And the students can have some experiences on the remote or imaginary places by the virtual classrooms [7], [8]. The virtual places are made by high definition projectors. The students may act like as if they are really in the remote places. These educational systems usually exploit various information technologies, such as sensor networks, computer graphics, view synthesis, geometry 1 Sang Hwa Lee is with Department of Electrical Engineering and Computer Science, INMC, Seoul National University, Kwanak-gu, Seoul, 151-742, South Korea (emails: [email protected] ). Junyeong Choi and Jong-Il Park is with Department of Electronics and Computer Eng., Hanyang Univ., 17 Haengdang-dong, Seongdong-gu, Seoul, 133-791, Korea (email: [email protected] , [email protected] ). Jong-Il Park is the corresponding author (e-mail: [email protected] ). analysis, and communication systems [10], [11], [13], [14], [15], [16], [17]. In this paper, we propose a new interactive e-learning system. The proposed system exploits pattern recognition techniques for object-based interactive learning. Our goal is to design a mentoring system for self-studying, which lets the students learn the audio-visual contents interactively. The proposed e-learning system augments the audio-visual contents as the students interact with the objects in the textbook. If there are educational programs such as textbook, auxiliary audio-visual contents, 3-dimensional (3-D) graphics, and educational scenarios, our interactive e-learning system provides how to interact and augment the contents based on pattern recognition. When the images and objects on the text pages are recognized, the related contents are played or augmented on the display. The contents are also displayed according to the pattern marker which is a kind of computer mouse. We implement the recognition algorithms of images and objects using texture-based features. We design color- band and polka-dot patterns for object-based user interaction. And we define some human-computer interfaces using the recognition results according to the educational scenarios. Thus, the proposed e-learning system is to combine education with various information technologies. It should be noted that the proposed e-learning system is exploited for the usual and public educational courses. We tested the proposed e-learning system with real elementary education courses, and obtained successful results as a mentoring system. The rest of this paper is organized as follows. We briefly introduce how the proposed e-learning system works in Section II. We describe the polka-dot pattern and color-band markers for interaction in Section III. We explain the recognition algorithm of images or objects in Section IV. We report the results of applying the proposed e-learning system to the elementary educational courses in Section V. Finally, we conclude the paper in Section VI. II. OVERVIEW OF PROPOSED E-LEARNING SYSTEM The proposed e-learning system consists of image/object recognition, polka-dot pattern recognition, color-band marker recognition, augmented reality engine, audio-visual contents, and some learning scenarios of textbooks. The learning scenarios are the predefined processes when or where to augment the contents. The scenarios combine the educational contents with information technologies to maximize the learning efficiency. And the augmented reality engine realizes the scenarios. Fig. 1 shows the structure of proposed e- learning system. A web camera connected to the computer

Paper

Embed Size (px)

DESCRIPTION

Guide for reserchers

Citation preview

S. H. Lee et al.: Interactive E-Learning System Using Pattern Recognition and Augmented Reality

Contributed Paper Manuscript received April 15, 2009 0098 3063/09/$20.00 © 2009 IEEE

883

Interactive E-Learning System Using Pattern Recognition and Augmented Reality

Sang Hwa Lee, Junyeong Choi, and Jong-Il Park, Member, IEEE

Abstract — This paper proposes an interactive e-learning

system using pattern recognition and augmented reality. The goal of proposed system is to provide students with realistic audio-visual contents when they are leaning. The proposed e-learning system consists of image recognition, color and polka-dot pattern recognition, and augmented reality engine with audio-visual contents. When the web camera on a PC captures the current page of textbook, the e-learning system first identifies the images on the page, and augments some audio-visual contents on the monitor. For interactive learning, the proposed e-learning system exploits the color-band or polka-dot markers which are stuck to the end of a finger. The color-band and polka-dot marker act like the mouse cursor to indicate the position in the textbook image. Appropriate interactive audio-visual contents are augmented as the marker is located on the predefined image objects in the textbook. The proposed e-learning system was applied to the educational courses in the elementary school, and we obtained satisfactory results for real applications. We expect that the proposed e-learning system is popular when the educational contents and scenarios are sufficiently provided. 1

Index Terms — E-learning system, interactive learning, augmented reality, pattern recognition

I. INTRODUCTION Recently, new media and systems for education are

appearing in the form of portable dictionary, e-book, distant/virtual classrooms, and so on. The main concept of new educational systems is to combine the educational contents using information technologies [5], [6], [7], [8]. The students study their textbooks with auxiliary audio-visual contents which are played on the personal computer and specific terminals. In the distant classroom, the remote education is performed by communication networks. And the students can have some experiences on the remote or imaginary places by the virtual classrooms [7], [8]. The virtual places are made by high definition projectors. The students may act like as if they are really in the remote places. These educational systems usually exploit various information technologies, such as sensor networks, computer graphics, view synthesis, geometry

1 Sang Hwa Lee is with Department of Electrical Engineering and

Computer Science, INMC, Seoul National University, Kwanak-gu, Seoul, 151-742, South Korea (emails: [email protected]).

Junyeong Choi and Jong-Il Park is with Department of Electronics and Computer Eng., Hanyang Univ., 17 Haengdang-dong, Seongdong-gu, Seoul, 133-791, Korea (email: [email protected], [email protected]).

Jong-Il Park is the corresponding author (e-mail: [email protected]).

analysis, and communication systems [10], [11], [13], [14], [15], [16], [17].

In this paper, we propose a new interactive e-learning system. The proposed system exploits pattern recognition techniques for object-based interactive learning. Our goal is to design a mentoring system for self-studying, which lets the students learn the audio-visual contents interactively. The proposed e-learning system augments the audio-visual contents as the students interact with the objects in the textbook. If there are educational programs such as textbook, auxiliary audio-visual contents, 3-dimensional (3-D) graphics, and educational scenarios, our interactive e-learning system provides how to interact and augment the contents based on pattern recognition. When the images and objects on the text pages are recognized, the related contents are played or augmented on the display. The contents are also displayed according to the pattern marker which is a kind of computer mouse. We implement the recognition algorithms of images and objects using texture-based features. We design color-band and polka-dot patterns for object-based user interaction. And we define some human-computer interfaces using the recognition results according to the educational scenarios. Thus, the proposed e-learning system is to combine education with various information technologies. It should be noted that the proposed e-learning system is exploited for the usual and public educational courses. We tested the proposed e-learning system with real elementary education courses, and obtained successful results as a mentoring system.

The rest of this paper is organized as follows. We briefly introduce how the proposed e-learning system works in Section II. We describe the polka-dot pattern and color-band markers for interaction in Section III. We explain the recognition algorithm of images or objects in Section IV. We report the results of applying the proposed e-learning system to the elementary educational courses in Section V. Finally, we conclude the paper in Section VI.

II. OVERVIEW OF PROPOSED E-LEARNING SYSTEM The proposed e-learning system consists of image/object

recognition, polka-dot pattern recognition, color-band marker recognition, augmented reality engine, audio-visual contents, and some learning scenarios of textbooks. The learning scenarios are the predefined processes when or where to augment the contents. The scenarios combine the educational contents with information technologies to maximize the learning efficiency. And the augmented reality engine realizes the scenarios. Fig. 1 shows the structure of proposed e-learning system. A web camera connected to the computer

IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY 2009 884

focuses on the textbook. The students study watching the textbook and the captured video frame where some audio-visual contents are augmented. When video frames from web camera are given, the recognition modules identify the image and objects on the textbook, and polka-dot or color-band marker. We have database of images and objects in the textbook in advance. The image/object recognition module identifies the current text page and objects that the student is studying. Using the identified pages and objects from recognition module, the system knows where the objects are located in the video frame. Then, some audio-visual contents are augmented on the computer monitor according to the predefined educational scenarios. The augmented reality engine matches the scenarios to information from the recognition modules, and plays the audio-visual contents automatically.

Some interactive learning actions are possible by the polka-dot or color-band marker. The marker is a kind of computer mouse, and indicates the location in the video frame. If the marker is located on the specific objects or menu bars, the object-based interactions are performed based on the educational scenarios and contents. The related visual contents are displayed on the marker even though the marker is moving. Some interactive actions, such as dragging the virtual object, scrubbing-based reaction, and menu selection, are also defined in the proposed e-learning system.

Fig. 1. Structure of proposed e-learning system. The image and marker recognition enables students to learn interactively according to the predefined learning scenarios and audio-visual contents.

For the usefulness of proposed e-learning system, we have produced many educational contents and scenarios for the real school courses. In addition, we develop the authoring tool to produce the educational scenarios and interactions easily,

since the proposed system is designed for general purposes. Thus, any contents providers and educational organizations can exploit the proposed e-learning system for their interactive learning courses. Being that this paper is focused on the recognition algorithms and augmented reality engine, not the contents production, we have developed this e-learning system based on tight collaboration with content providers and teachers.

III. DESIGN OF INTERACTIVE MARKERS For interesting interactive learning, we need a natural

human-computer interface method. We design two markers using polka-dot pattern or color-band. The markers are put on the fingers as bands, and act like the computer mouse. The markers indicate their locations in the video frame, which enables the students to interact according to the objects in the textbook. When the marker is located at a specific object or menu in the textbook, the corresponding audio-visual contents are augmented on the computer, or the predefined menu function is performed. And some interactive functions such as dragging and scrubbing object are defined to support various learning actions.

A. Polka-dot Pattern Recognition The polka-dot patterns are rare in the usual textbook, and

well recognized both in the grayscale and color images. The polka-dot band for a finger is used as a computer mouse. We exploit the polka-dot marker for interactive augmentation of contents and menu selection. To detect polka-dot pattern exactly in real-time, we propose fast filters of integer operations, hierarchical searching, and edge information.

Fig. 2 shows two array patterns of polka-dot markers. The array patterns are empirically selected by the polka-dot recognition algorithm. Since the maker on a finger is subject to be rotated and slanted at the camera viewpoint, the array pattern of dots should be invariant to the perspective variations. According to the proposed recognition algorithm, the optimal array pattern was selected as shown in Fig. 2. The hexagonal array is the best pattern that is invariant to the perspective distortions of camera viewpoints.

Fig. 2. Polka-dot patterns for interactive learning. The best arrays of dots are empirically selected. Usually, the hexagonal array patterns are robust for perspective distortion caused by different camera viewpoints.

S. H. Lee et al.: Interactive E-Learning System Using Pattern Recognition and Augmented Reality 885

The basic algorithm of polka-dot pattern recognition is the high pass filter in the horizontal and vertical directions [1]. The high pass filter first finds the area where the grayscale pixel values are regularly varied with black and white pattern as below,

( )

( )

2

( , )

2

( , )

( , ) ( , ) ( , ) ,

( , ) ( , ) ( , ) ,

hi j W

vi j W

f x y I i j I i D j

f x y I i j I i j D

= − +

= − +

∑ (1)

where ( , )hf x y and ( , )vf x y are high pass filters in the

horizontal and vertical directions at image coordinate ( , )x y .

The grayscale value of image ( , )I x y satisfies the following condition,

( , ) , ( , ),I i j B or W I i j≤ ≤ (2)

where the thresholds B and W mean the black (dark) or white (bright) values. Since the polka-dot pattern has only black and white values, we select such pixel values that satisfy the condition (2) when applying high pass filters. This reduces the process time to calculate the high pass filters and false positive errors in the complex textures. In (1), a parameter D is related to the diameter and interval of dots. The high pass filters are calculated in the regular grid, which reflects the periodic array of dots in the polka-dot patterns. Finally, we examine the values of high pass filters and the number of pixels to be calculated by (1) in the window W. We select the candidate position of polka-dot marker by high pass filtering and the number to indicate how many dot-like patterns exist in the window.

When we detect a candidate position of polka-dot marker, we refine the exact center position of the polka-dot marker. We find the position of polka-dot marker in the hierarchical process. The candidate position is first searched in the coarse grid in the original image. When a position satisfies the conditions of polka-dot marker by (1) and (2), we search for the marker positions near the position in the fine grid. There are multiple positions to be polka-dot marker. We average the multiple positions of the polka-dot markers in the fine grid.

For fast operation in marker detection, we restrict the search range based on the motion vector of previously detected polka-dot marker. The motion vector of polka-dot marker enables us to predict the next location. We predict the next position of marker, and first detect the marker in the restricted search range. If the polka-dot marker is not detected in the restricted search range, we expand the search range and find the marker again.

Finally, we examine the detected marker by edge information. Since the high pass filters can detect the complex textures or characters in the textbook as polka-dot markers, we exploit edge information to reduce the false positive errors.

Since the characters or other complex textures usually have some line-edge properties unlike the polka-dot patterns, the false positive errors are decreased by the edge information.

Fig. 3 shows some experimental results of polka-dot pattern recognition. It is shown that one or two independent polka-dot markers are detected in the video frame. In the usual personal computer environment, the recognition is performed at higher than 25 frames per second for 640x480 resolution.

(a)

(b)

Fig. 3. Recognition results of polka-dot patterns. Multiple polka-dot markers are independently detected in the video frame. The light green squares mean the central position of polka-dot markers. (a) Single maker detection, (b) two markers detection. B. Color-band Recognition

Some interactions in the educational scenarios require two or more markers simultaneously to manipulate multiple objects. Since the polka-dot patterns have little distinct difference between them, it is difficult to operate the multiple markers independently. We need new multiple markers to be individually discriminated. This paper designs two color-band markers which consist of three colors as shown in Fig. 4. The color-band markers are discriminated with each other and the polka-dot marker, thus, we use three markers simultaneously according to the educational scenarios and interaction.

Fig. 4. Color-band markers using three colors. The combination of three colors are optimally selected from various experiments. Each color-band marker is also discriminated with each other in a video frame.

IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY 2009 886

The colors of the markers are selected from various experiments. The blue color is usually best recognized and most stable in the lighting variation [18]. The blue band is located at the center of color-band marker, and is searched first. The other colors have been chosen since they are well discriminated with each other and the blue color. We design two color-band markers with different combinations of colors as shown in Fig. 4.

The color-band markers are detected by finding blue color first. The hue components in HSV color space are used for robust detection in various lighting conditions. When blue color pixels are detected, we examine the shape and area of blue region whether the blue region satisfies the condition of marker. Then, we search for the other colors (Green and Red, or Yellow and Purple) around the blue region. We consider the color range and the area of color region to confirm the color-band pattern. The order of colors and ratios of color areas are compared with the predefined criterions. Fig. 5 shows that two color-band markers are independently detected in a video frame. The color ranges of color-band markers are optimized according to the lighting environment. Note that the color ranges should be changed with respect to the lighting conditions. Thus, we devise the method to adjust the color ranges of markers automatically when the proposed e-learning system is setup.

Fig. 5. Recognition results of color-band markers. Two markers are consistently detected when they are moving.

IV. RECOGNITION OF IMAGE AND OBJECT Image recognition is designed for identification of

current text page or objects. When the text page or objects are identified, the related audio-visual contents are automatically played on the PC. Since we can obtain the pose information of objects in the captured image, we augment the visual 3-D contents according to the poses of objects.

As a previously related work, augmented reality (AR) toolkits have used geometric markers to be recognized in the images [19]. The AR markers consist of black/white geometric shapes in the square. The AR markers are well recognized in the various image distortions, and they have been popular for interactivity of virtual systems. However, since the AR markers are directly printed on the textbook

pages, they do not look good for text design. Our goal is to replace the AR geometric markers with image objects and to design a natural interface using the image objects.

A. Feature Extraction Since the images are subject to be rotated, distorted by

perspective viewpoints, and changed by scales, we have to extract robust features invariant to the image variations. Recently, the scale-invariant features have been widely researched, and some feature extraction algorithms are developed for image and objects recognition [2], [3], [4]. We exploit the robust features called speeded up robust feature (SURF) [3], which shows good recognition results and fast operation compared with SIFT [2]. Since the proposed e-learning system is also applied to the mobile devices like PDA or mobile phone, we implement the SURF algorithm with integer programming and optimized lookup tables.

The first step of feature extraction is to detect the distinct points which are also invariant to image variations. The distinct feature points are determined by Hessian matrix at image point ( , )x yx and scale parameter σ ,

( , ) ( , )( , )

( , ) ( , )xx xy

xy yy

L LL L

σ σσ

σ σ⎛ ⎞

= ⎜ ⎟⎝ ⎠

x xH x

x x. (3)

In (3), ( , )xxL σx is the second derivative of Gaussian-filtered image in the x-direction,

( )2

2( , ) ( )* ( )xxL I Gx

σ σ∂=∂

x x , (4)

where ( )* ( )I G σx means the convolution of image and Gaussian filter with standard deviation σ . The Gaussian filter blurs the image as the scale parameter σ increases. We construct the pyramid structure of Gaussian-filtered images, which considers the variation of image scales and resolutions. The sizes of Gaussian filters and scale parameter σ are increased to make higher scale (low resolution) images. And the images are sub-sampled as the scales increase. Finally, the distinct points are detected by the determinant of Hessian matrix in (3),

2det( ( , )) xx yy xyL L Lσ = −H x . (5)

In the scale space structure, the point at x and σ is the distinct feature point, if the determinant value is largest in the 26 neighbors. Fig. 6 shows the 26 neighbors to decide the distinct feature points. The neighboring scale images are considered to determine the distinct feature points.

S. H. Lee et al.: Interactive E-Learning System Using Pattern Recognition and Augmented Reality 887

Fig. 6. 26 neighbors to decide the feature points. The red pixel is the current point at x and σ. The neighboring scale images are considered to decide the feature points.

The second step of feature extraction is to find a dominant orientation around the feature point. The orientation information normalizes the rotated images and objects. Thus, the images or objects are recognized in spite of rotational distortions. The responses of Haar filters in the x and y-directions are respectively calculated for a circular neighboring region. The orientation angle for a pixel is calculated by the x and y responses of Haar filters. And each orientation angle is inversely weighted by the distance from the central feature point. All the orientation angles for the circular region are accumulated into a 6-bin histogram. We define the dominant orientation of feature point as the most frequent angle. Fig. 7 shows how to find the orientations around the feature point. The grayscales in the left circle mean the Gaussian weights according to the distance from the central feature point. The arrows are the magnitudes of x and y responses of Haar filters.

Fig. 7. Orientation assignment. The grayscale values in the left circle mean the Gaussian weights for the histogram of orientations. The dominant orientation angle is decided from the 6-bin histogram.

The last step of feature extraction is to describe the feature points as a vector structure. This descriptor discerns the feature points. The square region around a feature point is selected for the descriptor. Note that the square is rotated by the dominant orientation before finding the descriptor. The size of square is related to the scale parameter. The square region is divided into 16 subregions, and 25 pixels (5x5) are

sampled in each subregion. And we calculate 4-D vector for every subregion,

( ), , ,x y x yV d d d d= ∑ ∑ ∑ ∑ , (6)

where xd is the difference between adjacent pixel samples in

the x direction, and xd is the absolute value of xd . Based on

4-D vector for each subregion, the descriptor vector to describe the feature point becomes 64-D (4x16) vector. This 64-D descriptor vector is an ID number of each feature point.

B. Feature Matching The corresponding features are searched by the vector

distance between descriptors. When features are extracted for image and object recognition, all pairs of features are examined by the vector distances. Then, the nearest and second nearest features ( 1f and 2f ) are selected. The nearest

feature 1f is matched to the feature f , if the following criterion is satisfied,

1 2γ− ≤ −f f f f , (7)

where (0 1)γ γ< < adjusts how distinctively the features are

matched. As γ is close to zero, f is matched to only 1f . For robust and unique matches, we set γ less than 0.5 in the real system. And we exploit the sign of Laplacian operation for fast feature matching. The sign of Laplacian operation is derived by trace of Hessian matrix. We first search for features by examining the sign of trace of Hessian matrix, and then calculate the distance between feature descriptors.

(a)

(b)

Fig. 8. Feature matching results. (a) The nearest feature matching using (7), (b) Feature matching by homography. The features on the same object surface are correctly matched by homography.

IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY 2009 888

Fig. 8 shows feature matching results. Fig. 8 (a) is the result only using (7), but there are some mismatches. Therfore, we introduce the homography and RANSAC [12] optimization to reduce the errors. The only features that are on the same geometric relation are matched by the homography which is optimally estimated in the RANSAC process. Fig. 8 (b) shows that the mismatched features are eliminated by the homography relation. And we should note that the geometric markers on the top-left position in Fig. 8 are not matched by the features. The proposed recognition method is not confused with the AR toolkit markers. As you can see in Fig. 8, the geometric markers do not look good, so we replace the AR markers with the proposed feature matching.

C. Image and Object Recognition When we have all pairs of matched features, we can

recognize the images or objects. The simplest method is to count the number of matched features. Without loss of generality, the image pairs that have the largest number of matched features are the same. However, as shown in Fig. 8 (a), there are some matching errors, if the similar image patches or repetitive patterns exist in the images. As described before, we use the homography to reduce matching errors. Since the homography reflects the geometric relations of features, it removes such mismatched features that satisfy the matching criterion (7) without geometric correlation.

The homography H is a 3x3 perspective matrix to transform a 2-dimensional (2-D) plane point (x, y) into (x’, y’) [10], [11],

'

11 12 13'

21 22 23

31 321 1 1

x h h h xy h h h y

h h

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟

⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠

, (8)

where the 2-D points are represented by the homogeneous coordinates. Since the freedom of homography H is 8, we have to get at least 4 matched features to find the homography. From the coordinates of corresponding feature points, we get the following system equation,

11

12

13

21

22

23

31

32

1 0 0 00 0 0 1 .

hhh

x y xx yx xh

x y xy yy yhhhh

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

′ ′ ′− −⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′ ′− − =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

We can find an optimal homography by least squares method and RANSAC [12]. The matched features are randomly selected to find the homography using the least squares method. This process is iteratively performed until the estimated homography is optimal.

Fig. 9 shows two example of image recognition. For real situations, we occlude the images or objects partially by the hand. As we can see in Fig. 9, the images are well recognized under various image distortions, such as perspective distortion, luminance difference, scale difference, and occlusion. In Fig. 9, the left images are the database images, and the right images are captured ones by the web camera. The images are well identified regardless of AR markers as shown in Fig. 8 (a). Fig. 10 shows another system of image recognition. We tested the image recognition module by image retrieval system. In Fig. 10 (a) and (c), the left images are query images, and the right images show the database. The query images are captured by the web camera. The right images in Fig. 10 (b) and (d) show the recognition results from the database. For clarity, the features matched for recognition are shown in yellow dots.

(a) (b)

(c) (d)

Fig. 9. Image recognition results. (a) and (c) Original images recognized in the database regardless of AR markers, (b) and (d) Captured images by the web cam. Note that all images in the textbook are not recognized in the proposed method when there are no sufficient features in some images. Thus, we have to evaluate how well the images are recognized before constructing the database. The images that have feature points sufficiently are selected as the database for image recognition. The selection of database image is also related to produce educational scenarios and contents.

S. H. Lee et al.: Interactive E-Learning System Using Pattern Recognition and Augmented Reality 889

(a) (b)

(c) (d)

Fig. 10. Image retrieval results. The features matched for image recognition are indicated in yellow dots in (b) and (d).

V. APPLICATION TO PUBLIC EDUCATION SYSTEM The proposed e-learning system is applied to English and

science courses in the public elementary school. The educational contents providers design some learning scenarios and audio-visual contents. We adapt the educational scenarios to the proposed e-learning system. Some interfaces such as object/menu selection and objects movement are defined for the educational scenarios. The polka-dot and color-band markers act like a computer mouse. The interface of markers is so natural that the students don’t have to learn any kinds of poses in advance. And we construct database images and objects from textbooks for recognition.

(a) (b)

Fig. 11. Example of augmented reality using marker. (a) a video frame captured by the web camera, (b) A visual content is augmented reality on the marker. The visual content is augmented on the marker, thus the marker is not seen on the monitor.

Fig. 11 first shows that a moving graphic is augmented on the color-band marker. The left image (Fig. 11 (a)) is the captured image by the web camera, and the right image (Fig.

11 (b)) shows the augmented reality with graphic contents. The page ID is recognized by the image and objects in the text page. Then, the related audio-visual contents are augmented as the scenarios and student’s interaction. The graphics are displayed above the marker so that the interactive augment reality is naturally performed. The augmented graphic objects also move as the marker moves.

Fig. 12 shows the commercial system and an exemplary image of interactive augmented reality. The proposed interactive e-learning system using augmented reality was applied to the public elementary school in the courses of English and Science. This interactive augment reality made the students have more interest in learning. Therefore, the proposed e-leaning system not only provides with audio-visual contents, but also improves the learning efficiency and concentration of students. The application to public elementary school was satisfactory. It is expected that the proposed e-learning system is very useful in the various educational courses. If we have the authoring tools to develop the educational contents and scenarios easily, we expect that the proposed interactive e-leaning system using augmented reality is rapidly popular in the education system and industry. Further researches are focused on reducing recognition errors in the various camera environments.

(a) (b)

Fig. 12. The proposed e-learning system is applied to the public elementary school. (a) Example of interactive augmented reality using image and marker recognition.

VI. CCLUSIONS This paper has proposed an interactive e-learning system

using recognition algorithms and augmented reality. The proposed e-learning system provides students with realistic audio-visual contents according to the recognition results. When the images in the textbook are identified by the image recognition, the audio contents are played and the visual contents such as graphics animation or movies are augmented on the captured images of web camera. For real-time interactive learning, the polka-dot or color-band markers are designed to indicate some objects in the textbook just like a mouse cursor. The proposed e-learning system has been applied to the public elementary school successfully. It is expected that the proposed e-leaning system becomes popular faster, when recognition errors are reduced and authoring tools are provided to produce the educational contents and scenarios.

IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY 2009 890

ACKNOWLEDGMENT This work is supported by ETRI (Electrons and Telecommunication Research Institute), Development of Elementary Technology for Promoting Digital Textbook and U-Learning Project.

REFERENCES

[1] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed., Prentice Hall Inc., 2002.

[2] D. G. Lowe, “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, vol. 60, pp. 91-110, Nov. 2004.

[3] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “SURF: Speeded up robust features,” Proc. European Conf. on Computer Vision, 2006.

[4] K. Mikoljczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Trans. PAMI, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.

[5] Radu Dnddera, Chun Jia, Voicu Popescu, Cristina Nita-Rotaru, Melissa Dark, and Cynthia S. York, “Virtual classroom extension for effective distance education,” IEEE Computer Graphics and Applications, pp. 64-74, Jan./Feb. 2008.

[6] S. G. Deshpande and J.-N. Hwang, “A real-time interactive virtual classroom multimedia distance learning systems,” IEEE Trans. Multimedia, vol. 3, no. 4, pp. 432-444, Dec. 2001.

[7] Y. Shi, W. Xie, and G. Xu, “Smart remote classroom: Creating a revolutionary real-time interactive distance learning,” Proc. Int’l Conf. Web-Based Learning, LNCS 2436, 2002.

[8] M. J. Lavooy, “Computer mediated communications: Online instruction and interactivity,” Journal of Interactive Learning Research, vol. 14, no. 2, pp. 157-165, June 2003.

[9] Y. Ohta and H. Tamura., Mixed Reality - Merging Real and Virtual Worlds, Springer-Verlag, 1999.

[10] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, Prentice Hall, 2003.

[11] R. Hartley and A. Zisserman, Multiple View Geometry, Cambridge 2001. [12] D. Lowe, “Recognizing panoramas,” Proc. of CVPR, 2003. [13] O. Bimber and R. Raskar, Spatial Augmented Reality, A K Peters, 2005. [14] R. Azuma, “A survey of augmented reality,” Presence: Teleoperators

and Virtual Environments, vol. 6, no. 4, pp.355-385, 1997. [15] R. Azuma, Y. Baillot, R. Behringer, S. Feiner, S. Julier, and B.

MacIntyre, “Recent advances in augmented reality,” IEEE Computer Graphics and Applications, vol. 21, no. 6, pp. 34-47, 2001.

[16] M. Kanbara, N. Yokoya, “Real-time Estimation of Light Source Environment for Photorealistic Augmented Reality,” International Conference on (ICPR'04), vol. 2, pp. 911-914, Aug. 2004.

[17] W. R. Sherman and A. B. Craig, Understanding Virtual Reality, Morgan Kaufmann Publisher, 2003.

[18] H.-C. Lee, Introduction to Color Imaging Science, Cambridge, 2005. [19] AR Toolkit Homepage, http://www.hitl.washington.edu/artoolkit/.

Sang Hwa Lee received the B.S., M.S., and Ph.D. in electrical engineering and computer sciences from Seoul National University, Seoul, Korea, in 1994, 1996, and 2000, respectively. He has joined BK21 information technology, Department of Electrical Engineering and Computer Science, Seoul National University, as a research professor since 2005. His research interests include image and video processing, stereoscopic system, pattern recognition, MRF

modeling, and image-based rendering.

Junyeong Choi received B.S., and M.S., degrees in electrical and computer engineering from Hanyang University, Seoul, Korea in 2007, and 2009, respectively. He is now Ph.D. candidate in Hanyang University. His research interests include augmented reality, human-computer interaction, and affective computing.

Jong-Il Park (M’87) received B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea, in 1987, 1989, and 1995, respectively. From 1996 to 1999, he was with ATR Media Integration and Communication Research Laboratories, Japan. He joined the Department of Electrical and Computer Engineering, Hanyang University, Seoul, Korea, in 1999, where he is currently a Professor. His research interests include

computational imaging, augmented reality, 3D computer vision, and HCI.