JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN INFORMATION TECHNOLOGY

ENHANCED AUGMENTED REALITY TECHNOLOGY AND ASSOCIATED APPLICATIONS

1 AKHIL KHARE, 2 ANKIT RANJAN,3 PALLAVI KHARE

1Assistant Professor, Department Of Information Technology

Bharati Vidyapeeth College Of Engineering, Pune, India.

2 Research Student, Department Of IT, BVUCOE Pune, India.

3Research Student, Department Of ETC, SSSIST Bhopal, India.

1

ABSTRACT:Augmented Reality (AR) is a growing area in virtual reality research. The world environment around us provides a wealth of information that is difficult to duplicate in a computer. This is evidenced by the worlds used in virtual environments. An augmented reality system generates a composite view for the user. It is a combination of the real scene viewed by the user and a virtual scene generated by the computer that augments the scene with additional information. In all those applications the augmented reality presented to the user enhances that person's performance in and perception of the world. The ultimate goal is to create a system such that the user cannot tell the difference between the real world and the virtual augmentation of it. To the user of this ultimate system it would appear that he is looking at a single real scene.

Keywords: Virtual Reality, Virtual Environments, Augmented Virtuality

ISSN: 0975 –6698| NOV 09 TO OCT 10| Volume 1, Issue 1 Page 1

JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN INFORMATION TECHNOLOGY

1. INTRODUCTION

Augmented Reality (AR) is a variation of Virtual Environments (VE), or Virtual Reality as it is more commonly called. VE technologies completely immerse a user inside a synthetic environment. While immersed, the user cannot see the real world around him. In contrast, AR allows the user to see the real world, with virtualobjects superimposed upon or composited with the real world. Therefore, AR supplements reality, rather than completely replacing it. Ideally, it would appear to the user that the virtual and real objects coexisted in the same space. Some researchers define AR in a way that requires the use of Head-Mounted Displays (HMDs). To avoid limiting AR to specific technologies, this survey defines AR as systems that have the following three characteristics:

1) combines real and virtual

2) Interactive in real time

3) Registered in 3-D

a real environment in 3-D, but they are not interactive media. 2-D virtual overlays on top of live video can be done at interactive rates, but the overlays are not combined with the real world in 3-D. However, this definition does allow monitor3 based interfaces, monocular systems, see-through HMDs, and various other combining technologies.

2. DEVELOPMENT PRINCIPLES

There are two parts to developing applications that use ARToolKit[4]; writing the application, and training image-processing routines on the real world markers that will be used in the application. Writing an application with ARToolKit is very easy: a simple outline is used for creating an AR application. We base on it for writing a new application. Similarly, training pattern phase is largely simplified with the use of simple tool.

2.1 Algorithm for the Development of Application

Initialization / 1. Initialize the video capture and read in the marker pattern files and camera parameters.
Main Loop / 2. Grab a video input frame.
3. Detect the markers and recognized patterns in the video input frame.
4. Calculate the camera transformation relative to the detected patterns.
5. Draw the virtual objects on the detected patterns.
Shutdown / 6. Close the video capture down.

Steps 2 through 5 are repeated continuously until the application quits, while steps 1 and 6 are just performed on initialization and shutdown of the application respectively[2]. In addition to these steps the application may need to respond to mouse, keyboard or other application specific events.

2.2 Augmented Reality vs. Virtual Reality

Virtual reality is a technology that encompasses a well-known idea. It defines an umbrella under which many researchers and companies express their work[1]. one of the original companies selling virtual reality systems. The term was defined as "a computer generated, interactive, three-dimensional environment in which a person is immersed." There are three key points in this definition. First, this virtual environment is a computer generated three-dimensional scene which requires high performance computer graphics to provide an adequate level of realism. The second point is that the virtual world is interactive. A user requires real-time response from the system to be able to interact with it in an effective manner. The last point is that the user is immersed in this virtual environment. One of the identifying marks of a virtual reality system is the head mounted display worn by users. These displays block out all the external world and present to the wearer a view that is under the complete control of the computer. The user is completely immersed in an artificial world and becomes divorced from the real environment. For this immersion to appear realistic the virtual reality system must accurately sense how the user is moving and determine what effect that will have on the scene being rendered in the head mounted display. The real world and a totally virtual environment are at the two ends of this continuum with the middle region called Mixed Reality. Augmented reality lies near the real world end of the line with the predominate perception being the real world augmented by computer generated data. identify systems which are mostly synthetic with Augmented virtuality is a term created to

some real world imagery added such as texture mapping video onto virtual objects[3]. This is a distinction that will fade as the technology improves and the virtual elements in the scene become less distinguishable from the real ones.

3. TYPICAL AUGMENTED REALITY SYSTEM

A standard virtual reality system seeks to completely immerse the user in a computer generated environment. This environment is maintained by the system in a frame of reference registered with the computer graphic system that creates the rendering of the virtual world. For this immersion to be effective, the ego centered frame of reference maintained by the user's body and brain must be registered with the virtual world reference. This requires that motions or changes made by the user will result in the appropriate changes in the perceived virtual world. Because the user is looking at a virtual world there is no natural connection between these two reference frames and a connection must be created. An augmented reality system could be considered the ultimate immersive system. The user can not become more immersed in the real world. The task is to now register the virtual frame of reference with what the user is seeing. This registration is more critical in an augmented reality system because we are more sensitive to visual misalignments than to the type of vision-kinesthetic errors that might result in a standard virtual reality system. Figure shows the multiple reference frames that must be related in an augmented reality system. The scene is viewed by an imaging device, which in this case is depicted as a video camera. The camera performs a perspective projection of the 3D world onto a 2D image plane. The intrinsic (focal length and lens distortion) and extrinsic (position and pose) parameters of the device determine exactly what is projected onto its image plane[5]. The generation of the virtual image is done with a standard computer graphics system. The virtual objects are modeled in an object reference frame. The graphics system requires information about the imaging of the real scene so that it can correctly render these objects. This data will control the synthetic camera that is used to generate the image of the virtual objects. This image is then merged with the image of the real scene to form the augmented reality image.

Figure - Components of an Augmented Reality System

The video imaging and graphic rendering described above is relatively straight forward. The research activities in augmented reality center around two aspects of the problem. One is to develop methods to register the two distinct sets of images and keep them registered in real time. Some new work in this area has started to make use of computer vision techniques. The second direction of research is in display technology for merging the two images. An augmented reality system can be viewed as a collection of the related reference frames shown in Figure. Correct registration of a virtual image over the real scene requires the system to represent the two images in the same frame of reference.

4. RESEARCH SO FAR

An augmented reality system can be viewed as a collection of the related reference frames An augmented reality system can be viewed as a collection of the related reference frames . Most previous work in augmented reality has used that Euclidean system and carefully controlled and measured the relationships between these various reference frames. There has only been a small amount of work that tries to mitigate or eliminate the errors due to tracking and calibration by using image processing of the live video data The problem of registering the virtual objects over the live video is solved as a pose estimation problem. By tracking feature points in the video image these systems invert the projection operation performed by the camera and estimate the camera's parameters. The fields of computer vision, computer graphics and user interfaces are actively contributing to advances in augmented reality systems. There are two performance measures that are critical for the proper operation of an augmented reality system. First, the system must operate in real-time which we define here to be a system frequency response of at least 10 Hz. Not meeting this real-time requirement will be visibly apparent in the augmented display. Too slow update rate will cause the virtual objects to appear to jump in the scene. There will be no capability to represent smooth motion. A lag in the system response time will show as the object slipping behind the moving real scene during faster motion and then returning into place when the scene motion stops.

The second performance measure is accuracy of a fine reprojection. This technique assumes that the perspective camera that is viewing the scene can be approximated by a weak perspective camera. As the camera, or affine frame, moves the changes in the projection process can be approximated by affine transformations. While this assumption is maintained the affine representation used for all components of the system will remain invariant. Any violation of the affine assumption will result in errors in the reprojection of the virtual object both in its location and shape. The affine approximation is valid under the following conditions: the distance to the object is large compared to the focal length of the lens, the difference in depth between the nearest and farthest points on the object is small compared to the focal length of the lens, and finally, the object is close to the optical axis of the lens. In addition, the affine projection matrix is computed directly from the projected locations of the affine basis points. If the trackers do not correctly report the location of the feature points in a new view of the scene then this will be reflected as errors in the affine projection matrix and result in errors in the virtual object projection.

Experiments were performed to test the performance of the system against both of this criterion. One of the concerns was the update rate attainable using sockets for communications with the tracker and graphics server. The tracker is executing on the same machine as the Matlab control script so it should be isolated from network delays. The commands going to the graphics server do go across the network however. The experiments showed that the graphics system could render the virtual image at a maximum rate of 60 Hz when running stand alone. Within the framework of the Matlab script[8] that drives the system we were seeing a much slower update rate. Each pass through the main loop performs three operations with the graphics system: send a new affine projection matrix, request rendering of the scene and wait for acknowledgment of completion before continuing. If each of these is performed as a separate command sent to the graphics server the system executes at speeds as slow as 6 Hz depending on network activity. This is not acceptable for an augmented reality system. A modification was made to the system so that multiple commands can be sent in a single network packet. When these three commands were put into a single network transmission, the cycle time markedly improved to 60 Hz. Needless to say, the system is now operated in a mode where as many commands as possible are transmitted in single network packets to the graphics system.

5. FUTURE SCOPE

It is important to note that augmented reality is a costly development in technology. Because of this, the future of AR is dependent on whether or not those costs can be reduced in some way. If AR technology becomes affordable, it could be very widespread but for now major industries are the sole buyers that have the opportunity to utilize this resource.

  • Expanding a PC screen into the real environment: program windows and icons appear as virtual devices in real space and are eye or gesture operated, by gazing or pointing.
  • Virtual devices of all kinds, e.g. replacement of traditional screens, control panels, and entirely new applications impossible in "real" hardware, like 3D objects interactively changing their shape and appearance based on the current task or need.
  • Enhanced media applications, like pseudo holographic virtual screen[9], virtual surround cinema, virtual 'holodecks' (allowing computer-generated imagery to interact with live entertainers and audience)
  • Replacement of cell phone and car navigator screens: eye-dialing, insertion of information directly into the environment,
  • Virtual plants, wallpapers, panoramic views, artwork, decorations, illumination etc., enhancing everyday life.
  • With AR systems getting into mass market, we may see virtual window dressings, posters, traffic signs, Christmas decorations, advertisement towers and more.
  • Virtual gadgetry becomes possible. Any physical device currently produced to assist in data-oriented tasks (such as the clock, radio, PC, arrival/departure board at an airport, stock ticker, PDA, PMP, informational posters/fliers/billboards, in-car navigation systems, etc.) could be replaced by virtual devices that cost nothing to produce aside from the cost of writing the software.
  • Subscribable group-specific AR feeds. For example, a manager on a construction site could create and dock instructions including diagrams in specific locations on the site. The workers could refer to this feed of AR items as they work. AR systems can help the visually impaired navigate in a much better manner (combined with a text-to-speech software).

6. RELATED TECHNOLOGY

6.1 Hardware: The main hardware components for augmented reality are: display, tracking, input devices, and computer. Combination of powerful CPU, camera, accelerometers, GPS and solid state compass are often present in modern Smartphone’s, which make them prospective platforms for augmented reality.

6.2 Display: There are three major display techniques for Augmented Reality:

  1. Head Mounted Displays
  2. Handheld Displays
  3. Spatial Displays

6.2.1 Head Mounted Displays: A Head Mounted Display (HMD) places images of both the physical world and registered virtual graphical objects over the user's view of the world. The HMD's are either optical see-through or video see-through in nature[6].

6.2.2 Handheld Displays: Handheld Augment Reality employs a small computing device with a display that fits in a user's hand. All handheld AR solutions to date have employed video see-through techniques to overlay the graphical information to the physical world.

6.2.3 Spatial Displays: Instead of the user wearing or carrying the display such as with head mounted displays or handheld devices; Spatial Augmented Reality (SAR) makes use of digital projectors to display graphical information onto physical objects. The key difference in SAR[5] is that the display is separated from the users of the system

6.3 Tracking: Modern mobile augmented reality systems use one or more of the following tracking technologies: digital cameras and/or other optical sensors, accelerometers, GPS, gyroscopes, solid state compasses, RFID, wireless sensors. Each of these technologies have different levels of accuracy and precision[8]. Most important is the tracking of the pose and position of the user's head for the augmentation of the user's view.

Input devices: This is a current open research question. Some systems, such as the Tinmith system, employ pinch glove techniques [11]. Another common technique is a wand with a button on it. In case of Smartphone, phone itself could be used as 3D pointing device, with 3D position of the phone restored from the camera images.

6.4 Computer: Camera based systems require powerful CPU and considerable amount of RAM for processing camera images. Wearable computing systems employ a laptop in a backpack configuration. For stationary systems a traditional workstation with a powerful graphics card. Sound processing hardware could be included in augmented reality systems

. 6.5 Software: For consistent merging real-world images from camera and virtual 3D images, virtual images should be attached to real-world locations in visually realistic way. That means a real world coordinate system, independent from the camera, should be restored from camera images. That process is called Image registration and is part of Azuma's definition of Augmented Reality. Augmented reality image registration uses different methods of computer vision, mostly related to video tracking. Many computer vision methods of augmented reality are inherited form similar visual odometry methods. Usually those methods consist of two parts. First interest points, or fiduciary markers, or optical flow detected in the camera images. First stage can use Feature detection methods like Corner detection, Blob detection, Edge detection or thresholding and/or other image processing methods. In the second stage, a real world coordinate system is restored from the data obtained in the first stage. Some methods assume objects with known 3D geometry(or fiduciary markers) present in the scene and make use of those data. In some of those cases all of the scene 3D structure should be precalculated beforehand. If not all of the scene is known beforehand SLAM technique could be used for mapping fiduciary markers/3D models relative positions[3]. If no assumption about 3D geometry of the scene made structure from motion methods are used. Methods used in the second stage include projective(epipolar) geometry, bundle adjustment, rotation representation with exponential map, kalman and particle filters.