3D camera technology for computer vision and Metaverse Applications

Rajesh Uppal January 24, 2023 Photonics Comments Off on 3D camera technology for computer vision and Metaverse Applications 192 Views

A regular (non 3D) camera records images either as a still photograph or as moving images known as videos or movies, which in turn are stored in a physical medium such as in a digital system or on photographic film. A camera comprises a lens which focuses light from the scene, and a camera body which holds the image capture mechanism.

Digital (color) cameras capture light that enters the optical system through a hole known as aperture. Internally, a grid (or an array) of photosensors returns the electrical current for each sensor, when it is struck by the incoming light. The different current levels are combined into a composite pattern of data that represents the light, ultimately encoded with three different channels known as RGB (reg, green, blue).

A movie camera or a video camera operates in a fashion similar to the still camera, except that it records a series of static images in rapid succession, usually at a rate of 24 frames per second. When the images are run back to back in rapid speed, it gives us the illusion of motion.

A 3D camera is an imaging device that enables the perception of depth in images to replicate three dimensions as experienced through human binocular vision. Some 3D cameras use two or more lenses to record multiple points of view, while others use a single lens that shifts its position.

A 3D camera provides 3D data related to the objects that are within its spectrum of visibility. Some of them encode (for each pixel) the distance (depth) to the objects, while others output 3D coordinates. In some cases, color information can be appended to it, to obtain what is commonly known as RGBD or RGBXYZ data. Nevertheless, this often requires two cameras – one which outputs color and one which outputs 3D data – which are in-sync and calibrated (i.e., a roto-translation between them is known, as well as the lens distortion parameters).

The principle behind the 3-D effect is called stereoscopy and the corresponding technology is known as stereoscopic imaging. The difference between objects seen through the left and right eyes (binocular disparity) is what enables perspective in human eyesight, along with our accommodation through focusing and visual center interpretation to integrate those two perspectives. The combination of the two perspectives, as with the slightly different perspective of two human eyes, is what enables depth perception.

In recent years, 3D cameras have been increasingly used in computer vision applications. Fields such as robotics, autonomous driving, quality inspection etc., nowadays benefit from the availability of different types of 3D sensing technologies. Leveraging the extra data provided by such sensors allows for better performance on tasks such as detection and recognition, pose estimation, 3D reconstruction and so forth.

3D photography can enable an immersive frozen-in-time moment for stills or video content that seems real enough to touch. 3D TV and movies have traditionally worked by alternating frames, two separate images – one for the right eye and one for the left – that are incorporated through the use of specialized glasses. Another technology known as autostereoscopic imaging (auto 3-D) is screen-based and does not require viewers to wear special glasses.

3D content created through dual capture can be viewed on all manner of 3D displays, including 3D TVs and monitors that use special glasses as well as displays like Nintendo 3DS. As 3D and virtual reality (VR) displays become more common, so will user-generated 3D content. 3D digital cameras, webcams and camcorders will provide a means of producing that content.

One current 3D camera, Intel’s RealSense device, comprises one 1080p camera and one infrared (IR) camera that works along with an IR laser projector to accurately measure 3D space. RealSense cameras are embedded in select laptops and desktop computers from Dell, HP and Lenovo.

Time-of-Flight (ToF) 3D camera

Time-of-flight uses of an artificial light signal (provided by a laser or an LED) that travels between the camera and the object. This technology measures the time it takes for the light to travel to and bounce off objects, in order to estimate distance.

In general, they provide denser 3D data (compared to stereo vision), while operating at larger frame rates. They are also easier to configure compared to stereo cameras, since no point correspondence matching is needed.

However, some limitations of this technology include relatively low resolutions and the sensitivity to the reflectance properties of the object’s material. Therefore, suboptimal results are often obtained with objects that are either transparent or highly reflective.

Some popular cameras from this category are DepthSense from SoftKinetic, Kinect from Microsoft, Senz3D by Creative Labs, Helios from LUCID Vision Labs etc.

Structured light

Structured light cameras project a known pattern onto the object located within the field of view and 3D data is obtained by analyzing the deformation of its projection.

Such cameras usually provide superior quality to stereo vision cameras for close distances, since the projection of the pattern provides a clear, predefined texture needed for point-to-point correspondence matching.

However, since correspondence matching is still an expensive operation, they often operate at lower frame rates compared to other cameras (such as ToF) and require tuning of the matching parameters.

Some popular cameras from this category are the RealSense line from Intel and Ensenso from IDS Imaging.

Laser triangulation

Laser triangulation technology uses a 2D camera and a laser light source. The latter projects a line onto the object in the field of view, whose projection deforms under the curvature of the object’s surface. Using the 2D camera, the position coordinates of the lines in multiple images are determined, which allows for the calculation of the distance between the object and the laser light source.

Laser triangulation has multiple advantages over other 3D sensing technologies. It provides high quality data without a lot of computational burden. Also, lasers are relatively cheap, have a very long lifetime and can cover a large spectrum of material properties.

However, due to the need for scanning the object, the process can be slow, making it impractical for real-time applications.

Stanford engineers enable simple cameras to see in 3D, reported in March 2022

Standard image sensors, like the billion or so already installed in practically every smartphone in use today, capture light intensity and color. Relying on common, off-the-shelf sensor technology – known as CMOS – these cameras have grown smaller and more powerful by the year and now offer tens-of-megapixels resolution. But they’ve still seen in only two dimensions, capturing images that are flat, like a drawing – until now.

Researchers at Stanford University have created a new approach that allows standard image sensors to see light in three dimensions. That is, these common cameras could soon be used to measure the distance to objects. The engineering possibilities are dramatic. Measuring distance between objects with light is currently possible only with specialized and expensive lidar – short for “light detection and ranging” – systems. If you’ve seen a self-driving car tooling around, you can spot it right off by the hunchback of technology mounted to the roof. Most of that gear is the car’s lidar crash-avoidance system, which uses lasers to determine distances between objects.

Lidar is like radar, but with light instead of radio waves. By beaming a laser at objects and measuring the light that bounces back, it can tell how far away an object is, how fast it’s traveling, whether it’s moving closer or farther away and, most critically, it can calculate whether the paths of two moving objects will intersect at some point in the future.

“Existing lidar systems are big and bulky, but someday, if you want lidar capabilities in millions of autonomous drones or in lightweight robotic vehicles, you’re going to want them to be very small, very energy efficient, and offering high performance,” explains Okan Atalar, a doctoral candidate in electrical engineering at Stanford and the first author on the new paper in the journal Nature Communications that introduces this compact, energy-efficient device that can be used for lidar.

For engineers, the advance offers two intriguing opportunities. First, it could enable megapixel-resolution lidar – a threshold not possible today. Higher resolution would allow lidar to identify targets at greater range. An autonomous car, for example, might be able to distinguish a cyclist from a pedestrian from farther away – sooner, that is – and allow the car to more easily avoid an accident. Second, any image sensor available today, including the billions in smartphones now, could capture rich 3D images with minimal hardware additions.

One approach to adding 3D imaging to standard sensors is achieved by adding a light source (easily done) and a modulator (not so easily done) that turns the light on and off very quickly, millions of times every second. In measuring the variations in the light, engineers can calculate distance. Existing modulators can do it, too, but they require relatively large amounts of power. So large, in fact, that it makes them entirely impractical for everyday use.

The solution that the Stanford team, a collaboration between the Laboratory for Integrated Nano-Quantum Systems (LINQS) and ArbabianLab, came up with relies on a phenomenon known as acoustic resonance. The team built a simple acoustic modulator using a thin wafer of lithium niobate – a transparent crystal that is highly desirable for its electrical, acoustic and optical properties – coated with two transparent electrodes.

Critically, lithium niobate is piezoelectric. That is, when electricity is introduced through the electrodes, the crystal lattice at the heart of its atomic structure changes shape. It vibrates at very high, very predictable and very controllable frequencies. And, when it vibrates, lithium niobate strongly modulates light – with the addition of a couple polarizers, this new modulator effectively turns light on and off several million times a second.

“What’s more, the geometry of the wafers and the electrodes defines the frequency of light modulation, so we can fine-tune the frequency,” Atalar says. “Change the geometry and you change the frequency of modulation.”

In technical terms, the piezoelectric effect is creating an acoustic wave through the crystal that rotates the polarization of light in desirable, tunable and usable ways. It is this key technical departure that enabled the team’s success. Then a polarizing filter is carefully placed after the modulator that converts this rotation into intensity modulation – making the light brighter and darker – effectively turning the light on and off millions of times a second.

“While there are other ways to turn the light on and off,” Atalar says, “this acoustic approach is preferable because it is extremely energy efficient.”

Practical outcomes

Best of all, the modulator’s design is simple and integrates into a proposed system that uses off-the-shelf cameras, like those found in everyday cellphones and digital SLRs. Atalar and advisor Amin Arbabian, associate professor of electrical engineering and the project’s senior author, think it could become the basis for a new type of compact, low-cost, energy-efficient lidar – “standard CMOS lidar,” as they call it – that could find its way into drones, extraterrestrial rovers and other applications.

The impact for the proposed modulator is enormous; it has the potential to add the missing 3D dimension to any image sensor, they say. To prove it, the team built a prototype lidar system on a lab bench that used a commercially available digital camera as a receptor. The authors report that their prototype captured megapixel-resolution depth maps, while requiring small amounts of power to operate the optical modulator.

Better yet, with additional refinements, Atalar says the team has since further reduced the energy consumption by at least 10 times the already-low threshold reported in the paper, and they believe several-hundred-times-greater energy reduction is within reach. If that happens, a future of small-scale lidar with standard image sensors – and 3D smartphone cameras – could become a reality.

II-VI and Artilux Demonstrate 3D Camera Aimed at Metaverse, reported in July 2022

in July 202, Semiconductor laser specialist II-VI and imaging technology company Artilux demonstrated a next-generation 3D camera with longer range and higher image resolution, geared toward enhancing user experience in the metaverse.

Investments in the metaverse infrastructure are accelerating and driving the demand for sensors that enable more realistic and immersive virtual experiences. II-VI and Artilux combined their proprietary technologies in indium phosphide (InP) semiconductor lasers and GeSi sensor arrays, respectively, to demonstrate a miniature 3D camera that operates in the short-wavelength infrared (SWIR), at 1380 nm, granting significantly higher performance than existing cameras operating at 940 nm.

“The longer infrared wavelength provides better contrasts and reveals material details that are otherwise not visible with shorter-wavelength illumination, especially in outdoor environments,” said Julie Sheridan, senior vice president of II-VI’s optoelectronic devices and modules business unit. A 3D camera developed by II-VI and Artilux offers metaverse users greater resolution and longer range. Courtesy of II-VI and Artilux.

Designing a camera operating at 1380 nm as opposed to 940 nm enables greater brightness while remaining well within the margins of eye safety requirements, she said. Background light interference is also reduced, which greatly improves the signal-to-noise ratio, as well as enabling longer range and better image resolution.

According to Artilux CTO Neil Na, the miniature SWIR 3D camera can be seamlessly integrated into next-generation consumer devices. “The SWIR camera demonstration provides a glimpse of the future of 3D sensing in the metaverse, with displays that can identify, delineate, classify, and render image content, or with avatars that can experience real-time eye contact and facial expressions,” he said.

II-VI provided the highly integrated SWIR illumination module comprising InP edge-emitting lasers that deliver up to 2 W of output power and optical diffusers, in surface-mount technology packages for low-cost and high-quality assembly. Artilux’s camera features a high-bandwidth and high-quantum-efficiency GeSi SWIR sensor array based on a scalable CMOS technology platform. Combined, the technologies enable a broad range of depth-sensing applications in consumer and automotive markets.