KTU scientists propose consumer-friendly AI solution for 3D human shape reconstruction

Photo illustrating the news item

In the near future, the physicians will be able to monitor the patients at their homes using virtual reality tools, computer scientists are convinced. A team of researchers from Kaunas University of Technology (KTU), Lithuania, proposed a deep-learning-based method for the three-dimensional human shape reconstruction when the original figure is only partly visible. The main advantage of the method is its relatively low cost, high compression of the images obtained and easy integration with the existing virtual reality tools. The method was developed using a real-world dataset, the clinical trial is pending.

The rapid advancements in computer vision and three-dimensional object representation enables the development of virtual reality tools and the expansion of their application sphere. Although today virtual reality applications are mainly limited to entertainment or (rarely) educational purposes, the demand for three-dimensional image reconstruction is raising in many fields, including medicine.

Maskeliunas KTU
Dr Rytis Maskeliūnas, KTU Faculty of Informatics
“I am convinced that the majority of the future formats of communication will involve virtual reality – be it a visit to a doctor or exploring the setting of a film that one is watching. Today we already have “holographic-like” video conferencing systems, which allow the participants of a meeting to talk to each other as if they were interacting in real life,” says Dr Rytis Maskeliūnas, chief researcher at KTU Department of Multimedia Engineering.

However, the main drawbacks of the currently used three-dimensional image reconstruction solutions are the complicated set-up of multiple cameras and the computational power required to process the image, which often makes the full object reconstruction impractical and too expensive.

AI was used to recreate the invisible areas
To address these issues, a team of computer scientists from KTU headed by Maskeliūnas proposed a deep-learning-based method that can reconstruct a full human posture point cloud from a depth view. A three-staged adversarial deep neural network was applied to deal with depth sensor noise, and perform the refining of depth sensor data for full 3D human shape reconstruction.

For the experiment, a real-world dataset containing recordings of multiple subjects performing physical rehabilitation exercises was recorded. Two depth cameras (sensors) were used to film the subjects from the front and from the side.

“A camera sees only a part of the image: if it is filming the frontal view, the view from the back is invisible; if something is blocking the view, the camera cannot see what’s behind. Therefore, we employ artificial intelligence which reconstructs the invisible parts of the image,” explains Maskeliūnas.

A five-stage training approach was adopted for training artificial intelligence. The image completion results were validated by using expert knowledge. It was observed that the network has reconstructed the result with a few flaws, most of which occurred near the end of the limbs.

The proposed solution is the continuation of several applications that Maskeliūnas and his team are currently developing for the medical field.

Consumer-friendly approach
According to Maskeliūnas, in health care, the three-dimensional image of the person is crucial when there is a need to diagnose various traumas related to spinal injuries, the issues that may be caused by incorrect posture and for various other purposes.

“For example, a doctor may ask their patient to perform a simple task, such as touching their nose or rotating their shoulder. To fully see how the person bends, twists and in how their posture is changing, the physician needs to see them as a three-dimensional subject, to be able to look at them from all the sides and angles,” says Maskeliūnas.

I am convinced that the majority of the future formats of communication will involve virtual reality – be it a visit to a doctor or exploring the setting of a film that one is watching.

– Rytis Maskeliūnas, senior researcher at KTU

In the experiment, a usual commercially available depth camera, providing a three-dimensional point cloud image was used. The two cameras used in developing the solution, provide a more comprehensive view for the training of AI algorithms, minimising the incidences of “learning” to reconstruct occluded areas incorrectly. According to Maskeliūnas, the availability of the tools and the variety of applications with which the proposed solution can easily be integrated can make the developed method a preferred approach for three-dimensional image reconstruction.

“Telemedicine, including remote diagnostics is becoming more and more popular. The methods of rendering a real-world experience, which don’t require extensive resources or complicated equipment have a great potential for future applications,” says Maskeliūnas.

The above-described study was published in IEEE Sensors Journal, on November 1, 2021D

24 Jan 2022