An explanation of different 3D scanners


There are a few different 3D scanners which operate with varying levels of accuracy. There are scanners that are cheap and portable but produce low-quality results. There are incredibly high-definition scanners that are expensive and only capture objects of a certain size. There are scanners that are great for inanimate objects but terrible for people and animals. I will list the different scanners that are used to capture people but first I will explain what sets the scanners apart.

There are 3 important factors to consider with 3D scanners:

3D scanner types

There are a few common types of 3D scanners out there. I will briefly explain each one using human scanning as an example. Humans are difficult to scan because of their size and because they are animated.

Single camera photogrammetry

Another process used for 3D scanning is photogrammetry. This is where photos are aligned in 3D space using features in the photos. From there you can produce a mesh and texture. Its basically panorama for 3D. The downside is that this is reliant on features, but, the clearer the photos, the more features you can extract from it. With any kind of camera you can take 80 photos of a person from different angles then create a 3D scan just from that. However, a better camera will bring out more features so you will have more accurate topology and a clearer texture. The downside of using one camera is that the subject needs to keep still for a few minutes. This will introduce a lot of motion distortion.

Multiple camera photogrammetry

The purpose of this is to capture the person from all angles within the smallest possible time-frame. This is a setup that relies on at least 60 cameras which are all synced together. Having multiple cameras is used for near-instantaneous capture of people which massively improves accuracy. The speed of capture is no more than 1 second. The faster the sync speed is the more light that's required so the limit really is the amount of discomfort in their eyes a model is willing to bear.

Raspberry PI cameras

These are cameras that can be attached to the Raspberry PI processing boards. This is a cheap solution which requires less equipment. There are usually around 100 cameras in this system but the cameras themselves are quite cheap and light-weight. That also makes it easier to transport. Its common for these setups to include projectors. The texture resolution is 8 mega-pixels which is considered quite low. The projectors are used to overlay a random pattern over the person. This pattern adds features which helps produce a more accurate mesh. The downside is that 2 sets of photos need to be taken, one with the projectors on (for the mesh) and another with the projectors off (for the texture). A single shot from the cameras takes around 1/5th of a second. With the project its at least 0.5 seconds.

Compact cameras

This is a scanner with 72 Canon Powershot A2500 cameras. It takes about 1 second to capture the whole body. The resolution is 16.1 megapixels. This is a rather affordable 3D scanner to build, which is lightweight, easy to transport and produces pretty good results. The final result appears quite accurate to the users, but the raw mesh usually has inaccurate or corrupted topology which requires a lot of manual labour to clean-up.

DSLR cameras

This is the grand-daddy of them all. You could even capture people while they are in motion. A typical DSLR camera has more than 21 megapixels. A lot of these setups use the same technique as high-speed photography to achieve an incredibly fast shutter speed. This is done by having the scanner in a dark room, opening the shutters of the cameras, triggering a flash, then closing the shutters of the cameras. This effectively makes the speed of the shutter the same speed of the flashes. This can be from 1/600th of a second to 1/80,000th of a second. However, the faster the flash, the more flashes you will need to boost the light intensity. So the real limit is how much light you can use before it makes the model's eyes hurt. The initial topology here would consist of millions of polygons (from 3 million to 7 million) and the texture would have the same clarity as a professional photo taken by a DSLR camera. The final result is very accurate and processing may be quick because the raw mesh already has a lot of detail and does not require much clean-up.