Getting machines to watch 3D for you23 September 2011
How can 3D television signals be analysed automatically to provide quality of broadcast service? Mike Knee, consultant engineer, research & development at Snell (pictured) is working on the answer, and provides a short overview of his work here. Running a multi-channel TV installation brings new headaches when 3D is involved. For live monitoring of 2D TV channels, manufacturers have developed automated solutions covering many tasks, such as lip-sync measurement or compression quality estimation. 3D brings a new dimension to monitoring, because we additionally have to check the relationship between the left and right-eye signals. Manual monitoring of 3D is more difficult than 2D because operators need to wear glasses or accept limitations of autostereoscopic displays. So there is a burgeoning interest in automatic monitoring of 3DTV. In this article we look at how various aspects of 3D television signals can be analysed. Format Detection Left and right signals may be packed into a single video channel in many ways. Some formats, such as left/right juxtaposition, are ‘loose packed’ because the two pictures are physically separate. Other formats, such as line interleaving, are ‘close packed’ because corresponding left and right pixels are close together. One way to detect the packing format is to perform a trial unpacking with an assumed format and then detect whether the resulting images appear to be a stereoscopic pair. For loose packed formats, we look for relative similarity between the left and right images when compared with unrelated parts of the picture. For close packed formats, we look for relative differences between the left and right images when compared with adjacent pixels or lines. Depth or Disparity Analysis An important 3D analysis task is to measure the perceived depth of objects in the scene, which depends on disparity (the horizontal distance between left and right representations of the object). In 3D monitoring, we measure disparity and relate it to perceived depth for different display configurations. The most important use of disparity measurement is to provide a warning if the viewer is likely to suffer eye strain. It can also be used to verify that the sequence really is 3D, to detect and correct for geometric distortions between the two channels, and to assist in the insertion of captions or subtitles at suitable depths. One class of disparity measurement methods involves correlating the left and right images to generate a sparse disparity map. This approach is ideal for looking at the behaviour of different objects in the scene and for determining whether limits have been exceeded. Other methods generate a dense disparity map – a disparity value for every pixel. This approach would be necessary if the measurement were being used to drive post-processing, for example to change the effective camera spacing. Left-Right Swap Detection If the left and right images are inadvertently swapped, the result is disturbing, though it is not always obvious what is wrong. It would be useful to detect the swap automatically. A disparity map is a good starting point, but a 3D pair will often exhibit both negative and positive disparity values. So a simple disparity histogram analysis, for example, would not be enough. One approach is based on the spatial distribution of disparity values. Objects at the centre and bottom of the screen are generally nearer than objects at the top and sides. A left-right swap detector could correlate measured disparity with a template of expected values to see which way round gives the better match. A better method is based on the observation that closer objects occlude more distant objects. Occluded regions extend to the left of transitions in the left-eye view and to the right in the right-eye view. This observation enables us to determine statistically which view is which. 2D to 3D Conversion Detection In the rush to deliver 3D content, it is tempting to use 2D to 3D conversion. Some automatic conversion is impressive, but concern remains that over-use of simple conversion algorithms may undermine the appeal of 3DTV. So it would be desirable when monitoring 3D content to detect the possible use of a converter. One simple 2D to 3D conversion technique is to apply a fixed spatial disparity profile. Another technique is to introduce delay between two versions of the same moving sequence to give an impression of depth depending on motion. The use of these techniques can be detected using a combination of fingerprint comparison, temporal alignment and disparity estimation. One can envisage a game of ‘cat and mouse’ whereby detection algorithms become ever more sophisticated in order to keep up with the increasing complexity of automatic 2D to 3D converters. In this article I have shown how we can get machines to watch 3D for us so that humans can concentrate on delivering and watching 3D content. At Snell, we are active in developing and implementing these algorithms for monitoring and correction of 3D video across our product range.