Answers to Exercises, Chapter 6
These are answers to the exercises in the 3rd edition of Digital Multimedia (published February 2009) only. Do not try to use them in conjunction with the 2nd edition.
Test Questions
- Broadcast video is interlaced because the analogue TV channels that were in use when the standards were developed did not provide sufficient bandwidth to transmit the contents of an entire video frame at the required rate. Therefore, the frame was divided into two fields, each comprising alternate lines of the frame, which could be transmitted and displayed in succession. Although modern TV equipment is capable of displaying entire frames at once, the broadcast industry is shackled by the standards developed in the past for analogue signals. In contrast, Web video does not suffer from these constraints, because Web video is displayed on computer monitors, not television sets. Interlacing is unnecessary and, because of the complications it creates, undesirable, so it is not used on the Web.
- A PAL frame contains 576 lines, an NTSC frame contains 480. In digital video, these are also the numbers of pixels vertically. Both frames have 720 pixels horizontally in CCIR 601, and they both have the same aspect ratio. Suppose (without loss of generality) that a pixel is 1 unit tall, and that the pixels in PAL and NTSC frames are p and n units wide, respectively. Then to make the aspect ratios of PAL and NTSC frames the same, we need 720p/576 = 720n/480, which comes out to p/n = 1.2. In other words, PAL pixels are 1.2 times as wide as NTSC pixels.
- There are two pitfalls in this calculation. First, remember that DV is chrominance sub-sampled. Second, for data rates, M means a million, not 1024*1024, as it does in other contexts.
The answer to Test Question 9 in Chapter 2 shows that the data rate of full-colour uncompressed CCIR 601 video in either PAL or NTSC is 31104000 bytes per second. DV is 4:1:1 sub-sampled, which means that we only sample the colour of every fourth pixel. As the answer to Test Question 10 in Chapter 5 demonstrates, this means that the raw data stream will have half the number of bytes of the full-colour version, so before compression the data rate is 15552000 bytes per second, that is, 124416000 bits per second or 124.416 Mbps, which when divided by 25 is very nearly 5.
- An easy way to get the ratio is to consider 4:4:4 as being unsampled, then just add up the three numbers to see how many samples are being taken in each case. For instance, 4:2:2 means 8 out of each twelve pixels are being sampled, so the image will occupy 2N/3 bytes. For both 4:1:1 and 4:2:0, it occupies N/2.
As you can see in the lower diagram of Figure 6.6, the U and V colour samples in 4:2:0 sub-sampling are on alternate lines of the picture. That is, they will end up in different fields if the video is interlaced. This may not cause a problem with displaying the frames, because interlaced fields are optically mixed anyway, but it could cause a problem in processing the video because the information making up each frame has been split between the fields.
-
(a) In general, the frames before and after a cut will be completely different. A cut usually represents a change of scene or viewpoint. Therefore, the difference between the two frames adjoining the cut will be essentially equal to the complete frame. In this case nothing will be saved by using inter-frame compression. It is normal for software to insert new key frames (I-pictures) after a cut. Motion compensation is of no help.
(b) In contrast, a dissolve often features little or no movement, a static scene just fades (this is a generalisation, of course). When this is the case, inter-frame compression will perform well, because pixel values only change relatively slowly between frames. Motion compensation is irrelevant, though.
(c) We assume that stabilization is not being performed in the camera, so hand-held shots will tend to be shaky. This can seriously disrupt the effectiveness of inter-frame compression, because pixels that correspond to the same part of the picture, and whose values are identical in consecutive frames, will move abruptly as the camera shakes. Motion compensation should counteract this effect.
(d) Zooms also disrupt the effectiveness of inter-frame compression because each pixel may change from one frame to the next, but the change between frames in this case is systematic and can be described algorithmically. ("Digital" zooms can be performed purely by interpolation, after all.) The local motion compensation used in MPEG-2 is of limited help, but global motion compensation is designed precisely to deal with such systematic movement.
(e) During a pan, new parts of the scene constantly enter the frame, but usually there is little or no other movement. As with pans, there will be many changed pixels, but the change is systematic. If interpolated frames can depend on following frames as well as preceding ones (i.e. B-pictures are allowed) the pan can be efficiently compressed using global motion compensation. If you find it helpful to think of global motion compensation as being similar to interpolation, you will appreciate that, in order to compute intermediate frames, we need to know the contents of the final frame. Similalry, to perform motion compensation, we need to be able to look ahead to pixels that will come into frame next.
- This is simply because of inter-frame compression. Because the video frames form a sequence, only some parts of consecutive frames will be different – those where movement occurs. Hence, as we describe in detail in the text, only key frames (I-pictures) need to be stored in full; various methods can be used to store only the difference between consecutive frames in between them. The difference frames will be mostly empty (zero values) so they can be compressed much more effectively than the key frames. If the 750 images are compressed independently, each frame is effectively a key frame, and must include data for each pixel, so it will not be possible to compress the sequence as effectively as we can by making use of data from preceding frames.
Note that we did not include the word "usually" in this question. A video clip in which every frame was entirely different would not be watchable, because the frames are displayed too quickly. Note also that chrominance sub-sampling is not the answer here: JPEG images are usually sub-sampled, as we noted in Chapter 4.
- This is a bit messy to draw, so we'll leave that to you. There should be arrows running from right to left from each P to the preceding I or P, whichever is nearer, and arrows running in both directions from each B to the nearest I or P before and after. (There's nothing difficult conceptually, but drawing a neat picture is a challenge.)
The bitstream order of the first 19 frames is IPBBPBBIB BPBBPBBIB B. The first group is special because it must begin with an I-picture, since everything subsequently depends on it, but the last pair of B-pictures depend on the first I-picture of the next GOP. This therefore has to be pulled back into the first group of 9 pictures, to establish the pattern of all the subsequent groups. Another way of looking at this is to observe that the first GOP is special because there is no way there can be a B-picture before the first I-picture which depends on it, whereas in later GOPs there will be.
- The movie can start playing instantly, because the data rate is less than the speed of the download. Nevertheless, it may play back jerkily if the processor cannot display frames at that rate, or if network delays interfere – when deciding when to start a progressively downloaded movie, players assume a constant download rate which may not be achieved. We expand on this point in Chapter 16.
- When choosing video formats and codecs, the following factors should be considered.
- The destination for the video (DVD, Blu-Ray, broadcast, mobile phone, Web,…). Many of these delivery formats impose the choice for you. For example, DVD video must be in MPEG-2 format.
- Availability of codecs in players. There is no point using a codec that is unlikely to be installed on the machines of your intended audience or a format that cannot be read on common platforms.
- Hardware demands of the decompression process. Again, there is no point in using a codec if decompressing the video in real-time is beyond the capabilities of typical computer systems.
- Achievable compression ratios. Can you get enough compression with your chosen codec to be able to deliver it over your chosen medium, without unacceptable loss of quality?
- Image quality of compressed video. Some older codecs (e.g. Cinepak) always produce poor quality, which may be unacceptable.
- Licencing and intellectual property issues. The use of some codecs requires a licence fee. Some people insist on using "Free Software" that is distributed under a licence such as the GNU GPL which allows it to be modified by its users.
Discussion Topics: Hints and Tips
- As a possible starting point, consider the video formats used by mobile phones and those "still" cameras which can also capture short video clips. These devices are less hampered by the legacy of broadcast formats than conventional video cameras.
- Think about the effect on compression as well as the factors we consider when describing the use of indexed colour for still images in Chapter 5.
- Apple's iMovie is intended for just such applications, so you could begin to answer this question by making a critical assessment of its features.
Practical Tasks: Hints and Tips
- If this exercise is to be worthwhile, you need to be as systematic as possible and approach it as you would a scientific experiment. Don't necessarily expect a clear-cut result to emerge if you use modern codecs. There are differences in the algorithms they use, though, so you should be able to refine your experiment to identify factors which do affect their quality differently. For example, does one codec produce consistently better results on scenes with rapid movement?
- Try looking up the term "garbage matte" and see how it is interpreted in After Effects or Final Cut Pro.