Answers to Exercises, Chapter 16
These are answers to the exercises in the 3rd edition of Digital Multimedia (published February 2009) only. Do not try to use them in conjunction with the 2nd edition.
Test Questions
Asymmetrical technologies such as ADSL are suitable for domestic Internet connections because most of the activities carried out by domestic users are themselves asymmetrical – that is, domestic Internet users download more data than they upload.
This is evident in the case of Web browsing, which is likely to be the most common Internet activity for the majority of domestic users. The upstream data consists of HTTP requests, which are typically short, since they only need to specify the location of the resource to be retrieved, and provide some information about the user agent and the types of data it will accept. The server's response, which goes in the downstream (relatively fast) direction contains all the data on the specified page, including images and sometimes bulky video files.
Similarly, when users download files, such as system updates, their requests are small, but the data coming back in response may be very large.
Email is an exception, since it is (ideally) fairly symmetrical – email messages are much the same length whether you send them or receive them. However, email messages are usually small, being mostly text, so the asymmetry is no great drawback. In practice, many of us receive far more email than we send, thanks to the unwelcome activities of spammers. Asymmetrical data connections at least ensure that this does not slow down our email too much.
The other exception is P2P file transfers, and asymmetrical connections are not really appropriate for that mode of interaction. We discuss this in the text. Despite the attention P2P networking receives, it remains a minority activity among domestic users, however.
An IP address is a set of numbers that identifies a host connected to the Internet. A transport address is an IP address augmented with a socket number so the difference between the two addresses lies in the presence of a socket number in the transport address.
The fundamental reason for requiring both is that the IP address can be used to direct packets through the Internet to the host that is their destination, but the socket number is needed to ensure that each packet is passed to the appropriate server process at the destination – a host may be running several servers, each listening for requests on a different socket.
This begs the question of why we choose to disinguish between an IP address and a transport address. The transport address includes the IP address, so it would seem to be all that is required. This is the case, and the real distinction between the two types of address lies in the protocol layer at which each is used. Organizing protocols into a layered structure permits a separation of concerns between protocols at each layer. At the Internet layer, we are concerned with delivering datagrams to the correct host, so we only need an IP address. At the transport layer, we need to establish connections between specific application processes running on hosts. Establishing connections between processes requires the socket number to identify the process as well as the IP address to identify the host, hence at the transport layer, we need transport addresses.
- The purpose of a file transfer application is to copy files from one host to another. It is reasonable to expect that the copied file will be identical to the original. Therefore, no data can be lost during the transfer, which mean that such applications must indeed always run on top of a reliable transport protocol.
- The purpose of the timestamp, as explained on page 636, is to allow applications to synchronize separate RTP streams with different media types, such as a video stream and its accompanying soundtrack. The sequence numbers are used to record the order in which the packets in a data stream were sent, so that an application that receives the packets can ensure they are in the correct order or deal with packets that are out of sequence in some appropriate way. The sequence numbers are assigned to packets within a single stream, so they cannot be used to synchronize independent streams; that requires the use of timestamps.
Multicast is only beneficial if identical data streams are being sent to many destinations at the same time.
(a) Video on demand will not benefit. The main point of video on demand is to allow the consumer to decide when and what to watch, so many different video streams will be sent to people at many different times.
(b) Internet radio is an ideal application for multicasting. Following the model of traditional radio broadcasting, programs are sent out at a specific time, and many people may listen to them at the same time.
(c) Automatic distribution of system upgrades might, in theory, benefit from multicasting, but only if the system vendors were able to push upgrades to everybody at once. In practice, there are practical obstacles to this approach to system updating, such as ensuring that all systems that need updating are accessible at the time the update is transmitted. An even less tractable obstacle is people's natural preference for deciding for themselves whether and when to upgrade their own systems. Hence, in reality, this application is unlikely to benefit from multicasting.
(d) Online photo editing has none of the characteristics that would benefit from multicasting – individuals interact with their own photographs at random times, with data flowing to the server as well as from it.
The advantage of using a persistent connection for HTTP is that it avoids the non-trivial overhead of establishing a new connection, bringing it up to speed and then dismantling it for every request. (Some overhead is incurred at the server, client and intermediate routers whenever a connection is opened. Minimizing connections has some other technical advantages. See the discussion at the beginning of section 8 of the HTTP specification if you know something about networks.) HTTP requests typically come in batches, because a Web page is built from a collection of resources – an XHTML document, one or more stylesheet files and script files, some images and possibly other embedded media – so retrieving a single page will cause a sequence of closely-spaced requests from a single client to the same server. Keeping the connection open throughout this sequence allows the page to load faster.
The disadvantage of using persistent connections in this way is that the server and client have to manage the connection. Both servers and clients need to be able to close a connection explicitly, and deal with unexpected closing of connections. The server has to close a connection down after a specified period of time if there is no activity on it. Without persistent connections, HTTP is a purely stateless protocol: once it has dealt with a single request, a server can discard all state information related to it. This makes for efficient servers. With persistent connections, there is some loss of efficiency and some additional complexity at the server end.
The increased efficiency of persistent connections is considered to outweigh their disadvantages, and persistent connections are now used by default by HTTP servers and clients.
- It makes sense to provide an
Expires
header for any page that is periodically updated with information that is time-sensitive. A particular example is a page containing a weather forecast, such as one of the BBC's local weather pages. The URL we have linked to always points to the forecast for Edinburgh for the current day. Once the weather has occurred, the page's content is obsolete, so anExpires
header is included to ensure that browsers will always fetch a new copy of the page when the forecast in their cache is no longer valid. (The forecast pages that the BBC are using at the time of writing are actually very complicated collections of resources and it appears that JavaScript is used to fetch some of them dynamically, but if you look at the responses sent by the server, you will find several of them includeExpires
headers.) Other similar examples include news pages (again the BBC news home page is a good example – theExpires
header matches the creation time, because the page may be updated with breaking news at any time), online shops' bargain pages, where these are limited to a certain time, and (in principle though not always in practice) rail or ferry timetables, which are only valid for a few months. - The only purpose of RTSP is to control one or more streams of media data. The actual media data are transmitted using other protocols, often RTP. An RTSP client therefore needs to know some information about the actual streams before it can set up a session and start sending the messages that control the media streams. The presentation description contains this essential information. In particular, without the presentation description, the client does not know where to send subsequent requests, because the connection address is part of the presentation description.
- If an RSS feed is provided for a blog, the content of the blog entries (or summaries of them, if that is all that is included in the feed) can easily be extracted and presented in different ways. In particular, the contents of several feeds can be combined, either in a feed reader, or on a syndication service. The blog content can also be delivered through an email client. In the absence of an RSS feed, the information would have to be extracted by parsing the XHTML document for the blog's front page. To syndicate a collection of blogs, a program would need to know the structure of the markup used on each of them, which is not governed by any standard. In contrast, RSS (and Atom) present the information in a standard format which can easily be analyzed.
P2P file transfers may place additional loads on an ISP's network infrastructure in two ways. First, the culture surrounding P2P encourages or enables people to download large files, such as feature films, that otherwise they might not. Second, in a P2P network the ISP's customers should upload data equivalent to the data they download. All this increased data must flow through the ISP's network. As a result, the ISP will need to provide greater bandwidth, which will add to their costs and may reduce their profits.
ISP B is well-equipped to deal with this situation, because they are charging their customers for the bandwidth they actually use. ISP A's business model, though, relies on their customers never actually using the "unlimited bandwidth" which ISP A claims to provide. Hence ISP A is more likely to be opposed to P2P file sharing on economic grounds than ISP B, who may even encourage it, since they charge more per GB once users exceed their monthly allowance. (If ISP C were somehow able to distinguish P2P traffic and charge a premium for it, they might be even more enthusiastic about encouraging it.)
This is not the full story about the relationship between ISPs and P2P networking, though. It is possible that ISP B would express opposition to P2P file sharing for fear of being held responsible for any copyright abuses that took place via their networks. If ISP A wishes to continue charging a flat rate for unlimited access, it is likely that they will be opposed to any Internet application that leads to increased traffic, unless the price they need to pay for increased bandwidth is dropping at the same time.
The user shouldn't see much difference. When they arrive at the gallery, the page will load identically in both cases. When they click on a thumbnail, they will see the main image change. A properly written browser will only refresh the parts of the screen that change when a new page is loaded, so even if the gallery is built out of separate XHTML documents, the user should not see anything change except the main image. (We do not guarantee that every browser is written well enough for this to be the case. In a poorly written browser, the entire page will be redrawn in the case that the gallery is implemented as individual pages, but where the main image is replaced using JavaScript, it is the only thing that will change.) Because of caching, the delay while the new image is downloaded should be the same in both cases, although the JavaScript version might preload the images, in which case there will be no appreciable delay between images, whereas the version using separate pages will have to download one image every time a new one is displayed.
If the gallery is a set of pages, every time the user clicks a thumbnail to see a different full-sized image the URL in the status bar will change, and a new entry will be added to the browser's history. If JavaScript is used, this may not happen. Unless the script is written to manipulate the address bar and history list explicitly, there will only be a single URL for the whole gallery, no matter which main image is displayed. (This makes it impossible to bookmark a page for a specific image, which you can do if the pages are really separate documents.)
When describing the sequences of HTTP requests we assume that caching is enabled in the browser, but that the visitor has not visited the gallery before. We also assume that the only difference between pages displaying separate images is the image itself. It should be easy to see how matters will be changed if a caption must be updated at the same time.
Gallery as separate pages. When the user arrives at the gallery, an HTTP request for the XHTML document for the first gallery page is sent. After it is received, subsequent requests are sent for any stylesheets, all the image thumbnails and the main image that is displayed initially. When the user clicks a thumbnail, a request is sent for the XHTML document for the page that displays the corresponding image. After it is received, the browser will look in the cache and discover that requests for the stylesheets and image thumbnails can be met using the versions cached when the first page was loaded, so that only a request for the new main image needs to be sent. That is, for each page after the first, two requests are sent, one for an XHTML document, one for an image. If the user clicks a thumbnail for an image which they have viewed recently, all the requests will be met from the cache and none will be sent to the server.
JavaScript. When the user arrives at the gallery, an HTTP request for the XHTML document for the first gallery page is sent, as in the previous case. This will be followed by requests for the stylesheets and thumbnails, and also for the file containing the scripts used on the page. Depending on how the script is written, a single request may be sent for the first main image, or a collection of requests may be sent, one for every full-sized image in the gallery, so that these can be pre-loaded. When a user clicks on a thumbnail, in the first case a request for the full-sized image will be sent; in the second case, no request will be sent at all. So in this JavaScript version, at most one HTTP request is sent to the server when a new image is displayed.
Discussion Topics: Hints and Tips
Sometimes it seems like every program aspires to send HTTP requests. You should be able to find examples of at least three types.
Programs may have restricted browsers embedded in them. It has become easy to embed the working parts of a browser in other programs, using, for example, WebKit. This allows any program to provide an interface to a Web site. A typical example is iTunes, which allows you to connect to the iTunes store (a Web application) from inside the iTunes program using an embedded WebKit instance.
Special-purpose clients for some types of Web service use HTTP to pass data (often XML or JSON) but don't render it as a browser would. Typical examples of this sort of program would be desktop blog editors, such as NetNewsWire or FeedDemon, or programs for uploading images to Flickr or monitoring auctions on eBay.
Increasingly, programs whose main purpose is unrelated to Web browsing access information from the Web over HTTP. Up-to-date help information, often with "community-generated" content can often be accessed from inside a program. Adobe's CS4 Creative Suite programs can incorporate Flash-based panels that can send and receive HTTP messages to access services such as Kuler (described in Chapter 12).
There are also some special-purpose programs that send HTTP requests but do not fit into the categories described. One example is a link checker, used by Web developers.
- The Internet2 project supports peering arrangements for multicast. You may find some useful material on this topic by looking at the relevant background to this project.
- To answer this question, think about what information is contained in the headers. Try sending some
HEAD
requests and seeing what comes back. If you still can't see the point, read the relevant section of the HTTP specification, then see if you can think up some concrete examples based on the hints there. This should be obvious. But here is the identification string for Safari 4β. Does it look as if it will help with the worst problem?
Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_5_6; en-us) AppleWebKit/528.16 (KHTML, like Gecko) Version/4.0 Safari/528.16
- Development of P2P streaming is advancing at a fast rate, so instead of merely discussing its feasibility you need to look at the problems that "traditional" BitTorrent transfers face with streaming media and how these are being overcome, for example in the P2PNext project. (You might start with this news story that provides an introduction to this project, or this short report.) Do you think P2PNext will succeed?
- You will find it helpful to go back to Chapter 6 and think about the likely patterns of data resulting from compressing a video stream.
Practical Tasks: Hints and Tips
- For Firefox, the Live HTTP Headers add-on is recommended. In Safari 4, you can use the built-in Web Inspector to look at request and response headers. The Developer Tools in Internet Explorer 8 do not provide the facility to inspect HTTP headers, so a 3rd party tool will be required. (We cannot recommend any specific tools for IE, we suggest you use Firefox for this job instead.) There are some Web services that will show you HTTP headers if you cannot find a satisfactory way of viewing them in your browser.
- If you do not have any suitable software, there are Web-based services for creating podcasts.
- Implementation of this task will probably be too difficult unless you are also studying Web development and are able to put together a team to create the application. The exercise is still worth doing if you only get as far as a design or mock-up.