eScience Lectures Notes : The role of Digitisation

Slide 1 : 1/28 : The role of Digitisation (index.en.html)

COMP1710 Tools for New Media and Web


The role of Digitisation

Click here to start or press 's'tart or 'i',

then 'n'ext or 'b'ack

Click here for the 't'able of Content

Slide 2 : ToC : The role of Digitisation (tableOfContent.en.html)

Table of Contents (28 slides) for the presentation :

The role of Digitisation

Slide 3 : 3/28 : New Media and Web (intro.en.html)

In this lecture: The role of Digitisation

To read more about this subject :

"The Web Wizards's guide to Multimedia" by James G. Lengel
from Addison-Wesley's


Slide 4 : 4/28 : Communication by multimedia (communication.en.html)

Communication by multimedia

A computer connected to the internet allows us to

provide data to any of the human senses (though not all are equally available today)

Multimedia is nothing but the processing and presentation of information in a more structured and understandable manner using more than one medium such as text, graphics, animation, audio and video. Thus multimedia products can be an academic presentation, game or corporate presentation, information kiosk, fashion-designing etc. Multimedia systems are those computer platforms and software tools that support the interactive uses of text, graphics, animation, audio, or motion video. In other words, a computer capable of handling text, graphics, audio, animation and video is called a multimedia computer. If the sequence and timing of these media elements can be controlled by the user, then one can call it Interactive Multimedia.


Slide 5 : 5/28 : 5, 6, 7 ... senses (senses.en.html)

Five senses (but there are 7 or more)




The ability to hear; the auditory faculty; SYN. audition, auditory sense, sense of hearing, auditory modality.




The ability to see; the faculty of vision; SYN. vision, visual sense, visual modality.


surface / temperature


The faculty of touch; SYN. sense of touch, skin senses, touch modality, cutaneous senses.




The faculty of smell; SYN. sense of smell, olfaction, olfactory modality.


savour, flavour


The faculty of taste; SYN. gustation, sense of taste, gustatory modality.


position, movement, muscular tensions


The perception of body position and movement and muscular tension etc; SYN: kinaesthesia, feeling of movement


balance, acceleration, position, location, orientation, movement of the body


The ability to sense the position and location and orientation and movement of the body and its parts.



Slide 6 : 6/28 : Multimedia before Digitisation (beforeDigitization.en.html)

Multimedia before Digitisation

Each form of human communication had its own technology and its own channel

They evolved separately due to

into different industries:


Slide 7 : 7/28 : Specialized Systems (specialazedDevices.en.html)

Specialized Systems

E.g. at home : a long list of different media appliances

Radio, telephone, tape player, television, VCR, CD player, slide projector, newspapers, books ...

and outside : cinema, theater, concert halls, restaurants


Slide 8 : 8/28 : Multimedia before Digitisation (2) (communication2.en.html)

Multimedia before Digitisation (2)

Old media can not offer the same range of forms as the internet

Other Mass media are essentially one-way

Main diffusion principle : broadcasting (one to many)

Broadcastingt + specialized

Older forms of human communication are more interactive than most modern media.

One issue : the scale

Slide 9 : 9/28 : Multimedia after Digitisation (2) (communication3.en.html)

Multimedia after Digitisation

Media After

Slide 10 : 10/28 : Digital Revolution (digitalRevolution.en.html)

Digital Revolution

When they are digitized, all the different type of media can be saved in a digital computer file

A single medium stores text, voice, video, images and music and the computer can play all of them back, with high quality, at the same time

Convergence trends

N.B. : same medium, but still different usages !

Slide 11 : 11/28 : Two steps : From the physical world (twoSteps.en.html)

Two steps

From the physical world ...

Analog Signal

Light Intensity and Wavelength


How do we represent an analog signal in a computer ?

Basic problem is that we need to represent a function, which mathematically can represent an infinite amount of information, with a finite number of symbols.


Slide 12 : 12/28 : Specialized Systems (twoSteps2.en.html)

Two steps : ... to the digital world

Digitisation (Sampling) : Discretisation in space or time

Sampling pattern : image space is tessellated into discrete, local, compact, regions (regular rectangular planar grid)

Sampling process : Point in neighbourhood, Average over neighbourhood

Nyquist Criterion : Sample take at least twice highest frequency contained in the signal of interest

Sampling + Discretisation

Sound : Rate (44kHz) and Size (8 or 16 bits)

Image : Resolution (300 dps) and number of colours

Quantisation : Discretisation in Value

The function can take on only finitely many values

E.g. for images 3 general domains :

See for more on Bit Depth

Slide 13 : 13/28 : Illustration of the Digitisation (digitisationEx.en.html)

Illustration of the Digitisation :

From 600x400, 32 bits image (72 dpi : dot per inch)

(240 000 pixels)

original Image

To 180x80, 32 bits image

Low Resolution

and filling the same space (14.4dpi):


Slide 14 : 14/28 : Illustration of the Quantization : (quantisationEx.en.html)

Illustration of the Quantisation :

From 600x400, 32 bits image (72 dpi : dot per inch)

original Image

To 600x400, 4 bits image (16 colors)

To 600x400, 2 bits image (4 colors)

Slide 15 : 15/28 : Network Bandwidth (bandwidth.en.html)

Network Bandwidth

The rate at which the network can deliver data to the destination point

The amount of data that can be transmitted over a network in a fixed amount of time. Bandwidth is the fundamental networking parameter, and is usually measured in kilobits, megabits or gigabits per second (Kbps, Mbps, or Gbps).

Rate of transfer

Available bandwidth determined by wire and hardware

You may have High-Bandwidth and bad (high) latency (eg. Satellite)

Slide 16 : 16/28 : The Role of Bandwidth (bandwidthRole.en.html)

The Role of Bandwidth

The size of this included image is 84kB

84 kB = 84 kilo Bytes = 84 * 1024 * 8 = 688 128 bits

1kB = 1024 B   / 1 Bytes = 8 bits

The time to transfer the image = size / bandwidth

by a modem at 56 kbps = 56 000 bit per second

time = 688 128 / 56 000 = 12.28 secondes

on TransACT "broadband" : 688 128 / 512 000 = 1.3 s

Slide 17 : 17/28 : Bandwidth Levels (bandwidthLevels.en.html)

Bandwidth Levels

Type of Connection


what you get in 1 second

Or live streaming

old modem 9600 bps small email ~ 1.2 kB irc / text / telnet
modem 56 kbps web graphic ~ 7 kB audio
ISDN (Integrated Service Digital Network) 128 kbps 2 web graphics ~ 15 kB visioconference one 2 one
DSL / Cable Modem 512 kbps 1 jpeg image 600x400 ~ 62 kB 300kbps = very useful video (cable, ADSL)
near future DSL / Cable Modem 1Mbps Document ~ 125 kB 1500kbps, 2.2 Mbps= VHS video
WIFI 54Mbps
10Mbps 1 floppy disk ~ 1.25 MB 6Mbps = PAL video
ethernet 100Mbps 2 MP3 songs ~ 12.25 MB 20Mbps = compr. HDTV
ethernet 1Gbps 10m CD audio ~ 125 MB 270Mbps = raw PAL video
  10Gbps 2 CDs ~ 1.25 GB 1.5Gbps = raw HDTV
  100Gbps 2 DVDs ~ 12.5 GB 1Tbps = 50,000 channels of compressed HDTV

NB. : Mbps = 1000 x 1000 bits per second, kbps = 1000 bps, Gbps = 1000 Mbps   -- minus overheads !

MB/s (Megabytes/s) : 1024x1024 bytes per second

The standard for carriers and networks is that Mbps is 1000x1000 bits per second (and Gigabit/s is 1000x1000x1000). That's also the transport rate, not the payload rate - so you need to allow for overheads of whatever protocols you are using. (e.g. tcp/ip/atm/sdh - you lose a lot of payload bandwidth that way.)
Conversely, if somebody quotes MB/s (Megabytes/s) they do usually mean 1024x1024 bytes per second.

Back in the bad old days, a 1 Megabyte floppy was 1024x1000 !

Slide 18 : 18/28 : Network Bandwidth (2) (bandwidth2.en.html)

Network Bandwidth (2)

Slide 19 : 19/28 : Data Compression : Why ? (compressionWhy.en.html)

Data Compression : Why ?

Let's take an image ..."The size of this included image is 84 kB"

But 600 x 400 x 32 bits (2^32 colours = 16 Millions colours) = 7680000 bits = 1 MB !!! (937.5 kB)

No more 12 s but 2 minutes on a 56kbps modem !

84 kB is the compressed image

Compare your lab directory size and the zipped equivalent ... another type of compression

Different type of compression uses different codec : compressor / decompressor software routines

Same Image

Slide 20 : 20/28 : Finding Redundancy (compressionRedundancy.en.html)

Data Compression : Finding Redundancy

Most types of computer files are fairly redundant -- they have the same information listed over and over again.

file-compression programs list information once and then refer back to it whenever it appears in the original program.

In John F. Kennedy's 1961 inaugural address :

"Ask not what your country can do for you -- ask what you can do for your country."

17 words, made up of 61 letters, 16 spaces, one dash and one period : total file size of 79 units.

"ask", "what", "your", "country", "can", "do", "for", "you" appear twice



Slide 21 : 21/28 : Looking it Up (compressionLooItUp.en.html)

Data Compression : Looking it Up

Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files.

"LZ" refers to Lempel and Ziv, the algorithm's creators

"dictionary" refers to the method of cataloging pieces of data.

Our Dictionary :

  1. ask
  2. what
  3. your
  4. country
  5. can
  6. do
  7. for
  8. you

The compressed sentence : from "Ask not what your country can do for you; ask what you can do for your country" (79 char) to ...

"1 not 2 3 4 5 6 7 8; 1 2 8 5 6 7 3 4"

36 + 36 (dictionary 29+7) = 72

"Ask not what your country can do for you; 1 3 9 6 7 8 4 5" = 57



Slide 22 : 22/28 : Searching for Patterns (compressionPatterns.en.html)

Data Compression : Searching for Patterns

a compression program doesn't have any concept of separate words : it only looks for patterns.

Pattern : combinaison of character that are repeated over the sentence

From simple pattern ( "ou" in "your" and "country" ) to more than one word ( "can do for you" )

The ability to rewrite the dictionary is the "adaptive" part of LZ adaptive dictionary-based algorithm.

The way a program actually does this is fairly complicated, as you can see by the discussions on
No matter what specific method you use, this in-depth searching system lets you compress the file much more efficiently than you could by just picking out words.

Using the patterns we picked out above, and adding "_" for spaces, we come up with this larger dictionary:

  1. ask_
  2. what_
  3. you
  4. r_country
  5. _can_do_for_you


Sentence 16 units + dictionary  40 units = 56 units!



Slide 23 : 23/28 : Data Compression : Why ? (compressionLose.en.html)

Data Compression : Lossy and Lossless

Lossless compression lets you recreate the original file exactly

LZ adaptive dictionary-based algorithm is a well known example

Breaking a file into a "smaller" form for transmission or storage and then putting it back together on the other end so it can be used again.

works well ( good "file-reduction ratio" ) with text file, and programming source

far less efficient with complex data like sound or bitmap pictures

Lossy compression eliminate "unnecessary" bits of information

E.g. : the sky in a picture is blue, but most of the pixels are with a different blue.

The compression codec would chose an average blue and apply it to the pixels not to far from that average value.

No way to get the lost information back after such an alteration

You are not supposed to notice the change

This sort of compression can't be use for anything that needs to be reproduced exactly



Slide 24 : 24/28 : principles for Lossy compression (compressionLosy.en.html)

Different principles for Lossy compression

What is "unnecessary" ?

Averaging ( + repetition/pattern ) (clever forced digitisation)

Range Reduction : from 32 bits to 16 bits, to 8 bits (clever forced quantisation)

issues : classical music ( Orff - Carmina Burana : o Fortuna , Ravel's Boléro )  / subtleties in high contrasted pictures

Variation in Quantisation


Use the way human perception works : more receptive to high frequencies

E.g. : we see better objects that move, we notice the edge of things more than the centers

Frame difference Compression

In video, sending information only about what did change from a key frame (head talking in front of a fixed background)

Slide 25 : 25/28 : File Format (fileFormat.en.html)

File Format

Here is some music -- or is it an image ? or some text ? or some video ?

Only Matrix surfers are able to recognise it, or the computer

00101100110100011101010101000101010000100100100 01010001010101011110111111101101011010101010101 00101010010101010101010101010100100100111110001 ...

A format and a header for each file

The suffix or the type mime express the type of content, the format, then the file starts with a header that gives more information on the way to read the file format

If a file was a map, the format would be the legend and the header the scale and the direction of North

HTML, Gif, JPEG, MOV etc are file formats (for data, encapsulation of data)

Slide 26 : 26/28 : When a file format become a standard (standards.en.html)

When a file format become a Standard

Anybody can define a new format -- but not all formats become Standards

A need for it

A description of the structure of the files

Some tools to produce it

Some code to read/use it (library, plugin etc)

Some users and developers to adopt it

Some standards organization(s) to recognise it

ISO (International Standards Organization), IEEE (Intistitute of Electrical and Electronics Engineers), IETF (Interneet Engineering Task Force), W3C (Web Consortium), ECMA (European Computer Manufacturers Association, now Ecma International.

Slide 27 : 27/28 : Open vs proprietary (OpenStandards.en.html)

Open vs proprietary

Real standards are open standards : documentation and basic code should be Royalties-Free, with no patent in the way.

Proprietary format may be sometimes considered as "defacto standard", but are not real standard

Proprietary format and patent are a threat to the free access to your own data !

Gif vs PNG : patents (Unisys and IBM) cover the LZW compression algorithm which is used in making GIF

The Unisys patent expired on 20 June 2003 in the USA, in Europe it expired on 18 June 2004, in Japan patent expired on 20 June 2004 and in Canada until 7 July 2004. The U.S. IBM patent expires 11 August 2006,

GIF Image Format (Unisys), Hyperlinking/Hypertext (British Telecom), JPEG (Forgent Networks), MPEG-4 (ISO/IEC JTC 1/SC 29/WG 11), W3C P3P (Intermind), RDF (Unified Data Technologies, Ltd.), Rights Expression Language (ContentGuard's XrML), Stylesheets: CSS, XSL (Microsoft), XPointer (Sun Microsystems)

A vicious way to enter open standards : RAND "reasonable and non-discriminatory" fees

This week reading :

Patents and Open Standards :


Slide 28 : 28/28 : Open vs proprietary (chosingFormat.en.html)

Choosing a format : parameters to take into account

Don't use a new media because it is new : chose it reluctantly because you really need it, and make sure that your web site is still usable without an access to that latest trendy next-internet-revolution...

Did I already told you that all the computer are different on the net ?

Display Size : From 640x480 to 1600x1200 pixels

Processor speed (important to decompress video or java animation)

Video System (integration of multiple video, 3D)

User Knowledge

User Willingness

System Software

Browser type and settings

Network Configurations and Firewalls


Computer Platform