このページは http://www.slideshare.net/vishalsuri007/lossless の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

- Lossless Compression

CIS 658

Multimedia Computing - Compression

• Compression: the process of coding

that will effectively reduce the total

number of bits needed to represent

certain information. - Compression

• There are two main categories

Lossless

Lossy

• Compression ratio: - Information Theory

• We define the entropy of an

information source with alphabet S = {s1,

s2, …, sn} as

• pi - probability that si occurs in the source

and log21/pi is amount of information in si - Information Theory

• Figure (a) has a maximum entropy of

256 (1/256 log2256) = 8.

• Any other distribution has lower entropy - Entropy and Code Length

• The entropy gives a lower bound on

the average number of bits needed to

code a symbol in the alphabet

l where l is the average bit length of the

code words produced by the encoder

assuming a memoryless source - Run-Length Coding

• Run-length coding is a very widely used

and simple compression technique

which does not assume a memoryless

source

We replace runs of symbols (possibly of

length one) with pairs of (run-length,

symbol)

For images, the maximum run-length is the

size of a row - Variable Length Coding

• A number of compression techniques

are based on the entropy ideas seen

previously.

• These are known as entropy coding or

variable length coding

The number of bits used to code symbols in

the alphabet is variable

Two famous entropy coding techniques are

Huffman coding and Arithmetic coding - Huffman Coding

• Huffman coding constructs a binary tree

starting with the probabilities of each

symbol in the alphabet

The tree is built in a bottom-up manner

The tree is then used to find the codeword

for each symbol

An algorithm for finding the Huffman code

for a given alphabet with associated

probabilities is given in the following slide - Huffman Coding Algorithm

1. Initialization: Put all symbols on a list

sorted according to their frequency

counts.

2. Repeat until the list has only one

symbol left:

a. From the list pick two symbols with the

lowest frequency counts. Form a Huffman

subtree that has these two symbols as

child nodes and create a parent node. - Huffman Coding Algorithm

b. Assign the sum of the children's frequency

counts to the parent and insert it into the

list such that the order is maintained.

c. Delete the children from the list.

3. Assign a codeword for each leaf based

on the path from the root. - Huffman Coding Algorithm
- Huffman Coding Algorithm
- Properties of Huffman Codes

• No Huffman code is the prefix of any

other Huffman codes so decoding is

unambiguous

• The Huffman coding technique is

optimal (but we must know the

probabilities of each symbol for this to

be true)

• Symbols that occur more frequently

have shorter Huffman codes - Huffman Coding

• Variants:

In extended Huffman coding we group the

symbols into k symbols giving an extended

alphabet of nk symbols

This leads to somewhat better compression

In adaptive Huffman coding we don’t assume

that we know the exact probabilities

Start with an estimate and update the tree as we

encode/decode

• Arithmetic Coding is a newer (and more

complicated) alternative which usually

performs better - Dictionary-based Coding

• LZW uses fixed-length codewords to represent

variable-length strings of symbols/characters that

commonly occur together, e.g., words in English

text.

• The LZW encoder and decoder build up the same

dictionary dynamically while receiving the data.

• LZW places longer and longer repeated entries

into a dictionary, and then emits the code for an

element, rather than the string itself, if the

element has already been placed in the

dictionary. - LZW Compression Algorithm
- LZW Compression Example

• We will compress the string

"ABABBABCABABBA"

• Initially the dictionary is the following - LZW Example

Code

String

1

a

2

b

2

c - LZW Example
- LZW Decompression
- LZW Decompression Example
- Quadtrees

• Quadtrees are both an indexing

structure for and compression scheme

for binary images

A quadtree is a tree where each non-leaf

node has four children

Each node is labelled either B (black), W

(white) or G (gray)

Leaf nodes can only be B or W - Quadtrees

• Algorithm for construction of a quadtree for

an N N binary image:

1. If the binary images contains only black pixels,

label the root node B and quit.

2. Else if the binary image contains only white

pixels, label the root node W and quit.

3. Otherwise create four child nodes

corresponding to the 4 N/4 N/4 quadrants of the

binary image.

4. For each of the quadrants, recursively repeat

steps 1 to 3. (In worst case, recursion ends when

each sub-quadrant is a single pixel). - Quadtree Example

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 - Quadtree Example

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 - Quadtree Example

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 - 1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

0

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 - Lossless JPEG

• JPEG offers both lossy (common) and

lossless (uncommon) modes.

• Lossless mode is much different than

lossy (and also gives much worse

results)

Added to JPEG standard for completeness - Lossless JPEG

• Lossless JPEG employs a predictive

method combined with entropy coding.

• The prediction for the value of a pixel

(greyscale or color component) is based

on the value of up to three neighboring

pixels - Lossless JPEG

• One of 7 predictors is used (choose the

one which gives the best result for this

pixel). - Lossless JPEG

• Now code the pixel as the pair

(predictor-used, difference from

predicted method)

• Code this pair using a lossless method

such as Huffman coding

The difference is usually small so entropy

coding gives good results

Can only use a limited number of methods

on the edges of the image