MATLAB FUNCTIONS FOR SOURCE CODING
Comunication System Toolbox offer several functions for source coding. We consider here only Huffman and arithmetic coding
Contents
Huffman coding
Huffman coding offers a way to compress data. The average length of a Huffman code depends on the statistical frequency with which the source produces each symbol from its alphabet. A Huffman code dictionary, which associates each data symbol with a codeword, has the property that no codeword in the dictionary is a prefix of any other codeword in the dictionary. The huffmandict, huffmanenco, and huffmandeco functions support Huffman coding and decoding.
Note For long sequences from sources having skewed distributions and small alphabets, arithmetic coding compresses better than Huffman coding. To learn how to use arithmetic coding, see Arithmetic Coding.
Create a Huffman Code Dictionary in MATLAB
Huffman coding requires statistical information about the source of the data being encoded. In particular, the p input argument in the huffmandict function lists the probability with which the source produces each symbol in its alphabet. For example, consider a data source that produces 1s with probability 0.1, 2s with probability 0.1, and 3s with probability 0.8. The main computational step in encoding data from this source using a Huffman code is to create a dictionary that associates each data symbol with a codeword. The commands below create such a dictionary and then show the codeword vector associated with a particular value from the data source.
symbols = [1 2 3]; % Data symbols p = [0.1 0.1 0.8]; % Probability of each data symbol
The output below shows that the most probable data symbol, 3, is associated with a one-digit codeword, while less probable data symbols are associated with two-digit codewords. The output also shows, for example, that a Huffman encoder receiving the data symbol 1 should substitute the sequence 11.
dict = huffmandict(symbols,p) % Create the dictionary. dict{1,:} % Show one row of the dictionary.
dict =
[1] [1x2 double]
[2] [1x2 double]
[3] [ 0]
ans =
1
ans =
1 1
Create and Decode a Huffman Code Using MATLAB
The example below performs Huffman encoding and decoding, using a source whose alphabet has three symbols. Notice that the huffmanenco and huffmandeco functions use the dictionary that huffmandict created.
sig = repmat([3 3 1 3 3 3 3 3 2 3],1,50); % Data to encode symbols = [1 2 3]; % Distinct data symbols appearing in sig p = [0.1 0.1 0.8]; % Probability of each data symbol dict = huffmandict(symbols,p); % Create the dictionary. hcode = huffmanenco(sig,dict); % Encode the data. dhsig = huffmandeco(hcode,dict); % Decode the code. whos
Name Size Bytes Class Attributes ans 1x2 16 double code 1x470 3760 double counts 1x3 24 double dhsig 1x500 4000 double dict 3x2 424 cell dseq 1x500 4000 double hcode 1x600 4800 double p 1x3 24 double seq 1x500 4000 double sig 1x500 4000 double symbols 1x3 24 double
Arithmetic Coding
Arithmetic coding offers a way to compress data and can be useful for data sources having a small alphabet. The length of an arithmetic code, instead of being fixed relative to the number of symbols being encoded, depends on the statistical frequency with which the source produces each symbol from its alphabet. For long sequences from sources having skewed distributions and small alphabets, arithmetic coding compresses better than Huffman coding.
The arithenco and arithdeco functions support arithmetic coding and decoding.
Represent Arithmetic Coding Parameters
Arithmetic coding requires statistical information about the source of the data being encoded. In particular, the counts input argument in the arithenco and arithdeco functions lists the frequency with which the source produces each symbol in its alphabet. You can determine the frequencies by studying a set of test data from the source. The set of test data can have any size you choose, as long as each symbol in the alphabet has a nonzero frequency.
For example, before encoding data from a source that produces 10 x's, 10 y's, and 80 z's in a typical 100-symbol set of test data, define
counts = [10 10 80];
Alternatively, if a larger set of test data from the source contains 22 x's, 23 y's, and 185 z's, then define
counts = [22 23 185];
Create and Decode an Arithmetic Code Using MATLAB
The example below performs arithmetic encoding and decoding, using a source whose alphabet has three symbols.
seq = repmat([3 3 1 3 3 3 3 3 2 3],1,50); counts = [10 10 80]; code = arithenco(seq,counts); dseq = arithdeco(code,counts,length(seq)); whos
Name Size Bytes Class Attributes ans 1x2 16 double code 1x470 3760 double counts 1x3 24 double dhsig 1x500 4000 double dict 3x2 424 cell dseq 1x500 4000 double hcode 1x600 4800 double p 1x3 24 double seq 1x500 4000 double sig 1x500 4000 double symbols 1x3 24 double