📅  最后修改于: 2023-12-03 14:58:27.930000             🧑  作者: Mango
This problem requires us to implement Huffman coding algorithm to compress and decompress a given string of characters. Huffman coding is a lossless data compression algorithm that assigns variable-length codes to symbols based on their frequency of occurrence. The more frequently a symbol appears in a given string, the shorter its code will be, resulting in overall compression of the input string.
To implement Huffman coding, we need to perform the following steps:
from heapq import heappush, heappop, heapify
from collections import defaultdict
def encode(input_string):
freq = defaultdict(int)
for c in input_string:
freq[c] += 1
pq = [[freq[char], char] for char in freq]
heapify(pq)
while len(pq) > 1:
left = heappop(pq)
right = heappop(pq)
heappush(pq, [left[0] + right[0], left, right])
codes = {}
def assign_codes(node, prefix=''):
if type(node) == str:
codes[node] = prefix
else:
assign_codes(node[1], prefix + '0')
assign_codes(node[2], prefix + '1')
assign_codes(heappop(pq)[1])
encoded_string = ''.join([codes[c] for c in input_string])
return encoded_string, codes
def decode(encoded_string, codes):
decoded_string = ''
rev_codes = {v: k for k, v in codes.items()}
i = 0
while i < len(encoded_string):
j = i + 1
while encoded_string[i:j] not in rev_codes:
j += 1
decoded_string += rev_codes[encoded_string[i:j]]
i = j
return decoded_string
input_string = 'abacabadabacaba'
encoded_string, codes = encode(input_string)
print('The encoded string using Huffman coding:', encoded_string)
decoded_string = decode(encoded_string, codes)
print('The decoded string:', decoded_string)
We count the frequency of each character in the input string using a defaultdict. Then, we create a priority queue (using a heap), where each node is a tuple consisting of the frequency and the character. We repeatedly extract the two nodes with the smallest frequency from the priority queue, create a parent node whose frequency is the sum of the frequencies of the two extracted nodes, and push the parent node back to the priority queue. We repeat this process until only one node remains in the priority queue, which is the root of the Huffman tree.
We then traverse the Huffman tree to assign codes to each leaf node, by recursively traversing the left and right subtrees and adding a '0' for the left subtree and a '1' for the right subtree. We store the generated codes in a dictionary.
After generating the Huffman codes, we encode the input string by replacing each character with its corresponding code. We then decode the encoded string by traversing it from left to right, matching each substring with a code in the generated codes dictionary and outputting its corresponding character.
In this article, we implemented Huffman coding algorithm to compress and decompress a given string of characters. This algorithm is widely used in data compression applications, such as zip and gzip, and provides a simple yet effective way to compress data without losing any information.