📜  压缩尝试(1)

📅  最后修改于: 2023-12-03 15:07:22.292000             🧑  作者: Mango

压缩尝试

图片、文本、视频等文件都需要压缩,以便节省存储空间和提高传输效率。我们可以通过多种算法实现文件压缩,本文介绍两种流行的压缩算法:Huffman编码和Lempel-Ziv压缩算法。

Huffman编码

Huffman编码是一种基于频率的压缩算法,在压缩过程中,较常见的字符使用较短的编码,不常见的字符使用较长的编码。算法的核心是构建Huffman树,通过对字符集进行统计,得到每个字符的频率,然后将频率构建成一棵Huffman树,树的根节点为编码前的字符集合,叶子节点为字符集中的每个字符。然后从根节点开始,为每个字符建立一条路径,路径上0表示向左,1表示向右,得到每个字符对应的Huffman编码,由于较常见的字符编码较短,因此压缩后的文件大小更小。

示例代码:

import heapq
from typing import Dict, Tuple, Union


def get_freq(text: str) -> Dict[str, int]:
    """
    统计文本中每个字符出现频率
    """
    freq: Dict[str, int] = {}
    for ch in text:
        freq[ch] = freq.get(ch, 0) + 1
    return freq


def build_huffman_tree(freq: Dict[str, int]) -> Tuple[Union[str, Tuple], Dict[str, str]]:
    """
    构建Huffman树,并返回每个字符的Huffman编码
    """
    heap = [(f, ch) for ch, f in freq.items()]
    heapq.heapify(heap)
    while len(heap) > 1:
        freq1, left = heapq.heappop(heap)
        freq2, right = heapq.heappop(heap)
        heapq.heappush(heap, (freq1 + freq2, (left, right)))
    _, tree = heap[0]
    code_map: Dict[str, str] = {}

    def dfs(node, code):
        if isinstance(node, tuple):
            dfs(node[0], code + '0')
            dfs(node[1], code + '1')
        else:
            code_map[node] = code

    dfs(tree, '')
    return tree, code_map


def encode(text: str, code_map: Dict[str, str]) -> str:
    """
    使用Huffman编码对文本进行压缩
    """
    return ''.join([code_map[ch] for ch in text])


def decode(encoded: str, tree: Union[str, Tuple]) -> str:
    """
    使用Huffman编码解压缩文本
    """
    text = []
    node = tree
    for bit in encoded:
        if isinstance(node, tuple):
            node = node[int(bit)]
        if isinstance(node, str):
            text.append(node)
            node = tree
    return ''.join(text)


def compress(text: str) -> Tuple[str, Dict[str, str]]:
    """
    对文本进行压缩
    """
    freq = get_freq(text)
    tree, code_map = build_huffman_tree(freq)
    encoded = encode(text, code_map)
    return encoded, code_map


def decompress(encoded: str, code_map: Dict[str, str], tree: Union[str, Tuple]) -> str:
    """
    对已压缩文本进行解压缩
    """
    return decode(encoded, tree)
Lempel-Ziv压缩算法

Lempel-Ziv压缩算法是一种基于字典的算法,通过将文本分割成短语(phrase),并使用一个字典来存储其中已经出现过的短语,每个短语用一个索引来表示,从而实现压缩。算法可以分为两种实现方式:LZ77和LZ78。其中LZ77通过一个滑动窗口来寻找重复的短语,LZ78则直接在字典中查找已有的短语。

示例代码:

def lz_compress(text: str) -> str:
    """
    对文本进行LZ压缩
    """
    dic = {}
    seq = []
    i = 0
    while i < len(text):
        j = i
        w = ''
        while j < len(text) and w in dic:
            w += text[j]
            j += 1
        if w in dic:
            seq.append(dic[w])
            i = j - 1
        else:
            seq.append(ord(w[-1]))
            dic[w] = len(dic) + 1
            i = j - 1
    return ''.join([chr(c) for c in seq])


def lz_decompress(compressed: str) -> str:
    """
    对已LZ压缩文本进行解压缩
    """
    dic = {i + 1: chr(i) for i in range(256)}
    text = ''
    i = 0
    while i < len(compressed):
        if ord(compressed[i]) in dic:
            text += dic[ord(compressed[i])]
            if i + 1 < len(compressed) and ord(compressed[i+1]) not in dic:
                dic[len(dic) + 1] = dic[ord(compressed[i])] + compressed[i+1]
                i += 1
        else:
            text += compressed[i]
            i += 1
    return text

以上就是Huffman编码和Lempel-Ziv压缩算法的使用示例,读者可以根据自己的需要选择使用。