Home > Technology > Data Compression and Decompression

Data Compression and Decompression

  • Buffer

Data compression is a coding process by which data is converted into a format in which it can be stored in less space. Compression is particularly useful in communications because it enables the transmission of the same quantity of data in fewer bits. Certain types of data, such as audio files and bit-mapped graphics, can be compressed to a small fraction of their original size. Data compression is also used by backup utilities, spreadsheet applications and database management systems.

Data compression is also referred to as source coding. The input symbols – bits, ASCII codes, bytes, audio samples or pixel values – are conceptualized as being emitted by an information source and undergoing a coding process before being sent to their destination. The source is considered to have “memory” if there is some correlation between its component symbols – such that each symbol depends on some of the preceding or succeeding ones – or to be “memoryless” if each symbol is independent of the others. A memoryless source is also referred to as “independent and identically distributed,” or IID.

The compression process comprises two components – an encoding algorithm that generates a compressed representation of the file and a decoding algorithm that reconstructs the original file or a close approximation of it from the compressed representation.

There are two main types of compression algorithm: lossless algorithms, which can derive an exact reproduction of the original file without any loss of data, and lossy algorithms, which can only reconstruct an approximation of the original file. Lossless algorithms are generally used for text, which must remain intact in order to be meaningful, while lossy algorithms are used for audio and image files, where the loss in resolution is usually too small to be detected or at least falls within an acceptable range. Data decompression refers to the reconstruction of the compressed data.

There are many known methods of data compression, which are suitable for different types of data and produce different results. However, they are all based on the same principle, which is the removal of “redundancy” from the original data in the source file. All nonrandom data have some kind of “structure,” which can be exploited to achieve a smaller representation of the data in which no structure can be discerned. These terms are used in the professional literature, as well as “smoothness,” “coherence” and “correlation”; they all refer to the same concept: the repetition of patterns in the data.

Compression algorithms have their basis in information theory, which predicts how probabilities are related to information content and code length. The success of current data compression technology proves the efficacy of information theory by achieving code lengths almost identical to what the theory predicts. In theoretical terms, all compression algorithms operate on the assumption of “bias,” that is, that there is some unbalanced probability distribution for the file to be compressed, by which repeated characters are more likely than random characters in a text file, or that significant patches of a single color occur in typical images. This bias is processed by the two main components of the compression algorithm: the model and the coder. The model component identifies the probability distribution by analyzing the structure of the file. For example, it could measure repeated patterns in text. The coder component utilizes this information to generate codes that effectively lengthen low-probability data and shorten high-probability data.

Some compression algorithms are more useful than others, The criteria by which lossless compression algorithms are compared include compression time, reconstruction time, the relative size of the compressed file and the generality of the algorithm, that is, the kind of file on which it can operate. In the case of lossy compression, the comparison is further complicated by the quality of the approximation rendered by the encoder. There is usually a trade-off between the amount of compression, the compression time and the quality of the reconstruction. These factors tend to vary in importance according to the application in which the file is used; consequently, certain algorithms are favored for use in specific programs.

Resource by

I am a fun outgoing girl who loves to go to concerts and sit on the grass and read. My favorite books are those that make me think about life, love and how the world spins around no matter the troubles that you're going through.

Related Research For Teachers, Students, and Kids

  • Features of XHTML and IE Support of XHTML
    What Are the Features of XHTML? XHTML stands for Extensible Hypertext Markup Language. It is an aut...
  • Sir Francis Galton: Facts, Life Findings, and Resources
    Brief Biography and Timeline of Sir Francis Galton's Life Sir Francis Galton was a British polymath...
  • Internet Telephony (VoIP): History, Facts, and Resources
    What is Internet Telephony VoIP In essence, Internet telephony is an Internet telephone service. Us...
  • Polonium (Po): Fun Facts and Information About the Element
    Fun Facts about Polonium What is the symbol? Po What is the atomic number? 84 What is the atomic we...
  • Neptunium (Np): Fun Facts and Information About the Element
    Fun Facts aout Neptunium What is the symbol? Np What is the atomic number? 93 What is the atomic we...