• Data Compression

    What is Data Compression?

    Data compression is the reduction in the number of bits that is required to represent data. Compressing data can help to save on storage, speed up file transfer, capacity and decrease the storage for hardware and network bandwidth.

    How compression works

    The process of compression is performed by a program which uses the formula of algorithm so as to determine how to shrink the size of the data. For example, an algorithm may use represent a string of bits 0s and 1s with a smaller string of 0s and 1s through the use of a dictionary for the conversion between them.

    Text compression can be as simple as removing all the unneeded characters and inserting a single repeat character to indicate a string of repeated characters while substituting a smaller bit string for a frequently occurring bit string. Data compression can be able to shrink a text file by 50% or significantly higher percentage of its original size.

    In the case of data transmission, compression can be performed on the content of the data, including header data. Whenever information is sent or received via the internet, files that are large can be transmitted in a Zip, GZIP or another file format.

    Why is data compression important?

    Data compression has the ability to decrease the amount of storage file for example in a 2:1 compression ratio a 20-megabyte file may take up 10 MB space and as a result the data administrator spends less money and time on storage space.

    Compression optimizes backup storage performance as recently shown in primary storage data reduction. Compression is a very important method of data reduction as it continues to grow exponentially. Any type of file can be compressed however, it is very important to follow the best practices when choosing the type of files to compress. For instance, some files may need to be compressed therefore compressing those files would have a significant impact.

    Data compression methods

    Compressing data can be lossy or lossless process. Lossless compression enables the restoration of a file to its original state without necessarily the lose of a single bit of data when uncompressing the file. Lossless compression is the typical approach where executables as well as text and spreadsheet files in a case where the loss of words or numbers would affect or change the information.

    Lossy information basically eliminates bits of data that is redundant, unimportant or imperceptible. Lossy compression is very useful with audio, graphics, video and images where the removal of some data bits has a little or no discernible effect of the content.

    Graphic image compression can either be lossy or lossless. The file formats are typically designed to compress information since the files tend to be large. JPEG for instance, is a file format that supports lossy image compression formats such as GIF and PNG.

    Compression vs data duplication

    Compression is often compared to data deduplication in that the two technique operate frequently. Deduplication is a type of compression that looks for redundant chunks of data across several storage systems and then replaces each of the duplicate chunks with a pointer to the original. Data compression algorithms reduces the sizes of the size of the bit strings in a data stream that is far much smaller in scope and generally remembers more or less of the data.

    Data compression and Backup

    Compression is mainly used for data that cannot be accessed that much as the process can be rather intensive and slow. Administrators are able to seamlessly integrate compression in their backup systems. Backup is a redundant type of workload and any organization that performs full backups often have close to the same data from backup to backup.

    Benefits of compressing data

    Some of the major benefits of compressing data include data that is compressed takes up less space with a compression ratio of between 100:1 0r 5:1 or 2:1. Incase compression takes place in a server then prior transmission the time it takes to transmit data and the total network bandwidth is drastically reduced. On tape the compressed smaller file system image can be scanned faster and reach a particular file reducing the storage space. The greatest advantage of data compression is reduction in storage hardware, data transmission as well as communication bandwidth.