Harddrive Imaging compression

imzjustplayin · Aug 10, 2010

I was thinking about JBIG and LZW style of compression that is used for bi-level pictures (B&W pictures like those for faxes) and how it uses those compression algorithms for compressing the pictures to very small file sizes. Since those pictures work on the principle of zeros and ones, I was thinking that possibly this style of encoding could be applied to data that is on an HDD platter. That you could take the bits on the HDD platter and actually make it into a picture, then compress that picture using one of those compression schemes and then store the result on another drive. This seems like it could be one of the best ways to compress the data on a computer hard drive even on data that normally cannot be compressed. I understand that this would require some low level write ability to the drive but it's possible that this could be useful for drive imaging which is done at a low level as it is.

What do you guys think?

Whatsisname · Aug 10, 2010

no.

by the way, whenever someone says:

imzjustplayin said:
zeros and ones

in a computer-sciency context, they immediately lose any credibility they previously had.

imzjustplayin · Aug 11, 2010

Whatsisname said:
no.

by the way, whenever someone says:

in a computer-sciency context, they immediately lose any credibility they previously had.

But it is a "Zero and a One" when speaking in terms of bits, like 8 bits to a byte, I don't see anything wrong with saying this. Remember, I'm talking about the actual hard disk platter and the polarity of the magnetic information on the disc itself, which IS represented as 0s and 1s, and or for your sake "On or Off". The magnetism on the hard disks that represents data can only go two ways, you flip the poles so that they say either "north" or "south" and this is where you get the "0 and 1" or "on and off" bits from.

Bahamut · Aug 11, 2010

I have no clue what the hell the OP is going on about...

Drive imaging (most commercial tools and the freely available ones as well) already offer compression for the data in various levels, so I don't get what you're trying to say.

If you're implying that drive imaging should be using some sort of image-based compression then you're reaching into something like Steganos and steganography which is a pretty well known form of "hiding" content inside images - and yes they're typically standard images in JPG/PNG/etc format so they have compression natively on top of the data being embedded too.

Still scratching my head over where this one is going...

Whatsisname · Aug 11, 2010

The reason you lose credibility is because "zeros and ones" is such a gross oversimplification it is absolutely meaningless in most contexts.

8 bits to a byte, eh? Already you've introduced something other than zero or a one. Plus 8 bits? Not always! It depends on your computer system, although 8 is most common. Now which order do the bits come in? Is the most significant bit first? or is it last? What about if you want non integer data? What if you need more precision than you can get with 8 bits? How do you say you're going to use 16 bits instead of 8?

The bottom line is that the "ones and zeros" are meaningless, its the order and structure of the data that is important, however the fundamental structure isn't necessarily encoded into the data.

Furthermore, many of the mathematical principles employed in compression schemes do not depend on binary number systems either, many will work using any arbitrary base. So they don't work based on "ones and zeros" to begin with.

Whatsisname · Aug 11, 2010

I think he is describing essentially making a gigantic 2 color bitmap a billionish pixels wide and tall, representing the data layout on the harddrive, then running that through an image compression encoder. Which is not going to work.

Bahamut said:
I have no clue what the hell the OP is going on about...

Drive imaging (most commercial tools and the freely available ones as well) already offer compression for the data in various levels, so I don't get what you're trying to say.

If you're implying that drive imaging should be using some sort of image-based compression then you're reaching into something like Steganos and steganography which is a pretty well known form of "hiding" content inside images - and yes they're typically standard images in JPG/PNG/etc format so they have compression natively on top of the data being embedded too.

Still scratching my head over where this one is going...

gimp · Aug 11, 2010

image your drive, rename image file to BMP, then compress.
let us know how much more it gets compressed

Bahamut · Aug 11, 2010

I wanted to say something like that, really I did. Or even worse like take a picture of the hard drive with a camera and then... well, you get the joke, I hope.

GushpinBob · Aug 11, 2010

Bahamut said:
I have no clue what the hell the OP is going on about...

hezjustplayin.

ShadowStriker · Aug 12, 2010

One big problem with your theory is that its not just "ones and zeros". If you were reading binary code, sure, but how are you going to address references/pointers/stacks, much less data itself? Not all data values are recorded at the same size or at the same sector. So this process you theorize could, most likely, corrupt not only a file or two, but the entire drive completely.

imzjustplayin · Aug 13, 2010

Whatsisname said:
I think he is describing essentially making a gigantic 2 color bitmap a billionish pixels wide and tall, representing the data layout on the harddrive, then running that through an image compression encoder. Which is not going to work.

yeah pretty much... Though I do understand it'd be a difficult task to undertake.

ShadowStriker said:
One big problem with your theory is that its not just "ones and zeros". If you were reading binary code, sure, but how are you going to address references/pointers/stacks, much less data itself? Not all data values are recorded at the same size or at the same sector. So this process you theorize could, most likely, corrupt not only a file or two, but the entire drive completely.

Well instead of writing data one sector at a time (512 bytes) you'd write literally one bit at a time. There has got to be a way you can manually write to the drive one bit at a time because otherwise the data wouldn't be written at all.

ShadowStriker · Aug 13, 2010

imzjustplayin said:
Well instead of writing data one sector at a time (512 bytes) you'd write literally one bit at a time. There has got to be a way you can manually write to the drive one bit at a time because otherwise the data wouldn't be written at all.

The question isn't whether or not you can write to the drive one bit at a time, which you can, hence sector by sector writes. The real question is how you are going to put it back together to be readable. i.e. You can cut and dice Clifford the Dog and then sort him by colour and by weight of parts, but how are you going to know which hair goes where? Without definitive pointers/references and a stack to store these writing bits to a drive is the same as giving someone a pen/paper and telling them to randomly write down 1s and 0s in any order they want. It won't be distinguishable.

Whatsisname · Aug 13, 2010

The problem isn't that's its difficult, the problem is that it would be useless.

imzjustplayin said:
yeah pretty much... Though I do understand it'd be a difficult task to undertake.

That doesn't really matter. Its no different that packing an .xml file in a .zip, or packing a .doc file.

The disk data stream is ultimately a very long one dimensional string. The order of all that is the responsibility of the Clifford the Dog program. A compression encoder doesn't really have to care what the data represents. It will replace or rearrange things as it sees fit according to the compression algorithm. At very efficient levels of compression, the output will be virtually indistinguishable from random noise.

When decoding, the compression string will know whats what, and by determining its position within the data stream, it can compute what should be where, etc.

to the OP: you could actually do this pretty easily on a linux machine. It is trivial to read and write to the raw disk, using the dd and other commands. This will access all the info below the filesystem, allowing you to backup disks.

So, you could easily route that info into a bitmap container. png just uses gzip so you might as well gzip the whole disk and call it a day. So, you put all that data in a bmp, and try to run a lossless compression scheme on it and:

it will likely have virtually no compression or can make the filesize sans free-space bigger.

What?

For any loseless compression scheme, some data sets will make the output smaller, but necessarily, other data sets will actually result in a larger output. The key is by matching an encoding scheme to the expected form of data.

The data in images is very different than the data in sounds or text, or executable code. While an image compression scheme works well on images, the fundamental assumptions regarding the behavior of the data would fail on other types of data, and would resolve in no or worse than no compression.

ShadowStriker said:
The question isn't whether or not you can write to the drive one bit at a time, which you can, hence sector by sector writes. The real question is how you are going to put it back together to be readable. i.e. You can cut and dice Clifford the Dog and then sort him by colour and by weight of parts, but how are you going to know which hair goes where? Without definitive pointers/references and a stack to store these writing bits to a drive is the same as giving someone a pen/paper and telling them to randomly write down 1s and 0s in any order they want. It won't be distinguishable.

imzjustplayin · Aug 13, 2010

Whatsisname said:
So, you could easily route that info into a bitmap container. png just uses gzip so you might as well gzip the whole disk and call it a day. So, you put all that data in a bmp, and try to run a lossless compression scheme on it and:

it will likely have virtually no compression or can make the filesize sans free-space bigger.

What?

For any loseless compression scheme, some data sets will make the output smaller, but necessarily, other data sets will actually result in a larger output. The key is by matching an encoding scheme to the expected form of data.

The data in images is very different than the data in sounds or text, or executable code. While an image compression scheme works well on images, the fundamental assumptions regarding the behavior of the data would fail on other types of data, and would resolve in no or worse than no compression.

But since we're dealing with what could be seen as a Bilevel image, using a compression algorithm that is meant for bilevel images would make this possible. The type of data the computer has inside of the drive wouldn't matter since you would instead be looking at the very fundamentals of the data and what is actually represents, in this case a 0 or a 1. I guess the only thing I can think to do is try to make a mock miniature HDD in photoshop and then save it as a compressed format and see how it goes. This means the picture would essentially look like a random sequence of Black and white dots. I have no idea how well JBIG or PNG or GIF can compress this sort of thing.

First thing I'm going to do is create a file that has an image size of 2896X2896 pixels. This should be the equivalent of 1MB worth of data. Let's pretend we're working with binary code, therefore compressing this should be nearly impossible since we have no idea the contents of the data.

Whatsisname · Aug 13, 2010

Data is not fundamentally 1's and's 0's. That's about as ridiculous and intelligent a statement as is saying a virus that infects beetles should also infect humans because "we're all fundamentally atoms!". The arrangement of everything is whats important.

Read the lossless compression wiki artice: http://en.wikipedia.org/wiki/Lossless_data_compression

Notice the section Lossless compression methods:

By operation of the pigeonhole principle, no lossless compression algorithm can efficiently compress all possible data, and completely random data streams cannot be compressed. For this reason, many different algorithms exist that are designed either with a specific type of input data in mind or with specific assumptions about what kinds of redundancy the uncompressed data are likely to contain.

Trying to use an imaging encoding scheme on hard drive data will not perform any better than common traditional compression schemes.

You clearly don't know anything about the math behind this stuff, so I'd suggest you read some literature before making any other bold claims.

http://www.amazon.com/Introduction-Information-Theory-John-Pierce/dp/0486240614/ref=pd_sim_b_3

http://www.amazon.com/Introduction-...edia-Information/dp/012620862X/ref=pd_sim_b_1

imzjustplayin · Aug 13, 2010

Alright in photoshop, I created an Image that is 2896X2896 pixels. In Grayscale mode, in the "filters section" I told it to create noise, lots of it, as much noise as it could make. Then saved the file as a BMP with no sort of compression whatsoever. The File came out to 8MB.. Reason for this is because it not only has to retain the white and black but the shades of gray, increasing the size 8 fold! Alright, so I then convert the image to a Bi-level bitmap, this means Black or White. Now the image comes down EXACTLY to the size it should be, 1MB.

Why 1MB for 2896X2896? Cause there are 8 bits to a byte, 1024 bytes in a kilobyte, 1024 kilobytes in a megabyte. Anyway, I will conduct some testing to further prove or deny this possibility.

Whatsisname · Aug 13, 2010

Guess what dude, you've done that by eliminating the amount of information present. You've taken each pixel which could be 255 values, and thrown the gray away leaving only two possibilities.

Please 'decompress' that file now, restoring that bi-level image back to a gray scale, and it has to end up exactly like the original. If you do

Which you won't do because it is mathematically impossible. You're hopeless.

imzjustplayin · Aug 13, 2010

Whatsisname said:
Guess what dude, you've done that by eliminating the amount of information present. You've taken each pixel which could be 255 values, and thrown the gray away leaving only two possibilities.

Please 'decompress' that file now, restoring that bi-level image back to a gray scale, and it has to end up exactly like the original. If you do

Which you won't do because it is mathematically impossible. You're hopeless.

Huh? But each bit can't be 255 values, it can only be read as two values, "on or off".. That was my point. I wasn't trying to make a Bi-level image represent the data in a grayscale image, I was just merely pointing out in my steps that I needed it to be bi-level and that using grayscale increases the size by a large amount due to how much more information is in the picture with grayscale over bi-level. I can see where you got confused, but what you think I was trying to say is not what I was trying to say.

Whatsisname · Aug 13, 2010

when you did gray scale, each pixel was represented by 8 bits. When you have an RGB bitmap each pixel is 24 bits. When you converted it to black and white, it was reduced to 1. For every pixel, photoshop kept the first bit and threw the other 7 into the garbage. those 7 bits for each pixel you will never get back, the data was destroyed.

If you encoded say a word document as a bitmap, it would require the same filesize regardless whether the image was color, grayscale, or black and white. However, the number of pixels in the image would change correspondingly, because the pixels are encoded with a varying amount of precision.

imzjustplayin said:
Huh? But each bit can't be 255 values, it can only be read as two values, "on or off".. That was my point.

imzjustplayin · Aug 14, 2010

Ok I've got a problem. I saved the BMP file then tried to archive it up, problem is the file size got reduced by Winrar all the way to a tiny little 26KB rar file, 46KB Zip File from its original, portly previous 1MB uncompressed size. The problem is that I think the Zip program detected the algorithm used to generate the "noise pattern" in photoshop, therefore was able to encode the BMP so that its file size could be reduced substantially.

Anyway with that aside, I saved the BMP as a GIF, and it increased to 1.1MB, which proves the point made that if the compression algorithm in some cases increases the file size or file bits to represent a given data point. Saved as PNG and the file size was reduced down to half of its size, around 540KB.

Update:
Redid the "noise" by changing it from "Uniform" to "Gaussian", now when I compress, the files come out much larger... BMP is 1030KB (default), GIF is 1210KB, PNG is 913KB, and finally if you put the 1030KB BMP file into a RAR archive and set it to "best", you'll come out with a file size of 345KB... Very small though not nearly as small as that previous 26KB file, though that was probably due to incorrect settings and the archival software detecting the pattern. I still have to wonder now, at 345KB, what kind of patterns would the software have picked up on and whether or not an actual binary file would be able to have the same sort of compression.

Is there some sort of way of taking some random binary file and converting it to a picture? I remember VAGUELY reading or seeing something of the sort being done but I can't confirm this.

Whatsisname · Aug 14, 2010

http://www.imagemagick.org/script/stream.php

ShadowStriker · Aug 16, 2010

imzjustplayin said:
Alright in photoshop, I created an Image that is 2896X2896 pixels. In Grayscale mode, in the "filters section" I told it to create noise, lots of it, as much noise as it could make. Then saved the file as a BMP with no sort of compression whatsoever. The File came out to 8MB.. Reason for this is because it not only has to retain the white and black but the shades of gray, increasing the size 8 fold! Alright, so I then convert the image to a Bi-level bitmap, this means Black or White. Now the image comes down EXACTLY to the size it should be, 1MB.

Why 1MB for 2896X2896? Cause there are 8 bits to a byte, 1024 bytes in a kilobyte, 1024 kilobytes in a megabyte. Anyway, I will conduct some testing to further prove or deny this possibility.

... BMP no compression can save up to 24-bit colour (16.7 million colours), where bi-level can only save 1 bit

Harddrive Imaging compression

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

n00b

[H]F Junkie

[H]F Junkie

[H]F Junkie

n00b

2[H]4U

[H]ard|Gawd

[H]ard|Gawd

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd