ZFS dedupe is fixed soon

TCM2 · Mar 9, 2015

0/10

I'm just bored now.

Edit: Not a single person is backing you here. Stop making such an idiot out of yourself. Your supposed teachers deserve better, if they even exist.

danswartz · Mar 9, 2015

Never argue with a fool, he will drag you down to his level and beat you with experience

TCM2 · Mar 9, 2015

danswartz said:
Never argue with a fool, he will drag you down to his level and beat you with experience

The biggest problem with ignorant idiocy: if left alone, it spreads.

Silhouette · Mar 9, 2015

brutalizer said:
You need to explain WHY your claim is true. You can not just say "according to this link". You need to connect all the dots. THAT is the scientific method.

Since you are the one arguing against all research on and common sense regarding compression/deduplication of media files, I suggest you should be the one to prove your theory and not the rest of the world.

danswartz · Mar 9, 2015

by the way, by his bizarro logic, if someone claims that you can do X, where X violates a fundamental conservation law, and you say 'no that is impossible because of laws of nature', you are obliged to give a complete and thorough explanation of that law and why X violates it. uh, sure...

danswartz · Mar 9, 2015

TCM2 said:
The biggest problem with ignorant idiocy: if left alone, it spreads.

it isn't like you've not been trying, you're just talking to a troll or fool. you will never get through to him. i think anyone here longer than a couple of weeks knows this by now...

brutalizer · Mar 9, 2015

SirMaster said:
Yes I am really 100% sure that they do. And by remove all I mean they remote 99+% of it. There is so little left that even if the best deduplication removes all remaining you are going to only be saving a couple GB on a multi-TB library, so much less than 1%

It doesn't matter what you believe. The fact is they do and you don't have to wait for greenbytes to see that they have no duplicate data left.

You do have a point. But I am not convinced as I said earlier. I can try to explain this again. I am quite certain I am unclear somehow. I dont really understand, what people do not understand? I keep repeating myself time and again.

I do agree media codecs try to remove all redundancy. But we all agree they fail on this. Moreover, just because bit 0-100 has low redundancy, does not mean the same bit pattern does not occur later. Yes, it is probable that it does not occur again later, yes, but it is possible (unless you have some deep understanding on how all codecs work and can explain why).

If I do bitwise "Diehard test suite" comparison of two media files, bit 100 against bit 100, and bit 101 vs bit 101, etc - and find that they do not share much structure - I have missed the fact that there might be structures between bits far away. I need to extract bits 0-100, and then look through the entire other file, to see that 0-100 bits does not occur nowhere. I can NOT compare bits 0-100 in both files at the same positions. This is not how dedupe works, comparing bit patterns at the same positions. So you can not apply bitwise Diehard test suite and draw conclusions about how Greenbyte dedupes their data. We have no idea how Greenbyte do their dedupe. One thing is clear though, they do not do it like ordinary zfs dedupe. Maybe they have a much smaller dedupe window?

In short, just because data has no structure locally, does not mean they have no structure globally. Maybe there are large cycles of identical bit patterns? Not likely, but as a scientist you must consider all possibilities. For a living I research algorithmic trading, High frequency trading, etc - I am building trading models how to earn lot of money. And I have noticed that if I just draw quick conclusions without thinking them through, I might miss something important. Nothing is obvious in math, unless you can prove it. You can NOT say "according to Einstein, this trading model will not work" - you need to explain EXACTLY what makes the trading model kaput. Present a specific example which blows up the model. Or a general proof, if you can not find a counter example. Sometimes by really thinking against "common knowledge" you can find a profitable trading strategy. That is the reason I am not willing to accept anything unless it is proven. You need to open to all possibilities. For instance, sometimes you see large cycles in trading data, which you can capitalize on. There might not be structures locally when you consider HFT tick data, but if you go to a larger scale you might find structures. Or vice versa.

So if I do bitwise comparisons, that does not mean there are/aren't information globally. I dont really understand what is so difficult to comprehend about this?

SirMaster said:
You can simply read the computer science papers on modern media compression codecs. There are hundreds of them and they prove through math and through analysis that the duplicate data is eliminated.

Scientific debates does not work like this. Give me links. As I always do.

SirMaster said:
I cannot count how many times you have simply said "go read the papers on ZFS and data corruption" as the entire basis and proof of your argument and claims about how good ZFS is at preventing data corruption.

So now it is only fair and it's my turn to simply say "go read the H.264, and MP3 papers" and you will see for yourself how their data has no structure and how they leave no duplicate data.

Well, you are wrong on this. It is true that I have said "go read THESE papers on data corruption" - and I have linked to several papers, and I have even quoted the papers verbatim sometimes, even given page numbers. In the beginning I did that all the time, but I assumed people had read those papers so I did not cite them rigorously at the end. You do not cite anything. You say "The truth is out there on the internet. Trust me. Go and read some papers, you will find the answer there". So you are basically asking me to prove your own claims. That is not how science works. You can google for research papers and quote them, which supports your claim. Just like I do "according to this research paper, ntfs, ext3, etc are not safe", and "CERN examined this and concluded that it is not enough with some checksums to achieve data protection" and I gave links to both these papers. I must do that, because when you are writing papers, you need to cite correctly and make references with full information.

So, now you are making a claim, please quote the research facts that support your claims, and link to papers. Which is what I always have done. Just look at my earlier posts on zfs and corruption. Go ahead. Of course, you can withdraw your claim. That is ok.

Bit I do agree you have a point, yes. But I am not convinced, because when you compare bitwise and draw conclusions - well that is not how dedupe works. Dedupe works across bits far away. So that is the reason (as I explained numerous times) why I am not convinced yet that Greenbyte can not dedupe media. But it is not likely they can - as you explained. You gave a good explanation with some details, so your explanation is acceptable. But your explanation relies on the fact that you do bitwise comparison - which is wrong. If you present another explanation on why dedupe does not work across bits far away - I am of course willing to think over it again. You might convince me, yes. Or we just wait a few months and then try out greenbyte which ends all questions.

brutalizer · Mar 9, 2015

danswartz said:
by the way, by his bizarro logic, if someone claims that you can do X, where X violates a fundamental conservation law, and you say 'no that is impossible because of laws of nature', you are obliged to give a complete and thorough explanation of that law and why X violates it. uh, sure...

Yes, that is how you do it in science. Didnt you know?

TCM2 · Mar 9, 2015

You understand NOTHING of statistical analysis of data, otherwise we wouldn't have to read absolute garbage like

If I do bitwise "Diehard test suite" comparison of two media files, bit 100 against bit 100, and bit 101 vs bit 101, etc - and find that they do not share much structure - I have missed the fact that there might be structures between bits far away. I need to extract bits 0-100, and then look through the entire other file, to see that 0-100 bits does not occur nowhere. I can NOT compare bits 0-100 in both files at the same positions. This is not how dedupe works, comparing bit patterns at the same positions. So you can not apply bitwise Diehard test suite and draw conclusions about how Greenbyte dedupes their data. We have no idea how Greenbyte do their dedupe. One thing is clear though, they do not do it like ordinary zfs dedupe. Maybe they have a much smaller dedupe window?

to which you will reply "Explain to me why it's garbage!" ad nauseum. You simply can't grasp the concepts.

Steam engine, monkey.

brutalizer · Mar 9, 2015

Silhouette said:
Since you are the one arguing against all research on and common sense regarding compression/deduplication of media files, I suggest you should be the one to prove your theory and not the rest of the world.

I dont have a theory, as I explained. I have a question. "Can Greenbyte dedup media somehow?" and everybody here explains to me that I am wrong. I would like to know why I am wrong.

And their explanations are full of holes "do bitwise comparisons, and you will see that, locally there are not much redundancy" etc. But they do not think it through and can not see their holes. Am I the only one here that have a higher mathematical education? They dont understand their faulty reasoning?

brutalizer · Mar 9, 2015

TCM2 said:
You understand NOTHING of statistical analysis of data, otherwise we wouldn't have to read absolute garbage like

to which you will reply "Explain to me why it's garbage!" ad nauseum. You simply can't grasp the concepts.

Steam engine, monkey.

If you are done with insults name calling and foul language, can you explain why you ask me to do "Diehard test suite" that (I assume) tests bitwise, and make local comparisons. Bitwise comparisons works like that. You surely know that dedupe works across the whole file, not locally?

TCM2 · Mar 9, 2015

brutalizer said:
can you explain why you ask me to do "Diehard test suite" that (I assume) tests bitwise, and make local comparisons.

See, you assume. That's your whole failure right there. You don't even attempt to know, you keep assuming in the face of extreme resistance. That's the pinnacle of ignorant stupidity.

Diehard tests being a bitwise comparison? What. The. Fuck.

brutalizer · Mar 9, 2015

TCM2 said:
See, you assume. That's your whole failure right there. You don't even attempt to know, you keep assuming in the face of extreme resistance. That's the pinnacle of ignorant stupidity.

Diehard tests being a bitwise comparison? What. The. Fuck.

People keep repeating "do a bitwise comparison of two media files, and you will see that bitwise they dont share much structure". What do they mean with that you recon?

TCM2 · Mar 9, 2015

Noone said anything about bitwise comparisons. This is getting more absurd by the minute.

brutalizer · Mar 9, 2015

TCM2 said:
Noone said anything about bitwise comparisons. This is getting more absurd by the minute.

I quote SirMaster:
"Now take a 10GB HD movie. Run it through the same encoder a second time to re-encode it with the same settings so you have approximately the same size output file. Do a binary diff and see how much of the file is the same. It's actually only a few hundred KB of this 10GB file that remains the same, even though the file looks to us to be the same."

And how does this Diehard test suite work? You said it does not work as I think it does, so you surely know how it works. Can you explain how it works? You dont have to provide details, but you claim it does not do bitwise comparisons? What does it do, then? It tries to find a bit pattern 0-100, in the entire other file? Just like dedupe does? Or? You are using foul language again (the f-word) which means you get upset because I dont know how it work. So how does it work then?

brutalizer · Mar 9, 2015

Dum di dum....

TCM2 · Mar 9, 2015

It's not like Wikipedia doesn't exist...

Diehard tests help you determine "The entropy rate of a source is a number which depends only on the statistical nature of the source. " (H) among other things.

"Shannon established that there is a fundamental limit to lossless data compression. This limit, called the entropy rate, is denoted by H. The exact value of H depends on the information source --- more specifically, the statistical nature of the source. It is possible to compress the source, in a lossless manner, with compression rate close to H. It is mathematically impossible to do better than H."

If you could dedup better than H, you would violate this limit, since you'd automatically compress better than H (dedup is compression).

See, you don't even understand the theories, you approach this whole topic with a naive layman's kind of "maybe there are repeating patterns elsewhere in the file. It's such a biiig file! Who could ever know?!" which is kinda cute but neither scientific nor reasonable.

If you determine a set of files to have an H of 7.9999 bits/character, you automatically know you can never dedup it beyond miniscule ratios. That's a fact.

brutalizer · Mar 9, 2015

TCM2 said:
It's not like Wikipedia doesn't exist...

Diehard tests help you determine "The entropy rate of a source is a number which depends only on the statistical nature of the source. " (H) among other things.

"Shannon established that there is a fundamental limit to lossless data compression. This limit, called the entropy rate, is denoted by H. The exact value of H depends on the information source --- more specifically, the statistical nature of the source. It is possible to compress the source, in a lossless manner, with compression rate close to H. It is mathematically impossible to do better than H."

If you could dedup better than H, you would violate this limit, since you'd automatically compress better than H (dedup is compression).

See, you don't even understand the theories, you approach this whole topic with a naive layman's kind of "maybe there are repeating patterns elsewhere in the file. It's such a biiig file! Who could ever know?!" which is kinda cute but neither scientific nor reasonable.

If you determine a set of files to have an H of 7.9999 bits/character, you automatically know you can never dedup it beyond miniscule ratios. That's a fact.

Wikipedia? So you too, did not know how Diehard works. Fine. But you could have avoided the foul language for me not knowing how Diehard works. It turned out you too did not know.

Anyway, this has not much to do with dedupe. You are talking about compression. As I have said numerous times, dedupe works not locally, but globally. Why do you think there are two different concepts, deduplication and compression? Do you believe that what applies to compression, also applies to deduplication? They are the same? Well that is wrong.

Can you AGAIN, try to explain why my question is wrong? And please, do display some sound reasoning. Dont talk about compression when we talk about dedupe. Apples and oranges, I mentioned them, didnt I?

TCM2 · Mar 9, 2015

Just because I reference Wikipedia, doesn't mean I don't know what I'm talking about.

It was a hint directed at you to DO SOME RESEARCH and not be this obnoxious brat who needs everything handed to him and then fail to understand it anway...

If you can dedup, what does it mean? You reduce the amount of data needed to describe some other data, i.e. you are compressing. Another simple and logical concept you fail to grasp.

Edit: I can see why your simple mind would think they are fundamentally different. Just because ZFS has two separate knobs called "compression" and "dedup", it doesn't mean they are not the same thing theoretically.

Theories about compression automatically apply to deduplication because otherwise, we could achieve the same thing as compression with deduplication, while ignoring all theoretical limits.

brutalizer · Mar 9, 2015

TCM2 said:
Just because I reference Wikipedia, doesn't mean I don't know what I'm talking about.

It was a hint directed at you to DO SOME RESEARCH and not be this obnoxious brat who needs everything handed to him and then not understand it anway...

So now you say I must prove your claims about dedupe? Well, its quite wimsy. Talking about compression, and say it also applies to dedupe, by refering to the magic concept "entropy" which you dont really understand apparently.

The problem is, I can not prove your claim. If I could do that, or at least make it probable, I would change my mind. But I can not. However, after SirMaster's explanation, I am leaning more to that Greenbyte can not dedupe media well. But as I said, I am not sure on this, because you can not compare bit pattern 0-100 to 0-100, (i.e. locally).

Or maybe we just wait a couple of months and see? We can continue this later? I hope you have calmed down then, and dropped the foul language and all insults about how uneducated I am, etc?

EDIT: To explain in other words again: dedupe and compression are different concepts. Entropy applies to compression. It is not clear to me how that applies to entropy. You can not apply compression theory to dedupe theory. And if you can do that, someone needs to explain how they relate to each other in some detail. This is not clear to me. Just saying "according to Shannon" explains nothing about how different theorems from compression translates to dedupe domain. Has anybody figured this out, at all? Maybe no one has even solved this???? (it is a question, not a claim) That is why I am not as sure, as you guys are.

TCM2 · Mar 9, 2015

Dedup is the most simple form of compression. Now we have another topic to try to get into your thick skull for 5 pages. Great.

Someone shoot me.

Edit:

The deflation algorithm used by gzip (also zip and zlib) is a variation of
LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
the input data. The second occurrence of a string is replaced by a
pointer to the previous string, in the form of a pair (distance,
length). Distances are limited to 32K bytes, and lengths are limited
to 258 bytes.

http://www.gzip.org/algorithm.txt

My, doesn't this look a lot like deduplication? I don't know. Let's debate!

SirMaster · Mar 9, 2015

brutalizer said:
I quote SirMaster:
"Now take a 10GB HD movie. Run it through the same encoder a second time to re-encode it with the same settings so you have approximately the same size output file. Do a binary diff and see how much of the file is the same. It's actually only a few hundred KB of this 10GB file that remains the same, even though the file looks to us to be the same."

And how does this Diehard test suite work? You said it does not work as I think it does, so you surely know how it works. Can you explain how it works? You dont have to provide details, but you claim it does not do bitwise comparisons? What does it do, then? It tries to find a bit pattern 0-100, in the entire other file? Just like dedupe does? Or? You are using foul language again (the f-word) which means you get upset because I dont know how it work. So how does it work then?

I wrote up that scenario because of this comment:

brutalizer said:
Say you have a media server with lot of movies. I bet the movie data is quite similar, so you can apply dedupe to great success. Likewise with MP3.

I thought you were thinking that similar looking media (like a common TV show intro that looks the same to our eyes) or common blocks of color and shapes in different scenes and different movies could somehow dedupe.

I was merely trying to demonstrate that no, even a movie that looks identical to us is completely different data every time it's encoded. Because encoding is best-fit, and the algorithms don't always find the same fit.

I could have also said that just take something like 1TB of media and pick a block size, any block size. Now search across ALL that media for any duplicate blocks. You aren't going to find any meaningful amount of them. Search for every reasonable block size and you still won't even find 1GB of duplicate blocks in 1TB of media. This has nothing to do with bitwise or near bits. I'm telling you that you aren't going to find any significant amount of duplicate blocks ANYWHERE in the ENTIRE media collection.

brutalizer · Mar 10, 2015

TCM2 said:
Dedup is the most simple form of compression. Now we have another topic to try to get into your thick skull for 5 pages. Great.

Someone shoot me.

Edit:

http://www.gzip.org/algorithm.txt

My, doesn't this look a lot like deduplication? I don't know. Let's debate!

My why, LempelZiv77 that you link to, does not look like deduplication.
http://en.wikipedia.org/wiki/Data_deduplication
"...deduplication is different from that performed by standard file-compression tools, such as LZ77..."

I understand your confusion if you believe that dedupe and compression is the same thing. But again (I repeat myself many times in this thread, why is that?) dedupe and compression are different. What you know about compression does not necessarily apply to dedupe, because they are different animals. It might apply to dedupe, but it is not clear if it does. This is the springing point - I am not willing to rule out that compression theory does not apply to dedupe theory. That is my point - I am not willing to rule out they are different. It might be that dedupe has the same limitations as compression - but it is not certain. Not to me at least, and if someone claims they have the same limitations, I would like an explanation. Not an explanation that "look at this entropy link, and the rest follows". How are they related? Can you connect the dots? You might be correct, but your explanation is under all criticism "it is obvious, trust me". And when I question that and ask why it is true, and you can not explain why you get frustrated and mock me, insult me, calling me "idiot", "monkey", and use the f-word? And at the same time you call me "obnoxous brat"??? Who told me to "educate myself" and I knew nothing about comp sci for asking a question? Isnt that an obnoxous behaviour, then what is?

Compression works on the binary level, inside a file. Dedupe works on the block level, across different files. They do not necessarily have the same objective/goal, and might work totally different from each other - I do not rule that out. It might be they are identical (as you claim), but I am not convinced yet. If they are identical, can you provide an explanation? Because as of now, it does not look they are identical:

According to the wikipedia article I linked to:
"Whereas [standard file-compression such as Lempel Ziv 77] identify short repeated substrings inside individual files, the intent of ... deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, in order to store only one copy of it. This copy may be additionally compressed by single-file compression techniques."

This is exactly what I have been claiming all the time. Data might be redundant locally, but not globally, or vice versa.

http://www.purestorage.com/blog/data-deduplication-and-compression-means-double-savings/
"...Surely everyone knows about dedupe and compression. Heck, these technologies seemingly are the premier feature of every modern storage platform, right? I’m not so sure about that. I’m consistently surprised by the number of times these technologies are incorrectly referenced..."

http://www.webopedia.com/TERM/D/data_deduplication.html
"...Deduplication is sometimes confused with compression..."

https://devcentral.f5.com/articles/...pression-ndash-exactly-the-same-but-different
"...At the very highest level, deduplication and compression are the same thing. They both look for ways to shrink your dataset before passing it along. After that, it gets a bit more complex. If it was really that simple, after all, we wouldn’t call them two different things. The thing is that compression can do transformations to data to shrink it, and it also looks for small groupings of repetitive byte patterns and replaces them, while deduplication looks for larger groupings of repetitive byte patterns and replaces them. ...deduplication looks for larger byte patterns repeated across all streams, while compression applies transformations to the data, and when removing duplication only looks for smaller combinations on a single stream. The net result? The two are very complimentary, but if you run compression before deduplication, it will find a whole collection of small repeating byte patterns and between that and transformations, deduplication will find nothing, making compression work harder and deduplication spin its wheels. There are other differences..."

Look, if dedupe and compression were the same thing, they wouldnt have different names. They are NOT identical. If you believe that, I understand your frustration and confusion. I suggest you calm down a bit, and go and read some (educate yourself) and then come back. And then we can continue the discussion. And I would appreciate if you left out all the foul language, too.

I have not specifically studied compression nor dedupe, but I know basic comp sci, and how things should work. That is the reason I won't accept weird claims that goes against common sense (if they were identical, why different names?).

brutalizer · Mar 10, 2015

SirMaster said:
I wrote up that scenario because of this comment:

I thought you were thinking that similar looking media (like a common TV show intro that looks the same to our eyes) or common blocks of color and shapes in different scenes and different movies could somehow dedupe.

I was merely trying to demonstrate that no, even a movie that looks identical to us is completely different data every time it's encoded. Because encoding is best-fit, and the algorithms don't always find the same fit.

I could have also said that just take something like 1TB of media and pick a block size, any block size. Now search across ALL that media for any duplicate blocks. You aren't going to find any meaningful amount of them. Search for every reasonable block size and you still won't even find 1GB of duplicate blocks in 1TB of media. This has nothing to do with bitwise or near bits. I'm telling you that you aren't going to find any significant amount of duplicate blocks ANYWHERE in the ENTIRE media collection.

As I told you, you do have a point. Your explanation is sound and consistent. It does not take much for me to change my mind, if you can provide a sound explanation I must reconsider. But flawed explanations with gaping holes does not convince anyone. But TCM2 is just out and bicycling, he does not know what he is talking about, confuses concepts and insults people. I dont understand why I am willing to devote time to such a person? Just a waste of time. Much of what he claims is wrong, why must I educate such an "obnoxous brat"? I rather discuss with people that has something relevant to say, instead of spewing out insults and random uneducated ramblings.

"...I could have also said that just take something like 1TB of media and pick a block size, any block size. Now search across ALL that media for any duplicate blocks...."

This is interesting and a very valid argument for the debate. Have you done that? And how did you do this searching for duplicate blocks? With diehard test suite? How?

There are software that can inspect IP data traffic packets and conclude what kind of traffic it is: media, docs, etc. The purpose is to catch viruses and different attacks. If the data changes signature, the software will flag an alert. It might be an intruder or so. How do these software work? How can they distinguish between media, documents, etc? Media has a certain structure? And if you have a structure, you MUST be able to use that to your advantage, for instance, for dedup. Some bit patterns are typical for media? Which you can use for dedupe? But still you claim that media does not have a certain structure, that I can use for dedup. So, something is weird here. There is something I dont understand here. But discussion is good, it teaches us something. For instance, TCM2 understands now that dedupe and compression are not the same thing - pity it took me several hours to learn him that. Hours I could use for something more productive, that benefits me, instead of benefiting him.

TCM2 · Mar 10, 2015

*trollololol*

It reads in the very first sentence

"In computing, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. Related and somewhat synonymous terms are intelligent (data) compression and single-instance (data) storage."

brutalizer, the CompSci genius, strikes again.

Edit: they are, of course, completely different beasts in terms of implementation, but in the grand scheme of theoretical things, dedup is compression and bound by the same laws.

brutalizer · Mar 10, 2015

And here we go again. Ive provided you with several links that show dedup and compression are not the same thing. I can post many more links that says so, if you wish. How many do you want? Are you going to reject them all? Isnt it a bit ridiculous? Rejecting all those links? Do you deny that compression are not the same thing as dedupe?
https://www.google.se/search?q=dedu...hannel=sb&gfe_rd=cr&ei=0zr_VNf1NbSr8we_q4LYDw

"...they are, of course, completely different beasts in terms of implementation..."
And this is where the big elephant in the room is. Have you heard about "the devil is in the details?". Have you read Fermat's Last Theorem, where Andrew Wiles tries to prove FLT, and happily he sends his paper to a journal. In the paper there is some tricky part that Wiles just hand waves away. "Yes, it is obvious". The referee is not satifisfied with this, so he asks again and again, and Wiles explains and explains. At the end, Wiles realises that this little thing, this little detail, blows up all his seven year work. He can not explain it, that is the reason he hand waves. Because it is not true. If he could have explained it, he would have answered clear and concisely. Ive noticed that sometimes, when people claims something, and you ask them for clarification, ultimately, they can not answer they just continue to mumble. Which means, they dont understand. When someone says "it is obvious" or "no explanation is needed" when you ask for clarification, etc - then you should be cautious. That is a very clear sign that the person does not know what he talks about.

So, when you say it's just a matter of implementation - well, that is wrong. First of all, implementation is everything. The devil is in the details. Second, dedup and compression are using totally different techniques and algorithms. Compression tries to find redundancy within a file, dedupe does not. So you can not say, just use LempelZiv77 for deduplication across many files. Why do you believe dedup and compression are interchangable?

"...but in the grand scheme of theoretical things, dedup is compression and bound by the same laws..."
Another bold claim. Can you please explain why? Ive provided several links that say they are not the same thing, and it is easy to mix them up. Ive also explained how they are different, and you can not share the same methods. Have you seen anyone use LZ77 for deduplication across many files? How would you do that, in the first place? Is it even possible to do? Well, I know you know the answer: "oh, it is just a matter of implementation, it is obvious how to do it. Just think a while. I dont know how to do it, but it is obvious". Well, if it is obvious to you, would you care to enlighten us, o big guru?

Again, if you believe "it is only a matter of details" then you are ugly wrong. No sane person would accept that. I dont understand how people in your vicinity lets you get away with your bullsh-t? Noone has told you your behaviour is not ok?

You know, I can make anyone a millionaire. How? It is just a matter of buying and selling stocks. Of course, it is a implementation detail, but in the grand scheme, it is just a matter of buying and selling stocks. If it is so easy, tell me, wise guy, which stocks do I buy and sell? And when? At what proportions? etc etc etc etc. There are many many questions to straighten out, you can not just say "getting rich is only a matter of investing money in stock. Done. Next question please". Seriously, don't you see the huge difficulties in implementing any method? Have you heard about "the devil is in the details"?

cantalup · Mar 10, 2015

my question:
* are we talking dedup and compression on the file system or files(higher level than file system)?

TCM2 · Mar 10, 2015

Another wall of text from the CS monkey. I'm not even reading all of your drivel anymore.

Simple, Einstein, if dedup could work outside the laws that apply to compression, we wouldn't have 7zip and WinRAR, we'd have WinDedup that can do magic.

zrav · Mar 10, 2015

TCM2, your patience is noteworthy. In situations like this, one starts to ponder why expend so much energy when the fool is resisting all reason.

Brutalizer, you have no clue what you are talking about. You are either a fool, a troll or an antagonistic conversation bot. Either way, you're creating noise. Go away.

cantalup · Mar 10, 2015

brutalizer said:
And here we go again. Ive provided you with several links that show dedup and compression are not the same thing.....

are you talking dedup in the file or filesystem(across many files)

danswartz · Mar 10, 2015

(please don't feed the troll)

smangular · Mar 10, 2015

Microsoft Server 2012 Deduplication test results in 0.5% space savings for Video, Photos, and Vector art. Its a waste of time.

Poor results with Data Deduplication in Windows Server 2012

6.8TB volume with around 6.2TB of data that has been copied across from the live server. After around a week of not being touched with deduping enabled the savings reported on that volume is only 30GB, or around 0.5%.

brutalizer · Mar 10, 2015

zrav said:
TCM2, your patience is noteworthy. In situations like this, one starts to ponder why expend so much energy when the fool is resisting all reason.

Brutalizer, you have no clue what you are talking about. You are either a fool, a troll or an antagonistic conversation bot. Either way, you're creating noise. Go away.

If my thread bothers you, you are free to stop reading it.

brutalizer · Mar 10, 2015

smangular said:
Microsoft Server 2012 Deduplication test results in 0.5% space savings for Video, Photos, and Vector art. Its a waste of time.

Poor results with Data Deduplication in Windows Server 2012

"...Thanks for the link. It is a valid argument, which makes me believe that Greenbyte can not dedupe media well. But it is not ruled out yet, as Greenbyte reports 50x dedupe ratios, whereas everyone else report much lesser numbers...."

brutalizer · Mar 10, 2015

cantalup said:
are you talking dedup in the file or filesystem(across many files)

Dedupe works across files. Compression within a file. The methods are different, no one use LempelZiv77 on deduping across many files.

cantalup · Mar 10, 2015

brutalizer said:
If my thread bothers you, you are free to stop reading it.

and stop spreading misunderstanding

...

are you talking in filesystem or file when mentioned dedup/compression?

brutalizer · Mar 10, 2015

TCM2 said:
Another wall of text from the CS monkey. I'm not even reading all of your drivel anymore.

Simple, Einstein, if dedup could work outside the laws that apply to compression, we wouldn't have 7zip and WinRAR, we'd have WinDedup that can do magic.

I am not saying that dedup works outside the laws of compression. I am saying that dedupe and compression are two different things, as I have shown with several links. And therefore, what applies to compression does not necessarily apply to dedupe. Dedupe works across files, compression does not. etc etc. I posted information elaborating the difference between compression and dedupe.

And as you claim that compression and dedupe are identical (which numerous links say are not) I ask you to explain how they are identical. Have you ever seen any dedupe tech using for instance, LempelZiv77 across several files? Or methods of gzip or winrar across files? No, you have not. And why? Because dedupe use OTHER methods.

brutalizer · Mar 10, 2015

cantalup said:
and stop spreading misunderstanding ...

What misunderstanding? I am informing this guy TCM2 that dedupe and compression are two different things, each using different algorithms and methods. Because they have different objectives and goals.

are you talking in filesystem or file when mentioned dedup/compression?

Dedupe works across files. Compression works within a file.

Epic| · Mar 10, 2015

brutalizer · Mar 10, 2015

Hahaha!!!

Here is someone who writes about 30-50% dedupe savings on photos, music, videos. And in general, 50-60% dedupe savings for all types of files. I dont know how credible this link is though, I prefer research papers. Not some random blog.
https://technet.microsoft.com/en-us/library/hh831700.aspx

But as I said, SirMaster have a point and after reading his explanation, I doubt that Greenbyte is able to compress movies well. But noone knows, maybe they can? There are people talking about 50% dedupe savings, see the link.

How about we wait a couple of months and then continue?

ZFS dedupe is fixed soon

Gawd

2[H]4U

Gawd

Limp Gawd

2[H]4U

2[H]4U

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

Gawd

2[H]4U

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

Gawd

Gawd

Limp Gawd

Gawd

2[H]4U

Limp Gawd

[H]ard|Gawd

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd