Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Never argue with a fool, he will drag you down to his level and beat you with experience
You need to explain WHY your claim is true. You can not just say "according to this link". You need to connect all the dots. THAT is the scientific method.
The biggest problem with ignorant idiocy: if left alone, it spreads.
You do have a point. But I am not convinced as I said earlier. I can try to explain this again. I am quite certain I am unclear somehow. I dont really understand, what people do not understand? I keep repeating myself time and again.Yes I am really 100% sure that they do. And by remove all I mean they remote 99+% of it. There is so little left that even if the best deduplication removes all remaining you are going to only be saving a couple GB on a multi-TB library, so much less than 1%
It doesn't matter what you believe. The fact is they do and you don't have to wait for greenbytes to see that they have no duplicate data left.
Scientific debates does not work like this. Give me links. As I always do.You can simply read the computer science papers on modern media compression codecs. There are hundreds of them and they prove through math and through analysis that the duplicate data is eliminated.
Well, you are wrong on this. It is true that I have said "go read THESE papers on data corruption" - and I have linked to several papers, and I have even quoted the papers verbatim sometimes, even given page numbers. In the beginning I did that all the time, but I assumed people had read those papers so I did not cite them rigorously at the end. You do not cite anything. You say "The truth is out there on the internet. Trust me. Go and read some papers, you will find the answer there". So you are basically asking me to prove your own claims. That is not how science works. You can google for research papers and quote them, which supports your claim. Just like I do "according to this research paper, ntfs, ext3, etc are not safe", and "CERN examined this and concluded that it is not enough with some checksums to achieve data protection" and I gave links to both these papers. I must do that, because when you are writing papers, you need to cite correctly and make references with full information.I cannot count how many times you have simply said "go read the papers on ZFS and data corruption" as the entire basis and proof of your argument and claims about how good ZFS is at preventing data corruption.
So now it is only fair and it's my turn to simply say "go read the H.264, and MP3 papers" and you will see for yourself how their data has no structure and how they leave no duplicate data.
Yes, that is how you do it in science. Didnt you know?by the way, by his bizarro logic, if someone claims that you can do X, where X violates a fundamental conservation law, and you say 'no that is impossible because of laws of nature', you are obliged to give a complete and thorough explanation of that law and why X violates it. uh, sure...
If I do bitwise "Diehard test suite" comparison of two media files, bit 100 against bit 100, and bit 101 vs bit 101, etc - and find that they do not share much structure - I have missed the fact that there might be structures between bits far away. I need to extract bits 0-100, and then look through the entire other file, to see that 0-100 bits does not occur nowhere. I can NOT compare bits 0-100 in both files at the same positions. This is not how dedupe works, comparing bit patterns at the same positions. So you can not apply bitwise Diehard test suite and draw conclusions about how Greenbyte dedupes their data. We have no idea how Greenbyte do their dedupe. One thing is clear though, they do not do it like ordinary zfs dedupe. Maybe they have a much smaller dedupe window?
I dont have a theory, as I explained. I have a question. "Can Greenbyte dedup media somehow?" and everybody here explains to me that I am wrong. I would like to know why I am wrong.Since you are the one arguing against all research on and common sense regarding compression/deduplication of media files, I suggest you should be the one to prove your theory and not the rest of the world.
If you are done with insults name calling and foul language, can you explain why you ask me to do "Diehard test suite" that (I assume) tests bitwise, and make local comparisons. Bitwise comparisons works like that. You surely know that dedupe works across the whole file, not locally?You understand NOTHING of statistical analysis of data, otherwise we wouldn't have to read absolute garbage like
to which you will reply "Explain to me why it's garbage!" ad nauseum. You simply can't grasp the concepts.
Steam engine, monkey.
can you explain why you ask me to do "Diehard test suite" that (I assume) tests bitwise, and make local comparisons.
People keep repeating "do a bitwise comparison of two media files, and you will see that bitwise they dont share much structure". What do they mean with that you recon?See, you assume. That's your whole failure right there. You don't even attempt to know, you keep assuming in the face of extreme resistance. That's the pinnacle of ignorant stupidity.
Diehard tests being a bitwise comparison? What. The. Fuck.
I quote SirMaster:Noone said anything about bitwise comparisons. This is getting more absurd by the minute.
Wikipedia? So you too, did not know how Diehard works. Fine. But you could have avoided the foul language for me not knowing how Diehard works. It turned out you too did not know.It's not like Wikipedia doesn't exist...
Diehard tests help you determine "The entropy rate of a source is a number which depends only on the statistical nature of the source. " (H) among other things.
"Shannon established that there is a fundamental limit to lossless data compression. This limit, called the entropy rate, is denoted by H. The exact value of H depends on the information source --- more specifically, the statistical nature of the source. It is possible to compress the source, in a lossless manner, with compression rate close to H. It is mathematically impossible to do better than H."
If you could dedup better than H, you would violate this limit, since you'd automatically compress better than H (dedup is compression).
See, you don't even understand the theories, you approach this whole topic with a naive layman's kind of "maybe there are repeating patterns elsewhere in the file. It's such a biiig file! Who could ever know?!" which is kinda cute but neither scientific nor reasonable.
If you determine a set of files to have an H of 7.9999 bits/character, you automatically know you can never dedup it beyond miniscule ratios. That's a fact.
Just because I reference Wikipedia, doesn't mean I don't know what I'm talking about.
It was a hint directed at you to DO SOME RESEARCH and not be this obnoxious brat who needs everything handed to him and then not understand it anway...
The deflation algorithm used by gzip (also zip and zlib) is a variation of
LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
the input data. The second occurrence of a string is replaced by a
pointer to the previous string, in the form of a pair (distance,
length). Distances are limited to 32K bytes, and lengths are limited
to 258 bytes.
I quote SirMaster:
"Now take a 10GB HD movie. Run it through the same encoder a second time to re-encode it with the same settings so you have approximately the same size output file. Do a binary diff and see how much of the file is the same. It's actually only a few hundred KB of this 10GB file that remains the same, even though the file looks to us to be the same."
And how does this Diehard test suite work? You said it does not work as I think it does, so you surely know how it works. Can you explain how it works? You dont have to provide details, but you claim it does not do bitwise comparisons? What does it do, then? It tries to find a bit pattern 0-100, in the entire other file? Just like dedupe does? Or? You are using foul language again (the f-word) which means you get upset because I dont know how it work. So how does it work then?
Say you have a media server with lot of movies. I bet the movie data is quite similar, so you can apply dedupe to great success. Likewise with MP3.
My why, LempelZiv77 that you link to, does not look like deduplication.Dedup is the most simple form of compression. Now we have another topic to try to get into your thick skull for 5 pages. Great.
Someone shoot me.
Edit:
http://www.gzip.org/algorithm.txt
My, doesn't this look a lot like deduplication? I don't know. Let's debate!
As I told you, you do have a point. Your explanation is sound and consistent. It does not take much for me to change my mind, if you can provide a sound explanation I must reconsider. But flawed explanations with gaping holes does not convince anyone. But TCM2 is just out and bicycling, he does not know what he is talking about, confuses concepts and insults people. I dont understand why I am willing to devote time to such a person? Just a waste of time. Much of what he claims is wrong, why must I educate such an "obnoxous brat"? I rather discuss with people that has something relevant to say, instead of spewing out insults and random uneducated ramblings.I wrote up that scenario because of this comment:
I thought you were thinking that similar looking media (like a common TV show intro that looks the same to our eyes) or common blocks of color and shapes in different scenes and different movies could somehow dedupe.
I was merely trying to demonstrate that no, even a movie that looks identical to us is completely different data every time it's encoded. Because encoding is best-fit, and the algorithms don't always find the same fit.
I could have also said that just take something like 1TB of media and pick a block size, any block size. Now search across ALL that media for any duplicate blocks. You aren't going to find any meaningful amount of them. Search for every reasonable block size and you still won't even find 1GB of duplicate blocks in 1TB of media. This has nothing to do with bitwise or near bits. I'm telling you that you aren't going to find any significant amount of duplicate blocks ANYWHERE in the ENTIRE media collection.
And here we go again. Ive provided you with several links that show dedup and compression are not the same thing.....
6.8TB volume with around 6.2TB of data that has been copied across from the live server. After around a week of not being touched with deduping enabled the savings reported on that volume is only 30GB, or around 0.5%.
If my thread bothers you, you are free to stop reading it.TCM2, your patience is noteworthy. In situations like this, one starts to ponder why expend so much energy when the fool is resisting all reason.
Brutalizer, you have no clue what you are talking about. You are either a fool, a troll or an antagonistic conversation bot. Either way, you're creating noise. Go away.
"...Thanks for the link. It is a valid argument, which makes me believe that Greenbyte can not dedupe media well. But it is not ruled out yet, as Greenbyte reports 50x dedupe ratios, whereas everyone else report much lesser numbers...."Microsoft Server 2012 Deduplication test results in 0.5% space savings for Video, Photos, and Vector art. Its a waste of time.
Poor results with Data Deduplication in Windows Server 2012
Dedupe works across files. Compression within a file. The methods are different, no one use LempelZiv77 on deduping across many files.are you talking dedup in the file or filesystem(across many files)
If my thread bothers you, you are free to stop reading it.
I am not saying that dedup works outside the laws of compression. I am saying that dedupe and compression are two different things, as I have shown with several links. And therefore, what applies to compression does not necessarily apply to dedupe. Dedupe works across files, compression does not. etc etc. I posted information elaborating the difference between compression and dedupe.Another wall of text from the CS monkey. I'm not even reading all of your drivel anymore.
Simple, Einstein, if dedup could work outside the laws that apply to compression, we wouldn't have 7zip and WinRAR, we'd have WinDedup that can do magic.
What misunderstanding? I am informing this guy TCM2 that dedupe and compression are two different things, each using different algorithms and methods. Because they have different objectives and goals.and stop spreading misunderstanding ...
Dedupe works across files. Compression works within a file.are you talking in filesystem or file when mentioned dedup/compression?