123
-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0|256|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
Socoder -> On Topic -> File integrity

Mon, 13 Feb 2012, 07:37
Afr0
I've started a tool called Manifestation for generating manifests for use with the patcher I wrote.
Obviously I need a way to ensure file integrity when downloading, so I thought the best way would be to include a checksum for each file with the manifest.
From what I understand, CRC32 is supposed to be weak and isn't to be trusted.
I can generate an MD5 hash of a file, but wouldn't that mean that I would essentially be sending the entire file along with the manifest?
Or is the hash generated smaller than the bytelength of the actual file?
Are there any industry-standard ways of going about this I should be aware of?

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Mon, 13 Feb 2012, 08:04
JL235
MD5 hash would be the way to go; it's what everyone uses. I would pre-hash the files, and then use this stored value when you send them, so you only ever have to hash once (as it can be expensive).

I wouldn't worry about hashing being weak, in terms of security, as that isn't what file hashing aims to solve. It aims to just be a way of asking "did it all get transferred in one piece?". It's about detecting corruption or missing data; not security.

If you want security, then use a secure connection. However this is overkill for 99% of downloads.
Mon, 13 Feb 2012, 08:16
Afr0
Well, that's not what I meant.
Apparently CRC32 and CRC16 are weak in that they can cause collisions (moreso than MD5), and also apparently can be circumvented (it is apparently possible to alter a file without changing its CRC).

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Mon, 13 Feb 2012, 10:07
shroom_monk
MD5 hashes will be of a fixed length and the original file cannot be derived from them, so you wouldn't be sending the entire file along with the manifest. It would also be very difficult to alter the file without changing the manifest (nigh-on-impossible, in fact, if you wanted to change the file to be something malicious). It's industry-standard, as JL says, so it is probably what you're after.

-=-=-
A mushroom a day keeps the doctor away...

Keep It Simple, Shroom!
Mon, 13 Feb 2012, 10:56
Afr0
Yeah I don't doubt that MD5-hashes are tamper-proof, but I was worried about the filesize. Turns out hashes are much shorter in size than the actual file.
As for CRC:

There are many kinds of CRC algorithms. The primary aim of simpler CRC algorithms such as CRC-16 or CRC-32 is to detect random errors in a bit stream. The polynomials they use allow the detection of single (or in some cases multiple) bit errors - however if you know how these algorithms work, it is possible to alter the data such that two or more bit changes cancel each other out and result in the same CRC as the original data.


From: www.codeguru.com/forum/showthread.php?t=416016

There's also this and this.

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!