123
-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0|580|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
Socoder -> On Topic -> Collisions

Fri, 23 Nov 2012, 14:44
Afr0
So, what's the most efficient way of checking for collisions between thousands of keys and fix if any are found? I'm looking for a solution that won't take a week...
The thing is, I'm forced to generate random IDs for unarchived files used by The Sims Online. Right now though, the IDs are colliding with already existing IDs for files that are archived.
Arrays, hashtables - whatever! Please help!

Edit: The keys are restricted to be uints (4 bytes), so I can't use GUIDs...

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Fri, 23 Nov 2012, 15:16
Stealth
My guess is you need to check if the random ID is used before using it. If these are files, you can see if the file exists. If these are stored in a database, you can query the DB or store all the IDs in an array and see if the new ID is in the array.

This will work for a while, but since you're limited to 4 characters, at some point the software will struggle to generate a unique ID. The simplest way to avoid this is to start with "0000" and count up, like "0001".

To get the most names, once you hit "9" you should start using letters (i.e. "0009" then "000A"). Once you hit Z, increase the next digit by one character (i.e. "0010").

You could probably get even more names if you counted at the bit level, but that's going to more time consuming to implement.

(Note: Using this incremental method will make your IDs easier to guess. Four characters is pretty easy to brute force regardless.)

-=-=-
Quit posting and try Google.
Fri, 23 Nov 2012, 15:23
Afr0
These are files, but they're not filenames. They're IDs for files that (mostly) live inside of archives. I need to check IDs generated for files that aren't in archives against IDs for files that are in archives.
I just downloaded SQLite and thought of using an in-memory database. Is there a way to check for collisions using SQL if I added the filenames and IDs to a DB?

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Fri, 23 Nov 2012, 15:32
Stealth
Sure. Just query for the ID and if you have a result, it's been used.

-=-=-
Quit posting and try Google.
Fri, 23 Nov 2012, 15:35
Afr0
Thanks!
Sat, 24 Nov 2012, 08:30
HoboBen
https://en.wikipedia.org/wiki/Bloom_filter
Sat, 24 Nov 2012, 08:45
Afr0
Ok, so here's my current code:



Please tell me I'm doing something wrong, cos right now I'm getting collisions in both tables for the same ID!

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Sat, 24 Nov 2012, 08:50
Afr0
Thanks Hobo!
I found this implementation

Maybe I'll use that if I can't figure out what's wrong with my SQL queries.

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Sat, 24 Nov 2012, 15:58
Afr0
So I changed to this:



Requires less code, and there are fewer collisions - step forward!

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Sun, 25 Nov 2012, 01:01
Afr0
I also started using ulongs, because, guess what?!
Even though there are only 35851 assets, and a uint can store 0xFFFFFFFF of them, Maxis decided to reuse FileIDs. This means that an ID is only unique when combined with its TypeID, meaning that any given ID takes up twice as much memory for no particular reason at all!

Genious Maxis! Just fucking genious!

-=-=-
Afr0 Games

Project Dollhouse on Github - Please fork!
Mon, 26 Nov 2012, 10:41
Stealth
It probably was a hack since their FileIDs are only four characters. That wasn't a smart decision to begin with.

-=-=-
Quit posting and try Google.