Web Hosting Info

Search:

featured partner

The IP to Country Database

  Forum Topics : Development / More than 500.000 lookups per second...
Submitted by cyanid on Thu, 07/17/2003 - 07:29.
This was originally a comment, but I though it belonged in an own post instead.

How can you handle more than 500.000 or even millions of lookups per seconds with an average computer?

You could make an extremely fast "searcher" with not much work, all it requires is about 4,3gigs or space...(can be made less with a little work)

Very easy, ditch the database and use some drive space instead. Here is a breif explanation...

Simply, only 202 countries are present, one byte is enough for each IP t?hen, then you allocate 256^4 of space (approx 4,3gig) where each byte represents an IP address which then points to a contry... (less space can be used by removing large "dead" IP address areas)

Using this on a filesystem, would mean extremely fast lookups.

If interested I would be happy to tell you more.
If there are questions about this, please contact me.
Comment viewing options:
Select your preferred way to display the comments and click 'Save settings' to submit your changes.
About contacting me
Posted by cyanid on Thu, 07/17/2003 - 07:31.
Yes, if you want to contact me it's smart to have my e-mail address: cyanid@telia.com
 
So you really think...
Posted by Olray on Sat, 07/19/2003 - 17:58.
Given the fact that the "average computer" is an intel compatible i686 system with 512 or even 1 gig of memory. Further given that at "millions of lookups per second" you have a very high probability that at least one IP in a range of 4096 IPs (which is the Page Size of the "average intel compatible CPU) is requested at any minue, and a usual page swapping method rule of "discard oldest" this means you have to swap 4 gigs of memory to and from your hard disk at least once every minute. (Mark it pageable and r/o and you save the time for saving of every page when it's swapped out) How's a machine with more than 4 gigs of memory "average?"

Your method is only usable if you avoid swapping at all cost. That needs exactly 4 gigs of memory plus whatever your OS and your lookup software uses.

I suggest using a B-Tree algorithm with 256 subnodes at every node. This way you can look up every IP in 4 steps or less all from within memory, ideally marked non-swappable (ouch no not, don't throw stones at me!) The downside: It cannot be used from a PHP powered webserver but so can't your solution.

Just my EUR 0.02
 
Naa, not even close
Posted by cyanid on Fri, 08/08/2003 - 08:34.
The fact the a average hard-drive can read between 20-30mbs of data per second (modern are over 50mb/s)

Which means that you do not require to page the file...

Random access can go up to 10-20mb/s on average type drives, which means... it's very fast to lookup a file on the drive, since all you need is a ABSOLUTE pointer and 1 BYTE of data, I think your computer can do that 100.000 before you even got to pick your nose, or even get the finger up close :)

And without consuming any memory at all...
Done.
Posted by nwetters on Fri, 08/08/2003 - 14:34.

You don't need 4GB of disk, you need about 200KB.

You only need to store ranges, not individual IP addresses. Most ranges have sizes that are perfect powers of two, or can easily be chopped into sizes that are powers of two. Most ranges start at an easily maskable IP address.

Stored as a binary tree, it's possible to use less than five bytes per node using variable-length encoding. In fact, you can get close to one byte.

Every lookup can be guarateed to execute in less than 32 node lookups. In reality, the average is about 5 node lookups. This is the equivalent of around twenty array operations. You don't even need to write in a compiled language to achieve more that half a million lookups per second.

It's been done.

--
Nigel Wetters <nigel.wetters@lshtm.ac.uk> <nwetters@cpan.org>

 
Let's see a solution in action!
Posted by cruzit on Sat, 03/20/2004 - 23:37.
I raise the challenge!
Show us your links and the forum members can hit it all at once to check the theory. It sounds like someone may have a working version. I would really like to see a hard drive out-performing a real RDBMS. Not that I doubt anyone. I just want to see it work.
 
Disk/memory vs. RDBMS
Posted by Daath on Fri, 04/23/2004 - 13:57.
Well, me and my buddy are testing my new format and lookup method.
The data file is 468 KB.

* Test 1 does lookups directly to the file.
* Test 2 loads the file into memory and work from that.
* The tests looks up 28051 IP addresses
* This was a .NET app, in Delphi 8, so I don't know how much that slows it down.

Test 1 - Dell Dimension 8000 - 1,7 GHz P4
* App. 10600 lookups/sec

Test 1 - Dell D600 Centrino 1,6 GHz
* App. 18000 lookups/sec

Test 2 - Dell Dimension 8000 - 1,7 GHz P4
* App. 200000 lookups/sec

Test 2 - Dell D600 Centrino 1,6 GHz
* App. 400000 lookups/sec

Even working from a file outperfoms any RDBMS on an equal machine ;) Using memory wipes out competition from RDBMSs - You can always do something specialised to outperfom RDBMSs...

By the way, it's interesting to see how big a role the Centrino's 1MB cache plays :) It doubles the speed of a faster P4 (GHz wise) ;)

As a side note we tested GeoIP's free binary version with a delphi module that can handle it - GeoIP is fast - but our method is 50-70% faster using disk reads - Of course using test 2 GeoIP was wiped out.

-
Any technology distinguishable from magic, is insufficiently advanced.
 
Now in C#
Posted by Daath on Tue, 04/27/2004 - 16:19.
I wrote the lookup in C# and on the Dell Dimension 8000 I now get app. 300.000 lookups/sec - The D600 actually came over 500.000 lookups/sec ;)
We're in the final stages of releasing - We will release sources in python, php and C# :)

The speed is achieved by keeping the 468 KB file in memory.

What kind of performance does the other solutions have? I'm curious :)

-
Any technology distinguishable from magic, is insufficiently advanced.
 
Link?
Posted by emsi on Sat, 05/01/2004 - 13:47.
Can you please post a link to your c# source? :)
 
I can't stop it seems...
Posted by Daath on Wed, 05/05/2004 - 11:13.
I have written my lookup code in pure C. On my P4 2 GHz, built with Borland C Compiler 5.5 free, optimized for pentium pro (and no apparent other optimizations), I am now able to achieve ~1,2 million lookups/second. I almost didn't believe it, but the results are real :) I'm very surprised about the big difference from C# for example, but hey, it's CIL ;)
Anyway, on my linux server (Via C3 1GHz), I get ~420000 lookups/second.

I'm really not a C coder, but I'm pretty sure my code can't be THAT much more optimized...

Oh, and since I made a C version, I could just as well make a php module ;) The module is SIGNIFICANTLY slower - I guess that's php's fault. I only get ~63000 lookups/second with the module, on my C3 1GHz server - But it's still better than my pure php implementation where I only get ~1100 lookups/second.

My overall version has a few disadvantages though. It relies on x86 little endian... So it probably wont run on exotic architectures like AMD64 or Sparc... But maybe someone could take a look at that once I release all my code.

-
Any technology distinguishable from magic, is insufficiently advanced.
S sdf dsfds
Posted by Jomas56 on Fri, 04/20/2007 - 04:09.
Aloja! Ah: asian gang bang Extreme asian gang bang in that town. asian babes Tall asian babes have gropsex. asian porn star Famous asian porn star in that clips. asian hardcore Asian hardcore is really hard. gay asian She like gay asian. asian sexy There is asian sexy in the red dress. busty asian See! Busty asian! There. asian sluts That asian sluts were splendid. Be good.