It involves adding an indexed partial IP field to the database.
Since field equality is much faster to process, the numeric comparisons are narrowed down to the few records that share the same partial IP value instead of being carried out on the whole database. The bigger the partial IP field, the faster, but the database size will also increase. I found out that dividing the IP space in 16384 parts is a good compromise, giving significant speedup while adding about 10000 records to the table. Table size, including index is about 2.5Mo
Speedup is at least 5x (in some very crowded areas) and reaches more than 60x on large blocks (most traffic comes from large blocks). It's not as fast as some 'C' implementations, but it can be implemented very easily as an upgrade for people already using MySQL and does not require additional RAM.
Here is the modified database (zipped) : http://pages-web.com/data/ip-to-country-s.zip (based on the Sept 13, 2005 release)
Here is the table structure :
CREATE TABLE `ip2c` (
`IP8` smallint(5) unsigned zerofill NOT NULL default '00000',
`IP_FROM` int(10) unsigned zerofill NOT NULL default '0000000000',
`IP_TO` int(10) unsigned zerofill NOT NULL default '0000000000',
`COUNTRY_CODE2` char(2) NOT NULL default '??',
`COUNTRY_CODE3` char(3) NOT NULL default '???',
`COUNTRY_NAME` varchar(50) NOT NULL default '?',
KEY `IP8` (`IP8`)
) TYPE=MyISAM;
- the field was called IP8 since the original implementation used the 8 most significant bits
- use IP8 as an index, else no speedup at all !
Here is the unoptimized (normal) query :
SELECT COUNTRY_NAME FROM ip2c WHERE IP_FROM <= IP_Address and IP_TO >= IP_Address
On my 2.4G Server, execution time is 45 msec / query (a sluggish 22 queries/sec)
Here is the optimized query using the additional field :
SELECT COUNTRY_NAME FROM ip2c WHERE IP8 = IP_Address>>18 and IP_FROM <= IP_Address and IP_TO >= IP_Address
With the same machine, execution time varies from 0.8 msec to 7 msec (in some extreme cases). On a representative set of IP addresses from North America and Europe, average performance is 1.6 msec / query (about 600 queries / sec)
The index size was chosen as a compromise between database size increase (67369 records instead of the original value of 58850) and speedup.
A 100K records database would cut the worst case time in half (but effect on average speed is much lower).
The updating tool (Windows based) that creates the extra field is herein placed in the public domain :
http://pages-web.com/data/geo_ip.exe
It must be placed in the same directory as the ip-to-country.csv file and it will create a file called ip-to-country-s.csv
The file can be used directly to fill the table with phpMyAdmin.
If there is enough interest, a version that generates other formats (including SQL commands) could be made.
