Monthly Archives: August 2015

Optimal Hardware for Aim-Smart’s Data Quality Excel Add-in

Customers often ask me what our minimum and recommended hardware specifications are.  Well, I’ll give a short answer, followed by some additional explanation.

Minimum Requirement Recommended
CPU Pentium 4 or Higher Intel i7
Memory 1 GB free 6 GB or more free
Hard Disk 5400 rpm SSD

Aim-Smart has been designed with performance in mind from the beginning.  For this reason users are able to parse hundreds of thousands of records in just a few seconds.  When building the software, if at any point we thought performance was suboptimal, we spent significant effort to pinpoint the cause.  Each time, we’d find that, with a few more optimizations, we were able to improve the code and reduce the processing time by orders of magnitude in most cases.

We implemented our software in this fashion because we never knew what kind of hardware our customer’s might have.  In addition, because Excel doesn’t run on large powerful servers but rather individual laptops and desktops, we knew we didn’t have the luxury of sloppy code.

The hardest challenge to-date has been optimizing our fuzzy matching and deduplication algorithms. Having said all of this, our  fuzzy matching performance is excellent with nice linear behavior as the number of input records increases.  In addition, if you have additional demands on speed, we can work with you and come up with a solution (we have some tricks up our sleeves but it requires some extra configuration).

Despite everything we’ve done to reduce the need to have powerful computers, it of course doesn’t hurt and can only help.  The single most important thing when running Aim-Smart is the speed of the hard drive.  Because working with data is highly I/O dependent, a small investment in a fast SSD hard drive is well worth it.

South African Support

We recently added South African support to Aim-Smart. Now users can parse ZA addresses, phone numbers, match names using fuzzy logic, and perform other data quality functions within Excel.

One of the most complicated features to implement was the ability to parse full South African addresses, given that the format is not always consistent. We find that often addresses don’t specify the city, but rather only the suburb; however, this is not always the case – sometimes both are present and sometimes only the city is specified. Most often the province is not specified, but even then that’s not always true either. Here are some example addresses:

Waterfront Drive Knysna, South Africa 6571
Old Rustenburg Road Magaliesburg, South Africa 1791
Summit Place Precinct (corner of N1 North and Garsfontein off ramp), 213 Thys St Menlyn, Pretoria, Gauteng 0181
277 Main Rd, Sandton, South Africa
Johannes Road Randburg, South Africa
Porterfield Rd Cape Town, South Africa

As you can see, the format varies significantly; however, the new engine can handle most of the formats we found and does a great job matching addresses.

For gender guessing, we are able to guess gender, but we don’t have the statistical probabilities that exist in the US because we weren’t able to find a data source that allowed us to determine probability.

As with other countries, adding ZA support is very reasonably priced.  Good luck!