Category Archives: Deduplicate
Why is data cleansing important to companies?
The U.S. economy wastes an estimated $3 Trillion per year due to incorrect, inconsistent, fraudulent and redundant data. Businesses incur a very large portion of this cost. The Data Warehousing Institute estimate that the total cost to U.S. business more than $600 Billion a year from bad data alone. That is a very large cost that can easily be reduced by investing a much smaller amount of money earlier business data processing. No business is immune to financial loss from lost sales or extra expenses caused by incorrect data. However businesses are able to limit these losses through proper data cleansing.
The rule of 1:10:100 by W. Edwards Deming says that it takes $1 to verify a record when it is first entered, $10 to clean or deduplicate bad data after entry, and $100 per record if nothing is done to rectify a data issue. Part of this cost is incurred due to mail cost. The average company wastes $180,000 per year on direct mail that does not reach the intended recipient due to inaccurate data. Poorly marketed advertisements to incorrect demographics are another expense of having poor or out of date information.
The most common issues found due to data quality are duplicate and old records. Within one month of receiving customer data 2% becomes incorrect due to moving, death, marriage or divorce. If you are receiving data from an outside source, i.e. other departments, other businesses, these sources may not know how old the records are, if imported data is several months old 10% or more may be inaccurate. By having tools in house, companies can limit the level of incorrect data they deal with. An issues with in house data cleaning is that IT departments are often heavily relied on for that process. IT professionals have expert knowledge in technical issues, but often lack in-depth data processing best practices. Usually companies employ data analysis experts, but again, they often lack in-depth IT knowledge. The strength of utilizing business data is dependent on these professionals ability to function at their best. Often there is time lost or wasted due to lack of clear communication between these two departments. In addition lack of understanding causes incorrect information to be returned and can result in having to redo various aspects of the process.
What does Aim-Smart do to eliminate these issues? Aim-Smart is built with the business user, analysis expert and IT personal in mind. With Aim-Smart business users and analysis experts are able to manipulate data with the speed and accuracy of IT professionals and use their knowledge of advanced data processing techniques to remove or update data as needed. This is done all with in Excel, a comfortable environment for any business professional. This allows the IT professional to focus on the upkeep of the data platforms and focus on maintaining the platforms for data storage. I also removes many opportunities for misunderstanding and communication breakdown. It is estimated that up to 50% of IT expenses are spent on data cleaning. Aim-Smart removes most of the data cleaning process from the IT department. By putting the power of data cleaning in the hands of the business user, IT demands drop and allows the IT professionals to focus on strengthening and updating internal systems within the company. Aim-Smart allows business users to remove data, or update records they find to be out of date. It also allows them to manipulate the data as they want through parsing, standardization and other features. By doing this, company marketing budgets can more effectively be spent on contacts or data results that are accurate. This helps to maximize the use of company money spent on advertising as well as other expenses.
Source:
“$3 Trillion Problem: Three Best Practices for Today’s Dirty Data Pandemic” by Hollis Tibbetts
How Deduplicate is Superior to Excel’s Remove Duplicates
Excel includes a standard duplicate removal tool called “Remove Duplicates”. I have used this tool before matching records using Smart Match (to avoid duplicate matching). While Remove Duplicates is useful, there are two fundamental issues to be aware of.
First, Remove Duplicates only removes exact matches within the Excel Sheet. On several occasions I’ve had two records that are the same, but because they come from different sources they aren’t exactly the same. One record may contain a contact with the name Anthony, but the same contact may be listed in another record as Tony. Remove Duplicates doesn’t give me any options for finding duplicates that aren’t exact matches.
The second issue I have with Remove Duplicates is the information it provides after removing the duplicates. The only information a user receives is the total number of removed records. Remove Duplicates won’t specify whether one record was duplicated several times or if several records were duplicated only once, or if there was some combination in between.
Deduplicate addresses these issues. When running Deduplicate I can access all the power of Aim-Smart’s matching logic, which allows for superior deduplication identification and reporting.
Here is an example of where Deduplicate results differ from Remove Duplicates. When using Remove Duplicates users have no ability to choose if duplicates are removed or not; in a way this isn’t a problem because every field matches exactly, so we know that we want the record removed. However, it leaves several possible duplicates (that vary by a small amount) untouched in the document. When using Deduplicate with the intelligent deduplicate logic we can see multiple sets of exact duplicates and we can see varying quality levels of other duplicates as well. This is useful when data may look like this.
Using Remove Duplicates on all columns, with the data provided, would only remove row 4 and leave Rows 2 and 5, which are of course real duplicates. Remove Duplicates allows us to choose limited columns to compare. If a user chose a column subset (Last, Address, State) then rows 2-5 would be removed, but row 3 is not actually a duplicate. Without Aim-Smart’s Deduplicate tool it would take a lot of time and effort to identify (and potentially remove) the approximate match duplicates.
Deduplicate is different. Users can enter several different deduplication options at the same time. For the case above one option would be to create deduplication parameters like these.
With the options shown above Deduplicate will return the following results.
With these results a user is able to assess which matches are true matches and which ones aren’t true matches. The user then knows based on the [ID] which rows to remove in the original document. In addition users can easily see in the results how many duplicates each record group contains.
Thanks to these abilities and features Deduplicate is a more powerful tool than Excel’s Remove Duplicates tool.
What is Aim-Smart?
I am very excited to announce the launch of Aim-Smart – premier data processing software that leverages the latest in technology available. Founded in Denver, Colorado, Aim-Smart was created to change the way businesses view data management. Our company vision is to make data accessible to users at all levels throughout any company. Composed of experts in the field of data quality software, our team has numerous years of experience, programming, processing and working with data from the position of the user. Through our effort and vision we aim to innovate how business data quality software works.
Our signature product, Aim-Smart, an Excel add-in, allows users to Smart Match, Smart Parse and Smart Standardize data without relying on IT personnel or powerful servers. We know that when each individual within a company has immediate access to the data they need, performance of the entire company benefits. To maximize Smart Matching Aim-Smart utilizes fuzzy matching features; this allows for human inconsistencies, which are a constant challenge to accurate matching. In addition, Aim-Smart offers other features that facilitate the user in streamlining and sorting their data from different sources. These features, such as Genderize and Zap Gremlins, help by asserting to varying percentages the gender of customers based on name and remove difficult or unwanted characters from data, respectively. Aim-Smart was developed so that any business user can easily process data on their own computer, within the comfort of Excel, using this powerful plug-in.
Accuracy is vitally important when dealing with data. Government agencies often change laws on what financial records need to be reported by a company. Issues occur when a company has different departments that function almost independently of one another. In these cases there are opportunities for miscommunication regarding payments going to, or being received from, outside individuals. Records often need to be compared and analyzed so that proper information can be reported. A high level of accuracy is required in these instances. Severe fines, or worse, can result when companies report incorrect information. This is why we at Aim-Smart are focused on delivering the highest level of accuracy for data matching.
We understand that every person within a company can be more effective given the best-possible information. One issue that affects efficiency company-wide is not having the proper information when it is required. As an example, suppose Susan is testing several theories on how to increase exposure in different demographic areas. Waiting for Lilly, the systems analyst, (and possibly server availability) to get the job done will result in valuable time wasted, not to mention various discussions back and forth regarding exactly what’s needed. However, if Susan is able to access the data she needs and perform the data quality functions required without being forced to rely on Lilly, she can get her job done sooner and ultimately the company benefits. In order to bridge the divide between Susan and her data, Susan needs tools that she can work with without in-depth IT experience.
We are excited for every business to experience the power Aim-Smart brings to the individual business user. We are looking forward to helping all companies maximize time and resources in their companies. We are committed to be the best at what we do, and change the way businesses operate by increasing access to high quality data placed directly in the hands of the end users. Come experience Aim-Smart and feel the power of Data Quality Now.