Hard-core table types know that every once in a while, despite the utmost caution, a duplicate value slips into a table. Fortunately, Excel has tools that let you find duplicates, wherever they're hiding, and remove them.
It's not too hard to fish out these duplicates—one option is to use sorting (described earlier) on the column where you suspect a duplicate exists. Then, if you spot two identical values, you can delete one of the table rows. Of course, in order for this technique to work, you have to be ready to scroll through all the records and check each one. In a supremely long list, that job could take some time.
Fortunately, Excel has another solution—conditional formatting. You can use it to make repeating values stand out like sore thumbs. Here's how:
- Select the table column that you want to check for duplicate values.
For example, you could select the Product ID column to look for products that have the same ID value.
Note: You can highlight more than one column, but if you do, Excel highlights identical values that appear in more than one column. For example, if the same number appears in the Product ID column and in the Price column, Excel highlights it even though it isn't really a duplicate.
- Choose Home→Styles→Conditional Formatting→Highlight Cells Rules→Duplicate Values.
When the Duplicate Values dialog box appears, choose the type of formatting you'd like to use to highlight repeated values. People often choose to change the background color.
- Click OK.
Excel changes the background color of all values that appear more than once in the selected column (or columns), as shown in the image below. Conditional formatting keeps working even after you've applied it. So, if you add a new record that duplicates the value of an existing record in the column you're checking, Excel immediately highlights it. It's like having a duplicate value cop around at all times.
Tip: If you have an extremely large table, you may like to use color-based sorting to bring the duplicate records to the top of the table. For example, if you highlighted duplicate Product ID values with a light red background fill, you click the Product ID column's drop-down arrow, choose Sort By Color, and then pick the same color (which automatically appears in the menu).
Removing Duplicates Automatically
Once you've found duplicate records, it's up to you what to do with them. You can leave them in your table or delete them by hand. However, if you don't want to keep the duplicates, Excel has a quicker solution, thanks to a built-in feature that hunts for duplicates and automatically removes the offending rows. Here's how to use it:
- Move to any one of the table cells.
Although you can technically use the duplicate removal feature with any range of cells, it works best with tables because you don't need to select the full range of data yourself.
- Choose Data→Data Tools→Remove Duplicates.
Excel shows the Remove Duplicates dialog box, where you can choose which columns to search for duplicates.
- Decide how many columns need to match in order for Excel to consider the record a duplicate. Add a checkmark next to each column you want to inspect.
For example, you may decide that any record with the same Product ID is a duplicate. When Excel scans the table, it removes any subsequent record that has the same Product ID as an earlier record, even if the rest of the data is different.
On the other hand, you may decide that you want to match several columns to prevent the possibility for error. For example, there may be two different products that share the same Product ID due to a minor typo. To avoid this possibility, you could inspect several columns (as shown in the image below), so that records are removed only if every column value matches. (You can't do this maneuver with conditional formatting and the Highlight Duplicates rule. When you use conditional formatting, you're limited to finding duplicates in one column—if you include more than one column, Excel treats them as one big batch of cells.)
In order to be considered a duplicate in this operation, the record has to have the same Product ID, Model Name, and Category. (The Price can vary.) If you need to select all the columns in a hurry, use the Select All button.
- Click OK to remove the duplicates.
Excel scans your table looking for duplicates. If it finds any, it keeps the first copy and deletes those that appear later on in the table. When Excel is finished, it pops up a message box telling you how many rows it removed and how many still remain.
You don't get a chance to confirm the deletion process, but you can reverse it by using the Undo feature (hit Ctrl+Z) immediately after it finishes.
Remember, when using the Remove Duplicates feature, you have no way of knowing what records Excel's deleted. For that reason, people often use the Highlight Duplicates rule first to check out the duplicates and make sure they don't belong.
Note: The Remove Duplicates feature and the Highlight Duplicates rule don't work in exactly the same way. For instance, Remove Duplicates treats repeated empty cells (blank values) as duplicates, and removes them. The Highlight Duplicates feature ignores empty cells and doesn't highlight them.
Learn more about this topic from Excel 2010: The Missing Manual.
Fast-paced and easy to use, this guide shows you how to get the most out of Excel 2010. You'll learn how to develop a spreadsheet from scratch, create formulas, add data, and analyze and graph data so you can make informed business decisions. With clear jargon-free explanations, step-by-step instructions, tons of illustrations, and lots of undocumented tips and shortcuts, you'll get hands-on guided tours and explanations of Excel's new features.