Jump to content

Handling Duplicate Rows in Excel 2010

+ 1
  chco's Photo
Posted Jul 21 2010 11:19 AM

The following is an excerpt from Excel 2010: The Missing Manual. It explains what to do to address the issue of duplicate rows in Excel 2010.
Hard-core table types know that every once in a while, despite the utmost caution, a duplicate value slips into a table. Fortunately, Excel has tools that let you find duplicates, wherever they're hiding, and remove them.

Highlighting Duplicates

It's not too hard to fish out these duplicates—one option is to use sorting (described earlier) on the column where you suspect a duplicate exists. Then, if you spot two identical values, you can delete one of the table rows. Of course, in order for this technique to work, you have to be ready to scroll through all the records and check each one. In a supremely long list, that job could take some time.

Fortunately, Excel has another solution—conditional formatting. You can use it to make repeating values stand out like sore thumbs. Here's how:

  • Select the table column that you want to check for duplicate values.
    For example, you could select the Product ID column to look for products that have the same ID value.

    Note: You can highlight more than one column, but if you do, Excel highlights identical values that appear in more than one column. For example, if the same number appears in the Product ID column and in the Price column, Excel highlights it even though it isn't really a duplicate.

  • Choose Home→Styles→Conditional Formatting→Highlight Cells Rules→Duplicate Values.
    When the Duplicate Values dialog box appears, choose the type of formatting you'd like to use to highlight repeated values. People often choose to change the background color.

  • Click OK.
    Excel changes the background color of all values that appear more than once in the selected column (or columns), as shown in the image below. Conditional formatting keeps working even after you've applied it. So, if you add a new record that duplicates the value of an existing record in the column you're checking, Excel immediately highlights it. It's like having a duplicate value cop around at all times.


Tip: If you have an extremely large table, you may like to use color-based sorting to bring the duplicate records to the top of the table. For example, if you highlighted duplicate Product ID values with a light red background fill, you click the Product ID column's drop-down arrow, choose Sort By Color, and then pick the same color (which automatically appears in the menu).

Conditional formatting helps smoke out a product with the same name—"Escape Vehicle (Water)"—that happens to be in two different places in the table.
Attached Image


Removing Duplicates Automatically

Once you've found duplicate records, it's up to you what to do with them. You can leave them in your table or delete them by hand. However, if you don't want to keep the duplicates, Excel has a quicker solution, thanks to a built-in feature that hunts for duplicates and automatically removes the offending rows. Here's how to use it:

  • Move to any one of the table cells.
    Although you can technically use the duplicate removal feature with any range of cells, it works best with tables because you don't need to select the full range of data yourself.

  • Choose Data→Data Tools→Remove Duplicates.
    Excel shows the Remove Duplicates dialog box, where you can choose which columns to search for duplicates.

  • Decide how many columns need to match in order for Excel to consider the record a duplicate. Add a checkmark next to each column you want to inspect.
    For example, you may decide that any record with the same Product ID is a duplicate. When Excel scans the table, it removes any subsequent record that has the same Product ID as an earlier record, even if the rest of the data is different.

    On the other hand, you may decide that you want to match several columns to prevent the possibility for error. For example, there may be two different products that share the same Product ID due to a minor typo. To avoid this possibility, you could inspect several columns (as shown in the image below), so that records are removed only if every column value matches. (You can't do this maneuver with conditional formatting and the Highlight Duplicates rule. When you use conditional formatting, you're limited to finding duplicates in one column—if you include more than one column, Excel treats them as one big batch of cells.)

    In order to be considered a duplicate in this operation, the record has to have the same Product ID, Model Name, and Category. (The Price can vary.) If you need to select all the columns in a hurry, use the Select All button.
    Attached Image


  • Click OK to remove the duplicates.
    Excel scans your table looking for duplicates. If it finds any, it keeps the first copy and deletes those that appear later on in the table. When Excel is finished, it pops up a message box telling you how many rows it removed and how many still remain.

    You don't get a chance to confirm the deletion process, but you can reverse it by using the Undo feature (hit Ctrl+Z) immediately after it finishes.


Remember, when using the Remove Duplicates feature, you have no way of knowing what records Excel's deleted. For that reason, people often use the Highlight Duplicates rule first to check out the duplicates and make sure they don't belong.

Note: The Remove Duplicates feature and the Highlight Duplicates rule don't work in exactly the same way. For instance, Remove Duplicates treats repeated empty cells (blank values) as duplicates, and removes them. The Highlight Duplicates feature ignores empty cells and doesn't highlight them.

Excel 2010: The Missing Manual

Learn more about this topic from Excel 2010: The Missing Manual.

Fast-paced and easy to use, this guide shows you how to get the most out of Excel 2010. You'll learn how to develop a spreadsheet from scratch, create formulas, add data, and analyze and graph data so you can make informed business decisions. With clear jargon-free explanations, step-by-step instructions, tons of illustrations, and lots of undocumented tips and shortcuts, you'll get hands-on guided tours and explanations of Excel's new features.

See what you'll learn


Tags:
1 Subscribe


6 Replies

0
  Lisa Purple's Photo
Posted Aug 09 2012 04:31 AM

Thank you, but my job requires me to remove BOTH, the duplicate field and the field it repeats. Is there a way to do this without manually scrolling through the spreadsheet and going through each row of records one-by-one?
0
  Alex 27's Photo
Posted Oct 08 2012 01:31 AM

Please have a loook at the Duplicate Remover for Excel add-in
http://www.ablebits....cates/index.php

It provides the option "Remove Duplicates with their first occurrences" that should suit your needs.
0
  elenhim's Photo
Posted Dec 03 2012 11:00 AM

I prefer to dedupe data with pivot tables: add surrogate column for count (I use to number rows)-> create pivot on the table -> put needed columns on rowfields and surrogate field on datafield (as count) -> switch off any totals - and voila. I like this way because it takes just a couple of clicks, I can dynamically change criteria for dedupe, sort results, and drill down if I need to precize. Also, source data is unaffected.

This method is pretty good if you have to process single table and is awesome if you have to combine/compare/reconcile several tables.
Pluribus Impar
0
  etechguru's Photo
Posted Dec 03 2012 11:28 AM

There are several addins & tools available which can easily handle duplicate rows and columns in Excel 2010

Some of Examples are

http://exceladdins.net

http://www.technocom...atemanager.html

Video to show this

http://www.technocom...ertutorial.html
0
  Radiohalo's Photo
Posted Jul 19 2013 07:59 AM

Another method for finding duplicate rows: Reference the attached JPG
example for a new row 5, formula in cell L5 (L is the checking column in this case): {=INDEX($A$3:J5,MATCH(B5&C5&D5&E5&F5&G5&H5&J5,$B$3:B4&$C$3:C4&$D$3:D4&$E$3:E4&$F$3:F4&$G$3:G4&$H$3:H4&$I$3:I4&$J$3:J4,0),1)}

This formula is looking at a range from A3 to J5, but as the formula is copied down will look at every row before the current row. Row 5 is used to start the formula because there will not be an error between the first two rows (no need to check) . The formula looks for a match in every column listed in the format of B5 compared to $B$3:B4, B6 compared to $B$3:B5, etc. Reference the attached JPG. the returned result in the formula column is either "#N/A" which is good (meaning no duplicates for the new row) or the value in column A, in my case the first row that matches the new row. Note: Tthe outer brackets { } will not be there while entering the formula they are placed when the curser is in the formula bar and typing Ctrl+Shift+Enter

This table is used in Solidworks, and each row is a unique part. The tables to the right will be seen on the drawing.

Attached thumbnail(s)

  • Attached Image
  • Attached Image

0
  Kawser Ahmed's Photo
Posted Dec 26 2013 12:31 AM

Cleaning data with Excel is an easy technique. In the following link you will get solution with a sample excel file to work with.
Removing duplicate rows with Excel