Finding Distinct Values in Excel 2010

This is something that I’ve just found myself, and I wish I would have known about it a few days ago when I was asked to extract a distinct list of values from a column.

Start off by highlighting the range of data that you would like to find the distinct values in, I’ll be using the below simple sample data for the purpose of this post:

Example Data

Example Data

On the “Data” tab of Excel 2010, find the “Remove Duplicates” button:

Remove Duplicates Button

Click the button, and the Remove Duplicates Dialogue will appear:

Remove Duplicates Dialogue

Remove Duplicates Dialogue

On the top right of the Remove Duplicates Dialogue box, you’ll see a check box – “My data has headers” – you can toggle this check box depending on whether or not you have headers in your data. I’m going to leave it checked for the purpose of this example. The check box “Column” is the name of the column I’m removing duplicates from, if multiple columns are selected in the first step – they can be managed from here.

Once you’re ready, click OK, and the below message will appear – giving you a few details about what has occurred:

Remove Duplicates Message Box

Remove Duplicates Message Box

The result:

Remove Duplicates Result

Remove Duplicates Result

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Finding Duplicates With SQL

I’ve been asked several times over the last few days to help people find duplicates within a particular table, and below is a very basic bit of SQL which will do exactly that:

SELECT Column1,
       COUNT(Column1) [Data Count]
 
FROM Table
 
GROUP BY Column1
 
HAVING COUNT(Column1) > 1

Fairly straight forward, every where that I’ve written “Column1”, put the name of the column that you’d like to check for duplicates. You’ll obviously also need to replace “table” with the name of your table.

The “HAVING” clause just makes sure that only records with more than 1 record are returned, this can be modified as you please, or removed completely.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)