This Article Was Previously Published at Http

  • Upload
    threva

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 This Article Was Previously Published at Http

    1/16

    This article was previously published athttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/

    Find & Remove Duplicates - Dedupe Excel Tables / Lists

    1. Match two tables (lists), compare by columns, find or exclude the matchedo Example #1 - I have 2 lists of emails. Need to filter/subtract from List1 the emails

    that are in List2.o Example #2 - I have 2 lists of mailing addresses. Need to filter/subtract from List1

    the mailing addresses that are in List2. * Note that mailing address is more complicatedthan email address, because we need to compare at least three columns, 'Street1', 'Street2','ZipCode', to decide if two addresses are the same.

    2. Filter duplicates within one table (list), by key columns or more complex ruleso Example #3 - I have a list of emails. Need to extract the uniques, and count the

    number of duplicates for each unique email.o Example #4 - I have a list of mailing addresses. Need to extract the uniques, ( *

    The uniqueness of a mailing address is determined by 3 columns "Street1", "Street2", "ZipCode" )and count the number of duplicates for each unique address.

    o Example #5 - I have a list of customer transactions, need to find the latest transaction by each customer.

    3. Extract unique values from a random range or areaso Example #6 - I need to quickly extract a list of the unique values from one or

    more selected ranges.

    Find & Remove Duplicates ( dedupe ) - Example #1

    I have 2 lists of emails. How to filter/subtract from List1 the emails that are in List2?

    Practice file - dedupe-email-list-demo.xls (16k)

    http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A2http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A4http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A5http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A6http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A2http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A4http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A5http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A6http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1
  • 8/3/2019 This Article Was Previously Published at Http

    2/16

    1. Go to List1. Click only 1 cell, then invoke "Complex Filter->Match Tables...", a wizardwill start. (Note that this command is the same as "Table->Match Tables..." or "Unique->Match Tables...")

    2. Wizard Step 1 - List1 is automatically selected.

    3. Wizard Step 2 - Select List2.o Click into the range selector o Then you can go to the List2 sheet or area, click only 1 cell in List2o Click "Select Table(List)" button, to autoselect List2

  • 8/3/2019 This Article Was Previously Published at Http

    3/16

  • 8/3/2019 This Article Was Previously Published at Http

    4/16

    4. Wizard Step 3 - Use the dropdown to select the pair of columns to match. Our case onlyneeds one pair. To add more pairs, click 'Add Criteria'.

    5. Wizard Step 4 - Select what to show in result

    In our case, we want to exclude from List1 the rows that are matched in List2, therefore, we choose "Hide

    matched rows". Note that there is also a third option - "Count Matches", which only counts the number of matched rows in List2. Click 'Finish' to match.

  • 8/3/2019 This Article Was Previously Published at Http

    5/16

    6. Match resulto Matched rows are hidden beneath the "+". Only unmatched ( unique ) rows are shown. Duplicates

    are now hidden. You can click on the "+" to expand and show them.o Note that you can use "DigDB->Invert Filter" to toggle between matched and unmatched easily.

    Hidden beneath the '+' are the matched rows (duplicates). You can expand '+' or '2' to to see what they are.Click the '-' or '1' to contract.

  • 8/3/2019 This Article Was Previously Published at Http

    6/16

    7. Extract resulto Make sure the '+' is NOT expanded. Click "Extract Result->Copy Visible Rows to New Sheet", a

    new sheet will be created, and match result will be copied over.

    Find & Remove Duplicates ( dedupe ) - Example #2

    I have 2 lists of mailing addresses. How to filter/subtract from List1 the addresses that are inList2? (* The uniqueness of a mailing address is determined by 3 columns "Street1", "Street2","ZipCode")

  • 8/3/2019 This Article Was Previously Published at Http

    7/16

    Practice file - dedupe-mail-list-demo.xls (16k)

    This is pretty much the same as Example #1 . Except in Wizard Step 3 , when you selectthe column pairs to match, you need 3 pairs of columns. You can use the 'Add Criteria'

    button to add more pairs.

    Find & Remove Duplicates ( dedupe ) - Example #3

    I have a list of emails. How to extract the uniques, and count the number of duplicates for eachunique email?

    1. Go to your list. Click only 1 cell, then invoke "Complex Filter->Filter Uniques byGroup..." (note that this command is the same as "Table->Filter Uniques by Group..." and"Unique->Filter Uniques by Group...")

    http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-mail-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-mail-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1.3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1.3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1.3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-mail-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A1.3
  • 8/3/2019 This Article Was Previously Published at Http

    8/16

    Practice file - dedupe-email-list-demo.xls (16k)

    The list will be automatically selected, and a pop-up window will show.

    2. Select the "Email" column from the dropdown box.o Note that DigDB defaults to "Use All Fields". In our case, we are looking for uniques in "Email"

    column, therefore "Email" should be selected. You can also use 'Add Field' button to dedupe bymore than one field ( next demo ).

    o Uncheck the "Key Field(s) is Case Sensitive" checkbox, because email addresses are caseinsensitive, i.e. "[email protected]" is the same as "[email protected]".

    http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A4http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-email-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A4
  • 8/3/2019 This Article Was Previously Published at Http

    9/16

    3. Filter result is showno Only unique rows are shown. If a row has duplicates, a '+' will appear immediately below the row,

    and the duplicates will hidden under the "+". You can click on the "+" to show the duplicaterecords.

    o Click the small '2' sign will expand all '+', click the small '1' will contract all.o To the right there is a new column "Count of Occurances". It tells you, for example,

    "[email protected]" has duplicates and it appears 3 times in the table.

    Click '+' or '2' to expand and show the duplicates. Click '-' or '1' to contract.

  • 8/3/2019 This Article Was Previously Published at Http

    10/16

    4. Extract Uniqueo Make sure the '+' is NOT expanded. Click "Extract Result->Copy Visible Rows to New Sheet", a

    new sheet will be created, and the visible ones (unique rows with the count of occurrance) will becopied over.

    Find & Remove Duplicates ( dedupe ) - Example #4

    I have a list of mailing addresses. Each row is a person's mailing address. Since there aremultiple people living at the same address, we have a duplicate problem Need to extract theunique mailing addresses, ( * The uniqueness of a mailing address is determined by 3 columns "Street1","Street2", "ZipCode" ) and count the number of duplicates for each unique.

  • 8/3/2019 This Article Was Previously Published at Http

    11/16

    Practice file - dedupe-mail-list-demo.xls (16k)

    This is pretty much the same as Example #3 . Except in step 2 , you need to use 'AddFields' to in order to have 3 determining (key) fields: "Street1", "Street2", "ZipCode".The uniqueness of a mailing address is determined by the three.

    Find & Remove Duplicates ( dedupe ) - Example #5

    I have a list of customer transactions, need to find the latest transaction by each customer.

    http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-mail-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-mail-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3.2http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3.2http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-mail-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3.2
  • 8/3/2019 This Article Was Previously Published at Http

    12/16

    Practice file - dedupe-transaction-list-demo.xls (16k)

    This is similiar to Example #3 in that in result you want one row per 'customer'. But therow needs to meet the rule which is the 'date' must be the latest (max value) of all therows by that customer.

    1. Sort the list by 2 columns - Sort by 'Customer', then by 'Date'o Click a cell in the list, invoke 'DigDB->Sort->Multi-level...', the list will be auto-selectedo Use 'Then by' button to add sort-by columnso Sort by 'Customer', then by 'Date', 'Descending'

    Click 'Then by' to add more columns to sort by.

    http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-transaction-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-transaction-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-transaction-list-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3
  • 8/3/2019 This Article Was Previously Published at Http

    13/16

    Sorted, so that for each 'customer', its rows are sorted by 'Date', from latest to oldest

    2. Invoke 'DigDB->Complex Filter->Filter Duplicates...', repeat Example #3

    http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/#A3
  • 8/3/2019 This Article Was Previously Published at Http

    14/16

    Set how to filter

    Only the first row of each 'Customer' is shown, which is already sorted to be the latest 'Date' (max)

    Extract the result

  • 8/3/2019 This Article Was Previously Published at Http

    15/16

    Find & Remove Duplicates ( dedupe ) - Example #6

    I need to quickly extract or count unique values in a range, but the range is not a list or table. It's just a random selection of cells. How to do that?

    Practice file - dedupe-selection-demo.xls (16k)

    1. Select the range(s) first, use Ctrl+select for multiple areas, then invoke 'Unique->ExtractUniques->from Selection'

    http://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-selection-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-selection-demo.xlshttp://www.digdb.com/excel_add_ins/duplicates_find_remove_dedupe/dedupe-selection-demo.xls
  • 8/3/2019 This Article Was Previously Published at Http

    16/16

    A new sheet will be created where unique values will be extracted

    You can also use 'Unique->Count Uniques->in Selection' to get a quick count of theunique values in your selected cells