DiskBoss Duplicate Files Finder

Embed Size (px)

Citation preview

  • 8/6/2019 DiskBoss Duplicate Files Finder

    1/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    1

    Duplicate Files Finder

    Version 1.2

    Mar 2011

    Flexense Ltd.

    [email protected]

    DiskBossFile & Disk Manager

  • 8/6/2019 DiskBoss Duplicate Files Finder

    2/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    2

    Product Overview

    DiskBoss is an automated, rule-based file and disk manager allowing one to search andclassify files, perform disk space utilization analysis, detect and remove duplicate files,organize files according to user-defined rules and policies, copy large amounts of files in afault-tolerant way, synchronize disks and directories, cleanup wasted disk space, etc.

    All file management operations are integrated in a centralized and easy-to-use GUI applicationwith a built-in file navigator allowing one to execute any required operation in a single mouseclick. Frequently used file management operations may be pre-configured as user-definedcommands and executed using the GUI application or direct desktop shortcuts.

    DiskBoss is a highly extendable and customizable data management solution allowing one to

    design custom file classification plugins and purpose-built file management operations usingan open and easy-to-use XML-Based format. Custom disk space analysis and file managementoperations may be integrated into the product, executed periodically at specific time intervals,performed as conditional actions in other operations or automatically triggered by one or morechanges in a disk or directory.

    In addition, IT administrators are provided with extensive database integration capabilitiesallowing one to submit disk space analysis, file classification, duplicate files detection and filesearch reports into an SQL database. Reports from multiple servers and desktop computersmay be submitted to a centralized SQL database allowing one to display charts showing theused disk space, file categories and duplicate files per user or per host and providing an in-depth visibility into how disk space is used, what types of files are stored and how much spaceis wasted on duplicate files across the entire enterprise.

    Finally, IT professionals and enterprises are provided with DiskBoss Server a server-basedproduct version, which runs in the background as a service and is capable of executing all diskspace analysis and file management operations in a fully automatic and unattended mode.DiskBoss Server can be managed and configured locally or through the network using a freenetwork client GUI application or the DiskBoss command line utility, which provides the userwith the ability to integrate DiskBoss features and capabilities into other products and

    solutions.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    3/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    3

    Duplicate Files Finder

    DiskBoss' built-in duplicate files finder provides a large number of advanced features andcapabilities allowing one to identify and cleanup duplicate files on desktops, servers and NASstorage devices. The duplicate files finder shows detected duplicates and allows one to deleteduplicate files, replace duplicate files with links to originals or delete duplicates.

    The user is provided with the ability to categorize and filter detected duplicate files by the file

    extension, category, file size, user name, last access time, etc. Moreover, DiskBoss allows oneto generate various types of charts and export reports to the HTML, text and CSV formats.

    Power users and IT professionals are provided with policy-based duplicate files detection andremoval capabilities allowing one to define custom duplicate files detection and cleanupcommands and execute them in a fully automatic mode using the DiskBoss' GUI application orthe command line utility. Finally, corporations and enterprises are provided with the ability to

    submit reports from multiple servers and desktop computers to a centralized SQL databaseallowing one to analyze the disk space wasted on duplicate files across the entire enterprise.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    4/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    4

    Detecting Duplicates in a Disk or Directory

    In order to detect duplicate files in one or more disks or directories, select the requireddirectories in the DiskBoss' file navigator and press the Duplicates button located on the maintoolbar. DiskBoss will scan the selected files and directories and display a dialog showing thelist of detected duplicate file sets.

    For each duplicate file set, DiskBoss shows the name of the original file, the number ofduplicate files in the set, the size of each file in the set, the amount of wasted disk space andthe currently selected duplicates removal action. In order to see all duplicate files related to a

    set, click on the set item in the set list.

    The duplicate set dialog shows all duplicate files related to the set and allows one to select theoriginal file, the duplicate files and the duplicates removal action. In order to select a file asthe original, select the file item, press the right mouse button and select the Set as OriginalFile menu item. In order to see more information about a file, just click on the file item in thefile list. Once finished selecting the duplicate files, use the removal actions combo box locatedin the bottom-left corner of the dialog to select an appropriate duplicates removal action.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    5/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    5

    Selecting Duplicate Files Removal Actions

    The DiskBoss' duplicate files finder allows one to delete duplicate files, move duplicates toanother directory or replace duplicates with links pointing to the original file in each specificset of duplicate files. In order to select a specific duplicates removal action for one or moresets of duplicate files, select the sets in the set list, press the right mouse button and select an

    appropriate duplicate files removal action.

    By default, DiskBoss selects the oldest file in each set as the original file and all other files inthe set as duplicates. In order to change that, select one or more sets, press the right mousebutton and select the Select Oldest Files as Duplicates menu item. Alternatively, open the setdialog, select any arbitrary file in the set as the original file, select an appropriate duplicates

    removal action that should be executed for this specific set and select one or more duplicatefiles in the set that the removal action should be applied to.

    Executing Duplicate Files Removal Actions

    Once finished selecting duplicates and removal actions, press the Preview button to see theduplicate files removal actions preview dialog. The duplicates removal actions preview dialogshows the selected duplicate files and removal actions that will be executed and allows one toreview and manually confirm each specific action before execution.

    The operating system and other system applications may have a large number of duplicatefiles located in various system directories. These duplicate files may be very important forproper operation of the operating system and other system applications and it is highly

    dangerous to remove these duplicate files. To be on the safe side, use the duplicates removalactions only for your own documents, music files, videos, etc.

    In order to execute the selected duplicates removal actions, press the Execute button locatedin the bottom-right corner of the Preview dialog. DiskBoss will process the selected duplicate

    files and execute the specified duplicates removal actions.

    Warning: There are many duplicate files in the Windows system directory, which areimportant for proper operation of the operating system. Removal of duplicate fileslocated in the Windows system directory may permanently damage the operatingsystem and render the computer completely non-functional.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    6/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    6

    Using File Filters and Categories

    The DiskBoss' duplicate files finder allows one to categorize and filter duplicate files by the fileextension, category, size, user name, etc. The user is provided with the ability to applymultiple file filters, display specific types of duplicate files and apply duplicate files removalactions to or export reports showing filtered files only.

    In order to set one or more file filters, select an appropriate type of file categories in thecategories combo box, select one or more file filters in the filters view, press the right mouse

    button and select the Apply Selected Filters menu item.

    With active file filters, DiskBoss shows duplicate files matching the selected filters, exportsreports showing matching files only and significantly simplifies selection of duplicates removal

    actions for specific file types or file categories. In order to clear the selected file filters, justpress the Clear button located on the right side of the categories selector.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    7/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    7

    Showing Duplicate Files Pie Charts

    The duplicate files finder allows one to display charts showing the amount of wasted diskspace and the number of duplicate files per extension, file type, file size, user name, etc. Inorder to open the charts dialog, press the Charts button located on the dialogs toolbar.

    The charts dialog displays information for the displayed duplicate files and the currentlyselected categories of duplicate files. In order to display a chart for another category ofduplicates, select an appropriate category in the categories combo box and then open thecharts dialog.

    The charts dialog allows one copy the displayed chart image to the clipboard making it very

    easy to integrate DiskBoss charts into users reports and presentations. Finally, the user is

    provided with the ability to customize the information displayed on the charts status bar.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    8/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    8

    Saving Duplicate Files Reports

    DiskBoss allows one to save lists of detected duplicate files to HTML, text and Excel CSVreports. In addition, the user is provided with the ability to save DiskBoss' native reports,which preserve all information about each specific duplicate files detection operation and maybe imported to an SQL database using DiskBoss Ultimate.

    In order to save a report file, press the Save button located on the dialogs toolbar, select anappropriate report format, enter the report file name and press the Save button. Optionally,limit the report to a specific number of duplicate file sets and/or select the Save Compressed

    Report option to save a compressed report file.

    A typical report file includes information about the date and time of the duplicate filesdetection operation, the name of the host computer the operation was performed on, a list oftop 10 file categories according to the currently selected categories mode followed by the listof duplicate file sets detected in the processed disks and directories. For each set of duplicate

    files, DiskBoss shows the name of the original file, the number of duplicate files in the set andthe amount of wasted disk space.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    9/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    9

    Exporting Reports to an SQL Database

    IT professionals and enterprises are provided with the ability to submit reports listing duplicatefiles detected on multiple storage systems, servers and desktop computers to a centralizedSQL database enabling system and storage administrators to gain an in-depth visibility intoamounts of duplicate files and wasted disk space across the entire enterprise.

    In order to submit a report to an SQL database, press the Save button located on the dialogs

    toolbar, select the SQL Database report format and press the Save button. Before exportinga report to an SQL database, the user needs to open the options dialog, enable the ODBCinterface and specify the name of the ODBC data source, the database user name andpassword to use for database export operations.

    For each report in the database, DiskBoss shows the report date and time, the name of thehost computer the operation was performed on, disks and directories that were processed, thetotal amount of disk space and the number of files that were processed and the report title. Inorder to open a report, just click on the report item in the report list.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    10/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    10

    Analyzing Duplicate Files Per User

    DiskBoss Ultimate and DiskBoss Server provide the ability to analyze duplicate files owned bymultiple users and detected on one or more servers or desktop computers and display chartsshowing the amount of wasted disk space and the number of duplicate files per user.

    In order to analyze duplicate files per user, connect DiskBoss Ultimate to an SQL Database

    and submit reports containing duplicates owned by multiple users to the SQL database usingthe DiskBoss GUI application or the DiskBoss command line utility. Once reports are in thedatabase, open the Database dialog and press the Users button to open the Users Statisticsdialog.

    diskboss -duplicates -dir \\server\share -host -save_to_database

    The simplest way to submit reports from multiple servers or desktop computers is to use theDiskBoss command line utility to detect duplicate files on all required hosts through thenetwork. In order to simplify submission of reports to the SQL database, the command lineutility may be executed on the same host where the SQL database is installed on. In this case,the user needs to specify one or more network shares to be processed and the host name tobe set for each report.

    diskboss -duplicates -dir -save_report

    Another option is to execute the command line utility on each specific host, save duplicate filesreports and later submit report files from all hosts to the SQL database using the DiskBoss GUI

    application. In this case, there is no need to set the host name, which will be set automaticallyto the name of the host the command line utility is executed on.

    Important: By default, processing and display of user names is disabled. In order toenable this capability, open the options dialog and enable this option.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    11/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    11

    Analyzing Duplicate Files Per Host

    DiskBoss Ultimate and DiskBoss Server provide the ability to submit duplicate files reportsfrom multiple servers and desktop computers into a centralized SQL database, analyze reportsand display various types of charts showing the amount of duplicate disk space and thenumber of duplicates per host allowing one to gain an in-depth visibility into amounts ofduplicate files across the entire enterprise.

    In order to analyze reports from multiple hosts, the user needs to connect DiskBoss to an SQLDatabase, perform duplicate files search on multiple hosts using the DiskBoss GUI applicationor the DiskBoss command line utility and submit reports from all hosts to the SQL database.Once reports from all hosts are in the database, open the Database dialog and press theHosts button to open the Hosts Statistics dialog.

    diskboss -duplicates -dir \\server\share -host -save_to_database

    The simplest way to submit reports from multiple servers or desktop computers is to use theDiskBoss command line utility to detect duplicate files on all required hosts through thenetwork. In order to simplify submission of reports to the SQL database, the command line

    utility may be executed on the same host where the SQL database is installed on. In this case,the user needs to specify one or more network shares to be processed and the host name tobe set for each report.

    diskboss -duplicates -dir -save_report

    Another option is to execute the command line utility on each specific host, save duplicate filesreports and later submit report files from all hosts to the SQL database using the DiskBoss GUIapplication. In this case, there is no need to set the host name, which will be set automaticallyto the name of the host the command line utility is executed on.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    12/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    12

    Detecting Duplicates in Specific File Types

    One of the most powerful capabilities of DiskBoss is the ability to perform disk analysis and filemanagement operations on files matching user-specified criteria. In order to be able focus ofspecific types of duplicate files, the user is provided with the ability to define one or more filematching rules specifying files that should be processed by the DiskBoss' duplicate file finder.

    Files not matching the specified rules, will be just skipped from the duplicate files detectionprocess.

    In order to add one or more file matching rules to a duplicate files detection operation, openthe operation dialog, select the rules tab and press the Add button located on the right sideof the dialog. Once finished adding file matching rules, select an appropriate rules logic andpress the Save button.

    Advanced Duplicate Files Detection Options

    The DiskBoss' duplicate files finder provides a large number of advanced options allowing oneto customize duplicate files detection operations for user-specific hardware and storageconfigurations. The General tab allows one to control the file signature type, the file scanningmode, the maximum number of duplicate file sets to display in the results dialog and the filefilter, which may be used to limit the operation to specific files using a file name pattern.

    The Performance tab provides the ability to intentionally slow down the duplicate filesdetection process in order to minimize the potential performance impact on running production

    systems. The Exclude tab allows one to define one or more subdirectories to be excludedfrom the duplicate files detection process.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    13/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    13

    Using Automatic Duplicate Files Removal Actions

    DiskBoss Ultimate and DiskBoss Server provide the user with the ability to automaticallyexecute one or more duplicate files removal actions for files matching user-specified rules. Inorder to define one or more automatic duplicates removal actions, open the operation dialog,select the Actions tab and press the Add button.

    On the Action dialog select the original file detection mode, an appropriate duplicates removalaction and specify one or more file matching rules defining files the action should be appliedto. During runtime, DiskBoss will process detected duplicate files, apply the specified filematching rules, detect the original file and execute the duplicates removal actions for files

    matching the specified rules and policies.

    By default, DiskBoss executes automatic duplicates removal actions in the Auto-Select mode,

    which selects the specified actions and displays the duplicates removal actions preview dialogallowing one to review and manually confirm each specific action. After testing the duplicatefile detection operation in the preview mode, change the actions mode to Execute toautomatically execute the specified duplicates removal actions without showing the actionspreview dialog.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    14/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    14

    Finally, IT administrators are provided with the DiskBoss command line utility allowing one toexecute automatic duplicate files detection and removal operations from batch files and shellscripts, periodically remove duplicates from servers and enterprise storage system andintegrate DiskBoss' duplicate files detection capabilities with other products and solutions.

    The DiskBoss command line utility is available in DiskBoss Ultimate and DiskBoss Server and itis capable of executing user-defined duplicate files detection and removal commands definedin the DiskBoss GUI application and/or written in the DiskBoss' XML format.

    User-Defined Duplicate Files Detection Commands

    One of the most powerful and flexible capabilities of DiskBoss is the ability to pre-configurecustom duplicate files detection and removal operations as user-defined commands andexecute such commands in a single mouse click using the DiskBoss GUI application or directdesktop shortcuts.

    User-defined commands may be managed and executed through the commands dialog or thecommands tool pane. In order to add a new command through the commands pane, press theright mouse button over the pane and select the Add New Duplicate Files Search Command

    menu item. In order to execute a previously saved command, just click on the command itemin the commands tool pane or create a direct desktop shortcut on the Windows desktop.

  • 8/6/2019 DiskBoss Duplicate Files Finder

    15/15

    DiskBoss Duplicate Files Finder Flexense Ltd.

    15

    Detecting Duplicates Using the Command Line Utility

    In addition to the DiskBoss GUI application, DiskBoss Ultimate provides a command line utilityallowing one to execute duplicate files detection and removal operations from batch files andshell scripts. The command line tool is located in the \bin directory.

    Command Line Syntax:

    diskboss -duplicates -dir [ ... ]

    Parameters:

    -dir < Directory 1> [ ... < Directory X> -file ]

    This parameter specifies the list of input directories or files to process. In order to ensureproper parsing of command line arguments, directories and file names containing spacecharacters should be double quoted.

    Options:

    -signature_type

    This parameter sets the type of algorithm used to calculate signatures of files. By default,DiskBoss uses the SHA256 algorithm.

    -exclude_dir [ ... ]

    This parameter specifies the list of directories that should be excluded from processing. Inorder to ensure proper parsing of command line arguments, directories containing spacecharacters should be double quoted.

    -filter

    This parameter sets the directory search filter (default *.*).

    -workers

    This parameter sets the number of working threads to process files. DiskBoss is optimized forMulti-Core and Multi-CPU computers and is capable of distributing the workload to anunlimited number of CPUs. By default, DiskBoss processes files with one working thread.

    -max_dup_set

    This parameter sets the maximum number of duplicate file sets to report about. By default,DiskBoss will report about up to 1000 duplicate file sets sorted by the amount of wastedstorage space.

    -min_wasted_space

    This parameter sets the minimum amount of wasted storage space to report about. By default,DiskBoss will report about duplicate file sets wasting at least 1 MBytes of storage space.

    -save_html_report | save_csv_report | save_text_report [ ReportFileName ]

    This parameter saves a report file. If no file name is specified, DiskBoss will automaticallygenerate a file name according to the following template:

    diskboss_duplicates_[date]_[time].html

    -v - This command shows the products version, revision and build date.

    -help - This command shows the command line usage information.