DataScrub Tutorial

A complete guide to cleaning, transforming, and importing data in Excel with DataScrub's 55+ tools.

1. Getting Started

Installation

How to install DataScrub:

  1. Download the DataScrub64-packed.xll file
  2. Double-click the file — Excel will open and load the add-in
  3. Look for the DATA SCRUB tab in the Excel ribbon
Tip: To make DataScrub load automatically every time you open Excel, go to File > Options > Add-ins > set Manage to Excel Add-ins > click Go > click Browse and select the .xll file.

The Ribbon

Once installed, you'll see a DATA SCRUB tab in your Excel ribbon. All 55+ tools are organized into groups:

DataScrub ribbon tab showing all tool groups
The DATA SCRUB ribbon tab with tools organized by category: Undo, Scan, Text, Numbers, Dates, Cleanup, Contacts, Import, and License.
Ribbon GroupWhat It DoesKey Tools
UndoReverse any DataScrub operationUndo
ScanDiagnose data quality issuesHealth Scan
TextClean and transform text valuesTrim Spaces, Strip Chars, Change Case, Add Text, Split, Join, Rearrange, Parse, Create, Count
NumbersFix and format numeric dataInsert, Quick Calc, Analyze, Transform, Convert, Calculate, Smart Fix
DatesParse and standardize datesExtract, Text → Date
CleanupRemove unwanted dataRemove Chars, Dedup, Clean HTML, Values, Remove Blanks, Unmerge, Fill, Find, Replace+
ContactsFormat contact informationSmart Fill, Standardize, Validate Emails, Format Phones, Split Names, Split Address, Normalize Country, Clean URLs
ImportBring in external dataImport (CSV, PDF, Word)

Basic Workflow

Every DataScrub tool follows the same simple pattern:

The 3-step workflow:

  1. Select the cells or columns you want to clean
  2. Click the tool you want from the DATA SCRUB ribbon
  3. Done — your data is cleaned in place. Use Undo if you need to revert.
Tip: You can select entire columns by clicking the column header (A, B, C...). DataScrub will process all non-empty cells in the selection.

2. Data Health Scan

The Health Scan is the best place to start with any new dataset. It analyzes your entire selection and reports every quality issue it finds — extra spaces, text-formatted numbers, formula errors, blanks, duplicates, and hidden characters — all in one pass.

Data Health Scan dialog showing scan results with issue categories, counts, severity, and descriptions
The Data Health Scan analyzed 8.1 million cells and found 5.5 million issues. Health score: 32%. Each issue category shows count, severity, and description.

How to Run a Health Scan

Steps:

  1. Select the data range you want to scan (or press Ctrl+A to select all)
  2. Click Health Scan on the DATA SCRUB ribbon tab
  3. The scan runs and displays a results dialog with your data's health score

Understanding the Results

The Health Scan reports a Health Score (0–100%) based on how many issues it finds relative to your total cell count. Each issue is categorized with a severity level:

Issue CategorySeverityWhat It Means
Extra SpacesMediumLeading, trailing, or multiple consecutive spaces in text. These cause lookup failures and mismatched joins. Click to fix with Trim Spaces.
Text-Formatted NumbersHighNumbers stored as text (left-aligned, green triangle). SUM, VLOOKUP, and other formulas will silently skip these cells. Click to fix with Values or Smart Fix.
Formula ErrorsHigh#REF!, #VALUE!, #N/A, #DIV/0!, and other error values. Review and fix the source formulas, or use Smart Fix to replace errors with blanks or zeros.
Empty CellsLowBlank cells in your selection. Use Remove Blanks to delete blank rows, or Fill to fill gaps with the value above.
Duplicate ValuesMediumRepeated values (excluding the first occurrence). Use Dedup to highlight or remove duplicates.
Hidden CharactersHighNon-printable characters, non-breaking spaces, zero-width characters. Invisible but break lookups and matching. Click to fix with Strip Chars or Clean HTML.

Fixing Issues

After reviewing the scan results, you have two options:

Tip: Run the Health Scan again after fixing issues to verify your data quality improved. Aim for a score of 90% or higher before using data in reports or analysis.
Note: The Health Scan is read-only until you click "Fix Selected". Simply running the scan does not modify your data.

3. Text Cleaning

The Text group is the largest set of tools in DataScrub. These tools clean, transform, split, and restructure text data without formulas.

Trim Spaces

Removes leading spaces, trailing spaces, and reduces multiple consecutive spaces to a single space. This is the #1 most common data quality issue — invisible spaces that break VLOOKUP, MATCH, and pivot table groupings.

Trim Spaces dropdown showing 6 whitespace cleaning options
The Trim Spaces dropdown offers 6 targeted options: trim leading/trailing, remove extra spaces, remove all spaces, non-breaking spaces, line breaks, or clean all whitespace at once.

Click the Trim Spaces dropdown arrow to choose exactly what type of whitespace to clean:

OptionWhat It DoesWhen to Use
Trim leading_trailingRemoves spaces from the start and end of each cellMost common — fixes VLOOKUP mismatches
Remove extra spacesReduces multiple spaces to one (e.g., "John   Smith" → "John Smith")Names and addresses with inconsistent spacing
Remove all spacesStrips every space characterID numbers, codes, phone numbers where no spaces are wanted
Remove non-breaking spacesRemoves   characters (char 160) that look like spaces but aren'tData pasted from websites or HTML emails
Remove line breaksStrips carriage returns and newlines within cellsMulti-line cells that should be single-line
Clean all whitespaceDoes everything above in one passWhen you just want it all gone — the "nuclear option"

How to use:

  1. Select the cells or columns with extra spaces
  2. Click Trim Spaces on the ribbon (or click the dropdown arrow for specific options)
  3. Spaces are removed instantly — check the status bar for a count of cells cleaned

Strip Chars

Removes specific character types from your selection. Click the Strip Chars dropdown to choose what to remove:

Strip Chars dropdown showing 5 character removal options
The Strip Chars dropdown with options to remove custom characters, letters, numbers, accents, or deduplicate within cells.
OptionWhat It DoesExample
Remove Characters...Opens the full Remove Characters dialog (see below)Remove first N chars, keep only N chars, strip non-printable, or specify custom characters
Remove All Letters (A-Z)Strips all alphabetic characters, leaving only numbers and symbols"Order#12345" → "#12345"
Remove All Numbers (0-9)Strips all digits, leaving only letters and symbols"John123Smith" → "JohnSmith"
Remove Accents (e→e)Replaces accented characters with their plain equivalents"café" → "cafe", "résumé" → "resume"
Dedup Within Cell...Removes duplicate words or values within a single cell"red, blue, red, green" → "red, blue, green"

The Remove Characters Dialog

Clicking Remove Characters... opens a detailed dialog with 8 removal modes:

Remove Characters dialog showing 8 character removal options with count field and preview
The Remove Characters dialog lets you remove by position (first N, last N, keep only first N), by type (letters, numbers, non-printable, accents), or by specifying exact characters to strip. The status bar shows how many cells will be affected.
ModeWhat It DoesUse Case
Remove first N charactersStrips the first N characters from each cellRemove a fixed-length prefix like "INV-" (set N=4)
Remove last N charactersStrips the last N characters from each cellRemove a file extension like ".csv" (set N=4)
Keep only first N charactersTruncates each cell to N charactersStandardize codes to a fixed length
Remove all letters (keep numbers)Strips A-Z, keeps digits and symbolsExtract numeric IDs from mixed text
Remove all numbers (keep letters)Strips 0-9, keeps letters and symbolsExtract names from alphanumeric codes
Remove non-printable charactersStrips control chars, zero-width chars, BOM markersClean data pasted from web or imported files
Remove accents/diacriticsReplaces accented letters with plain equivalentsNormalize international names for matching
Remove specific charactersType the exact characters to strip in the text boxRemove specific symbols like @, #, $, etc.
Tip: The status bar at the bottom shows "X of Y cells will change" before you click Remove — so you can preview the impact without modifying anything.

Change Case

Converts text between case formats. Click the Change Case dropdown to pick a format:

Change Case dropdown showing 5 case conversion options with UPPER CASE data visible in column D
The Change Case dropdown with 5 options. Notice column D has inconsistent casing — some rows are UPPER CASE ("UTILITIES - ELECTRIC"), others are Title Case ("Cash - Operating"). Change Case fixes this in one click.
OptionResultWhen to Use
UPPER CASE"cash - operating" → "CASH - OPERATING"Standardizing codes, headers, or labels
lower case"JOHN SMITH" → "john smith"Preparing data for case-sensitive matching
Title Case"cash - operating" → "Cash - Operating"Names, addresses, descriptions — the most common choice
Sentence case"HELLO WORLD. GOODBYE." → "Hello world. Goodbye."Notes, comments, free-text fields
tOGGLE cASE"Hello" → "hELLO"Rarely used — inverts every character's case

Add Text

Inserts text at a specific position in every cell in your selection — beginning, end, or at a character position. Useful for adding prefixes, suffixes, or inserting text mid-string.

Add Text dialog showing text insertion with 4 position options and live preview
The Add Text dialog inserting "STRICTED" at the end of cells containing "UNRE" — turning them into "UNRESTRICTED". The status bar shows 14 of 14 cells will change. Notice the live results already visible in column I.
Position OptionWhat It DoesExample
Add to beginning of cell valuePrepends text before existing content"10001" → "INV-10001"
Add to end of cell valueAppends text after existing content"UNRE" → "UNRESTRICTED"
Add before positionInserts text before the Nth characterInsert "-" at position 4: "AP10001" → "AP-10001"
Add after positionInserts text after the Nth characterInsert " " at position 2: "NY10001" → "NY 10001"
Tip: The preview updates live as you type, so you can see exactly what your data will look like before clicking Apply.

Split & Join

Split breaks cell contents into multiple columns or rows. Join does the opposite — combines multiple columns into one. Both are accessible from the Split Join button on the ribbon.

Split Cells dialog showing split method options (delimiter, fixed width, line break) and split direction options
The Split Cells dialog with 3 split methods (delimiter, fixed width, line break), 2 direction options (into columns or rows), and a "Keep original column" checkbox.

Split Methods

MethodWhat It DoesExample
Split by delimiterSplits on a character — comma, space, dash, pipe, or custom"Smith, John" splits into "Smith" | "John"
Split by fixed widthSplits every N characters"ABCDEF" with width 2 → "AB" | "CD" | "EF"
Split by line breakSplits multi-line cells into separate cellsA cell with 3 lines → 3 columns or 3 rows

Split Direction

Tip: Check "Keep original column" if you want to preserve the source data alongside the split results. Uncheck it to replace the original with the split output.

Join (Combine Columns)

Join does the opposite of Split — select multiple columns and combine them into one, with a delimiter of your choice. For example, joining First Name + Last Name with a space separator, or combining Address + City + State + Zip with commas.

Rearrange

Reorders parts of text within each cell. Common use: converting "Smith, John" to "John Smith" without splitting into separate columns.

Parse

Extracts specific data from text cells using pattern recognition. Click the Parse dropdown to choose an extraction method:

Parse dropdown showing 5 extraction options: Substring, Between Delimiters, Numbers Only, Emails/URLs, Regex Pattern
The Parse dropdown with 5 extraction tools. Each one pulls specific data out of your cells into a new column.
OptionWhat It ExtractsExample
Substring (first/last N)The first or last N characters from each cellFirst 3 of "AP-10001" → "AP-"
Between DelimitersText between two characters you specifyBetween "(" and ")" in "John (CEO)" → "CEO"
Numbers OnlyExtracts just the numeric parts from mixed text"Invoice #12345-A" → "12345"
Emails_URLsFinds and extracts email addresses or URLs from free text"Contact us at info@org.com for help" → "info@org.com"
Regex Pattern...Extract using a custom regular expression — the most powerful optionPattern \d{3}-\d{4} extracts phone numbers like "555-1234"
Tip: Parse extracts data into a new column to the right — it doesn't modify the original cell. This makes it safe to experiment with different extraction patterns.

Create

Generates new data based on patterns — sequential IDs, random values, repeated sequences, or template-based text.

Count

Counts characters, words, or specific patterns in each cell and writes the count to an adjacent column. Useful for data validation (e.g., ensuring phone numbers have exactly 10 digits).

Tip: Most text tools work on the selected cells only. If you want to clean an entire column, click the column header first to select all cells in that column.

4. Number Tools

The Numbers and Dates groups fix, format, and transform numeric and date data. The most common issue: numbers stored as text that silently break SUM, AVERAGE, and other calculations.

Numbers and Dates ribbon groups showing Insert, Transform, Extract, Quick Calc, Convert, Calculate, Analyze, Smart Fix, and Text to Date tools
The Numbers and Dates ribbon groups. These tools handle everything from text-to-number conversion to date parsing, quick calculations, and data analysis.

Insert

Generates new data columns — sequences, IDs, running totals, and calculated values. Click the Insert dropdown:

Insert dropdown showing Sequential Numbers, Random Numbers, Unique IDs, Row Numbers, Running Total, and Calculated Column
The Insert dropdown with 6 options for generating new data alongside your existing columns.
OptionWhat It GeneratesUse Case
Sequential Numbers1, 2, 3, 4... (or custom start/step)Adding row numbers or sequence IDs to imported data
Random NumbersRandom values within a range you specifyTest data generation, sampling
Unique IDsUnique identifiers for each rowCreating primary keys for data that lacks them
Row NumbersSimple row numbering (1, 2, 3...)Quick numbering after sorting or filtering
Running TotalCumulative sum down a numeric columnRunning balance for financial data
Calculated ColumnNew column based on a formula using existing columnsCompute Debit - Credit, concatenate fields, etc.

Quick Calc

Performs quick arithmetic on every cell in your selection — no formulas needed. Enter a value, pick an operation, and DataScrub applies it across all selected cells in place.

Quick Calc dropdown showing 6 arithmetic options: Add, Subtract, Multiply, Divide, Markup/Discount, Percentage Change
The Quick Calc dropdown with 6 one-click arithmetic operations. Each one applies to every numeric cell in your selection.
OptionWhat It DoesExample
Add to all cellsAdds a fixed number to every cellAdd 100 to all prices for a shipping surcharge
Subtract from all cellsSubtracts a fixed number from every cellSubtract 5 to remove a flat fee from each amount
Multiply all cells byMultiplies every cell by a factorMultiply by 1.1 to apply a 10% increase
Divide all cells byDivides every cell by a valueDivide by 100 to convert cents to dollars
% Markup/DiscountApplies a percentage increase or decrease+15% markup on cost prices, or -20% discount
Percentage ChangeCalculates the % change between each cell and a base valueShow how each month's sales compares to a baseline
Tip: Quick Calc modifies values in place. If you want to keep the original data, copy the column first or use Undo immediately after if the result isn't what you expected.

Convert

Converts between units — temperature (F/C), distance (miles/km), weight (lbs/kg), currency formatting, and more.

Calculate

Evaluates text expressions as math. If a cell contains "2+3*4", Calculate will replace it with "14". Useful for imported data that contains formulas as plain text.

Smart Fix

Automatically detects and fixes the most common number issues:

Transform

Applies mathematical transformations: absolute value, round up/down, floor/ceiling, logarithm, square root, and power functions.

Analyze

The Analyze dropdown provides tools that go beyond simple statistics — it includes scaling, ranking, and number-to-text conversion:

Analyze dropdown showing Scale/Normalize, Rank Values, and Spell Out Numbers options
The Analyze dropdown with 3 tools: Scale/Normalize for standardizing numeric ranges, Rank Values for ordering data, and Spell Out Numbers for converting digits to words.
OptionWhat It DoesUse Case
Scale / NormalizeRescales numbers to a standard range (e.g., 0–1 or 0–100) using min-max normalization or z-score standardizationComparing metrics on different scales — e.g., revenue (thousands) vs. headcount (tens)
Rank ValuesAssigns a rank to each cell based on its value (highest = 1, or lowest = 1)Ranking sales reps by performance, students by score, or transactions by amount
Spell Out NumbersConverts numeric values to written words"85000" → "Eighty-Five Thousand" — required on checks, invoices, and legal documents
Tip: Spell Out Numbers is especially useful for financial documents where both the numeric amount and the written-out amount are required (e.g., checks, contracts, grant agreements).
Tip: If your SUM formula returns 0 but the cells clearly have numbers, they're probably text-formatted. Select the range and click Smart Fix — it will convert them to real numbers instantly.

5. Date & Time Tools

The Dates group handles the most frustrating Excel problem: dates stored as text that won't sort, filter, or calculate correctly.

Text → Date

Converts text strings to proper Excel date values. Handles dozens of formats automatically:

DataScrub auto-detects the format. If it can't determine whether "01/02/2024" is January 2 or February 1, it will ask you to confirm the format.

Extract

Pulls out date components into separate columns: year, month, day, quarter, week number, day of week, or time components (hour, minute, second). Useful for building pivot table groupings.

Tip: If dates display as 5-digit numbers (like 45307), the cells are formatted as "General" instead of "Date". Right-click > Format Cells > Date to display them correctly. The underlying value is correct.

6. Data Quality & Deduplication

The Cleanup group removes unwanted data and fixes structural issues in your spreadsheet.

Dedup (Remove Duplicates)

DataScrub's deduplication goes beyond Excel's built-in "Remove Duplicates":

Remove Blanks

Deletes entire rows where the selected column(s) are empty. Also can remove blank cells and shift remaining data up.

Fill

Fills empty cells with the value from the cell above. Essential for "merged cell" style data where a category label appears once and the rows below are blank.

Unmerge

Unmerges all merged cells in the selection and fills the resulting blank cells with the merged value. Merged cells are the #1 cause of broken sorting, filtering, and pivot tables.

Values

Converts formulas to their current values (paste-as-values in one click). Also converts text-formatted numbers to real numbers.

Clean HTML

Strips HTML tags, entities (&,  , etc.), and embedded formatting from text. Essential when you've pasted data from a website or email.

Find & Replace+

An enhanced find-and-replace with:

Tip: Always run Trim Spaces before Dedup. Two cells that look identical might have different trailing spaces, causing the dedup to miss them.

7. Import (CSV, PDF, Word)

The Import tool brings external data directly into your Excel worksheet from multiple file formats.

CSV Import

Imports CSV and TSV files with automatic delimiter detection. Handles encoding issues (UTF-8, ANSI, etc.) and preserves leading zeros in ID columns.

PDF Import

Extracts tables from PDF files into Excel. Uses intelligent table detection to identify rows, columns, and headers. Works with:

Word Import

Extracts tables from Word documents (.docx) directly into Excel cells, preserving the table structure.

How to import a file:

  1. Click the cell where you want the imported data to start
  2. Click Import on the DATA SCRUB ribbon
  3. Select the file type (CSV, PDF, or Word)
  4. Browse and select your file
  5. Preview the data and adjust settings if needed
  6. Click Import — the data lands in your worksheet
Tip: After importing, run a Health Scan to check the quality of the imported data. Imports from PDF are especially prone to text-formatted numbers and merged fields.

8. Batch Clean & Quick Clean

For repetitive cleaning tasks, DataScrub lets you chain multiple tools together and run them in one click.

Batch Clean

Build a multi-step cleaning pipeline:

How to set up a batch clean:

  1. Click Batch Clean (in the Cleanup group or Quick Clean dropdown)
  2. Add steps: e.g., Trim Spaces → Change Case to Title → Remove Duplicates
  3. Preview the results at each step
  4. Click Run All to execute the entire pipeline
  5. Save the pipeline as a preset for future use

Quick Clean Presets

Pre-built cleaning recipes for common scenarios:

Tip: If you clean the same type of data regularly (e.g., a weekly export from another system), save a Batch Clean preset. Next time, it's a one-click operation.

9. Tips & Best Practices

Before You Start

Common Workflows

ScenarioRecommended Steps
Cleaning a CSV exportImport → Health Scan → Trim Spaces → Smart Fix (numbers) → Dedup
Preparing data for VLOOKUPTrim Spaces → Change Case → Strip hidden characters → Values (remove formulas)
Cleaning a mailing listTrim → Split Names → Title Case → Format Phones → Validate Emails → Dedup
Extracting data from PDFImport PDF → Health Scan → Smart Fix → Text → Date → Trim
Fixing pivot table source dataUnmerge → Fill gaps → Trim Spaces → Remove Blanks → Dedup

Keyboard Shortcuts

DataScrub tools don't have built-in keyboard shortcuts, but you can use Excel's ribbon shortcut system: press Alt, then look for the shortcut letters that appear on the DATA SCRUB tab. This lets you access any tool without the mouse.

Getting Help

If you run into issues or have questions, email jkisarale@gmail.com — we typically respond within 24 hours.

Ready to clean your data?

Download DataScrub and try all 55+ tools free for 3 days.

Download Buy License — $29/yr