A Guide to Data Cleaning
Table of Contents
1 INTRODUCTION
The Excel file, titled “Sample Data Cleaning,” presents a comprehensive approach to transforming raw data into a clean, consistent, and usable format.
2 The following data cleaning processes have been applied:
Name: Names have been split into separate surname and first name columns using the TEXT TO COLUMN FUNCTION for easier analysis and sorting.
Brand: Unique brand values have been identified using the UNIQUE formula, and the brand “Nike” has been standardized through the FIND & REPLACE FUNCTION .
Product: Product names have been capitalized consistently using the PROPER FORMULA.
Location: Excess spaces in country names have been removed with the TRIM FORMULA, and locations have been divided into separate country and city columns using FLASH FILL FUNCTION for efficiency.
Fulfillment: Fulfillment data has been converted into percentages by dividing by 100.
Price: Commas in price values have been replaced with periods using SUBSTITUTE FORMULA, and the data format has been converted to numbers for numerical calculations using VALUE FORMULA.
Data Consistency: By standardizing formats and removing inconsistencies, data accuracy and reliability have been enhanced.
Data Completeness: Rows containing blank cells have been deleted to ensure data integrity and prevent errors in analysis using the GO TO SPECIAL FUNCTION.
This cleaned dataset provides a solid foundation for further analysis, reporting, and decision-making. By following these data cleaning steps, users can improve the quality and usefulness of their data