A Guide to Data Cleaning

Author

Data Analyst - Pythias C

Published

August 3, 2024

1 INTRODUCTION

The Excel file, titled “Sample Data Cleaning,” presents a comprehensive approach to transforming raw data into a clean, consistent, and usable format.

2 The following data cleaning processes have been applied:

  1. Name: Names have been split into separate surname and first name columns using the TEXT TO COLUMN FUNCTION for easier analysis and sorting.

  2. Brand: Unique brand values have been identified using the UNIQUE formula, and the brand “Nike” has been standardized through the FIND & REPLACE FUNCTION .

  3. Product: Product names have been capitalized consistently using the PROPER FORMULA.

  4. Location: Excess spaces in country names have been removed with the TRIM FORMULA, and locations have been divided into separate country and city columns using FLASH FILL FUNCTION for efficiency.

  5. Fulfillment: Fulfillment data has been converted into percentages by dividing by 100.

  6. Price: Commas in price values have been replaced with periods using SUBSTITUTE FORMULA, and the data format has been converted to numbers for numerical calculations using VALUE FORMULA.

Data Consistency: By standardizing formats and removing inconsistencies, data accuracy and reliability have been enhanced.

Data Completeness: Rows containing blank cells have been deleted to ensure data integrity and prevent errors in analysis using the GO TO SPECIAL FUNCTION.

This cleaned dataset provides a solid foundation for further analysis, reporting, and decision-making. By following these data cleaning steps, users can improve the quality and usefulness of their data

Click Here to View The Dataset 👈🏻

Back to top