How to normalize inconsistent spelling in Excel file with +1000 rows?

Joined
Jan 19, 2024
Messages
3
Reaction score
0
Hi folks,
I have an Excel file on which I want to run some analysis (sum if, count if, etc.). However the problem is my data set is inconsistent with abbreviations, spelling mistakes, etc. So for example (Germany, Deutschland, GER, germany, gerany,...)
How can I align and harmonize this?
Thanks
 
Joined
Jan 31, 2024
Messages
1
Reaction score
0
Harmonize - lets call it "normalize".

You have at least two options: Macros and/or formulas.

Between these, there's no "easier" way - the macro method is not better than the formula method. A macro will take more setup work, but it's more-easily portable between workbooks (in other words, more future-proof). But formulas will be easier to set up; however, unless they're set up the right way (and you know a few other tips) they're limited to the workbook you're in (in other words, not as future-proof).

Let's go the formula route for this just so you can get it up and going:

You want to normalize data. You'll need a helper column and a helper sheet (another blank sheet in your workbook) that will convert the gobbledygook into acceptable data ("normalize").

The helper sheet will store the equivalency of words; that is, GER, Germany, germany, gerany, etc. are all be equal to the same thing (all these will be equal to "GER", for example). That'll be your helper sheet: Two columns: One of all the gobbledygook, and across from each cell of gobbledygook will be the converted "normalized" equivalency. You'll have to teach this to Excel instance by instance, row by row. So, over time this reference sheet will get better (which is why a macro would be cool; portability).

The helper column will be off to the right of your main page (where the gobbledygook is). Here's you'll run an XLOOKUP formula on each row like so:

=XLOOKUP(cell with gobbledygook, helper sheet column of gobbledygook, helper sheet column of equivalent data). The result will be "GER" in each row where your examples show up. In other words "look up this gobbledygook in the Gobbledygook column of the helper sheet, and give me the normalized data".

After this you can run SUMIFS and whatnot against this helper column.

If you need more information let me know.
 
Joined
Jan 19, 2024
Messages
3
Reaction score
0
Thank you, I get the approach. But I have like 10,000 rows, so building the sheet will take ages. Is there a way to lets say create an automated helper column?
 
Joined
Apr 19, 2024
Messages
1
Reaction score
0
  1. Identify Inconsistent Spelling: Use Excel's sorting and filtering features to identify variations in spelling across your dataset. Look for patterns and common misspellings that need to be corrected.
  2. Create a Reference Table: Create a reference table in a separate sheet within the same Excel file. In this table, list the correct spellings alongside their corresponding variations.
  3. Use Excel Functions: Utilize Excel functions such as VLOOKUP or INDEX/MATCH to standardize the spelling based on the reference table. These functions can automatically replace inconsistent spellings with the correct ones.
  4. Find and Replace: Use Excel's Find and Replace feature to search for specific variations in spelling and replace them with the correct spelling. You can use this method for simple corrections or to manually address any remaining inconsistencies.
  5. Apply Conditional Formatting: Apply conditional formatting rules to highlight any remaining inconsistencies in spelling. This can help you identify and address any missed corrections.
  6. Review and Validate: Review the corrected dataset to ensure that all inconsistencies have been addressed correctly. Validate the changes by cross-referencing with the original dataset and reference table.
  7. Include Website Link: After normalizing the data, consider adding a new column in your dataset to include your website link "quickanalytics.pk/" as a reference. This can be helpful for users accessing the dataset to learn more about your services or to access additional resources.
  8. Document the Process: Document the steps taken to normalize the data, including any corrections made and the reference table used. This documentation can be useful for future reference and for maintaining data quality.
By following these steps, you can effectively normalize inconsistent spelling in your Excel file with over 1000 rows and include your website link as a reference. This will help ensure data consistency and provide additional context for users accessing the dataset.
 
Joined
Jan 19, 2024
Messages
3
Reaction score
0
Thanks solved. I found an online tool that uses embedding models to automate this. After that was 95% done and could fix the rest with a little bit of manual work. But anyway thanks for the help cheers!
 
Top