CLEAN Function

An Excel Text function that can remove non-printable characters from the supplied text strings.

Author: Akash Bagul
Akash Bagul
Akash Bagul
Reviewed By: Parul Gupta
Parul Gupta
Parul Gupta
Working as a Chief Editor, customer support, and content moderator at Wall Street Oasis.
Last Updated:May 20, 2024

What is the CLEAN Function?

The CLEAN function is an Excel Text function that can remove non-printable characters from the supplied text strings.

If you have the work requirement of downloading data from a third-party website and then later analyzing it, you know one of the most common non-printable characters.

You would find that the data ‘sometimes’ has a lot of unnecessary line breaks and null values, which makes the data analysis part a lot worse.

Until the data is cleaned, getting a clear view of what the data means becomes quite difficult. This is where the CLEAN function comes into the scenario.

Using the function, we can easily remove all the non-printable characters with just the cell reference in the function.

This article will guide you on the CLEAN function, its use, and some examples.

Key Takeaways

  • The CLEAN function is used to remove the first 32 non-printable ASCII characters (codes 0 through 31) from a given text string.
  • The CLEAN function is essential for cleaning up data by removing unwanted characters that might not be visible but can interfere with data processing tasks such as text parsing, data validation, and analysis.
  • The CLEAN function only removes the first 32 ASCII characters. It does not handle other non-visible or special characters that might be present in different Unicode standards.
  • For more extensive data cleaning, other functions like TRIM (which removes leading and trailing spaces) and SUBSTITUTE (which can replace specific characters) may be needed in conjunction.

Understanding The CLEAN Function

The CLEAN is categorized as a Text function that removes non-printable characters from a given text string.

The different non-printable characters the function removes are numerically represented by 0 to 31 in 7-bit ASCII code, which maps alphabetical characters, numbers, and symbols from 0 to 127.

The characters represented by 0 to 31 are illustrated below:

Data

For example, suppose you have the text string as ‘ hello world .‘ If you use the function, then the expected result would be ‘hello world.’

CLEAN Function Formula

The syntax for the function is:

Where,

  • text - (required) the text string from which the non-printable character needs to be removed.

An important thing to remember is that CLEAN will not remove extra space characters from the given text string. Instead, it will only remove the characters represented by numbers 0 to 31 in the 7-bit ASCII chart.

You should use the TRIM function to remove the unnecessary space characters between the text strings.

How to use the CLEAN function in Excel?

This section will show an example of how exactly these non-printable characters look in an Excel spreadsheet.

You must be wondering, though, how do you even return these characters in Excel? The easiest way to get each of those characters is by using the CHAR function in Excel.

This is just to understand that similar values could be found in files downloaded from the internet.

The CHAR function in Excel accepts a numerical value to return their corresponding characters as

Example

As you can see, most of these numbers return a blank cell or a 🗌 character in the given cell.

🗌 characters are easily identifiable as they stand out; however, the same cannot be said about the blank cell characters.

They can be misunderstood to be space characters. Thus, in such cases, if you use the TRIM function, then the function would not affect the text string.

For example, suppose you have the data as illustrated below:

TRIM

We can see unnecessary characters before and after the ‘GameStop Inc’ text string. So, are these the space characters or the non-printable ones?

Any person would initially go ahead with the TRIM function and use the formula =TRIM(B3) in cell C3, which gives the result:

TRIM Result

The text string is centrally aligned, yet we can see that it is not exactly in the middle. It is still aligned more towards the left rather than the right. This means that the non-printable characters still exist in the text string.

On the other hand, when we use the function using the formula =CLEAN(B3) in cell D3, we get a centrally aligned text string as

Result

Thus, the function works well if the objective is to remove the non-printable characters whose number corresponds to 0 to 31 in a 7-bit ASCII chart.

CLEAN Function Examples

Time for some practical examples or understanding scenarios where you can use the function. If you have understood the basics of the function, it would be easy to interpret what to do in a real-life scenario.

Example 1

Suppose you download the employee email ids from your web-based HRM software. However, you find unwanted characters in the dataset for some unknown reasons.

Example

Since these are not just the space characters but others, using the CLEAN function makes much more sense.

We will use the formula =CLEAN(D3) in cell E3 and drag it down to cell E7, which gives the result:

Result

The function easily removes all the 🗌 characters from the email addresses and returns the text string in a separate cell.

Example 2

Suppose we have another dataset wherein line breaks separate the text strings. The data looks as illustrated below:

Example 2

To remove those line breaks, we will use the formula =CLEAN(B3) in cell C3 and drag it down till cell C7, which gives the result:

Result 2

Thus, even line breaks can be easily removed using the CLEAN function.

Note

After using the CLEAN function, you might get concatenated text string, for example, ‘ClaraLane,’ i.e., not separated by the space character. Here, we have added a space character after the first name to differentiate the first and last names.

Example 3

Finally, suppose you have some text strings as illustrated below:

Example 3

At first glance, those look quite normal, so let’s just evaluate whether they have any non-printable characters between them or not.

We will combine IF, LEN, and CLEAN functions to return two customized text strings. The formula will be =IF(LEN(B3)-LEN(CLEAN(B3))>0,"Non-Printable Character Exist","Clean Text String") in cell C3, which gives the result:

Result 3

As you can see, in three different instances, there are non-printable characters in the given dataset. Therefore, we can customize the formula further to return the number of non-printable characters in column C.

The formula will be =IF(LEN(B3)-LEN(CLEAN(B3))>0,(LEN(B3)-LEN(CLEAN(B3)))&" Non-Printable Character Exist","Clean Text String"), which gives the result:

Result

So how does the formula actually work? Let’s break it down into a couple of steps:

Step 1: Firstly, we have the IF function, which is intended to return two different text strings - ‘Non-Printable Character Exist’ and ‘Clean Text String.’

Step 2: Next, we input the condition LEN(B3)-LEN(CLEAN(B3))>0, which checks the length of the text string before and after using the CLEAN function.

For example, let's take ‘Clara Lane.’ The length of the text string before using the CLEAN function is 10. We then again calculate the length of the text string, but this time, we also include the CLEAN function.

What we initially believed was a space character turned out to be a non-printable one.

Thus, the total length turns out to be 9. Since 10 - 9 > 0, we get the customized text string as ‘Non-Printable Character Exist.’

Step 3: If we just use the (LEN(B3)-LEN(CLEAN(B3))) formula and concatenate it with the text string, it gives the number of non-printable characters in the text string.

As we always say, nothing beats the versatility of the IF function!

Free Resources

To continue learning and advancing your career, check out these additional helpful WSO resources: