Contents

Remove duplicate characters (with.Java)

   Feb 18, 2024     2 min read

This article looks into the “removing duplicate characters” issue.

As I solve coding test problems, I look back on the problems I solved and look into different solution methods to learn more.

Let’s look at the problem first.

problem

The string my_string is given as a parameter.

Please complete the solution function to remove duplicate characters from my_string and return a string with only one character remaining.

Restrictions

  • 1 ≤ my_string ≤ 110
  • my_string consists of uppercase letters, lowercase letters, and spaces.
  • Distinguish between uppercase and lowercase letters.
  • Spaces (“ “) are also separated by one character.
  • Among the duplicated characters, the first character is left.

Input/Output Example

my_stringresult
“people”“peol”
“We are the world”“We are arthwold”

My solution to the problem

import java.util.*;
class Solution {
     public String solution(String my_string) {
         StringBuilder answer = new StringBuilder();
         Set<String> set = new HashSet<>();
         char[] arrMyString = my_string.toCharArray();

         for(char ch : arrMyString){
             if(set.add(String.valueOf(ch))){
                 answer.append(ch);
             }
         }

         return answer.toString();
     }
}

Solution explanation

  • HashSet utilization: Create a HashSet that does not allow duplication through Set set = new HashSet<>();
  • String traversal: After converting the string to a character array, iterate through each character using the for-each statement.
  • Removal of duplicates: Add characters to HashSet through set.add(String.valueOf(ch)). If a character already exists, the add method returns false, and if it is a new character, it returns true.
  • Add unique character: Only if it is a new character, add the character to the result string through answer.append(ch).
  • Result return: After traversing all characters and removing duplicate characters, the resulting string is finally returned through the toString() method.

Code Advantages

  • Simple logic: A simple yet effective logic is used to remove duplicate characters and leave only unique characters.
  • HashSet utilization: HashSet is a data structure that can effectively remove duplicate values, making it easy to check for duplicates.
  • String manipulation: Strings are efficiently manipulated using StringBuilder.

Code Disadvantages

  • Case Sensitivity: The current code is case sensitive and handles duplicate characters. If you want to handle duplicates case-insensitively, additional logic is needed.
  • Result string order: The current code does not maintain the order of the strings but removes duplicates. If you want to remove duplicates while maintaining the order of the input string, you should use LinkedHashSet, etc.
  • String conversion overhead: The part where characters are converted to strings through String.valueOf(ch) is repeated, which may cause overhead. This may cause performance degradation, especially when processing large strings.