Skip to content

Longest Substring with At Least K Repeating Characters⚓︎

Link

Description⚓︎

Given a string s and an integer k, return the length of the longest substring of s such that the frequency of each character in this substring is greater than or equal to k.

if no such substring exists, return 0.

Example 1:

  • Input: s = "aaabb", k = 3
  • Output: 3
  • Explanation: The longest substring is "aaa", as 'a' is repeated 3 times.

Example 2:

  • Input: s = "ababbc", k = 2
  • Output: 5
  • Explanation: The longest substring is "ababb", as 'a' is repeated 2 times and 'b' is repeated 3 times.

Constraints:

  • 1 <= s.length <= 10^4
  • s consists of only lowercase English letters.
  • 1 <= k <= 10^5

Solution⚓︎

class Solution {
public:
    int longestSubstring(string s, int k) {
        int res = 0;
        int n = s.size();

        for (int require = 1; require <= 26; require++) {
            vector<int> cnts(256, 0);
            int collectedKinds = 0, satisfiedKinds = 0;
            for (int left = 0, right = 0; right < n; right++) {
                cnts[s[right]]++;

                if (cnts[s[right]] == 1) collectedKinds++;
                if (cnts[s[right]] == k) satisfiedKinds++;

                while (collectedKinds > require) {
                    if (cnts[s[left]] == 1) collectedKinds--;
                    if (cnts[s[left]] == k) satisfiedKinds--;
                    cnts[s[left]]--;
                    left++;
                }

                if (satisfiedKinds == require) {
                    res = max(res, right - left + 1);
                }
            }
        }

        return res;
    }
};

The core idea is to use a sliding window to explore all substrings and a frequency count (cnts) to track how many times each character appears within the current window. The algorithm iterates over a range of unique characters that could be present in the substring, from 1 to 26 (assuming the input string contains only lowercase English letters). For each possible count of unique characters (require), it tries to find the longest substring that contains exactly that many different characters, each appearing at least k times.

Algorithm and Steps:

  1. Initialization: A result variable res is initialized to 0 to keep track of the length of the longest valid substring found so far. The size of the input string s is stored in n.
  2. Iterate for each unique character count: The algorithm loops from 1 to 26 (require), considering each possible number of unique characters that a valid substring could have.
  3. Sliding Window and Frequency Count: Inside this loop, a sliding window is defined by two pointers, left and right, initially set to the beginning of the string. A frequency count array cnts of size 256 (to cover all ASCII values, thus accommodating not just lowercase letters) is used to keep track of the character counts within the window.
  4. Expand the window: For each character added to the window (by incrementing right), update the frequency count (cnts[s[right]]++). If a character appears for the first time (cnts[s[right]] == 1), increase collectedKinds. If a character count reaches k, increment satisfiedKinds.
  5. Shrink the window: If the number of unique characters (collectedKinds) exceeds the current require, shrink the window from the left (increment left) until collectedKinds matches require again. Adjust collectedKinds and satisfiedKinds as characters are removed from the window.
  6. Check and Update Result: Whenever the number of kinds of characters that have met the k requirement (satisfiedKinds) equals the current require, update res to the maximum of its current value and the current window size (right - left + 1).

Complexity Analysis:

  • Time Complexity:
    • The outer loop runs \(26\) times, once for each possible number of unique characters.
    • The inner loop, which moves right across the string, has a maximum of \(n\) iterations for each outer loop iteration.
    • The left pointer moves across the string once for each outer loop iteration.
    • Thus, the total time complexity is \(O(26n)\), which simplifies to \(O(n)\), since \(26\) is a constant factor.
  • Space Complexity:
    • The space complexity is dominated by the frequency count array cnts, which has a fixed size of \(256\). Hence, the space complexity is \(O(1)\), as this does not scale with the input size \(n\).