Partial String Splitting

· Oct 2, 07:28 AM

Often I need a split function that splits off only the first piece from a string. Python has a version of Split that allows you to do just this, and REALbasic should too. Here I implement two functions. SplitLeft splits at most N pieces from a string, starting from the left, and returns an array containing the split-off pieces of the input string, plus the remainder of the input, if any. SplitRight returns an array containing the remainder, if any, followed by at most N pieces of the string, starting from the right. In terms of examples,

Array("first, "second", "third, fourth") = SplitLeft("first, second, third, fourth", ", ", 2)
Array("first, second", "third",  "fourth") = SplitRight("first, second, third, fourth", ", ", 2)

As for boundary cases, SplitLeft(s, *, 0) should return s for any separator. SplitLeft(s, *, N) should return the same output as Split when N is sufficiently large. And SplitLeft(s, “”, N) should split off the first N characters.

It is efficient in terms of both execution speed and code reuse to base the implementation of SplitLeft on the built-in Split function. Using Split, we can model an implementation using VML:

find the position of the Nth separator;
split the string at that position;
call Split on the first piece;
assemble and return the result.

It turns out that it is surprisingly tricky to get the code right. The problem is that InStr does not quite have the behavior I want, and adding that behavior makes my code a little too tangled. The solution is to write an InStr function that has the behavior I want.

That function, InStrN, returns the position of the Nth occurrence of a match string in a source string. The particular behavior I require is the handling of boundary cases. For N = 0, I want InStrN to return -Len(matchString) + 1, and for N greater than the total number of occurrences of the match String, I want InStrN to return Len(sourceString) + 1.

Function InStrN(start as Integer = 1, source as String, match as String, N as Integer) As Integer
  #pragma disableBackgroundTasks
  
  dim matchLength as Integer = Len(match)
  dim matchCount as Integer = 0
  dim position as Integer = -matchLength + start
  do
    if matchCount < N then
      position = InStr(position + matchLength, source, match)
      if position > 0 then
        matchCount = matchCount + 1
      else
        position = Len(source) + 1
        exit
      end if
    else
      exit
    end if
  loop
 
  return position
End Function

With this function, perhaps useful in its own right, my implementation of SplitLeft is the following.

Function SplitLeft(s as String, separator as String, count as Integer = 1) As String()  
  if separator = "" then
    return SplitLeftChar(s, count)
  end if
 
  dim position as Integer = InStrN(s, separator, count)
  dim splitSource as String = Left(s, position - 1)
  dim splitList() as String
  if splitSource <> "" then
    splitList() = Split(splitSource, separator)
  else
    splitList = Array("")
  end if
  dim remainder as String = Mid(s, position + Len(separator))
  if remainder <> "" then
    splitList.Append remainder
  end if
  return splitList
End Function

Tthe case of an empty separator is handed off to a separate function, SplitLeftChar.

Function SplitLeftChar(s as String, count as Integer) As String()
  dim splitList() as String = Split(Left(s, count))
  dim remainder as String = Mid(s, count + 1)
  if remainder <> "" then
    splitList.Append remainder
  end if
  return splitList
End Function

A full explanation of the reason for this involves a certain amount of agonizing over the Split function, and I’ll save this for later.

A SplitRight function is useful for tasks like grabbing extensions from file names. We can implement it in terms of SplitLeft, with the aid of some functions that reverse strings.

Function SplitRight(s as String, separator as String, count as Integer) As String()
  #pragma disableBackgroundTasks
 
  dim splitList() as String = Reverse(SplitLeft(Reverse(s), Reverse(separator), count))
  for i as Integer = 0 to UBound(splitList)
    splitList(i) = Reverse(splitList(i))
  next
  
  return splitList
End Function

I have not tested whether this implementation is particularly fast. But the built-in implementation of InStr is certainly faster than anything I have written in REALbasic, so my guess is that the cost of Reverse is repaid by the ability to call InStr.

Implementing the reverse functions revealed a surprising tidbit about REALbasic performance.

Function Reverse(s() as String) As String()
  #pragma disableBackgroundTasks
  
  dim U as Integer = UBound(s)
  dim t() as String
  redim t(U)
  
  for i as Integer = 0 to UBound(s)
    t(U - i) = s(i)
  next
  return t
End Function
Function Reverse(s as String) As String
  return Join(Reverse(Split(s, "")), "")
End Function

The first Reverse method returns a new array. You might think that it would be faster to reverse the array in place.

Sub ReverseInPlace(s() as String)
  #pragma disableBackgroundTasks
 
  dim leftPtr as Integer = 0
  dim rightPtr as Integer = UBound(s)
  
  while leftPtr < rightPtr
    dim temp as String = s(leftPtr)
    s(leftPtr) = s(rightPtr)
    s(rightPtr) = temp
    leftPtr = leftPtr + 1
    rightPtr = rightPtr - 1
  wend
End Sub

But testing on my MacBook showed that in fact the function is faster than the in-place reverse. Given my developing taste for functional programming style, I’m pleased by this.

---

Comment

  1. Thanks for another helpful tip!

    John Bacon-Shone · Oct 3, 11:38 PM · #

  2. Did you find Joe Strout’s string and array utilities?

    <a href=“http://www.strout.net/info/coding/rb/intro.html”>http://www.strout.net/info/coding/rb/intro.html</a>

    I think he was also inspired by Python’s slices. Now if only RB string operations were fast. :-(

    DeanG · Oct 29, 03:28 PM · #

  3. I wrote a little of the code in the string utilities.

    Rb string operations are generally quite fast, in my experience.

    charles · Oct 29, 03:37 PM · #

Commenting is closed for this article.