mirror of
https://github.com/junegunn/fzf.git
synced 2025-11-13 13:53:47 -05:00
Normalize char before pattern lookup (#4252)
There is an edge-case in FuzzyMatchV1 during backward scan, related to
normalization: if string is initially denormalized (e.g. Unicode symbol),
backward scan will proceed further to the next char; however, when the
score is computed, the string is normalized first, then scanned based on
the pattern. This leads to accessing pattern index increment, which
itself leads to out-of-bound index access, resulting in a panic.
To illustrate the process, here's the sequence of operations when search
is perfored:
1. during backward scan by "minim" pattern
```
xxxxx Minímal example
^^^^^^^^^^^^
||||||||||||
miniiiiiiiim <- compute score for this substring
```
2. during compute score by "minim" pattern
```
Minímal exam
minimal exam <- normalize chars before computing the score
^^^^^^
||||||
minim <- at this point the pattern is already fully scanned and index
is out-of-the-bound
```
In this commit the char is normalized during backward scan, to detect
properly the boundaries for the pattern.
This commit is contained in:
@@ -200,3 +200,12 @@ func TestLongString(t *testing.T) {
|
||||
bytes[math.MaxUint16] = 'z'
|
||||
assertMatch(t, FuzzyMatchV2, true, true, string(bytes), "zx", math.MaxUint16, math.MaxUint16+2, scoreMatch*2+bonusConsecutive)
|
||||
}
|
||||
|
||||
func TestLongStringWithNormalize(t *testing.T) {
|
||||
bytes := make([]byte, 30000)
|
||||
for i := range bytes {
|
||||
bytes[i] = 'x'
|
||||
}
|
||||
unicodeString := string(bytes) + " Minímal example"
|
||||
assertMatch2(t, FuzzyMatchV1, false, true, false, unicodeString, "minim", 30001, 30006, 140)
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user