.NET源码学习 [算法2-数组与字符串的查找与匹配]( 六 ) _生活百科

文章插图
实测时间：

文章插图
可以发现， RK算法在一定程度上确实比纯暴力得BF算法效率要高，尤其是当主串长度越长时，提升越明显。
【思考】假设有两个以行为优先主序列二维矩阵，如果要返回所有匹配的字串首地址，该如何实现？

文章插图
【参考如下】
RK算法比较的是字符串的哈希值，那只要获取了对应位置的哈希值，进行比较即可。对于一维，在哈希值转移时只需减去前一个、加上后一个；而二维的转移需根据模式串的规格而定。
using System;using System.Collections;using System.Collections.Generic;using System.Linq;using System.Text;using System.Numerics;using System.Diagnostics;using System.Runtime.CompilerServices;namespace Common{internal class Program{static IList<int[]> pos = new List<int[]>();static int m, n;static int p, q;static void Main(string[] args){Program pg = new();char[][] s = {new char[]{ 'c', 'a', 'b', 'c' },new char[]{ 'e', 'f', 'a', 'd' },new char[]{ 'c', 'c', 'a', 'f' },new char[]{ 'd', 'e', 'f', 'c' }};char[][] pat = {new char[]{ 'c', 'a' },new char[]{ 'e', 'f' }};m = s.Length;n = s[0].Length;p = pat.Length;q = pat[0].Length;pg.RK(s, pat);foreach (int[] arr in pos) Console.WriteLine("{ " + arr[0] + ", " + arr[1] + " }");}void RK(char[][] s, char[][] pat){int patCode = GetHash(pat, 0, 0);int sCode = GetHash(s, 0, 0);for (int row = 0; row <= m - p; row++){for (int col = 0; col <= n - q; col++){if (patCode == sCode && ComString(s, pat, row, col)) pos.Add(new int[] { row, col });if (row < m - p || col < n - q) sCode = NextHash(s, sCode, row, col);}}}int GetHash(char[][] tmp, int row, int col){int hash = 0;for (int i = row, cntR = 0; cntR < p; i++, cntR++)for (int j = col, cntC = 0; cntC < q; j++, cntC++)hash += tmp[i][j] - 'a';return hash;}int NextHash(char[][] s, int hash, int row, int col){if (row < m - p && col == n - q) return GetHash(s, row + 1, 0);for (int i = row, cntR = 0; cntR < p; i++, cntR++){hash -= s[i][col] - 'a';hash += s[i][col + q] - 'a';}return hash;}bool ComString(char[][] s, char[][] pat, int row, int col){for (int i = row, cntR = 0; cntR < p; i++, cntR++)for (int j = col, cntC = 0; cntC < q; j++, cntC++)if (s[i][j] != pat[cntR][cntC]) return false;return true;}}}3. KMP算法（Knuth-Morris-Pratt）【参考文章：漫画：什么是KMP算法？；KMP算法—终于全部弄懂了_June·D的博客-CSDN博客】
回顾一下前两种算法效率不高的原因：在不匹配时，从待匹配串的起始位置重新全部开始匹配；反复计算哈希值，且需要处理哈希冲突的问题。这两种方式可以看作每次将字符串向后一位一位滑动。这样的方式导致了中间存在大量无意义的比较，以及不得不完成的比较，使得效率低下。而KMP就针对这一点进行了优化，其核心在于：在匹配过程中，当不匹配时，对于已匹配好的部分，找到一种规律，将模式串一次性向后滑动很多位，从而提高效率。其中，记不匹配的字符为 “坏字符” ，已匹配的部分为 “好前缀” 。
—— —— —— —— —— —— —— —— —— —— —— —— —— ——