Posted By

freephys on 10/04/09


Tagged

Knuth-Morris-Pratt


Versions (?)

Determine if a Sequence is in another sequence in Python : solution 1


 / Published in: Python
 

URL: http://stackoverflow.com/questions/425604?sort=oldest#sort-top

This is a generalization of the "string contains substring" problem to (more) arbitrary types.

Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:

Example usage (Sequence in Sequence):

seqinseq([5,6], [4,'a',3,5,6]) 3 seqinseq([5,7], [4,'a',3,5,6]) -1 # or None, or whatever

I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/

It finds all the correct subsequences in a given sequence, and should be used as an iterator:

For larger ones, look at the Aho-Corasick algorithm.

  1. >>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s
  2. 3
  3. >>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s
  4. (nothing)
  5.  
  6. Here is another KMP implementation:
  7.  
  8. def seq_in_seq(seq1,seq2):
  9. '''
  10. Return the index where seq1 appears in seq2, or -1 if
  11. seq1 is not in seq2, using the Knuth-Morris-Pratt algorithm
  12.  
  13. based heavily on code by Neale Pickett <[email protected]>
  14. found at: woozle.org/~neale/src/python/kmp.py
  15.  
  16. >>> seq_in_seq(range(3),range(5))
  17. 0
  18. >>> seq_in_seq(range(3)[-1:],range(5))
  19. 2
  20. >>>seq_in_seq(range(6),range(5))
  21. -1
  22. '''
  23. def compute_prefix_function(p):
  24. m = len(p)
  25. pi = [0] * m
  26. k = 0
  27. for q in xrange(1, m):
  28. while k > 0 and p[k] != p[q]:
  29. k = pi[k - 1]
  30. if p[k] == p[q]:
  31. k = k + 1
  32. pi[q] = k
  33. return pi
  34.  
  35. t,p = list(tee(seq2)[0]), list(tee(seq1)[0])
  36. m,n = len(p),len(t)
  37. pi = compute_prefix_function(p)
  38. q = 0
  39. for i in range(n):
  40. while q > 0 and p[q] != t[i]:
  41. q = pi[q - 1]
  42. if p[q] == t[i]:
  43. q = q + 1
  44. if q == m:
  45. return i - m + 1
  46. return -1

Report this snippet  

You need to login to post a comment.