## Posted By

freephys on 10/04/09

# Determine if a Sequence is in another sequence in Python : solution 1

/ Published in: Python

This is a generalization of the "string contains substring" problem to (more) arbitrary types.

Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:

Example usage (Sequence in Sequence):

seqinseq([5,6], [4,'a',3,5,6]) 3 seqinseq([5,7], [4,'a',3,5,6]) -1 # or None, or whatever

I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/

It finds all the correct subsequences in a given sequence, and should be used as an iterator:

For larger ones, look at the Aho-Corasick algorithm.

```>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s3>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s(nothing) Here is another KMP implementation: def seq_in_seq(seq1,seq2):    '''    Return the index where seq1 appears in seq2, or -1 if     seq1 is not in seq2, using the Knuth-Morris-Pratt algorithm     based heavily on code by Neale Pickett <[email protected]
/* <![CDATA[ */
(function(){try{var s,a,i,j,r,c,l,b=document.getElementsByTagName("script");l=b[b.length-1].previousSibling;a=l.getAttribute('data-cfemail');if(a){s='';r=parseInt(a.substr(0,2),16);for(j=2;a.length-j;j+=2){c=parseInt(a.substr(j,2),16)^r;s+=String.fromCharCode(c);}s=document.createTextNode(s);l.parentNode.replaceChild(s,l);}}catch(e){}})();
/* ]]> */
>    found at:  woozle.org/~neale/src/python/kmp.py     >>> seq_in_seq(range(3),range(5))    0    >>> seq_in_seq(range(3)[-1:],range(5))    2    >>>seq_in_seq(range(6),range(5))    -1    '''    def compute_prefix_function(p):        m = len(p)        pi = [0] * m        k = 0        for q in xrange(1, m):            while k > 0 and p[k] != p[q]:                k = pi[k - 1]            if p[k] == p[q]:                k = k + 1            pi[q] = k        return pi     t,p = list(tee(seq2)[0]), list(tee(seq1)[0])    m,n = len(p),len(t)    pi = compute_prefix_function(p)    q = 0    for i in range(n):        while q > 0 and p[q] != t[i]:            q = pi[q - 1]        if p[q] == t[i]:            q = q + 1        if q == m:            return i - m + 1    return -1```