"Time series motifs, as described in the [STUMPY Basics](https://stumpy.readthedocs.io/en/latest/Tutorial_STUMPY_Basics.html) tutorial, are approximately repeated subsequences within a time series. While the concept of motif discovery is crucial to many fields, the process of motif discovery isn't usually as clean in the real world as we would like it to be and, in most practical uses of motif discovery, we find some motifs to be more desirable than others. For instance, consider the [Amazon Customer Reviews Dataset](https://www.kaggle.com/bittlingmayer/amazonreviews), which contains several million customer reviews from Amazon's website. If we were to combine all of the textual reviews to reveal the most used words on the platform, unsurprisingly, we would find the word \"Amazon\" to be around the top of the list, probably following a few other more commonly used words such as \"the\", \"and\", and \"is\". While this result is obvious, it is certainly not useful. In order to produce more insightful results, we would need to filter the results to exclude less desirable or \"[stop words](https://en.wikipedia.org/wiki/Stop_word)\" and make way for the more desirable ones.\n",
0 commit comments