In todays society, large amounts of data are collected and stored by business, individuals and companies. But analyzing data by hand is time-consuming and prone to error. Hence, in the field of data mining, there is an interest in developing algorithms that can automatically or semi-automatically analyze data.
Pattern mining is a subfield of data mining that aims at finding interesting patterns in data. Here, a pattern means some values that appear together in the data and represent some strong correlation or has some other desirable properties. To achieve this goal of finding patterns, a user must select an algorithm and define what kind of patterns should be searched in the data. Then, the algorithm will search and report all the patterns that have been found in the data to the user. These patterns can reveal interesting information that may help to understand the data. But these patterns may also be used for decision-making and prediction.
An example, a popular pattern mining task is itemset mining. If applied to shopping data. It can reveal sets of items frequently purchased by customers. Having this information can help understanding customers' behavior but also to take decision such as launching a marketing campaign or offering discounts when buying sets of products.
Pattern mining algorithms can be used on data to find patterns. Moreover, these algorithms can be used to explore the data before applying other data processing or data mining techniques. For example, finding some correlations between values in data can reveal that some attributes are correlated. Then this information may be used for feature selection.
A reason why pattern mining algorithms are popular is that they can provide results that are generally easily interpretable by humans. This is different from many machine learning models that focus on prediction and operate as black-boxes. For example, some neural networks may give a very high accuracy for some prediction tasks but it may be difficult to understand why or to learn something new from a neural network (as a human). On the contrary, pattern mining may not have the best accuracy for prediction but patterns are generally interpretable.
Pattern mining can be done with many different types of data such as a binary table (also known as a transaction database), a sequence, multiple sequences, graphs, time series, spatial data, trajectory data, etc.
Besides, many types of patterns can be found in data such as frequent patterns (some values that appear frequently in the data) and high utility patterns (e.g. some values that have a high importance or yield a lot of money). Some pattern types are simple while other are more complex.