Thursday, July 9, 2009

Uncovering Information through Data Mining


Data Mining

This term is being thrown a lot nowadays. In this information-driven age we live in, trawling through terabytes of data and figuring out what to do with the data obtained is essential in making businesses work. That is where our good old Data Mining tools come in handy.

But what is Data Mining?

According to Kurt Thearling's Introduction to Data Mining, the term is defined as the "extraction of hidden predictive information from large databases".

Extraction of hidden, predictive information sounds like a mouthful to some, and even more cryptic to decipher to most. But this short phrase says a lot. For instance, lets say your business is about selling cakes, and you have a compilation of customers who have bought products from your store before. You might want to know what product appeals to your customers on a given, say, holiday event. That's where data mining comes in with a charm. Not only does it tell you what product to sell, it also helps you determine to which person to sell to using the database you've compiled.

Works like a charm, right? Kind of reminds of Statistics, don't you think?

According to Kurt Thearling, Data Mining and statistics aren't really far off each other. In fact, according to him, "there is little practical difference between a statistical technique and a classical data mining technique".

So why is there so much hype for Data Mining than Statistics?

Kurt Thearling states that data mining tools are more robust to dealing with messier real world data, and more robust to being used by less expert users. Simply put, these tools are more reliable in that they are capable of performing without failure under a wide range of conditions.

But that's not the only reason he has stated.

Data Mining is timely during this information age. "If there was no data, there would be no interest in mining it." In this age where computers are running rampant, and information is being stored in large quantities, data mining has a lot to offer, not just locating data amongst
millions of files and folders, it also translate those data into information, and that information into business intelligence, which then in turn, helps in the decision making process of a company.

So why again is Data Mining timely? Well, in my point of view and understanding of what Thearling has said, Data mining simply makes the task of finding "hidden" data that much easier. In this information age we live in, traditional find and grind would not be as efficient as before. Not to mention that once you've found the data, analyzing it in hopes of finding what its relevance to the current issues would take long period of time using traditional manual methods. Data gathering nowadays continually grow in size and complexity, requiring more sophisticated means of safely storing them, finding them when the need arise, and analyzing them in the quickest, yet thorough manner.

Databases nowadays have grown to contain information that are very large in quantities and possibly larger in quality. But not all data stored in them will be useful at a given time. Databases can be larger in both depth and breadth:
  • More columns. Analysts must often limit the number of variables they examine when doing hands-on analysis due to time constraints. Yet variables that are discarded because they seem unimportant may carry information about unknown patterns. High performance data mining allows users to explore the full depth of a database, without preselecting a subset of variables.
  • More rows. Larger samples yield lower estimation errors and variance, and allow users to make inferences about small but important segments of a population.

Data Mining is without help anyway. Listed below are some of the more useful techniques Data Mining has to offer:

  • Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.
  • Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) .
  • Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.
  • Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k ³ 1). Sometimes called the k-nearest neighbor technique.
  • Rule induction: The extraction of useful if-then rules from data based on statistical significance.

While these techniques offer help, another question lingers. How exactly does Data mining help you tell the information you do not know, and what is going to happen next?

Answer is simple: Modeling. Thearling defines Modeling as simply the act of building a model in one situation where you know the answer and then applying it to another situation that you don't. Basically, you find patterns using pre-existing records to predict the data you need.

For instance, a marketing manager for a novelty product institution could "mine" their data warehouse for data of what customers prefer to purchase given a specific holiday. Using a model table containing demographic information such as age, and gender of customers who have purchased novelty items from the store at a given holiday event, you could easily know what products to promote, and what products to omit during a specific holiday event, thus efficiently promoting products while eliminating unnecessary costs.

Over all, we see the potential successes data mining offers to businesses in different industries. Not only does it help in decision making processes, it also makes the "decryption" and prediction of valuable information easier in today's information age.


5 comments:

  1. Nice blog. Finally you're in blogspot.

    ReplyDelete
  2. welcome to blogspot :P I love that first pick with the mining cap and pick and 0s and 1s

    ReplyDelete
  3. Welcome to Blogspot.

    Timing is of the essence. And data mining is that "essence".
    Imagine how powerful you could be, being a data mining company, and the biggest companies of the several industries depending on you?
    Now that is real money.

    ReplyDelete
  4. nice you put the picture on the blog. hehehe

    ReplyDelete