"Bigness" data - some thoughts about the big data journey

The "Bigness" data. Ideas of Big Data

If you are reading this blog, you are likely to be a nerd.

Everywhere - Yes, data is everywhere, and the volumes are exponentially increasing by day. There exists a saying: "Where there is a will there is relatives". Are your measures trustworthy? Ensure their integrity, but at the same time give attention to the training needed to its users.

Data Detectives - The cool part for me of data science, is the detecting fraud part. Fraudsters love lots and lots of data. The more the merrier. Its also easier to hide abnormalities when lots of data are present. Therefore, if you find your organization at the stage where you are IYO consuming and processing big data, then its time you consider employing a data detective.

Machine Learning - This will only be useful if you actually start using what your system is learning about trends, etc

OODA Loop - What came first? The egg or the chicken. This scenario gives rise to what is known as the OODA loop. So when adding more factors to a system, it becomes more complicated. Do you understand factors contributing to the success of your company? Be careful that you aim to understand the variables at play

Data Effect - Accept the fact that data is always predictive. If you are considering a marketing campaign, can your system report on who bought the product you are aiming to market. In this case you may want to consider whether you want to market to that group, probably not. Marketing to such a group might have a negative effect on them.

BI Team dynamism - The idea of "Bigness" in data describes a Data Journey. Your team needs to think out of the box, and have the right culture and entrepeneurship. If you are working with a team where this is a problem, then, its time to make big changes, and soon. Encourage your team to keep the bottomline in focus. For this, we need a good analyst. 

Apache - Have you had your apache today? This covers the idea that many of the tools to help us in "Bigness" data, are found in the opensource realm in industry. Tools like: R, Python, stats, Engineering, Applied Mathematics should be considered.

Data Lakes - For many data driven organizations these are some of the myths to keep in mind regarding data lakes:
    Myth 1: A data lake is a mess, and is not going to give you important information
    Myth 2: Data driven decisions lead to exceptional decision making (The human factors are critical to exceptional decision making)
    Myth 3: Data driven business creates better bahavior in clients, and results in better business.

Thinking Points:
1. Remember data governance
2. Include human and social psyche
3. Respect and nurture privacy
4. Consider economic consequences

Further thought:
Is the dimensional model compatible with the shift to big data? I'm of the opinion that you can use the in the most correct way or not. Some of the factors in determining what is correct are also impacted by the business needs. In the light of the dimensional model's ability to hold data, the largest fact table I've run into, last I searched, was a fact table containing 2.4 trillion records, where reporting on it was happy. If your fact table cannot hold this amount and be able to report off it, then go back to your architectural design. Lets chat about the alternatives.

Lets explore each of these avenues together. Where are the analytical minds at?

Have fun people.



Popular posts from this blog

Moving Data [Spark Streaming - simple data files]

Notepad++ Regular expressions

RSS feeds to your webpage [part 1]