Smart data in the age of big data

doc77k0zaun56daamxic9o_doc769ddec3a0yah9w44yl.jpg
The most valuable databases do not have to be the biggest.
OCTOBER 16, 2019 - 3:53 PM

BIG data has become the trendy buzzword in tech as of late, some even characterising it as the “oil of the modern era”. The combination of rapidly growing smartphone penetration (46 per cent of the ASEAN population) and the prevalence of the Internet-of-Things (IoT) allows unprecedented collection of data. Now, interested organisations can derive data about what we do, and more importantly, receive all of this crucial information in real-time.

It is easy to imagine a dystopian use of this data, but if stewarded responsibly the same information can be a source of social good. Companies can better understand the huge swaths of populations it is actually attempting to access, and create products that are actually catered to their real needs.

Big data has also been implemented by politicians to win elections. Governments with access to big data analysis capabilities can utilise the information to better govern their citizens. Even students could be better educated with the right implementation of big data. 

Data Revolution Only a Matter of Time

sentifi.com

Market voices on:

If I were to describe the ASEAN region in one word, it would be ‘young’.

In 2016, there were 213 million youths in ASEAN countries, the largest ever in the region’s recorded history, and the numbers will peak a little over 220 million in 2038.

ASEAN is also young in its economy. Many of the countries in the region are still nascent in the rise of the middle-class populations and have low household debt per GDP, signaling a strong potential for economic growth. The young, emerging middle-class youth are keen to cultivate digital technology advancements to promote productivity and prosperity.

Crucially, members of the ASEAN region are economically primed to take advantage of the big data boom, easily slotting it into industries in the midst of rapid shifts. As we approach this future though, it is prudent that we learn from the mistakes of our predecessors.

Big Data’s Downfalls in the West

The downfall of big data has been heralded by western media since as early as 2014, primarily due to the large-scale failures of a programme designed to anticipate flu outbreaks. 

The Google Flu Trends (GFT) project shunned government statistics and instead churned analytics from five years’ worth of web logs from Google’s algorithms.

Then, the GFT failed to predict the swine flu epidemic of 2009. Unrelated researchers also later found that the project consistently over-estimated the prevalence of flu since August 2011. Researchers pointed to shortcomings in simplistic forecasting models as the cause.

Ironically, predictions based on the government datasets proved more accurate than GFT.

Unfortunately, the flu indicator had been based on a popular fallacy plaguing big data - that more data is better. Not all data is created equal however, and it seems that datasets
collected hadn’t thoroughly been checked for validity.

As crucial sectors like banking and federal/central governments are predicted to be the largest Asia Pacific (sans Japan) spenders in big data by 2021, it is important that we address this fallacy urgently.

This issue can be boiled down to optimistic assumptions, but I think it highlights another important issue: data management and the use of smart data versus big data.

Getting Smart About Data Collection

I’m of the opinion that the most valuable databases do not have to be the biggest. After all, if we’re talking about sheer volumes of data, your local hypermarket probably has huge databases on products, prices, competitors, customers, but this does not directly correlate to an increase in revenue.

Smart data here refers to data that is structured for usability from as early as the collection point. Later efforts by AI and machine learning programmes are then not too severely marred by poor quality data.

The problem? To collect smart data, companies need to have robust data stewardship hygiene built from the ground-up, but 55 per cent of ASEAN’s businesses still entrust their IT departments to do the brunt of work rather than a fully fledged data scientist team, or data analysts. Naturally, 53 per cent of surveyed organisations find data collection and connectivity challenging.

As a result, 43 per cent of ASEAN's organisations have to contend with bad or inaccurate data, and 63 per cent of them struggle with the complexity of the data infrastructure.

Image Credit: AOPG Insights

To me, what stands out is a need for comprehensive AI solutions that cater to a business’ needs end-to-end, from data collection all the way to analysis and suggested actions. 

ASEAN sees a major shortfall in data talent, meaning it is more viable for companies to outsource their data management to reputable organisations that can do the brunt of that work. 

Experts in data stewardship would also be mindful that data collected are not divorced of their context, and stored closely together.

To truly understand the importance of contextual data, we can look no further than traditional credit scoring methods - arguably the world’s pioneers in big data analysis.

They have huge stacks of data that have been rendered for use, but I posit that they are lacking in many profound ways. Traditional credit scoring methods preclude under-served populations from accessing important financial services. And even among those with well-recorded financial data, the methods for determining their creditworthiness do not tend to reflect modern life.

Essentially, traditional credit scoring methods focused on interactions with entrenched financial institutions are too divorced from their context to predict behaviours of consumers other than the “already banked”consumer.

Alternative data sources like device data from your smartphone could be one of the most reliable sources behavioural data that is also not too invasive of users, as long as the data is anonymised, privacy-consented and permissioned.

These data points could help round out the gaps in normal big data collection. For example, we are able to produce predictive digital credit scorecards based on how one interacts with their phones.

Some indicators for a reliable borrower can be oddball data points: a higher number of events scheduled during working hours of working days may show higher willingness to repay whilst having a lower number of contacts with more than one phone number stored in your phone may tend to indicate lower willingness to repay.

As the modern world progresses, data volumes will continue to grow and bloat, making it increasingly difficult to discern the diamonds in the deluge of rough. The notion that volume will help to correct disparities in big data has historically failed time and time again. 

More emphasis needs to be put into stewarding data properly, beginning as early as the point of collection. The final collection of usable data may fall short of “big” data, but we may discover that even the smallest data can generate tremendous impact.

 

The author is chief product officer of CredoLab