Data lakes: Where big businesses dump their excess data, and hackers have a field day

Big Data

Data lakes: Where big businesses dump their excess data, and hackers have a field day

Machines and the internet are woven into the fabric of our society. A growing number of users, devices and applications work together to produce what we now call “big data.” And this data helps drive many of the everyday services we access, such as banking.

comparison of internet snapshots from 2018 and 2019 sheds light on the increasing rate at which digital information is exchanged daily. The challenge of safely capturing and storing data is becoming more complicated with time.

This is where data warehouses and data lakes are relevant. Both are online spaces used by businesses for internal data processing and storage.

Unfortunately, since the concept of data lakes originated in 2010, not enough has been done to address issues of cybersecurity. These valuable repositories remain exposed to an increasing amount of cyber attacks and data breaches.

A proposed panacea for big data problems

The traditional approach used by service providers is to store data in a data warehouse  —  a single repository that can be used to analyze data, create reports and consolidate information.

However, data going into a warehouse needs to be preprocessed. With zettabytes of data in cyber space, this isn’t an easy task. Preprocessing requires a hefty amount of computation done by high-end supercomputers and costs time and money.

Data lakes were proposed to solve this. Unlike warehouses, they can store raw data of any type. Data lakes are often considered a panacea for big data problems and have been embraced by many organizations trying to drive innovation and new services for users.

James Dixon, the US data technician who reputedly coined the term, describes data lakes thus:

If you think of a datamart as a store of bottled water — cleansed and packaged and structured for easy consumption — the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

Be careful swimming in a data lake

Although data lakes create opportunities for data crunchers, their digital doors remain unguarded, and solving cyber safety issues remains an afterthought.

Our ability to analyze and extract intelligence from data lakes is threatened in the realms of cyber space. This is evident through the high number of recent data breaches and cyber attacks worldwide.

With technological advances, we become even more prone to cyber attacks. Confronting malicious cyber activity should be a priority in the current digital climate.

While research into this has flourished in recent years, a strong connection between effective cyber security and data lakes is yet to be made.

Continue Reading

Data lakes: Where big businesses dump their excess data, and hackers have a field day