n. A massive amount of data stored and readily accessible in its pure, unprocessed state.
2015
To prepare for this onslaught, some IT leaders are urging the creation of "data lakes." These are centralized repositories based on Hadoop that draw raw data from source systems and then pass them to downstream facilities for utilization by the knowledge workforce.
2014
The data lake strategy is part of a greater movement toward data liberalization. It started with the printing press and moving the books out of the monastery. Sure, there was confusion and a schism, but did we really want to wait for the monks to decide who gets the handwritten books?
2013
The second most common use case is one we call "Data Exploration." In this case, organizations capture and store a large quantity of this new data (sometimes referred to as a data lake) in Hadoop and then explore that data directly.
2010 (earliest)
Based on the requirements above and the problems of the traditional solutions we have created a concept called the Data Lake to describe an optimal solution.
If you think of a datamart as a store of bottled water — cleansed and packaged and structured for easy consumption — the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.
If you think of a datamart as a store of bottled water — cleansed and packaged and structured for easy consumption — the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.