An enterprise data lake provides the following core benefits to an enterprise:
For data architecture through a significantly lower cost of storage, and through optimization of data processing workloads such as datatransformation and integration.
For business through flexible “schema-on-read” access to all enterprise data, and through multi-use and multi-workload data processing on the same sets of data, from batch to real-time.
Apache Hadoop provides these benefits through a technology core comprised of:
Hadoop Distributed Filesystem.
HDFS is a Java-based file system that provides scalable and reliable data storage that is designed to span large clusters of commodity servers.
Apache Hadoop YARN.
YARN provides a pluggable architecture and resource management for data processing engines to interact with data stored in HDFS.