Skip to content

Hadoop

September 29, 2023
January 15, 2015

Welcome to Apache™ Hadoop®!

Hadoop 201 -- Deeper into the Elephant

» Prerequisites for Learning Hadoop – Hadoop Training
» Hadoop Cluster – Architecture and Core Components
» Hadoop 1.0 vs Hadoop 2.0

IBM Analytics - Hadoop
Hadoop Dev: IBM BigInsights for Hadoop Developer Community
IBM - What is the Hadoop Distributed File System (HDFS) - United States

Hadoop: The Definitive Guide - Tom White - Google Books
The Architecture of Open Source Applications: The Hadoop Distributed File System

Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis Performance, comparison
MapR's Direct Access NFS vs. Hadoop FUSE

Installation

Scalable Spark/HDFS Setup using Docker — Medium
Getting started with HDFS on Kubernetes – Hasura

Mounting

MountableHDFS - Hadoop Wiki
Simplifying data management: NFS access to HDFS - Hortonworks

cemeyer/hadoofus C, FUSE, libhdfs-compatible, out-of-order execution

remis-thoughts/native-hdfs-fuse C, FUSE

cloudera/hdfs-nfs-proxy Java, Nfs4

Tuning

How-to: Deploy Apache Hadoop Clusters Like a Boss - Cloudera Engineering Blog
Hadoop configuration & performance tuning

Ecosystem

The ecosystem is vast, these are what I came across.

spotify/luigi: Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Cascading | Application Platform for Enterprise Big Data
Apache YARN & Hadoop - Hortonworks
Spark tutorial: Get started with Apache Spark | InfoWorld

Spark