Fei's Playground home

Cloudera Blogs on Hadoop

Application

  1. Using Hadoop to Annotate Billions of Web Documents with Semantics
  2. The Smart Grid and Big Data: Hadoop at the Tennessee Valley Authority
  3. Analyzing Human Genomes with Hadoop
  4. Hadoop at Twitter part 1: Splittable LZO Compression
  5. Why Europe’s Largest Ad Targeting Platform Uses Hadoop
  6. Natural Language Processing with Hadoop and Python
  7. How Raytheon BBN Technologies Researchers are Using Hadoop to Build a Scalable, Distributed TripleStore
  8. Scaling Social Science withHadoop
  9. Do the schimmy efficient large graph analysis with Hadoop
  10. Lessons learned putting Hadoop intoproduction
  11. Wordnik Bypasses Processing Bottleneck with Hadoop
  12. Strategies for Exploiting Large-scale Data in the FederalGovernment
  13. An emerging data management architectural pattern behind interactive web applications
  14. Adopting Apache Hadoop in the Federal Government
  15. Using Hadoop to Measure Influence
  16. Biodiversity Indexing: Migration from MySQL to Hadoop
  17. Evolution of Hadoop Ecosystem: AOL Advertising Experience
  18. RecordBreaker: Automatic structure for your text-formatted data
  19. Hadoop Applied
  20. Hadoop World 2011: A Glimpse of the Applications Track
  21. Hadoop for Archiving Email
  22. Hadoop World 2011: A Glimpse into Enterprise Architecture
  23. Hadoop World 2011: A Glimpse into Development
  24. Hadoop World 2011: A Glimpse into Operations
  25. Using Apache Hadoop to Find Signal in the Noise: Analyzing Adverse Drug Events
  26. FoneDoktor, A WibiData Application
  27. How I found Hadoop
  28. Hadoop for Archiving Email Part 2
  29. Seismic Data Science: Reflection Seismology and Hadoop
  30. Indexing Files via Solr and Java MapReduce
  31. HBaseCon 2012: A Glimpse into the Development Track
  32. HBaseCon 2012: A Glimpse into the Operations Track
  33. How Treato Analyzes Health-related Social Media Big Data with Hadoop and HBase
  34. Processing Rat Brain Neuronal Signals Using A Hadoop Computing Cluster – Part I
  35. Processing Rat Brain Neuronal Signals Using a Hadoop Computing Cluster – Part II

Hadoop

  1. Securing a Hadoop Cluster Through a Gateway
  2. Testing Hadoop
  3. The Small Files Problem
  4. Database Access with oop
  5. Hadoop Metrics
  6. Configuration Parameters: What can you just ignore?
  7. Configuring Eclipse for Hadoop Development (a screencast)
  8. High Energy Hadoop
  9. 5 Common Questions About Hadoop
  10. Parallel LZO: Splittable Compression for Hadoop
  11. Hadoop Graphing with Cacti
  12. File Appends in HDFS
  13. Hadoop HA Configuration
  14. Integrating Hadoop in your Existing DW and BI Environment
  15. Map-Reduce With Ruby Using Apache Hadoop
  16. Setting up CDH3 Hadoop on my new Macbook Pro
  17. Hadoop I/O: Sequence, Map, Set, Array, BloomMap Files
  18. Lessons Learned from Cloudera’s Hadoop Developer Training Course
  19. Hadoop Availability
  20. Avoiding Full GCs in HBase with MemStore-Local Allocation Buffers: Part 1
  21. Avoiding Full GCs in HBase with MemStore-Local Allocation Buffers: Part 2
  22. Avoiding Full GCs in HBase with MemStore-Local Allocation Buffers: Part 3
  23. Learn about Apache Hadoop at the Chicago Data Summit
  24. Automatically Documenting Apache Hadoop Configuration
  25. Snappy and Hadoop
  26. Thoughts on Cloudera and Cisco UCS reference architecture for Hadoop
  27. Authorization and Authentication In Hadoop
  28. Hadoop Beyond MapReduce, Part 1: Introducing Kitten

HDFS

  1. HDFS Reliability
  2. Multi-host SecondaryNameNode Configuration
  3. Protecting per-DataNode Metadata
  4. Hoop - Hadoop HDFS over HTTP
  5. High Availability for the Hadoop Distributed File System (HDFS)
  6. NameNode Recovery Tools for the Hadoop Distributed File System
  7. Why we build our platform on HDFS

MapReduce

  1. Sending Files to Remote Task Nodes with Hadoop MapReduce
  2. Job Scheduling in Hadoop
  3. Upcoming Functionality in Fair Scheduler 2.0
  4. 10 MapReduce Tips
  5. Debugging MapReduce Programs With MRUnit
  6. Advice on QA Testing Your MapReduce Jobs
  7. 7 Tips for Improving MapReduce Performance
  8. A profile of Apache Hadoop MapReduce computing efficiency
  9. A profile of Apache Hadoop MapReduce computing efficiency (continued)
  10. How to Include Third-Party Libraries in Your Map-Reduce Job
  11. Simple Moving Average, Secondary Sort, and MapReduce (Part 1)
  12. Simple Moving Average, Secondary Sort, and MapReduce (Part 2)
  13. Simple Moving Average, Secondary Sort, and MapReduce (Part 3)
  14. Introducing Crunch: Easy MapReduce Pipelines for Hadoop
  15. Building and Deploying MR2
  16. Crunch for Dummies
  17. MapReduce 2.0 in Hadoop 0.23
  18. Experimenting with MapReduce 2.0

HBase

  1. HBase User Group #9: HBase and HDFS
  2. Integrating Hive and HBase
  3. Log Event Processing with HBase
  4. HBase Dos and Donts
  5. Caching in HBase: SlabCache
  6. HBase + Hadoop + Xceivers
  7. Online HBase Backups with CopyTable
  8. The Singularity: HBase Compatibility and Extensibility
  9. HBase Write Path
  10. HBase IO HFile Input and Output
  11. HBase Log Splitting
  12. HBase Replication Overview

Pig

  1. Analyzing Apache logs with Pig

Scribe

  1. Install Scribe for Log collection
  2. Configuring and Using scribe for Hadoop log collection

ZooKeeper

  1. Building a distributed concurrent queue with Apache ZooKeeper

Sqoop

  1. Introducing Sqoop
  2. Apache Sqoop - Overview
  3. Apache Sqoop: Highlights of Sqoop 2

Avro

  1. Avro: a Format for Big Data
  2. Better Workflow Management in CDH with Oozie 2
  3. Tracing with Avro
  4. Three Reasons Why Apache Avro Data Serialization is a Good Choice for OpenRTB
  5. Data Interoperability with Apache Avro
  6. Apache Avro at RichRelevance

Hive

  1. Hadoop World: Rethinking the Data Warehouse with Hadoop and Hive from Ashish Thusoo

Misc

  1. CAP Confusion: Problems with ‘partition tolerance’
  2. If 80% of data is unstructured, is it the exception or a new rule?
  3. Notes from the Flume NG Hackathon
  4. Capacity Planning with Cloudera Manager
Fork me on GitHub