2011 BigDataGlossary

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Big Data Tool, Data Science Tool.

Notes

Cited By

Quotes

Book Overview

To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.

This handy glossary also includes a chapter of key terms that help define many of these tool categories:

Table of Contents

Chapter 1 Terms
Document-Oriented
Key/Value Stores
Horizontal ScalingHorizontal or Vertical Scaling
MapReduce
Sharding
Chapter 2 NoSQL Databases
MongoDB
CouchDB
Cassandra
Redis
BigTable
HBase
Hypertable
Voldemort
Riak
ZooKeeper
Chapter 3 MapReduce
Hadoop
Hive
Pig
Cascading
Cascalog
mrjob
Caffeine
S4
MapR
Acunu
Flume
Kafka
Azkaban
Oozie
Greenplum
Chapter 4 Storage
S3
Hadoop Distributed File System
Chapter 5 Servers
EC2
Google App Engine
Elastic Beanstalk
Heroku
Chapter 6 Processing
R
Yahoo! Pipes
Mechanical Turk
Solr/Lucene
ElasticSearch
Datameer
BigSheets
Tinkerpop
Chapter 7 NLP
Natural Language Toolkit
OpenNLP
Boilerpipe
OpenCalais
Chapter 8 Machine Learning
WEKA
Mahout
scikits.learn
Chapter 9 Visualization
Gephi
GraphViz
Processing
Protovis
Fusion Tables
Tableau
Chapter 10 Acquisition
Google Refine
Needlebase
ScraperWiki
Chapter 11 Serialization
JSON
BSON
Thrift
Avro
Protocol Buffers

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 BigDataGlossaryPaul WardenBig Data Glossary2011