Spark GraphX Graph Processing System

From GM-RKB
(Redirected from GraphX Database System)
Jump to navigation Jump to search

A Spark GraphX Graph Processing System is a graph data processing system.



References

2018

  • https://spark.apache.org/docs/latest/graphx-programming-guide.html#overview
    • QUOTE: GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API. In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

2017

2015

2015

  • https://amplab.cs.berkeley.edu/projects/graphx/
    • Increasingly, data-science applications require the creation, manipulation, and analysis of large graphs ranging from social networks to language models. While existing graph systems (e.g., GraphBuilder, Titan, and Giraph) address specific stages of a typical graph-analytics pipeline (e.g., graph construction, querying, or computation), they do not address the entire pipeline, forcing the user to deal with multiple systems, complex and brittle file interfaces, and inefficient data-movement and duplication.
      The GraphX project unifies graphs and tables enabling users to express an entire graph analytics pipeline within a single system. The GraphX interactive API makes it easy to build, query, and compute on large distributed graphs. In addition, GraphX includes a growing repository of graph algorithms for a range of analytics tasks. By casting recent advances in graph processings systems as distributed join optimizations, GraphX is able to achieve performance comparable to specialized graph processing systems while exposing a more flexible API. By building on top of recent advances in data-parallel systems, GraphX is able to achieve fault-tolerance while retaining in-memory performance and without the need for explicit checkpoint recovery.
      GraphX is available as part of the Spark Apache Incubator project as of version 0.9.0, and the active research version of GraphX can be obtained from the github project page.