Monday, 19 October 2015
PAGE: A Partition Aware Engine for Parallel Graph Computation
Graph partition quality affects the overall performance of parallel graph computation systems. The quality of a graph partition is measured by the balance factor and edge cut ratio. A balanced graph partition with small edge cut ratio is generally preferred since it reduces the expensive network communication cost. However, according to an empirical study on Giraph, the performance over well partitioned graph might be even two times worse than simple random partitions. This is because these systems only optimize for the simple partition strategies and cannot efficiently handle the increasing workload of local message processing when a high quality graph partition is used. In this paper, we propose a novel partition aware graph computation engine named PAGE, which equips a new message processor and a dynamic concurrency control model. The new message processor concurrently processes local and remote messages in a unified way. The dynamic model adaptively adjusts the concurrency of the processor based on the online statistics. The experimental evaluation demonstrates the superiority of PAGE over the graph partitions with various qualities.
The main aim is to generate PAGE for an efficient and general parallel graph computation engine.
A partition aware graph computation engine named PAGE that monitors three high-level key running metrics and dynamically adjusts the system configurations.
Graph computation systems. Parallel graph computation is a popular technique to process and analyze large scale graphs. Different from traditional big data analysis frameworks (e.g., Map Reduce), most of graph computation systems store graph data in memory and cooperate with other computing nodes via message passing interface. Besides, these systems adopt the vertex-centric programming model and release users from the tedious communication protocol. Such systems can also provide fault-tolerant and high scalability compared to the traditional graph processing libraries, such as Parallel BGL and CGMgraph. There exist several excellent systems, like Pregel, Giraph, GPS, Trinity.
Graph partitioning algorithms. To evaluate the performance of distributed graph algorithm, Ma et al. introduced three measures, which are visit times, makespan and data shipment. As efficiency (makespan) remains the dominant factor, they suggested to sacrifice visit times and data shipment for makespan, which advocates a well-balanced graph partition strategy when designing distributed algorithms. Actually, various graph partitioning algorithms focused on this object as well. METIS is an off-line graph partitioning package which can bring off high quality graph partitioning subject to a variety of requirements. But it is expensive to use METIS partitioning large graphs.
· The existing graph computation systems may suffer from the high-quality graph partition at a certain point.
· Existing graph computation systems cannot efficiently exploit the benefit of high quality graph partitions.
In this project, we present a novel graph computation engine, partition aware graph computation engine (PAGE). To efficiently support computation tasks with different partitioning qualities, we develop some unique components in this new framework. First, in PAGE’s worker, communication module is extended with a new dual concurrent message processor. The message processor concurrently handles both local and remote incoming messages in a unified way, thus accelerating the message processing. Furthermore, the concurrency of the message processor is tunable according to the online statistics of the system. Second, a partition aware module is added in each worker to monitor the partition related characters and adjust the concurrency of the message processor adaptively to fit the online workload.
The work of dynamic graph repartition and PAGE are orthogonal. To balance the workload, the proposed strategies repartition the graph according to the online workload. Thus the quality of underlying graph partition changes along with repartitioning. The existing graph computation systems may suffer from the high-quality graph partition at a certain point, but PAGE can mitigate this drawback and improve the performance by decreasing the cost of a worker further.
· PAGE can mitigate the drawback and improve the performance by decreasing the cost of a worker further.
· To efficiently support computation tasks with different partitioning qualities, we develop some unique components in this new framework.
· PAGE can effectively harness the partition information to guide parallel processing resource allocation, and improve the computation performance.
· We introduce a dual concurrent message processor. The message processor concurrently processes incoming messages in a unified way and is the cornerstone in PAGE.
· We present a dynamic concurrency control model. The model estimates concurrency for dual concurrent message processor by satisfying the producer consumer constraints. The model always generates proper configurations for PAGE when the graph applications or underlying graph partitions change.
· Speed - 1.1 Ghz
· RAM - 256 MB(min)
· Hard Disk - 20 GB
· Floppy Drive - 1.44 MB
· Key Board - Standard Windows Keyboard
· Mouse - Two or Three Button Mouse
· Monitor - SVGA
· Operating System : Windows 7
· Front End : JSP AND SERVLET
· Database : MYSQL
Yingxia Shao; Bin Cui; Lin Ma “PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION” Knowledge and Data Engineering, IEEE Transactions on May 2014.