Mining from a big graph those subgraphs that satisfy certain requirement is compute-intensive. A natural solution to tackle the high complexity is to utilizes many CPU cores for parallel execution. Unfortunately, existing Big Data frameworks are dominantly data-intensive where communication caused by data movement is the bottleneck, and CPU cores are underutilized. Solving compute-intensive graph mining problems using these frameworks often results in even worse performance.
For example, Chu and Cheng reported that the state-of-the-art MapReduce algorithm for triangle counting uses 1,636 machines and takes 5.33 minutes on a small graph, on which their single-threaded algorithm uses less than 0.5 minute. McSherry et al. reported that existing "think like a vertex" parallel graph processing systems (designed for data-intensive iterative computation) is comparable and sometimes beaten by a single-threaded solution.
Our G-thinker system fundamentally changes the data-intensive design of existing Big Data frameworks. It adopts a task-based execution engine that is able to keep CPUs in a cluster fully utilized even with a budget limit on memory consumption. It also provides a subgraph-centric API that is natural for writing graph mining algorithms. G-thinker is orders of magnitude faster than existing systems and scales to graphs orders of magnitude larger given the same hardware environment.