Divide and conquer pattern searching
by Staff Writers
Thuwal, Saudi Arabia (SPX) Jan 03, 2017
Searching for recurring patterns in network systems has become a fundamental part of research and discovery in fields as diverse as biology and social media.
KAUST researchers have developed a pattern or graph-mining framework that promises to significantly speed up searches on massive network data sets.
"A graph is a data structure that models complex relationships among objects," explained Panagiotis Kalnis, leader of the research team from the KAUST Extreme Computing Research Center.
"Graphs are widely used in many modern applications, including social networks, biological networks like protein-to-protein interactions, and communication networks like the internet."
In these applications, one of the most important operations is the process of finding recurring graphs that reveal how objects tend to connect to each other.
The process, which is called frequent subgraph mining (FSM), is an essential building block of many knowledge extraction techniques in social studies, bioinformatics and image processing, as well as in security and fraud detection. However, graphs may contain hundreds of millions of objects and billions of relationships, which means that extracting recurring patterns places huge demands on time and computing resources.
"In essence, if we can provide a better algorithm, all the applications that depend on FSM will be able to perform deeper analysis on larger data in less time," Kalnis noted.
Kalnis and his colleagues developed a system called ScaleMine that offers a ten-fold acceleration compared with existing methods.
"FSM involves a vast number of graph operations, each of which is computationally expensive, so the only practical way to support FSM in large graphs is by massively parallel computation," he said.
In parallel computing, the graph search is divided into multiple tasks and each is run simultaneously on its own processor.
If the tasks are too large, the entire search is held up by waiting for the slowest task to complete; if the tasks are too small, the extra communication needed to coordinate the parallelization becomes a significant additional computational load.
Kalnis' team overcame this limitation by performing the search in two steps: a first approximation step to determine the search space and the optimal division of tasks and a second computational step in which large tasks are split dynamically into the optimal number of subtasks. This resulted in search speeds up to ten times faster than previously possible.
"Hopefully this performance improvement will enable deeper and more accurate analysis of large graph data and the extraction of new knowledge," Kalnis said.
King Abdullah University of Science and Technology
Space Technology News - Applications and Research
|The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.