Guest Editorial: Special Section: Advances in Distributed Computing and Data Analysis

The Special Section on Advances in Distributed Computing and Data Analysis was inspired by the conference held in 2016, the 17th International Conference on Parallel and Distributed Computing, Applications and Technologies. According to the review results during conference preparation, ten papers related to the scope of this section were selected for possible inclusion. After two rounds of rigorous review process, we finally accepted six papers where each one has over 40% extension to their conference version. These papers present interesting algorithms and promising techniques in the field of Distributed Computing and Data Analysis.

In the first paper "Imbalanced Data Classification Based on Hybrid Re-sampling and Twin Support Vector Machine", a combined technique with twin support vector machine (TWSVM) was proposed to identify the minority class in imbalanced datasets. It employed over-sampling and under-sampling to balance the training data. The classification accuracy of the whole dataset can thus be improved. The efficiency of dealing with imbalanced data classification was also improved in their experiment.

The paper "Promising Techniques for Anomaly Detection on Network Traffic" discussed anomaly detection techniques based on analysis of global traffic. They introduced Principle Component Analysis-based and Diffusion Wavelets-based analysis techniques in details. After compared with various anomaly detection methods, these techniques show their outperformance in global traffic analysis for anomaly detection.

Tao Jiang et. al’s paper "BHyberCube: a MapReduce Aware Heterogeneous Architecture for Data Center" proposed a new heterogeneous network, BHyberCube network (BHC), for the distributed data processing application, MapReduce. They addressed the heterogeneous nodes and scalability issues by considering the implementation of MapReduce in the existing topologies. Their simulations of BHC in multi-job injection and different probability of worker servers’ communications scenarios showed that the BHC could be a viable interconnection topology in today’s data center for MapReduce.

In paper "Click-Boosted Graph Ranking for Image Retrieval", Jun Wu et. al. proposed a novel click-boosted graph ranking framework for image retrieval, which addressed the limited effectiveness of the well-known semantic gap for image data. This framework consisted of two coupled components. The first one was a click predictor based on matrix factorization with visual regularization, which was used to alleviate the sparseness of the click through data. The second component was a soft-label graph ranker that conducts the image ranking using the enriched click-through data noise-tolerantly. The proposed method was effective for the tasks of click predicting and image ranking.

The paper "A Weighted Mutual Information Biclustering Algorithm for Gene Expression Data" presented a novel biclustering algorithm, which is called Weighted Mutual Information Biclustering algorithm (WMIB), to discover local characteristics of gene expression data. Traditional clustering methods were difficult to deal with this high dimensional data, whose a subset of genes were co-regulated under a subset of conditions. Their algorithm applied the weighted mutual information as new similarity measure which can simultaneously detect complex linear and nonlinear relationships between genes. In experiments on yeast gene expression data, their algorithm generated larger biclusters with lower mean square residues.

In last paper of this special section, "An Optimization Scheme for Routing and Scheduling of Concurrent User Requests in Wireless Mesh Networks", Z. Cao et. al. constructed analytical network models and formulated multi-pair data transfers as a rigorous optimization problem. They proposed an optimization scheme for cooperative routing and scheduling together with channel assignment to establish a network path for each request through the selection of appropriate link patterns. Their performance superiority was illustrated in experiments on various types of mesh networks.

PDCAT is an annual international conference covering the theory, design, analysis, evaluation and application of parallel and distributed computing systems. It started from Hong Kong in 2000, followed with the great successes in Taipei, China, Kanazawa, Japan, Chengdu, China, Singapore, Dalian, China, Adelaide, Australia, Dunedin, New Zealand, Hiroshima, Japan, Wuhan, China, Gwangju, Korea, Beijing, Jeju, Korea, and then Guangzhou, China in 2016. The PDCAT 2016 had the support of Sun Yat-Sen University and IEEE Computer Society Technical Committee on Parallel Processing. The conference aims to strengthen the drive towards a close and promoted networks of different areas on the latest research problems, innovations, trends, and needs in parallel computing.

We sincerely thank to the program committee members for their support in selecting paper and especially the reviewers for their valuable comments to improve selected papers. We also thank all authors for their contribution to this special section. Special thanks are given to Prof. Mirjana Ivanović, the Editor in Chief of ComSIS, for providing us the opportunity to publish this special section, valuable comments in improving quality of selected papers, and support in the whole process.

Guest Editor
Hui Tian

University of Adelaide
Beijing Jiaotong University