Sharing Models without Sharing Data: Distributed Consensus Reduced Support Vector Machine

Yuh-Jye Lee

Department of Applied Mathematics, National Chiao Tung University

Nowadays, machine learning performs astonishingly in many different fields. The more data we have, our machine learning methods will show better results. However, in some cases, the data owners may not want to or not allow to share the data they have. On the other hand, we may encounter extremely large data sets that even cannot be stored in a single machine. In order to deal with these two problems, we propose the distributed consensus reduced support vector machine (DCRSVM) for binary classification. Image that we have a set of local working units and one center master. The DCSVM allows the local working units share the local models without sharing their own data. Iteratively, by sharing and updating the local models, the center master will generate a consensus final model. The performance of the consensus model is approximately as good as the model trained by using all local working units’ data together. Similarly, training an extremely large dataset, we can divide the dataset into many partitions and dispatch the partitions to many computation units. Thus, our proposed method can satisfy the requirement of no data sharing.