FedPS is a Python module designed for data preprocessing in Federated Learning, primarily leveraging aggregated statistics. The preprocessing workflow involves the following five steps:
- Local Statistics Estimation: Clients estimate local statistics from their local data.
 - Aggregation: The server receives the local statistics and performs aggregation.
 - Global Parameter Calculation: The server calculates the global preprocessing parameters.
 - Parameter Distribution: The global parameters are then sent back to the clients.
 - Data Preprocessing: Clients apply the preprocessing to their local data.
 
- Python (>= 3.10)
 - Scikit-learn (~= 1.7)
 - NumPy (>= 1.20)
 - DataSketches
 - PyZMQ
 
- Create a Python env
 
conda create --name fedps python=3.10
conda activate fedps- Clone this project
 
git clone https://github.com/xuefeng-xu/fedps.git && cd fedps- Build the project
 
pip install -e .- Set up communication channels
 
# Client1 channel
from fedps.channel import ClientChannel
channel = ClientChannel(
    local_ip="127.0.0.1", local_port=5556,
    remote_ip="127.0.0.1", remote_port=5555,
)# Client2 channel
from fedps.channel import ClientChannel
channel = ClientChannel(
    local_ip="127.0.0.1", local_port=5557,
    remote_ip="127.0.0.1", remote_port=5555,
)# Server channel
from fedps.channel import ServerChannel
channel = ServerChannel(
    local_ip="127.0.0.1", local_port=5555,
    remote_ip=["127.0.0.1", "127.0.0.1"],
    remote_port=[5556, 5557],
)- Specify 
FL_typeandrolein the preprocessor 
- 
FL_type: "H" (Horizontal) or "V" (Vertical) - 
role: "client" or "server" 
# Client1 code example
from fedps.preprocessing import MinMaxScaler
X = [[-1, 2], [-0.5, 6]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)# Client2 code example
from fedps.preprocessing import MinMaxScaler
X = [[0, 10], [1, 18]]
est = MinMaxScaler(FL_type="H", role="client", channel=channel)
Xt = est.fit_transform(X)
print(Xt)# Server code example
from fedps.preprocessing import MinMaxScaler
est = MinMaxScaler(FL_type="H", role="server", channel=channel)
est.fit()- Run the script
 
# Run in three terminals
python client1.py
python client2.py
python server.pyPS: See more cases in the example folder.
- 
Discretization
 - 
Encoding
 - 
Scaling
 - 
Transformation
 - 
Imputation
IterativeImputer(experimental)KNNImputerSimpleImputer
 
- Currently, this library does not support sparse data.
 KBinsDiscretizer,StandardScaler, andSplineTransformercannot set thesample_weightparameter in their fit methods.KBinsDiscretizerdoes not support thequantile_methodparameter.IterativeImputerdoes not support thesample_posteriorandn_nearest_featuresparameters.KNNImputerdoes not support custom weight funtion and distance metric.
This project is build on Scikit-learn.