Metadata-Version: 2.4
Name: syft-client
Version: 0.1.87
Summary: A simple client library for setting up secure communication channels using Google Drive
Author-email: OpenMined <info@openmined.org>
License: Apache-2.0
Project-URL: Homepage, https://github.com/OpenMined/syft-client
Project-URL: Repository, https://github.com/OpenMined/syft-client
Project-URL: Documentation, https://github.com/OpenMined/syft-client#readme
Project-URL: Bug Tracker, https://github.com/OpenMined/syft-client/issues
Keywords: privacy,federated-learning,google-drive,communication
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: Communications :: File Sharing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: google-api-python-client>=2.95.0
Requires-Dist: google-auth>=2.22.0
Requires-Dist: google-auth-oauthlib>=1.0.0
Requires-Dist: dnspython>=2.3.0
Requires-Dist: cryptography>=41.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: watchdog>=3.0.0
Requires-Dist: syft-serve>=0.3.11
Requires-Dist: requests>=2.32.4
Requires-Dist: pandas
Provides-Extra: job
Requires-Dist: syft-job; extra == "job"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-mock>=3.10.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-xdist>=3.0.0; extra == "test"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "test"
Requires-Dist: black>=23.0.0; extra == "test"
Requires-Dist: flake8>=6.0.0; extra == "test"
Requires-Dist: mypy>=1.0.0; extra == "test"
Requires-Dist: isort>=5.12.0; extra == "test"
Requires-Dist: coverage>=7.0.0; extra == "test"
Provides-Extra: dev
Requires-Dist: jupyter>=1.0.0; extra == "dev"
Requires-Dist: notebook>=6.5.0; extra == "dev"
Requires-Dist: ipython>=8.0.0; extra == "dev"
Requires-Dist: tqdm>=4.64.0; extra == "dev"
Provides-Extra: all
Requires-Dist: syft-client[dev,job,test]; extra == "all"

# Syft Client

Syft client is a high level client object bundling modular components which enable a user to execute map/reduce bash scripts (and any file-based resources those bash scripts might require) across a decentralized, peer-to-peer network of computers... bash scripts which are designed and tested using local mock versions of the remote/private files located on those computers... connected through whichever transport layers any two connected organizations already trust (e.g. Google Drive, Dropbox, Microsoft 365, etc.). 

# Principles

- **File-first:** State is first and foremost described by files, which are synced amongst peers on the network to communicate updates to state. Client-side state can be made directly viewable through client side applications which display the local / client-side files in a convenient manner (but not files which aren't available to the local filesystem... unless one is submitting a job to someone else to query their state). State can also be locally stored/cached in more performant ways (e.g. client or server side databases, indexes, etc.) but this is secondary for performance, and is an optional part of the system. All state is first-and-foremost made available as a file (future development may explore local, in-memory file storage...akin to the relationship between Hadoop's HDFS and Spark's upgrade to in-memory storage over HDFS, but this would still adhere to the same hierarchical addressing system of folders/files.).
- **File-permission-first, job-policy second, nothing third:** Access-control is first and foremost described by permissioned access to files on the system. Files are made available bewteen users at a low level, and when file permission changes are to be requested, or when other access is to be granted, that is to be managed by the job policy framework. No other permission system exists outside of file permissions or job policies (including manual code review policy).
- **Offline-first:** Datasites (i.e. computers/users/peers) in the syft network can go offline/online as desired and all functionality continues to work, as messages from that server are cached locally when it is offline, and messages to each server are cached in the transport layers (e.g. Google Drive) until that datasite comes back online. Datasites can use faster, ephemeral transport layers when datasites are online at the same time (e.g. WebRTC), but this is an optional upgrade, not the foundation to the system.
- **Shell-first:** Functions are first and foremost described as shell scripts (run.sh) inside of a folder of resources (job_23j3ijgw/).
- **Schema-last:** For the core syft protocol, we seek to only require schemas or dependencies which all computers in the world are highly likely to already have (e.g. some kind of local filesystem + some kind of local shell + internet access)... with anything else that might be built on top of this core layer left open for users to steaward.
- **Fail-softly:** When a client seeks to use state owned by antoher computer but the recipient isn't storing data in teh way teh client wishes... the server doesn't reject the job entirely (e.g. as it would if you tried to call an API endpoint that doesn't exist). Insateda, the shell script which was attempted generates some error which (with the data owners's permission) the data scientist can then view to debug what's going wrong. (see client-first-work)
- **Manual-review-first:** We assume that all shell scripts sent to a data owner will be manually reviewed unless the data owner happens to have a policy which can automatically approve it.
- **DataScientist-first-debugging:** When something goes wrong... we build tools which first-and-foremost enable teh client to be the one doing the work to come up with a souliton (as opposed toa l the world's data owners needing to collaborate together... such as to normalize their data together).
- **Peer-first:** Discoverability on the syft network is assumed to happen somewhere else (e.g. companion websites like SyftHub). The syft protocol is like Signal in this way... if they're not a contact in the network you ahve explicitly authorized... you don't know they exist and nobody else outside your contact list knows you exist on the network.
- **Modular-first:** When in doubt, we separate into optional modules such that upgrades to one doesn't require upgrades to the rest of the system.
- **MapReduce-first:** All interactions between data scientist and data owner personas are first-and-foremost viewed through the lense of a MapReduce system.
- **Unopinionated-first — Convenient-second:** Unless we need to express an opinion on how the users should do things, we seek for teh core layers to be completely onopinoinated (e.g. a glorified email inbox for bash scripts and supporting files), but then upon that highly unopinonated core, we add optional convenience layers to make certain actions easier for teh end user. 
- **Transport-agnostic:** nothing about the syft protocol requires the user of any particular transport layer. And any means of getting strings from one (uniquely addressed) user to another (uniquely addressed) user are viable transport layers for the syft network.
- - **Transport-based-auth first — any auth second:** a user on the system is marked by "a channel i can send messages to". Consequently, we bootstarap identity using the authentication of existing transport layers, but this can be used to exchange keys which might have come through some offline process, and can thus protect transport layers which might not have formal authentication (e.g. insecure transport layers).
- **Mock-always:** every piece of state in the syft ecosystem comes with a mock enabling others to leverage that state in a job if they have permission to see the mock.
- **Automock-first:** mock generation should be automatic as the primary way of doing things, with fallbacks to user-generated/helped mocks when privacy norms are unclear, and with no mocks in some specific instances (but this should be rare).
- **Single-gateway only:** there is only one job queue which sends and receives information to/from each datasite, enabling the data owner to collaborate with confidence that they can fully see anything and everything which enters to be run / exists to be disclosed from/to peers.
- **Job-only:** every request by one user for another user to run computation is done through a job (including one user asking another user for updated information about state on their local machine). High level syntactic sugar might simplify this experience, but under the hood there is only one mechanism for one user to ask another user to run a computation. (this includes anything that might look or feel RPC-esque... under the hood it's also a job)
