Metadata-Version: 2.4
Name: adlfs
Version: 2026.2.0
Summary: Access Azure Blobs and Data Lake Storage (ADLS) Gen2 with fsspec and dask
Maintainer-email: Greg Hayes <hayesgb@gmail.com>
License: BSD
Keywords: file-system,dask,azure
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: azure-core<2.0.0,>=1.28.0
Requires-Dist: azure-datalake-store<0.1,>=0.0.53
Requires-Dist: azure-identity
Requires-Dist: azure-storage-blob[aio]>=12.17.0
Requires-Dist: fsspec>=2023.12.0
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: myst-parser; extra == "docs"
Requires-Dist: furo; extra == "docs"
Requires-Dist: numpydoc; extra == "docs"
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: docker; extra == "tests"
Requires-Dist: pytest-mock; extra == "tests"
Requires-Dist: arrow; extra == "tests"
Requires-Dist: dask[dataframe]; extra == "tests"
Dynamic: license-file

Filesystem interface to Azure Blob and Data Lake Storage (Gen2) 
------------------------------------------------------------


[![PyPI version shields.io](https://img.shields.io/pypi/v/adlfs.svg)](https://pypi.python.org/pypi/adlfs/)
[![Latest conda-forge version](https://img.shields.io/conda/vn/conda-forge/adlfs?logo=conda-forge)](https://anaconda.org/conda-forge/aldfs)
[![API Reference](https://img.shields.io/badge/API-Reference-blue)](https://fsspec.github.io/adlfs/api/)

Quickstart
----------

This package can be installed using:

`pip install adlfs`

or

`conda install -c conda-forge adlfs`

The `az://` and `abfs://` protocols are included in fsspec's known_implementations registry.

To connect to Azure Blob Storage or Azure Data Lake Storage (ADLS) Gen2 filesystem you can use the protocol `abfs` or `az`:

```python
import dask.dataframe as dd

storage_options={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}

ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=storage_options)
ddf = dd.read_parquet('az://{CONTAINER}/folder.parquet', storage_options=storage_options)

Accepted protocol / uri formats include:
'PROTOCOL://container/path-part/file'
'PROTOCOL://container@account.blob.core.windows.net/path-part/file'
'PROTOCOL://container@account.dfs.core.windows.net/path-part/file'

or optionally, if AZURE_STORAGE_ACCOUNT_NAME and an AZURE_STORAGE_<CREDENTIAL> is 
set as an environmental variable, then storage_options will be read from the environmental
variables
```

To read from a public storage blob you are required to specify the `'account_name'`.
For example, you can access [NYC Taxi & Limousine Commission](https://azure.microsoft.com/en-us/services/open-datasets/catalog/nyc-taxi-limousine-commission-green-taxi-trip-records/) as:

```python
storage_options = {'account_name': 'azureopendatastorage'}
ddf = dd.read_parquet('az://nyctlc/green/puYear=2019/puMonth=*/*.parquet', storage_options=storage_options)
```

Details
-------
The package includes pythonic filesystem implementations for both [Azure Blobs](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-overview) and [Azure Datalake Gen2 (ADLS)](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction), that facilitate interactions between these implementations and Dask.  This is done leveraging the [fsspec/filesystem_spec](https://github.com/fsspec/filesystem_spec) base class and Azure Python SDKs.

Operations against Azure Blobs and ADLS Gen2 are implemented by leveraging [Azure Blob Storage Python SDK](https://github.com/Azure/azure-sdk-for-python).

### Setting credentials
The `storage_options` can be instantiated with a variety of keyword arguments depending on the filesystem. The most commonly used arguments are:
- `connection_string`
- `account_name`
- `account_key`
- `sas_token`
- `tenant_id`, `client_id`, and `client_secret` are combined for an Azure ServicePrincipal e.g. `storage_options={'account_name': ACCOUNT_NAME, 'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}`
- `anon`: bool, optional.
   The value to use for whether to attempt anonymous access if no other credential is passed. By default (`None`), the
   `AZURE_STORAGE_ANON` environment variable is checked. False values (`false`, `0`, `f`) will resolve to `False` and
   anonymous access will not be attempted. Otherwise the value for `anon` resolves to True.
- `location_mode`: valid values are "primary" or "secondary" and apply to RA-GRS accounts

For more argument details see all arguments for [`AzureBlobFileSystem` here](https://fsspec.github.io/adlfs/api/#adlfs.AzureBlobFileSystem)

The following environmental variables can also be set and picked up for authentication:
- "AZURE_STORAGE_CONNECTION_STRING"
- "AZURE_STORAGE_ACCOUNT_NAME"
- "AZURE_STORAGE_ACCOUNT_KEY"
- "AZURE_STORAGE_SAS_TOKEN"
- "AZURE_STORAGE_TENANT_ID"
- "AZURE_STORAGE_CLIENT_ID"
- "AZURE_STORAGE_CLIENT_SECRET"

The filesystem can be instantiated for different use cases based on a variety of `storage_options` combinations. The following list describes some common use cases utilizing `AzureBlobFileSystem`, i.e. protocols `abfs`or `az`. Note that all cases require the `account_name` argument to be provided:
1. Anonymous connection to public container: `storage_options={'account_name': ACCOUNT_NAME, 'anon': True}` will assume the `ACCOUNT_NAME` points to a public container, and attempt to use an anonymous login. Note, the default value for `anon` is True.
2. Auto credential solving using Azure's DefaultAzureCredential() library: `storage_options={'account_name': ACCOUNT_NAME, 'anon': False}` will use [`DefaultAzureCredential`](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) to get valid credentials to the container `ACCOUNT_NAME`. `DefaultAzureCredential` attempts to authenticate via the [mechanisms and order visualized here](https://learn.microsoft.com/en-us/python/api/overview/azure/identity-readme?view=azure-python#defaultazurecredential).
3. Auto credential solving without requiring `storage_options`: Set `AZURE_STORAGE_ANON` to `false`, resulting in automatic credential resolution. Useful for compatibility with fsspec.
4. Azure ServicePrincipal: `tenant_id`, `client_id`, and `client_secret` are all used as credentials for an Azure ServicePrincipal: e.g. `storage_options={'account_name': ACCOUNT_NAME, 'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}`.

### Append Blob
The `AzureBlobFileSystem` accepts [all of the Async BlobServiceClient arguments](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python).

By default, write operations create BlockBlobs in Azure, which, once written can not be appended. It is possible to create an AppendBlob using `mode="ab"` when creating and operating on blobs. Currently, AppendBlobs are not available if hierarchical namespaces are enabled.

### Older versions
ADLS Gen1 filesystem has officially been [retired](https://learn.microsoft.com/en-us/lifecycle/products/azure-data-lake-storage-gen1). Hence the adl:// method, which was designed to connect to ADLS Gen1 is obsolete.
