Metadata-Version: 2.4
Name: pyexcel-xlsxr
Version: 0.6.2
Summary: Read xlsx file using partial xml
Home-page: https://github.com/pyexcel/pyexcel-xlsxr
Download-URL: https://github.com/pyexcel/pyexcel-xlsxr/archive/0.6.2.tar.gz
Author: C.W.
Author-email: info@pyexcel.org
License: New BSD
Keywords: python
Classifier: Topic :: Software Development :: Libraries
Classifier: Programming Language :: Python
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
License-File: LICENSE
Requires-Dist: lxml>=3.4.4
Requires-Dist: pyexcel-io>=0.6.2
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: download-url
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

================================================================================
pyexcel-xlsxr - Let you focus on data, instead of xlsx format
================================================================================

.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png
   :target: https://www.patreon.com/chfw

.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel-mobans/master/images/awesome-badge.svg
   :target: https://awesome-python.com/#specific-formats-processing

.. image:: https://codecov.io/gh/pyexcel/pyexcel-xlsxr/branch/master/graph/badge.svg
   :target: https://codecov.io/gh/pyexcel/pyexcel-xlsxr

.. image:: https://badge.fury.io/py/pyexcel-xlsxr.svg
   :target: https://pypi.org/project/pyexcel-xlsxr



.. image:: https://pepy.tech/badge/pyexcel-xlsxr/month
   :target: https://pepy.tech/project/pyexcel-xlsxr


.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg
   :target: https://gitter.im/pyexcel/Lobby

.. image:: https://img.shields.io/static/v1?label=continuous%20templating&message=%E6%A8%A1%E7%89%88%E6%9B%B4%E6%96%B0&color=blue&style=flat-square
    :target: https://moban.readthedocs.io/en/latest/#at-scale-continous-templating-for-open-source-projects

.. image:: https://img.shields.io/static/v1?label=coding%20style&message=black&color=black&style=flat-square
    :target: https://github.com/psf/black

**pyexcel-xlsxr** is a specialized xlsx reader using lxml. It does partial reading, meaning
it wont load all content into memory.


lxml installation
=================

This library depends on lxml. Because its availablity, the use of this library is restricted.

for PyPy, lxml == 3.4.4 are tested to work well. But lxml above 3.4.4 is difficult to get installed.

for Python 3.7, please use lxml==4.1.1.

Otherwise, this library works OK with lxml 3.4.4 or above.



Support the project
================================================================================

If your company uses pyexcel and its components in a revenue-generating product,
please consider supporting the project on GitHub or
`Patreon <https://www.patreon.com/bePatron?u=5537627>`_. Your financial
support will enable me to dedicate more time to coding, improving documentation,
and creating engaging content.


Known constraints
==================

Fonts, colors and charts are not supported.

Nor to read password protected xls, xlsx and ods files.

Installation
================================================================================


You can install pyexcel-xlsxr via pip:

.. code-block:: bash

    $ pip install pyexcel-xlsxr


or clone it and install it:

.. code-block:: bash

    $ git clone https://github.com/pyexcel/pyexcel-xlsxr.git
    $ cd pyexcel-xlsxr
    $ python setup.py install

Usage
================================================================================

As a standalone library
--------------------------------------------------------------------------------

Read from an xlsx file
********************************************************************************

Here's the sample code:

.. code-block:: python

    >>> from pyexcel_xlsxr import get_data
    >>> data = get_data("your_file.xlsx")
    >>> import json
    >>> print(json.dumps(data))
    {"Sheet 1": [[1, 2, 3], [4, 5, 6]], "Sheet 2": [["row 1", "row 2", "row 3"]]}



Read from an xlsx from memory
********************************************************************************

Continue from previous example:

.. code-block:: python

    >>> # This is just an illustration
    >>> # In reality, you might deal with xlsx file upload
    >>> # where you will read from requests.FILES['YOUR_XLSX_FILE']
    >>> data = get_data(io)
    >>> print(json.dumps(data))
    {"Sheet 1": [[1, 2, 3], [4, 5, 6]], "Sheet 2": [[7, 8, 9], [10, 11, 12]]}


Pagination feature
********************************************************************************



Let's assume the following file is a huge xlsx file:

.. code-block:: python

   >>> huge_data = [
   ...     [1, 21, 31],
   ...     [2, 22, 32],
   ...     [3, 23, 33],
   ...     [4, 24, 34],
   ...     [5, 25, 35],
   ...     [6, 26, 36]
   ... ]
   >>> sheetx = {
   ...     "huge": huge_data
   ... }
   >>> save_data("huge_file.xlsx", sheetx)

And let's pretend to read partial data:

.. code-block:: python

   >>> partial_data = get_data("huge_file.xlsx", start_row=2, row_limit=3)
   >>> print(json.dumps(partial_data))
   {"huge": [[3, 23, 33], [4, 24, 34], [5, 25, 35]]}

And you could as well do the same for columns:

.. code-block:: python

   >>> partial_data = get_data("huge_file.xlsx", start_column=1, column_limit=2)
   >>> print(json.dumps(partial_data))
   {"huge": [[21, 31], [22, 32], [23, 33], [24, 34], [25, 35], [26, 36]]}

Obvious, you could do both at the same time:

.. code-block:: python

   >>> partial_data = get_data("huge_file.xlsx",
   ...     start_row=2, row_limit=3,
   ...     start_column=1, column_limit=2)
   >>> print(json.dumps(partial_data))
   {"huge": [[23, 33], [24, 34], [25, 35]]}

As a pyexcel plugin
--------------------------------------------------------------------------------

No longer, explicit import is needed since pyexcel version 0.2.2. Instead,
this library is auto-loaded. So if you want to read data in xlsx format,
installing it is enough.


Reading from an xlsx file
********************************************************************************

Here is the sample code:

.. code-block:: python

    >>> import pyexcel as pe
    >>> sheet = pe.get_book(file_name="your_file.xlsx")
    >>> sheet
    Sheet 1:
    +---+---+---+
    | 1 | 2 | 3 |
    +---+---+---+
    | 4 | 5 | 6 |
    +---+---+---+
    Sheet 2:
    +-------+-------+-------+
    | row 1 | row 2 | row 3 |
    +-------+-------+-------+



Reading from a IO instance
********************************************************************************

You got to wrap the binary content with stream to get xlsx working:

.. code-block:: python

    >>> # This is just an illustration
    >>> # In reality, you might deal with xlsx file upload
    >>> # where you will read from requests.FILES['YOUR_XLSX_FILE']
    >>> xlsxfile = "another_file.xlsx"
    >>> with open(xlsxfile, "rb") as f:
    ...     content = f.read()
    ...     r = pe.get_book(file_type="xlsx", file_content=content)
    ...     print(r)
    ...
    Sheet 1:
    +---+---+---+
    | 1 | 2 | 3 |
    +---+---+---+
    | 4 | 5 | 6 |
    +---+---+---+
    Sheet 2:
    +-------+-------+-------+
    | row 1 | row 2 | row 3 |
    +-------+-------+-------+




License
================================================================================

New BSD License

Developer guide
==================

Development steps for code changes

#. git clone https://github.com/pyexcel/pyexcel-xlsxr.git
#. cd pyexcel-xlsxr

Upgrade your setup tools and pip. They are needed for development and testing only:

#. pip install --upgrade setuptools pip

Then install relevant development requirements:

#. pip install -r rnd_requirements.txt # if such a file exists
#. pip install -r requirements.txt
#. pip install -r tests/requirements.txt

Once you have finished your changes, please provide test case(s), relevant documentation
and update changelog.yml

.. note::

    As to rnd_requirements.txt, usually, it is created when a dependent
    library is not released. Once the dependency is installed
    (will be released), the future
    version of the dependency in the requirements.txt will be valid.


How to test your contribution
--------------------------------------------------------------------------------

Although `nose` and `doctest` are both used in code testing, it is advisable
that unit tests are put in tests. `doctest` is incorporated only to make sure
the code examples in documentation remain valid across different development
releases.

On Linux/Unix systems, please launch your tests like this::

    $ make

On Windows, please issue this command::

    > test.bat


Before you commit
------------------------------

Please run::

    $ make format

so as to beautify your code otherwise your build may fail your unit test.




Change log
================================================================================

0.6.2 - 31.10.2025
--------------------------------------------------------------------------------

**Fixed**

#. Fix freeze when parsing certain corrupt XLSX files
#. Fix reading of files with more than 26 columns

**Updated**

#. Migrated to pytest

0.6.1 - 11.11.2024
--------------------------------------------------------------------------------

**Updated**

#. `#9 <https://github.com/pyexcel/pyexcel-xlsxr/issues/9>`_: Potential fix for
   incorrect reading of data with empty cells when used with pyexcel 

0.6.0 - 10.10.2020
--------------------------------------------------------------------------------

**Updated**

#. New style xlsx plugins, promoted by pyexcel-io v0.6.2.

0.5.3 - 23.06.2020
--------------------------------------------------------------------------------

**Fixed**

#. `#5 <https://github.com/pyexcel/pyexcel-xlsxr/issues/5>`_: AttributeError
   when a cell text is None
#. `#2 <https://github.com/pyexcel/pyexcel-xlsxr/issues/2>`_: unit test failed
   in OpenSuSE

0.5.2 - 15.09.2018
--------------------------------------------------------------------------------

**Updated**

#. Fix python 3 compactibility

0.5.1 - 14.07.2018
--------------------------------------------------------------------------------

**Updated**

#. `#1 <https://github.com/pyexcel/pyexcel-xlsxr/issues/1>`_: fix xml parsing
   problem when the microsoft spreadsheetml 2009 ac name space 'x14ac' made lxml
   an idiot

0.5.0 - 24.11.2017
--------------------------------------------------------------------------------

**Added**

#. Initial release. In order align it with pyexcel 0.5.0 release, its version
   start from 0.5.0

