.. _table_io: Unified file read/write interface *********************************** Astropy provides a unified interface for reading and writing data in different formats. For many common cases this will simplify the process of file I/O and reduce the need to master the separate details of all the I/O packages within Astropy. For details on the implementation see :ref:`io_registry`. Getting started with Table I/O ============================== The :class:`~astropy.table.Table` class includes two methods, :meth:`~astropy.table.Table.read` and :meth:`~astropy.table.Table.write`, that make it possible to read from and write to files. A number of formats are automatically supported (see `Built-in table readers/writers`_) and new file formats and extensions can be registered with the :class:`~astropy.table.Table` class (see :ref:`io_registry`). To use this interface, first import the :class:`~astropy.table.Table` class, then simply call the :class:`~astropy.table.Table` :meth:`~astropy.table.Table.read` method with the name of the file and the file format, for instance ``'ascii.daophot'``: .. doctest-skip:: >>> from astropy.table import Table >>> t = Table.read('photometry.dat', format='ascii.daophot') It is possible to load tables directly from the Internet using URLs. For example, download tables from Vizier catalogues in CDS format (``'ascii.cds'``):: >>> t = Table.read("ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/snrs.dat", ... readme="ftp://cdsarc.u-strasbg.fr/pub/cats/VII/253/ReadMe", ... format="ascii.cds") # doctest: +SKIP For certain file formats, the format can be automatically detected, for example from the filename extension:: >>> t = Table.read('table.tex') # doctest: +SKIP Similarly, for writing, the format can be explicitly specified:: >>> t.write(filename, format='latex') # doctest: +SKIP As for the :meth:`~astropy.table.Table.read` method, the format may be automatically identified in some cases. The underlying file handler will also automatically detect various compressed data formats and transparently uncompress them as far as supported by the Python installation (see :meth:`~astropy.utils.data.get_readable_fileobj`). For writing, one can also specify details about the `Table serialization methods`_ via the ``serialize_method`` keyword argument. This allows fine control of the way to write out certain columns, for instance writing an ISO format Time column as a pair of JD1 / JD2 floating point values (for full resolution) or as a formatted ISO date string. Any additional arguments specified will depend on the format. For examples of this see the section `Built-in table readers/writers`_. This section also provides the full list of choices for the ``format`` argument. Command-line utility -------------------- For convenience, the command-line tool ``showtable`` can be used to print the content of tables for the formats supported by the unified I/O interface:: $ showtable astropy/io/fits/tests/data/table.fits target V_mag ------- ----- NGC1001 11.1 NGC1002 12.3 NGC1003 15.2 To get full documentation on the usage and available options do ``showtable --help``. .. _built_in_readers_writers: Built-in table readers/writers ============================== The :class:`~astropy.table.Table` class has built-in support for various input and output formats including :ref:`table_io_ascii`, -:ref:`table_io_fits`, :ref:`table_io_hdf5`, and :ref:`table_io_votable`. A full list of the supported formats and corresponding classes is shown in the table below. The ``Write`` column indicates those formats that support write functionality, and the ``Suffix`` column indicates the filename suffix indicating a particular format. If the value of ``Suffix`` is ``auto``, the format is auto-detected from the file itself. Not all formats support auto-detection. =========================== ===== ====== ============================================================================================ Format Write Suffix Description =========================== ===== ====== ============================================================================================ ascii Yes ASCII table in any supported format (uses guessing) ascii.aastex Yes :class:`~astropy.io.ascii.AASTex`: AASTeX deluxetable used for AAS journals ascii.basic Yes :class:`~astropy.io.ascii.Basic`: Basic table with custom delimiters ascii.cds No :class:`~astropy.io.ascii.Cds`: CDS format table ascii.commented_header Yes :class:`~astropy.io.ascii.CommentedHeader`: Column names in a commented line ascii.csv Yes .csv :class:`~astropy.io.ascii.Csv`: Basic table with comma-separated values ascii.daophot No :class:`~astropy.io.ascii.Daophot`: IRAF DAOphot format table ascii.ecsv Yes .ecsv :class:`~astropy.io.ascii.Ecsv`: Basic table with Enhanced CSV (supporting metadata) ascii.fixed_width Yes :class:`~astropy.io.ascii.FixedWidth`: Fixed width ascii.fixed_width_no_header Yes :class:`~astropy.io.ascii.FixedWidthNoHeader`: Fixed width with no header ascii.fixed_width_two_line Yes :class:`~astropy.io.ascii.FixedWidthTwoLine`: Fixed width with second header line ascii.html Yes .html :class:`~astropy.io.ascii.HTML`: HTML table ascii.ipac Yes :class:`~astropy.io.ascii.Ipac`: IPAC format table ascii.latex Yes .tex :class:`~astropy.io.ascii.Latex`: LaTeX table ascii.no_header Yes :class:`~astropy.io.ascii.NoHeader`: Basic table with no headers ascii.rdb Yes .rdb :class:`~astropy.io.ascii.Rdb`: Tab-separated with a type definition header line ascii.rst Yes .rst :class:`~astropy.io.ascii.RST`: reStructuredText simple format table ascii.sextractor No :class:`~astropy.io.ascii.SExtractor`: SExtractor format table ascii.tab Yes :class:`~astropy.io.ascii.Tab`: Basic table with tab-separated values fits Yes auto :mod:`~astropy.io.fits`: Flexible Image Transport System file hdf5 Yes auto HDF5_: Hierarchical Data Format binary file votable Yes auto :mod:`~astropy.io.votable`: Table format used by Virtual Observatory (VO) initiative =========================== ===== ====== ============================================================================================ .. _table_io_ascii: ASCII formats -------------- The :meth:`~astropy.table.Table.read` and :meth:`~astropy.table.Table.write` methods can be used to read and write formats supported by `astropy.io.ascii`. Use ``format='ascii'`` in order to interface to the generic :func:`~astropy.io.ascii.read` and :func:`~astropy.io.ascii.write` functions from `astropy.io.ascii`. When reading a table this means that all supported ASCII table formats will be tried in order to successfully parse the input. For example: .. doctest-skip:: >>> t = Table.read('astropy/io/ascii/tests/t/latex1.tex', format='ascii') >>> print(t) cola colb colc ---- ---- ---- a 1 2 b 3 4 When writing a table with ``format='ascii'`` the output is a basic character-delimited file with a single header line containing the column names. All additional arguments are passed to the `astropy.io.ascii` :func:`~astropy.io.ascii.read` and :func:`~astropy.io.ascii.write` functions. Further details are available in the sections on :ref:`io_ascii_read_parameters` and :ref:`io_ascii_write_parameters`. For example, to change column delimiter and the output format for the ``colc`` column use: .. doctest-skip:: >>> t.write(sys.stdout, format='ascii', delimiter='|', formats={'colc': '%0.2f'}) cola|colb|colc a|1|2.00 b|3|4.00 .. note:: When specifying a specific ASCII table format using the unified interface, the format name is prefixed with ``ascii`` in order to identify the format as ASCII-based. Compare the table above to the `astropy.io.ascii` list of :ref:`supported formats ` where the prefix is not needed. Therefore the following are equivalent: .. doctest-skip:: >>> dat = ascii.read('file.dat', format='daophot') >>> dat = Table.read('file.dat', format='ascii.daophot') For compatibility with astropy version 0.2 and earlier, the following format values are also allowed in ``Table.read()``: ``daophot``, ``ipac``, ``html``, ``latex``, and ``rdb``. .. attention:: **ECSV is recommended** For writing and reading tables to ASCII in a way that fully reproduces the table data, types and metadata (i.e. the table will "round-trip"), we highly recommend using the :ref:`ecsv_format`. This writes the actual data in a simple space-delimited format (the ``basic`` format) that any ASCII table reader can parse, but also includes metadata encoded in a comment block that allows full reconstruction of the original columns. This includes support for :ref:`ecsv_format_mixin_columns` (such as `~astropy.coordinates.SkyCoord` or `~astropy.time.Time`) and :ref:`ecsv_format_masked_columns`. .. _table_io_fits: FITS ---- Reading and writing tables in `FITS `_ format is supported with ``format='fits'``. In most cases, existing FITS files should be automatically identified as such based on the header of the file, but if not, or if writing to disk, then the format should be explicitly specified. Reading ^^^^^^^^ If a FITS table file contains only a single table, then it can be read in with: .. doctest-skip:: >>> from astropy.table import Table >>> t = Table.read('data.fits') If more than one table is present in the file, you can select the HDU as follows:: >>> t = Table.read('data.fits', hdu=3) # doctest: +SKIP In this case if the ``hdu`` argument is omitted then the first table found will be read in and a warning will be emitted:: >>> t = Table.read('data.fits') # doctest: +SKIP WARNING: hdu= was not specified but multiple tables are present, reading in first available table (hdu=1) [astropy.io.fits.connect] Writing ^^^^^^^^ To write a table ``t`` to a new file:: >>> t.write('new_table.fits') # doctest: +SKIP If the file already exists and you want to overwrite it, then set the ``overwrite`` keyword:: >>> t.write('existing_table.fits', overwrite=True) # doctest: +SKIP At this time there is no support for appending an HDU to an existing file or writing multi-HDU files using the Table interface. Instead one can use the convenience function :func:`~astropy.io.fits.table_to_hdu` to create a single binary table HDU and insert or append that to an existing :class:`~astropy.io.fits.HDUList`. As of astropy version 3.0 there is support for writing a table which contains :ref:`mixin_columns` such as `~astropy.time.Time` or `~astropy.coordinates.SkyCoord`. This uses FITS ``COMMENT`` cards to capture additional information needed order to fully reconstruct the mixin columns when reading back from FITS. The information is a Python `dict` structure which is serialized using YAML. Keywords ^^^^^^^^^ The FITS keywords associated with an HDU table are represented in the ``meta`` ordered dictionary attribute of a :ref:`Table `. After reading a table one can view the available keywords in a readable format using: .. doctest-skip:: >>> for key, value in t.meta.items(): ... print('{0} = {1}'.format(key, value)) This does not include the "internal" FITS keywords that are required to specify the FITS table properties (e.g. ``NAXIS``, ``TTYPE1``). ``HISTORY`` and ``COMMENT`` keywords are treated specially and are returned as a list of values. Conversely, the following shows examples of setting user keyword values for a table ``t``: .. doctest-skip:: >>> t.meta['MY_KEYWD'] = 'my value' >>> t.meta['COMMENT'] = ['First comment', 'Second comment', 'etc'] >>> t.write('my_table.fits', overwrite=True) The keyword names (e.g. ``MY_KEYWD``) will be automatically capitalized prior to writing. At this time, the ``meta`` attribute of the :class:`~astropy.table.Table` class is simply an ordered dictionary and does not fully represent the structure of a FITS header (for example, keyword comments are dropped). .. _fits_astropy_native: TDISPn Keyword ^^^^^^^^^^^^^^ TDISPn FITS keywords will map to and from the `~astropy.table.Column` ``format`` attribute if the display format is convertible to and from a Python display format. Below are the rules used for both conversion directions. TDISPn to Python Format String ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TDISPn format characters are defined in the table below. ============ ================================================================ Format Description ============ ================================================================ Aw Character Lw Logical Iw.m Integer Bw.m Binary, integers only Ow.m Octal, integers only Zw.m Hexadecimal, integers only Fw.d Floating-point, fixed decimal notation Ew.dEe Floating-point, exponential notation ENw.d Engineering; E format with exponent multiple of three ESw.d Scientific; same as EN but non-zero leading digit if not zero Gw.dEe General; appears as F if significance not lost, also E Dw.dEe Floating-point, exponential notation, double precision ============ ================================================================ Where w is the width in characters of displayed values, m is the minimum number of digits displayed, d is the number of digits to the right of decimal, and e is number of digits in the exponent. The .m and Ee fields are optional. The A (character), L (logical), F (floating point), and G (general) display formats can be directly translated to Python format strings. The other formats need to be modified to match Python display formats. For the integer formats (I, B, O, and Z), the width (w) value is used to add space padding to the left of the column value. The minimum number (m) value is not used. For the E, G, D, EN, and ES formats (floating point exponential) the width (w) and precision (d) are both used, but the exponential (e) is not used. Python Format String to TDISPn ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The conversion from Python format strings back to TDISPn is slightly more complicated. Python strings map to the TDISP format A if the Python formatting string does not contain right space padding. It will accept left space padding. The same applies to the logical format L. The integer formats (decimal integer, binary, octal, hexidecimal) map to the I, B, O, and Z TDISP formats respectively. Integer formats do not accept a zero padded format string or a format string with no left padding defined (a width is required in the TDISP format standard for the Integer formats). For all float and exponential values zero padding is not accepted. There must be at least a width or precision defined. If only a width is defined, there is no precision set for the TDISPn format. If only a precision is defined, the width is set to the precision plus an extra padding value depending on format type, and both are set in the TDISPn format. Otherwise, if both a width and precision are present they are both set in the TDISPn format. A Python ``f`` or ``F`` map to TDISP F format. The Python ``g`` or ``G`` map to TDISP G format. The Python ``e`` and ``E`` map to TDISP E format. Masked columns ^^^^^^^^^^^^^^ Tables that contain `~astropy.table.MaskedColumn` columns can be written to FITS. By default this will replace the masked data elements with certain sentinel values according to the FITS standard: - ``NaN`` for float columns - Value of ``TNULLn`` for integer columns, as defined by the column ``fill_value`` attribute - Null string for string columns (not currently implemented) When the file is read back those elements are marked as masked in the returned table, but see `issue #4708 `_ for problems in all three cases. The FITS standard has a few limitations: - Not all data types are supported (e.g. logical / boolean) - Integer columns require picking one value as the NULL indicator. If all possible values are represented in valid data (e.g. an unsigned int columns with all 256 possible values in valid data) then there is no way to represent missing data. - The masked data values are permanently lost, precluding the possibility of later unmasking the values. Astropy provides a work-around for this limitation that users can choose to use. The key part is to use the ``serialize_method='data_mask'`` keyword argument when writing the table. This tells the FITS writer to split each masked column into two separate columns, one for the data and one for the mask. When it gets read back that process is reversed and the two columns are merged back into one masked column. .. doctest-skip:: >>> from astropy.table.table_helpers import simple_table >>> t = simple_table(masked=True) >>> t['d'] = [False, False, True] >>> t['d'].mask = [True, False, False] >>> t a b c d int64 float64 str1 bool ----- ------- ---- ----- -- 1.0 c -- 2 2.0 -- False 3 -- e True .. doctest-skip:: >>> t.write('data.fits', serialize_method='data_mask', overwrite=True) >>> Table.read('data.fits')
a b c d int64 float64 bytes1 bool ----- ------- ------ ----- -- 1.0 c -- 2 2.0 -- False 3 -- e True .. warning:: This option goes outside of the established FITS standard for representing missing data so users should be careful about choosing this option, especially if other (non-astropy) users will be reading the file(s). Behind the scenes, astropy is converting the masked columns into two distinct data and mask columns, then writing metadata into ``COMMENT`` cards to allow reconstruction of the original data. Astropy native objects (mixin columns) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is possible to store not only standard `~astropy.table.Column` objects to a FITS table HDU, but also any Astropy native objects (:ref:`mixin_columns`) within a `~astropy.table.Table` or `~astropy.table.QTable`. This includes `~astropy.time.Time`, `~astropy.units.Quantity`, `~astropy.coordinates.SkyCoord`, and many others. In general a mixin column may contain multiple data components as well as object attributes beyond the standard Column attributes like ``format`` or ``description``. Abiding by the rules set by the FITS standard requires mapping of these data components and object attributes to the appropriate FITS table columns and keywords. Thus, a well defined protocol has been developed to allow the storage of these mixin columns in FITS while allowing the object to "round-trip" through the file with no loss of data or attributes. Quantity ~~~~~~~~ A `~astropy.units.Quantity` mixin column in a `~astropy.table.QTable` is represented in a FITS table using the ``TUNITn`` FITS column keyword to incorporate the unit attribute of Quantity. For example: .. doctest-skip:: >>> from astropy.table import QTable >>> import astropy.units as u >>> t = QTable([[1, 2] * u.angstrom)]) >>> t.write('my_table.fits', overwrite=True) >>> qt = QTable.read('my_table.fits') >>> qt col0 Angstrom float64 -------- 1.0 2.0 Time ~~~~ Astropy provides the following features for reading and writing ``Time``: - Writing and reading `~astropy.time.Time` Table columns to and from FITS tables - Reading time coordinate columns in FITS tables (compliant with the time standard) as `~astropy.time.Time` Table columns Writing and reading Astropy Time columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, a `~astropy.time.Time` mixin column within a `~astropy.table.Table` or `~astropy.table.QTable` will be written to FITS in full precision. This will be done using the FITS time standard by setting the necessary FITS header keywords. The default behavior for reading a FITS table into an `~astropy.table.Table` has historically been to convert all FITS columns to `~astropy.table.Column` objects, which have closely matching properties. For some columns, however, closer native astropy representations are possible, and one can indicate these should be used by passing ``astropy_native=True`` (for backwards compatibility, this is not done by default). This will convert columns conforming to the FITS time standard to `~astropy.time.Time` instances, avoiding any loss of precision. For example: .. doctest-skip:: >>> from astropy.time import Time >>> from astropy.table import Table >>> from astropy.coordinates import EarthLocation >>> t = Table() >>> t['a'] = Time([100.0, 200.0], scale='tt', format='mjd', ... location=EarthLocation(-2446354, 4237210, 4077985, unit='m')) >>> t.write('my_table.fits', overwrite=True) >>> tm = Table.read('my_table.fits', astropy_native=True) >>> tm['a']
`` tag, defaults to ``'table{id}'`` where ``id`` is the id of the Table object. - *max_lines*: maximum number of lines. - *table_class*: HTML classes added to the ``
`` tag, can be useful to customize the style of the table. - *jskwargs*: additional arguments passed to :class:`~astropy.table.JSViewer`. - *css*: CSS style, default to ``astropy.table.jsviewer.DEFAULT_CSS``. - *htmldict*: additional arguments passed to :class:`~astropy.io.ascii.HTML`. .. _Datatables: https://www.datatables.net/ .. _table_io_votable: VO Tables ----------- Reading/writing from/to `VO table `_ files is supported with ``format='votable'``. In most cases, existing VO tables should be automatically identified as such based on the header of the file, but if not, or if writing to disk, then the format should be explicitly specified. If a VO table file contains only a single table, then it can be read in with:: >>> t = Table.read('aj285677t3_votable.xml') If more than one table is present in the file, an error will be raised, unless the table ID is specified via the ``table_id=`` argument:: >>> t = Table.read('catalog.xml') Traceback (most recent call last): ... ValueError: Multiple tables found: table id should be set via the table_id= argument. The available tables are twomass, spitzer >>> t = Table.read('catalog.xml', table_id='twomass') To write to a new file, the ID of the table should also be specified (unless ``t.meta['ID']`` is defined):: >>> t.write('new_catalog.xml', table_id='updated_table', format='votable') When writing, the ``compression=True`` argument can be used to force compression of the data on disk, and the ``overwrite=True`` argument can be used to overwrite an existing file. .. _table_serialization_methods: Table serialization methods =========================== Astropy supports fine-grained control of the way to write out (serialize) the columns in a Table. For instance if you are writing an ISO format Time column to an ECSV ASCII table file, you may want to write this as a pair of JD1 / JD2 floating point values for full resolution (perfect round-trip), or as a formatted ISO date string so that the values are easily readable by you or other applications. The default method for serialization depends on the format (FITS, ECSV, HDF5). For instance HDF5 is a binary format and so it would make sense to store a Time object as JD1 / JD2, while ECSV is a flat ASCII format and commonly you would want to see the date in the same format as the Time object. The defaults also reflect an attempt to minimize compatibility issues between astropy versions. For instance it was possible to write Time columns to ECSV as formatted strings in a version prior to the ability to write as JD1 / JD2 pairs, so the current default for ECSV is to write as formatted strings. The two classes which have configurable serialization method are `~astropy.time.Time` and `~astropy.table.MaskedColumn`. See the sections on Time `Details`_ and `Masked columns`_, respectively, for additional information. The defaults for each format are listed below: ====== ==================== =============== Format Time MaskedColumn ====== ==================== =============== FITS ``jd1_jd2`` ``null_value`` ECSV ``formatted_value`` ``null_value`` HDF5 ``jd1_jd2`` ``data_mask`` YAML ``jd2_jd2`` --- ====== ==================== =============== As an example, start by making a table with a Time column and masked column: >>> import sys >>> from astropy.time import Time >>> from astropy.table import Table, MaskedColumn >>> t = Table(masked=True) >>> t['tm'] = Time(['2000-01-01', '2000-01-02']) >>> t['mc1'] = MaskedColumn([1.0, 2.0], mask=[True, False]) >>> t['mc2'] = MaskedColumn([3.0, 4.0], mask=[False, True]) >>> t
tm mc1 mc2 object float64 float64 ----------------------- ------- ------- 2000-01-01 00:00:00.000 -- 3.0 2000-01-02 00:00:00.000 2.0 -- Now specify that you want all `~astropy.time.Time` columns written as JD1 / JD2 and the ``mc1`` column written as a data / mask pair and write to ECSV: .. doctest-skip:: >>> serialize_method = {Time: 'jd1_jd2', 'mc1': 'data_mask'} >>> t.write(sys.stdout, format='ascii.ecsv', serialize_method=serialize_method) # %ECSV 0.9 ... # schema: astropy-2.0 tm.jd1 tm.jd2 mc1 mc1.mask mc2 2451544.0 0.5 1.0 True 3.0 2451546.0 -0.5 2.0 False "" (Spaces added for clarity) Notice that the ``tm`` column has been replaced by the ``tm.jd1`` and ``tm.jd2`` columns, and likewise a new column ``mc1.mask`` has appeared and it explicitly contains the mask values. When this table is read back with the ``ascii.ecsv`` reader then the original columns are reconstructed. The ``serialize_method`` argument can be set in two different ways: - As a single string like ``data_mask``. This value then applies to every column, and is a convenient strategy for a masked table with no Time columns. - As a `dict`, where the key can be either a single column name or a class (as shown in the example above), and the value is the corresponding serialization method.