VOTable XML handling (astropy.io.votable
)¶
Introduction¶
The astropy.io.votable
subpackage converts VOTable XML files to and
from Numpy record arrays.
Getting Started¶
Reading a VOTable file¶
To read in a VOTable file, pass a file path to
parse
:
from astropy.io.votable import parse
votable = parse("votable.xml")
votable
is a VOTableFile
object, which
can be used to retrieve and manipulate the data and save it back out
to disk.
VOTable files are made up of nested RESOURCE
elements, each of
which may contain one or more TABLE
elements. The TABLE
elements contain the arrays of data.
To get at the TABLE
elements, one can write a loop over the
resources in the VOTABLE
file:
for resource in votable.resources:
for table in resource.tables:
# ... do something with the table ...
pass
However, if the nested structure of the resources is not important,
one can use iter_tables
to
return a flat list of all tables:
for table in votable.iter_tables():
# ... do something with the table ...
pass
Finally, if there is expected to be only one table in the file, it
might be simplest to just use
get_first_table
:
table = votable.get_first_table()
Even easier, there is a convenience method to parse a VOTable file and return the first table all in one step:
from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml")
From a Table
object, one can get the data itself
in the array
member variable:
data = table.array
This data is a Numpy record array.
The columns get their names from both the ID
and name
attributes of the FIELD
elements in the VOTABLE
file. For
example, suppose we had a FIELD
specified as follows:
<FIELD ID="Dec" name="dec_targ" datatype="char" ucd="POS_EQ_DEC_MAIN"
unit="deg">
<DESCRIPTION>
representing the ICRS declination of the center of the image.
</DESCRIPTION>
</FIELD>
Note
The mapping from VOTable name
and ID
attributes to Numpy
dtype names
and titles
is highly confusing.
In VOTable, ID
is guaranteed to be unique, but is not
required. name
is not guaranteed to be unique, but is
required.
In Numpy record dtypes, names
are required to be unique and
are required. titles
are not required, and are not required
to be unique.
Therefore, VOTable’s ID
most closely maps to Numpy’s
names
, and VOTable’s name
most closely maps to Numpy’s
titles
. However, in some cases where a VOTable ID
is not
provided, a Numpy name
will be generated based on the VOTable
name
. Unfortunately, VOTable fields do not have an attribute
that is both unique and required, which would be the most
convenient mechanism to uniquely identify a column.
When converting from a astropy.io.votable.tree.Table
object to
an astropy.table.Table
object, one can specify whether to give
preference to name
or ID
attributes when naming the
columns. By default, ID
is given preference. To give
name
preference, pass the keyword argument
use_names_over_ids=True
:
>>> votable.get_first_table().to_table(use_names_over_ids=True)
This column of data can be extracted from the record array using:
>>> table.array['dec_targ']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
or equivalently:
>>> table.array['Dec']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
Building a new table from scratch¶
It is also possible to build a new table, define some field datatypes and populate it with data:
from astropy.io.votable.tree import VOTableFile, Resource, Table, Field
# Create a new VOTable file...
votable = VOTableFile()
# ...with one resource...
resource = Resource()
votable.resources.append(resource)
# ... with one table
table = Table(votable)
resource.tables.append(table)
# Define some fields
table.fields.extend([
Field(votable, name="filename", datatype="char", arraysize="*"),
Field(votable, name="matrix", datatype="double", arraysize="2x2")])
# Now, use those field definitions to create the numpy record arrays, with
# the given number of rows
table.create_arrays(2)
# Now table.array can be filled with data
table.array[0] = ('test1.xml', [[1, 0], [0, 1]])
table.array[1] = ('test2.xml', [[0.5, 0.3], [0.2, 0.1]])
# Now write the whole thing to a file.
# Note, we have to use the top-level votable file object
votable.to_xml("new_votable.xml")
Outputting a VOTable file¶
To save a VOTable file, simply call the
to_xml
method. It accepts
either a string or Unicode path, or a Python file-like object:
votable.to_xml('output.xml')
There are a number of data storage formats supported by
astropy.io.votable
. The TABLEDATA
format is XML-based and
stores values as strings representing numbers. The BINARY
format
is more compact, and stores numbers in base64-encoded binary. VOTable
version 1.3 adds the BINARY2
format, which allows for masking of
any data type, including integers and bit fields which can not be
masked in the older BINARY
format. The storage format can be set
on a per-table basis using the format
attribute, or globally using the
set_all_tables_format
method:
votable.get_first_table().format = 'binary'
votable.set_all_tables_format('binary')
votable.to_xml('binary.xml')
Using astropy.io.votable
¶
Standard compliance¶
astropy.io.votable.tree.Table
supports the VOTable Format Definition
Version 1.1,
Version 1.2,
and the Version 1.3 proposed recommendation.
Some flexibility is provided to support the 1.0 draft version and
other non-standard usage in the wild. To support these cases, set the
keyword argument pedantic
to False
when parsing.
Note
Each warning and VOTABLE-specific exception emitted has a number and is documented in more detail in Warnings and Exceptions.
Output always conforms to the 1.1, 1.2 or 1.3 spec, depending on the input.
Pedantic mode¶
Many VOTABLE files in the wild do not conform to the VOTABLE
specification. If reading one of these files causes exceptions, you
may turn off pedantic mode in astropy.io.votable
by passing
pedantic=False
to the parse
or
parse_single_table
functions:
from astropy.io.votable import parse
votable = parse("votable.xml", pedantic=False)
Note, however, that it is good practice to report these errors to the author of the application that generated the VOTABLE file to bring the file into compliance with the specification.
Even with pedantic
turned off, many warnings may still be omitted.
These warnings are all of the type
VOTableSpecWarning
and can be turned
off using the standard Python warnings
module.
Missing values¶
Any value in the table may be “missing”. astropy.io.votable
stores
a Numpy masked array in each Table
instance. This behaves like an ordinary Numpy masked array, except
for variable-length fields. For those fields, the datatype of the
column is “object” and another Numpy masked array is stored there.
Therefore, operations on variable length columns will not work – this
is simply because variable length columns are not directly supported
by Numpy masked arrays.
Datatype mappings¶
The datatype specified by a FIELD
element is mapped to a Numpy
type according to the following table:
VOTABLE type Numpy type boolean b1 bit b1 unsignedByte u1 char (variable length) O - A bytes()
object.char (fixed length) S unicodeChar (variable length) O - A str
objectunicodeChar (fixed length) U short i2 int i4 long i8 float f4 double f8 floatComplex c8 doubleComplex c16
If the field is a fixed size array, the data is stored as a Numpy fixed-size array.
If the field is a variable size array (that is arraysize
contains
a ‘*’), the cell will contain a Python list of Numpy values. Each
value may be either an array or scalar depending on the arraysize
specifier.
Examining field types¶
To look up more information about a field in a table, one can use the
get_field_by_id
method, which returns
the Field
object with the given ID. For
example:
>>> field = table.get_field_by_id('Dec')
>>> field.datatype
'char'
>>> field.unit
'deg'
Note
Field descriptors should not be mutated. To change the set of
columns, convert the Table to an astropy.table.Table
, make the
changes, and then convert it back.
Data serialization formats¶
VOTable supports a number of different serialization formats.
- TABLEDATA stores the data in pure XML, where the numerical values are written as human-readable strings.
- BINARY
is a binary representation of the data, stored in the XML as an
opaque
base64
-encoded blob. - BINARY2 was added in VOTable 1.3, and is identical to “BINARY”, except that it explicitly records the position of missing values rather than identifying them by a special value.
- FITS
stores the data in an external FITS file. This serialization is not
supported by the
astropy.io.votable
writer, since it requires writing multiple files.
The serialization format can be selected in two ways:
1) By setting the
format
attribute of aastropy.io.votable.tree.Table
object:votable.get_first_table().format = "binary" votable.to_xml("new_votable.xml")2) By overriding the format of all tables using the
tabledata_format
keyword argument when writing out a VOTable file:votable.to_xml("new_votable.xml", tabledata_format="binary")
Converting to/from an astropy.table.Table
¶
The VOTable standard does not map conceptually to an
astropy.table.Table
. However, a single table within the VOTable
file may be converted to and from an astropy.table.Table
:
from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml").to_table()
As a convenience, there is also a function to create an entire VOTable file with just a single table:
from astropy.io.votable import from_table, writeto
votable = from_table(table)
writeto(votable, "output.xml")
Note
By default, to_table
will use the ID
attribute from the files to
create the column names for the Table
object. However,
it may be that you want to use the name
attributes instead. For this,
set the use_names_over_ids
keyword to True
. Note that since field
names
are not guaranteed to be unique in the VOTable specification,
but column names are required to be unique in Numpy structured arrays (and
thus astropy.table.Table
objects), the names may be renamed by appending
numbers to the end in some cases.
See Also¶
Reference/API¶
astropy.io.votable Package¶
This package reads and writes data formats used by the Virtual Observatory (VO) initiative, particularly the VOTable XML format.
Functions¶
parse (source[, columns, invalid, pedantic, …]) |
Parses a VOTABLE xml file (or file-like object), and returns a VOTableFile object. |
parse_single_table (source, **kwargs) |
Parses a VOTABLE xml file (or file-like object), reading and returning only the first Table instance. |
validate (source[, output, xmllint, filename]) |
Prints a validation report for the given file. |
from_table (table[, table_id]) |
Given an Table object, return a VOTableFile file structure containing just that single table. |
is_votable (source) |
Reads the header of a file to determine if it is a VOTable file. |
writeto (table, file[, tabledata_format]) |
Writes a VOTableFile to a VOTABLE xml file. |
Classes¶
Conf |
Configuration parameters for astropy.io.votable . |
astropy.io.votable.tree Module¶
Classes¶
Link ([ID, title, value, href, action, id, …]) |
LINK elements: used to reference external documents and servers through a URI. |
Info ([ID, name, value, id, xtype, ref, …]) |
INFO elements: arbitrary key-value pairs for extensions to the standard. |
Values (votable, field[, ID, null, ref, …]) |
VALUES element: used within FIELD and PARAM elements to define the domain of values. |
Field (votable[, ID, name, datatype, …]) |
FIELD element: describes the datatype of a particular column of data. |
Param (votable[, ID, name, value, datatype, …]) |
PARAM element: constant-valued columns in the data. |
CooSys ([ID, equinox, epoch, system, id, …]) |
COOSYS element: defines a coordinate system. |
FieldRef (table, ref[, ucd, utype, config, pos]) |
FIELDref element: used inside of GROUP elements to refer to remote FIELD elements. |
ParamRef (table, ref[, ucd, utype, config, pos]) |
PARAMref element: used inside of GROUP elements to refer to remote PARAM elements. |
Group (table[, ID, name, ref, ucd, utype, …]) |
GROUP element: groups FIELD and PARAM elements. |
Table (votable[, ID, name, ref, ucd, utype, …]) |
TABLE element: optionally contains data. |
Resource ([name, ID, utype, type, id, …]) |
RESOURCE element: Groups TABLE and RESOURCE elements. |
VOTableFile ([ID, id, config, pos, version]) |
VOTABLE element: represents an entire file. |
astropy.io.votable.converters Module¶
This module handles the conversion of various VOTABLE datatypes to/from TABLEDATA and BINARY formats.
Functions¶
get_converter (field[, config, pos]) |
Get an appropriate converter instance for a given field. |
table_column_to_votable_datatype (column) |
Given a astropy.table.Column instance, returns the attributes necessary to create a VOTable FIELD element that corresponds to the type of the column. |
astropy.io.votable.ucd Module¶
This file contains routines to verify the correctness of UCD strings.
Functions¶
parse_ucd (ucd[, …]) |
Parse the UCD into its component parts. |
check_ucd (ucd[, …]) |
Returns False if ucd is not a valid unified content descriptor. |
astropy.io.votable.util Module¶
Various utilities and cookbook-like things.
Functions¶
convert_to_writable_filelike (fd[, compressed]) |
Returns a writable file-like object suitable for streaming output. |
coerce_range_list_param (p[, frames, numeric]) |
Coerces and/or verifies the object p into a valid range-list-format parameter. |
astropy.io.votable.validator Package¶
Validates a large collection of web-accessible VOTable files, and generates a report as a directory tree of HTML files.
Functions¶
make_validation_report ([urls, destdir, …]) |
Validates a large collection of web-accessible VOTable files. |
astropy.io.votable.xmlutil Module¶
Various XML-related utilities
Functions¶
check_id (ID[, name, config, pos]) |
Raises a VOTableSpecError if ID is not a valid XML ID. |
fix_id (ID[, config, pos]) |
Given an arbitrary string, create one that can be used as an xml id. |
check_token (token, attr_name[, config, pos]) |
Raises a ValueError if token is not a valid XML token. |
check_mime_content_type (content_type[, …]) |
Raises a VOTableSpecError if content_type is not a valid MIME content type. |
check_anyuri (uri[, config, pos]) |
Raises a VOTableSpecError if uri is not a valid URI. |
validate_schema (filename[, version]) |
Validates the given file against the appropriate VOTable schema. |