show_data: Quickly See What Data Exists in the BDNYC Database

In the BDNYC database code, I added a new method called show_data. Its purpose is to show you all and any kind of data that exists in the database for a specific target. The method is part of the module, under the BDNYCData class.

The only input it needs is the U-number of the target. So, for example, if you want to know what data exists for target U10276, type:

> bdnyc.show_data(‘U10276’)

assuming that bdnyc is where you loaded the database using the pickle.load(f) command. The method will automatically check whether the target exists in the database, and if it does, it will print out a list of rows, each one describing a piece of data for the target in question. For the case of U10276, you will get the following in your terminal:

[‘unum’, ‘U10276’]
[‘index’, 74]
[‘name’, ‘sd0423’]
[‘sptype’, ‘T0’]
[‘ra’, ’04 23 48.6′]
[‘dec’, ‘-04 14 04.0’]
[‘standard’, ‘Yes’]
[0, ‘opt’, ‘med’, ‘CT 4m’, ‘2002jan25’]
[1, ‘nir’, ‘high’, ‘NIRSPEC’, ‘2006jan10’]
[2, ‘nir’, ‘high’, ‘NIRSPEC’, ‘2005dec11’]
[3, ‘nir’, ‘high’, ‘NIRSPEC’, ‘2001oct09’]
[4, ‘nir’, ‘med’, ‘NIRSPEC’, ‘0000xxx00’]
[5, ‘nir’, ‘low’, ‘IRTF SpeX Prism’, ‘2003sep17’]
[6, ‘nir’, ‘phot’, ‘2MASS’, [‘H’, ‘K’, ‘J’]]

From this, you can see that there are five spectra in the database: one optical-medium resolution, three nir-high resolution, one nir-medium resolution, and one nir-low resolution. You also see that there is nir-photometry for this target for the J, H, and K bands.

Note that the second row gives you the index corresponding to this target in the database. This number is the same result that you would get if you used the method bdnyc.match_unum(unum).

One last thing, to make this output code-friendly, I added the option to give you the output as a list object —as opposed to just a terminal print-out— which you can then easily manipulate with your code. To do this, type:

> x = bdnyc.show_data(‘U10276’, dump=True)

And x will hold the list result shown above. Once you can see what data exists for a target, it becomes much easier to access the actual spectrum or photometry you want.

This is the complete method:

    bdnyc.show_data(unum, dump=True)

Sherpa Part Three (Spectral Measurements)

Over the course of this semester, I have been utilizing the program Sherpa, written in Python, to model Potassium (K) absorption lines and make three specific measurements. Specifically, these measurements are the equivalent width, full width half maximum, and line-to-continuum flux ratio. In this entry, I will show the commands I utilized to make these measurements.

When I use Sherpa, I import it as a python module as such:
>>> from sherpa.astro.ui import *    # All Sherpa functions are found here

If you followed my post on modeling with Sherpa, then I have used 3 separate model components to create my model. These include the power law (pl), the lorentzian (l1) and the gaussian (g1). The power law describes the continuum, and the three together represent the data.

To measure the equivalent width of the K line, I use Sherpa’s eqwidth() function. The syntax is:
>>>eqwidth(continuum, continuum+line, lo=lower wl limit, hi=upper wl limit, id=data ID #)
Using my data for a K line whose line goes from 1.252µ to 1.253µ, I measure the eq. width like so:
>>> eqwidth(pl, pl+l1+g1, lo=1.252, hi=1.253, id=1)

The output will be the equivalent width value, in microns. Nice and simple. The hard part was constructing the models and eyeballing the lower/upper limits for the eq. width measurement. All of this I did manually by eye.

The line-to-continuum flux ratio and full width half maximum measurements are harder. They require locating the actual arrays of values for the various models. In order to access the model’s x and y values (continuum+lorentzian+gaussian), you do:
>>> modelArrays = get_model_plot(1)     # The 1 is the ID number
# modelArrays.x  –> array of x values
# modelArrays.y  –> array of y values

To access the y values for the continuum only (pl), you have to do:
>>> key = pl._cache.keys()[0]
>>>  cont_yvals = pl._cache[key]
# This may not work if some of the values in pl.__dict__ are not set properly. To be honest, # I’m not fully sure which ones may screw this up, but for me, I have:
>>> pl._NoNewAttributesAfterInit__initialized = True
>>> pl._use_caching = True
# For me, since I use a power law with an exponent of 0, all of the y values are the same. So I can access this value instead by doing:
>>> cont_yval = pl.ampl      # The continuum is a constant value: the amplitude

Armed with the model’s x and y value arrays, and the continuum’s y value array (or value), we can make the measurements. To calculate the line-to-continuum flux ratio, we must first find the minimum (maximum for emission lines) value of the model, and then compare it with the corresponding continuum flux value:
>>> F_model = numpy.min(modelArrays.y)        # Minimum of the model
>>> min_ind = numpy.where(modelArrays.y == F_model)[0][0]   # min’s array index
>>> F_cont = cont_yvals[min_ind]                      # Continuum value
>>> depth = 1.0 – (F_model/F_cont)                  # The line-to-continuum flux ratio!

The full width half maximum was a little tougher. The algorithm I used is as follows:
>>> F_halfmax = (F_cont + F_max) / 2.0            # Flux at the half maximum
>>> indices = numpy.where(modelArrays.y <= F_halfmax)[0]
>>> wls = modelArrays.x[indices]
>>> fwhm = wls[-1] – wls[0]

The problem with this algorithm is that sometimes there are pockets of spectrum that are equal to or lower than the half max, which get into the wls array. For those, I had to go through them manually and fix the wls array.

That’s how I measured the equivalent width, line-to-continuum flux ratio, and full width half maximum using Sherpa. If you have any questions or know of a better (or different) way of calculating these values, feel free to leave a comment or email me at

Editing astrotools

It is important to maintain some standards in astrotools. The goal is to keep it readable and stable. With than in mind, I shall provide some coding guidelines and then describe the structure of astrotools in detail to allow for its easy editing.

Actually, I will let Guido van Rossum (the father of Python) provide you with all the necessary coding guidelines, from naming conventions to code lay-out. You can find them in Guido’s Style Guide for Python Code.

Let me clear out an important aspect about Python modules. A module is a collection of reusable pieces of code, such as functions and classes. A piece of code in a module can use another piece within the same module or in another module, or it can be completely independent. The implications are that you can add a function:

  • that is completely self-contained and that does not interfere with anything else in astrotools
  • that makes use of other astrotools functions/classes
  • that makes use of other Python modules (e.g. numpy), in which case you need a command line to invoke such module (more on this below).

The order of things in astrotools

The module astrotools is broken down into five sections. From top to bottom in the code, they are described below.

I) Documentation: Contains general information about the contents of the module. This type of non-executable text (a.k.a. docstring) must always start and end with three apostrophes (which I like to call Melchior, Caspar, & Balthazar). This tells Python that the stuff within them is a string that can be called by the end user using Python help.

II) External Modules: Contains the Python commands used to invoke external Python modules —the famous “import” commands. Having them all in one location as opposed to inside the functions/classes where they are needed helps to avoid repetition (when more than one function/class needs to invoke the same external module) and to clearly see the external modules needed to use astrotools.

III) Public Functions: Organized alphabetically, these functions are meant to be accessible to end users. These functions have their own docstrings, surrounded by the three apostrophes, just like in Section I.

IV) Non-Public functions: Organized alphabetically, these functions are only used by other functions/classes within astrotools. As such, end users have no business accessing these. Python convention says non-public names should be preceded by double underscore. Technically, anyone can still access them, but Python code of conduct tells end users not to do it. These functions do not need docstrings, only commented (#) intro lines to explain their purpose and where they are used.

V) Public Classes: Organized alphabetically, these classes are meant to be accessible to end users. Just like with Public Functions, Public Classes have their own docstrings.

How to add a function/class to astrotools

  1. Download the latest version of astrotools from the GitHub repository into your computer.
  2. If your function/class needs to import an external module, check if it is already imported in Section II of astrotools. If not, add the command line to do so. Take as much advantage as you can of the modules already imported in astrotools before adding a new one. More external modules needed by astrotools means more requirements that end users will have to meet before having a fully functional astrotools module.
  3. Add your function/class in the appropriate location (alphabetically) within the relevant section (III or V) of astrotools.
  4. Your function/class MUST have a docstring, and it should follow reStructuredText syntax. No panic! It is very simple. For instance, if you want a word in italics, you write: *word*. Consult the reStructuredText User Reference for more information. In terms of format and contents, follow the example of docstrings of existing astrotools functions. Include in the docstring a description of all input parameters in your function/class, including their formats. Remember, the point of documenting your code is to allow others to easily use it and edit it in case you die. This is why I like to call the docstring “the death string”.
  5. If your function/class comes with a sub-function, that is, a side function that you wrote that is used by your main function/class, this means you are a good (object-oriented) programmer! The place to put it is Section IV of astrotools. Include in it intro comments about its purpose and where it is used. If you think that this sub-function could also be useful to end users, by all means add it in Section III instead. Just remember to write a docstring for it.
  6. Test the function/class that you added. Test extreme cases, test wrong input formats, test, test, test!
  7. Update in the GitHub repository with your new edited version.
  8. Inform the person that maintains the public documentation for astrotools so that it gets updated in

Regularizing Data

I have been trying to cross correlate two spectra with the intent of finding the radial velocity of one relative to the other.  I’ve been working on a Python code to do this for some time, but was having complications.  Although there is clearly a shift between the two spectra, the numpy cross correlation function that the code uses kept coming up with a shift of zero.  I recently came across the concept of “regularizing” data on a python help website.  Someone was having a similar problem (coming up with a cross correlation result of a zero shift), and it was pointed out that they weren’t regularizing their data.  To regularize data is to subtract off the mean, and divide by the standard deviation.  I tried doing this in my code, and it actually started to work, and give me very nice answers!  The problem is that I don’t particularly understand why this makes the cross correlation work.

Is anyone familiar with the concept of regularizing data? Or has anyone had a similar problem regarding cross correlations?

Fitting a gaussian to your data

Let’s say you have a nice histogram, like this…

…and you want to fit a gaussian to it so that you can find the mean, and the standard deviation.  Follow these steps!

First, we have to make sure we have the right modules imported

>>> import matplotlib.pyplot as plt
>>> import matplotlib.mlab as mlab
>>> from scipy.stats import norm

Let’s say your data is stored in some array called data.

>>> (mu,sigma) =
Mu is the mean, and sigma is one standard deviation. If you don’t care about plotting your data, you can stop here.
>>> plt.figure(1)
>>> n,bins,patches=plt.hist(data,20,normed=1,facecolor=’green’,align=’mid’)
The number after data (20) is the number of bins you want your data to go into. Normed has to do with the integral of the gaussian.
>>> y = mlab.normpdf(bins,mu,sigma)
>>> plt.plot(bins,y,’r–‘,linewidth=2)

Now your data is nicely plotted as a histogram and its corresponding gaussian!

Intro to astrotools

The BDNYC Python module astrotools is now up and running for all to use and edit. This module is intended to be a collection of useful pieces of code for our everyday data handling needs. So far, astrotools is useful to handle fits files and spectral data. Additions to astrotools are welcome.

Let me clear out some relevant Python concepts. A module is simply a set of functional pieces of code. These pieces can be either functions or classes, among other things. A function is a code that executes an action, for example “make me a sandwich”. A class is a code that creates an object with specific characteristics and it allows you to handle this object very effectively. For the example above, you can have a class to create the person that will make the sandwich, a class to create a slice of mortadella, and a class to create mayonnaise. Of course, you can have functions that make no use of classes. More on that in a future post.

Where to find astrotools and how to install it

The module astrotools is available at as Make sure that you download it into one of your Python folders in sys.path, which are the folders where Python looks for modules whenever you invoke one in your Python session. If you have no idea what I am referring to, please read this piece of Python documentation.

How to use astrotools

Now you are ready to invoke astrotools into your Python session. Type:

“import astrotools”

That’s it! You are ready to use astrootols functions and classes. For example, if you want to use the function that gets spectral data from a fits file, type:

“x = astrotools.read_spec(filename)”

and x will hold the output of the read_spec function. To learn about the functions and classes in astrotools, visit

By convention, modules are invoked using a nickname, if only to make typing easier. I
suggest the nickname “at”. Then, import astrotools by typing:

“import astrotools as at”

To use it, type:

“x = at.read_spec(filename)”

In later posts I will describe in greater detail some important Python concepts as well as some guidelines to follow when editing astrotools.

Sherpa Part Two (Fitting)

I have spent the last month or so immersing myself in Sherpa, the Python-built program I am using to make the measurements of the Potassium absorption line at 1.2525 microns. While finally coming up for air, I am going to share how I’ve used Sherpa in order to fit a model (or rather, 3 combined models) to the spectral feature so I can make the desired measurements. I will walk you through an example, the plotting and fitting of target “DE1228”.

To start, you need to have all of the data available. Using the BDNYC database developed last semester, we load into arrays the wavelength, flux, and snr values from may 30th, 2007:

>>> w = bdnyc.targets[33].nir[‘high’][‘NIRSPEC’][‘2007may30’][61][‘wl’]
>>> f = bdnyc.targets[33].nir[‘high’][‘NIRSPEC’][‘2007may30’][61][‘flux’]
>>> snr = bdnyc.targets[33].nir[‘high’][‘NIRSPEC’][‘2007may30’][61][‘snr’]

I then used a series of numpy commands to chop the arrays so as to only contain the values from 1.25 to 1.255 microns. Once that was done, I converted the snr to uncertainty values and stored them in an array called “unc”. Since these actions did not require Sherpa, I will omit them from this post. Once I had this completed, it was time to create the Sherpa data structure required for analysis. You use these commands:

>>> from sherpa.astro.ui import *     # This is necessary if you are using Sherpa in a non-standalone version.
>>> data = unpack_arrays(w,f)        # Loads the wavelength and flux arrays into the data structure, which I’ve originally named “data”.
>>> = ‘DE1228’              # Give Sherpa the name of the target — this is not necessary, but it will automatically title any plots from now on.

Now that the data structure is created, we can now start using the sherpa functions to load in the data, models, uncertainties, etc. and bind them to an “id” number. These are the commands I used next:

>>> set_data(1, data)              # I am using the id number “1”, which is the default id number, but I will explicitly state it so you can see how to state it.
>>> set_staterror(1, unc)         # Load in uncertainties as the error
>>> plot_data(1)                      # This plots the data. Error is automatically plotted as well. This is not a necessary step for fitting, but it’s so you can see what you’re dealing with.
>>> set_source(1,       # I used a model containing a power law, gaussian profile, and lorentzian profile to fit the data and absorption line.

At this stage in the game, all of the data is now inputted, and now we have to play around. If you want to fit the models to the data, you have to use the fit command:

>>> fit(1)
If you do not prime the program by giving it some good values to start with, it may not come up with anything good. To prime, you can use:
>>> pl.ampl = 9
>>> l1. fwhm  = 0.0011
Etcetera. You basically give somewhat good values to start with, and then have it fit the data from there. However, I wanted to fit to the peaks of the continuum data, and the fitting function seemed to be very unstable, often iving me horrible fits even when primed, so I was unable to find a good fit without manually fitting the models to the data myself. For DE1228, I used the values:

>>> pl.gamma = 10
>>> pl.ref = 1
>>> pl.ampl = 8.6
>>> l1.fwhm = 0.00115
>>> l1.pos = 1.25251
>>> l1.ampl = -0.00093
>>> g1.fwhm = 0.00021
>>> g1.pos = 1.25255
>>> g1.ampl = -0.157

Once you’ve fit the model to the data (either manually or not), you can plot the fit by typing:

>>> plot_fit(1)

You can then plot (or overplot) the different model components:

>>> plot_model_component(pl, overplot=True)
>>> plot_model_component(g1, overplot=True)
>>> plot_model_component(l1, overplot=True)

And that’s how I fit the models to the potassium absorption line. I’ve consolidated a lot of these numbers and commands into a table and python module ( While specific, parts of the module can be applied to other people using Sherpa, so I can share the code if needed. In my next blog post, I’ll describe how to utilize the model to make the spectral line measurements (i.e. equivalent width, line-to-continuum flux ratio, full width half maximum).

Fitting a power law to data

I wanted to fit a power law function to data, not a polynomial. Here’s how I did it.

I used the least squares method.

How to fit a power law to the data

power fit to L4 data

Example of a power law fit to data

If you prefer other types of functions, just change the function form to whatever you want to fit when you define it.

The inputs for leastsq are the error function (difference between data and a function you want to fit) and initial conditions. When full_output = nonzero, it returns the covariance matrix in addition to the parameters that minimize the sum of squares of the error function.

What would you fit a power law function to?

Reading & Writing (table data) Fits Files in Python

This is one technique of writing arrays to fits files and correspondingly accessing those arrays. I create a fits file where each column of data corresponds to an array. In this example I have three ndarrays (wv_full_test,  fx_full_test, and (fx_full_test*.1).

(1)  Writing table data arrays to a fits file:
$import pyfits

$col1 = pyfits.Column(name = ‘wave’, format = ‘D’, array = wv_full_test)
$col2 = pyfits.Column(name = ‘fx’, format = ‘D’, array = fx_full_test)
$col3 = pyfits.Column(name = ‘fx_error’, format = ‘D’, array = (wv_full_test*.1))

$cols = pyfits.ColDefs([col1, col2, col3])
$tbhdu = pyfits.new_table(cols)
$hdu = pyfits.PrimaryHDU()
$thdulist = pyfits.HDUList([hdu, tbhdu])

(2) Reading fits files into table data arrays:
$input = pyfits.core.getdata(‘filename.fits’, 1)

(3) To access each individual array:
$wavelength_array = input.field(‘wave’)
$fx_array = input.field(‘fx’)
$fx_error_array = input.field(‘fx’)

For more information take a look at the Pyfits Handbook



Rebin Numpy Arrays in Python

I’m currently working to determine radial velocities of suspected young brown dwarfs. I’m working with high resolution optical spectra collected from the MIKE Spectrograph on Magellan II. Through this process I’ve gained a much better understanding for data analysis and reduction in Python.

One such helpful tool I recently discovered in Python is how to rebin numpy arrays. Vivienne and I are cross-correlating two stellar spectrums to determine the relative pixel shifts between the two. These pixel shifts correspond to wavelength shifts of the emitted stellar light due to the doppler effect. In order to gain a more precise measurement of the relative pixel shift I rebinned the data. Data rebinning is useful when you want to reduce or increase the amount of data you’re working with, without losing important information. I rebinned my data from a precision of 1 pixel to .1 pixel.

Below is a plot of the Rebinned (blue) and Original (red) Flux & Wavelength Data.

I wrote a script in Python to rebin the data. The useful commands are:

#convert wavelength & flux into from lists to arrays called wave_array & fx_array
wave_array = numpy.asarray(wave_nz, dtype = float)
fx_array = numpy.asarray(fx_nz, dtype = float)

#rebin data array by a factor of 10.
wv_full_test = scipy.ndimage.interpolation.zoom(wave_array, 10)
fx_full_test = scipy.ndimage.interpolation.zoom(fx_array, 10)