AstrodbWeb

Followers of this blog and our team’s scientific endeavors may know we have a curated database of brown dwarfs we work with. An initial version of this database has been published in Filippazzo et al. 2015 and contains information for 198 objects. The database is also maintained on Github, where we welcome contributions from other researchers. We’ve developed a set of tools for astronomers to work with SQL databases, namely the Python package astrodbkit. This package can be applied to other SQL databases allowing astronomers from all fields of research to manage their data.
Here we introduce a new tool: AstrodbWeb, a web-based interface to explore the BDNYC database.
Continue reading

Creating a Database with astrodbkit

NOTE: The information on this post may be outdated. We recommend looking into the up-to-date documentation at trac (internal) and ReadTheDocs.

The astrodbkit package can be used to modify an existing SQL database (such as The BDNYC Database) but it can also be used to create and populate a SQL database from scratch.

To do this, import the BDdb module and create a new database with
from astrodbkit import astrodb
dbpath = '/path/to/new_database.db'
astrodb.create_databse(dbpath)

Then load your new database with

db = astrodb.Database(dbpath)

and start adding tables! The db.table() method accepts as its arguments the table name, list of field names, and list of data types like so:

db.table('my_new_table', ['field1','field2'], ['INTEGER','TEXT'], new_table=True)

Note new_table=True is necessary to create a new table. Otherwise, it looks for an existing table to modify (which you could do as well!).

To populate your new database with data, read the documentation here or a summary at Adding Data to the BDNYC Database.

As always, I recommend the SQLite Browser for a nice GUI to make changes outside of the command line.

Happy databasing!

Updating the astrodbkit documentation

As you make changes to the astrodbkit repository on Github, you may find that the documentation needs updating. Luckily, we use the invaluable Sphinx and the awesome ReadTheDocs to generate the documentation so this is fairly simple.

First, make sure you update the appropriate doc strings (those informative green bits just below the function definition) as this is what Sphinx will use to generate the documentation!

Start in the top level astrodbkit directory and generate the documentation from the module docstrings with

sphinx-build docs astrodbkit -a

Then cd into the docs directory and rebuild the html files with

cd docs
make html

Now move back to the top level directory and add, commit, and push your changes to Github with something like

cd ..
git add /docs
git commit -m "Updated the documentation for methods x, y, and z."
git push origin <your_branch>

All set! Refresh the page (after a few minutes so ReadTheDocs can build the pages) and make sure everything is to your liking. Well done.

And just in case, here’s a great tutorial for getting started with Sphinx and here’s the official documentation.

SQL Queries

An SQL database is comprised of a bunch of tables (kind of like a spreadsheet) that have fields (column names) and records (rows of data). For example, our database might have a table called students that looks like this:

id first last grade GPA
1 Al Einstein 6 2.7
2 Annie Cannon 6 3.8
3 Artie Eddington 8 3.2
4 Carlie Herschel 8 3.2

So in our students table, the fields are [id, first, last, grade, GPA], and there are a total of four records, each with a required yet arbitrary id in the first column.

To pull these records out, we tell SQL to SELECT values for the following fields FROM a certain table. In SQL this looks like:

In [1]: db.execute("SELECT id, first, last, grade, GPA FROM students").fetchall()
Out[1]: [(1,'Al','Einstein',6,2.7),(2,'Annie','Cannon',6,3.8),(3,'Artie','Eddington',8,3.2),(4,'Carlie','Herschel',8,3.2)]

Or equivalently, we can just use a wildcard “*” if we want to return all fields with the SQL query "SELECT * FROM students".

We can modify our SQL query to change the order of fields or only return certain ones as well. For example:

In [2]: db.execute("SELECT last, first, GPA FROM students").fetchall()
Out[1]: [('Einstein','Al',2.7),('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Now that we know how to get records from tables, we can restrict which records it returns with the WHERE statement:

In [3]: db.execute("SELECT last, first, GPA FROM students WHERE GPA>3.1").fetchall()
Out[3]: [('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Notice the first student had a GPA less than 3.1 so he was omitted from the result.

Now let’s say we have a second table called quizzes which is a table of every quiz grade for all students that looks like this:

id student_id quiz_number score
1 1 3 89
2 2 3 96
3 3 3 94
4 4 3 92
5 1 4 78
6 3 4 88
7 4 4 91

Now if we want to see only Al’s grades, we have to JOIN the tables ON some condition. In this case, we want to tell SQL that the student_id (not the id) in the quizzes table should match the id in the students table (since only those grades are Al’s). This looks like:

In [4]: db.execute("SELECT quizzes.quiz_number, quizzes.score FROM quizzes JOIN students ON students.id=quizzes.student_id WHERE students.last='Einstein'").fetchall()
Out[4]: [(3,89),(4,78)]

So students.id=quizzes.student_id associates each quiz with a student from the students table and students.last='Einstein' specifies that we only want the grades from the student with last name Einstein.

Similarly, we can see who scored 90 or greater on which quiz with:

In [5]: db.execute("SELECT students.last, quizzes.quiz_number, quizzes.score FROM quizzes JOIN students ON students.id=quizzes.student_id WHERE quizzes.score>=90").fetchall()
Out[5]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]

That’s it! We can JOIN as many tables as we want with as many restrictions we need to pull out data in the desired form.

This is powerful, but the queries can become lengthy. A slight shortcut is to use the AS statement to assign a table to a variable (e.g. students => s, quizzes => q) like such:

In [6]: db.execute("SELECT s.last, q.quiz_number, q.score FROM quizzes AS q JOIN students AS s ON s.id=q.student_id WHERE q.score>=90").fetchall()
Out[6]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]