A Software Ecosystem for Big Data Astronomy

I believe next-generation galaxy surveys demand the creation of high-quality software and centralized computing infrastructure. These tools should provide working astronomers the time to think more about the science they want to do and less about the details of how a large-scale analysis should be implemented. The technologies to build these kinds of tools exist throughout the private sector, but the knowledge and skills to piece them together into working systems is not generally prioritized in astronomy education.

I believe these skills are essential to the future of astronomy. Rather than trying to teach everyone to be a better engineer, we should prioritize supporting a smaller number of engineering-focused individuals who have the knowledge necessary to appreciate the unique challenges of working with astronomical datasets.

A selection of software I’ve written:

cosmap | sample the universe at scale

cosmap is a high-level tool for defining analyses that depend on repeatedly retrieving data from a large survey. Cosmap handles parameter management, data retrieval, dependency analysis, and results output. You get to focus on the astronomy.

github.com/patrickrwells/cosmap

heinlein | data management for the sky

heinlein accelerates queries on the sky. Quickly grab images, catalog data, and more from a specific location on the sky. heinlein understands the relationships between your data types and caches data so you can come back to it easily.

github.com/patrickrwells/heinlein

godata | file management for science

godata manages scientific data on disk so you don’t have to. Never worry about where your data is on disk or. Just add it to a project and grab it when you need it.

github.com/patrickrwells/godata

Languages and Technologies

Advanced Usage

Intermediate/Working Knowledge

Basic familiarity