Welcome to Titanpad! General Meeting Information * This Titan Pad * Previous Meeting *

Welcome to TitanPad!
==General Meeting Information==
* [ This Titan Pad]
* [ Previous Meeting]
* Call-in information
* [ Next Meeting]
==Agenda==
==Attendance==
*
==Past Action Items==
Resource Manager Group:
* Linyun Fu
* Ping Wang
* Hang Wang
* Weijing Chen
* Qi Pan
Citizen Science Group:
* Linyun Fu
* Yue (Robin) Liu
* Bassem Makni
* Amar Viswanathan
* Yu Chen
======
Notes on data sources for SWQP from Sky - 3/26
Characterizing impaired waters
The US EPA has a program called Watershed Assessment, Tracking & Environmental ResultS (WATERS) that provides access to information on some of the pertinent regulatory activities under the Clean Water Act (CWA). There are a couple of regulatory actions that we probably want to key in on in particular to come up with a way of characterizing potential threats to fish and wildlife health:
303(d) - The EPA sets Total Maximum Daily Load (TMDL) levels for water systems around the country that specify the amount of various types of contaminants that can be found in waterways from whatever sources are contributing. States conduct assessment and monitoring programs to examine the watersheds under their jurisdictions and report to the EPA. All of this information eventually ends up in databases, which ultimately result in potential regulatory action.
305(b) - The nation's waterways have the concept of designated uses (e.g., drinking water supply, fish and wildlife habitat, etc.). The EPA builds and maintains a National Assessment Database that contains information on how those designated uses are being met over time through assessments that occur every two years (the last was 2010). This information all works its way into a National Water Quality Inventory Report to Congress (see
There are a number of ways of getting to structured data that could be used to characterize watersheds in relation to water quality. The WATERS "Expert Query" tools provide some ways of searching through a user interface to return tables of summary information by region and state - Information on impaired watersheds has been used to create atlas type map products like the one here showing HUC-8 watersheds and their relative impairment status across the nation -
*deborah thinks this works as a demo
If you follow something like the 305(b) query tool down, you ultimately end up with a record like the following: It shows a single report for an assessment conducted in 2010 on a particular stream reach that was flagged as impaired water for the designated use, "Habitat For Fish, Other Aquatic Life And Wildlife." All of these individual records of some form of impairment can get rolled together to form something like the above atlas example to provide a national view of watersheds and one view of their relative health for some particular purpose.
The problem with all of this is that there is a whole lot of data and some relatively arcane ways of getting at the information. The data have often been put together and released with some particular use in mind, which has resulted in data formats and release mechanisms that may not be all that conducive to unanticipated good uses.
The most flexible way I've seen of getting at a base data source that might prove useful is through the collection of WATERS Web, Mapping, and Database Services - The getEntitiesByHuc operation ( is probably one that we could work with for this project or one of the other spatial services end points that would provide access to the water_entity information described in the various pages for the WATERS services that will provide the assessment state information discussed earlier (good, impaired, etc.).
At the very least, we might be able to add a new query feature to the SWQP based on the getEntitiesByLatLong service ( that would introduce the dimension of water quality assessment entities in relation to the facility reports that are in the portal now. The data tech question might be on whether the SOAP web service end point for the data is too heavy to be practicable in returning and processing results rapidly enough to be workable or if some form of background data caching and alternate storage mechanism (e.g., RDF) might make it more workable.
The other thing the WATERS services should get us for at least some areas are the "water highlights" found here - Among that information should be some TMDL characteristics from the 303(d) discussion above, which should be at least some pointers to specific contaminants (although the state of the nation's assessment on TMDL data has always been more scant that we could wish).
Characterizing aquatic biodiversity (biological species and habitat)
I've done some digging around to see what we might have in the way of reasonably usable data sources for characterizing the types of aquatic species that might be found in areas of impaired water quality. There are a number of potential ways to go about this and a variety of interesting questions we might pose, but I'm trying to come up with a way to make it as easy and attainable as possible.
I believe that aquatic GAP (USGS Gap Analysis Program for aquatic species) might be the cleanest and easiest source to use that would provide data at a fairly large scope. You can interact with the aquatic GAP data through the viewer application here -
The whole US is not yet covered with Aquatic GAP data, but you'll see some areas highlighted when you first hit the map. If you zoom in on the map to an area of interest within one of those areas and click on a lake or stream, you'll get a list of fish species recorded or modeled in that area. What I'm looking into today is some way that we can extract information from the underlying Aquatic GAP database to make a meaningful mashup with the impaired waters data. I'm going to consult with one of the biologists on the GAP project today to see if there is a background service somewhere we could hit for that purpose. The current publicly available GAP services won't get us to the type of queries we'd want to run to extract structured data in a way that would be conducive for this project.
So what does all this tell us?
If we can put these pieces together of waters that are impaired for the designated purposes of "Habitat For Fish, Other Aquatic Life And Wildlife" with known habitat for particular species, we'll be able to say something with a map about potential risks to wildlife. That in itself might help answer a question like the following line of inquiry:
I live 'here' in the United States. That means I live in a particular watershed, a place where surface water (and ground water for that matter) flows from one place to another. I might have a stream or river along with various catchment (lakes, ponds, man-made or not, etc.) Catchment:The action of collecting water, esp. the collection of rainfall over a natural drainage area.
in my area that provide habitat for fish and wildlife. State and Federal assessments have indicated that some of the waters in my area are impaired in some way, meaning they are not 'living' up to their full potential to serve the needs of fish and wildlife. Some other information tells me that 'these' particular species of fish are probably found in those same waters, so those are the species I might be concerned about.
This doesn't yet tell us much about exactly what the threats might be in terms of specific contaminants and their effects. However, it's not a bad start to begin getting these data together and examining data formats and potential caching/data transformation methods to put all the pieces together. If the WATERS service does start providing some TMDL information, we will have information on watersheds where particular contaminants have exceeded what the EPA has set as threshold values above which health effects may occur. We may be able to use that information to start exploring more specific questions about how different types of contaminants may be impacting particular species or groups of species.
Notes about provenance
When putting together a mashup type of application like the SWQP, I think it is very important to be able to portray where all the data have come from and whether anything has been done to those data to get them into whatever visualization, exploratory tool, or derivative service is being produced. In examining the current SWQP, I looked at the following example of a facility:
ARKWRIGHT ADVANCED COATING, INC.
I see that there are reports of pH and Oil & Grease exceedences, and the provenance information behind those distinct values is really pretty excellent. The information points to where the data were retrieved from an EPA source, when that happened, who did it, and what sorts of conversions were done on the information. I think the only things I would add to this example would be the following:
1) Provide a higher level characterization of the source (possibly accessed via a ? on the facility name itself in the map info window) that would show how the site came to be in the SWQP system.
2) Work on a more simple provenance statement about the data points being presented that is probably a simple distillation of the reasoning behind the factors that have that point included.
I just got a new pointer to data on fish species indexed by HUC8. This comes from the National Fisheries Data Infrastructure (NFDI). You can query NFDI via the following link:

It lets you look for particular Hydrologic Units or you can pick all available and get back a list of species. You can download what you get back as a text file.
==Action Items==
* Widelife Health Group: look into WATERS
* we will need to decide if in the course of this project we can connect to WATERS or if we include a future plan to connect to WATERS. Deborah suggests that Ping and someone else in the group take this as a task
* sky to validate the best available epa information on threshold information
* sky contact out to sue ellis in fish and wildlife service at environmental contaminants program
todo sky will check in to see if we can get focus (from sue ellis in fish and wildlife on questions related to impacts (currently on line 118)
* sky will choose one or two geographic areas that are data rich and of interest
==Notes==
* Wildlife Health Project
** Today's presentation:
** Sky Highlights from line 66:
** Downloadable data with Hydrological unit code from NFDI site
** a lot of fishery data are from catchments from rivers, streams
** clean water act: 303(d) 305(b): information about streams, lakes, ponds, clean water
sky mentions that sometimes there are conflicting state and federal regulations. from deborah - we should see if we can use a reasoner to help determine these conflicts and provenance to determine what the reasoning relied on
contaminant assessment report -
there were different pieces in this
1 - known information
2 - different regulations (federal state )
3 - scientific literature
come from state and federal agencies who are conducting their assessments (including monitoring)
these reports may be considered grey literature
bureau of land management for example
semantic reasoing engines tell there is somethere wrong there
what are the impacts, how bad they are, how to go and dealing with them
contact out to sue ellis in fish and wildlife service at environmental contaminants program
todo sky will check in to see if we can get focus (from sue ellis in fish and wildlife on questions related to impacts (currently on line 118)
todo make sure the EPA thresholds on contaminants are available
species threshold values are based on individual research activities, maybe not comprehensive, not definite impacts, just recommendations can be derived
EPA threasholds: the value of concentration of some pollutants found in fish issue
USGS has data service for animal species
Reason over the data and threasholds to find out water pollution, visualize with heap map
Not only is the water imparied, there is also species in the water,
we know the health effect of the pollutant, then infer how the pollution impacts the species
get percentage from Huc query, and visualize the percentage via heat map
Heat map shows hot spots where are more problems
1. update existing display
2. heat map
focus on one or two geographical regions for this class
look at a map at us that has hydrologic areas, indexed to provide a heatmap of US that (example atlas of waters in the US).
higher percentage of impaired waters (WATERS gives this percentage)
now bring in the species occurrence that might be impacted by that contamination
Ping's presentation -
Sky mentions possible connection with BISON project - they woudl be willing to create a SPARQL endpoint, but need assistance
what is the next step to move that forward?
Han
D3 + SVG and Google Maps to visualize countries that have ?? fish [animal distribution?) - Han prefers Google Maps
Regulations on wildlife health demonstrated
The tightest regulation
compare different types of regulation is valuable
todo connect animal distribution data with violation data
Sky: the dimension of watersheds
individual watersheds, monitoring entire watersheds
species found in watersheds,
Action: find the watersheds in 1km in the polluting source
Bridge: the health effects of pollutant on species
the species are senstive to which pollutant
endangered species data are fuzzy, broad
no point data, based on habitat
todo write down next steps (and what i was thinking of was next steps for this week but it could be next steps past the next week)
todo Sky write down most important/useful next steps
NFDI data might be good, hasn't looked into it yet
Ping mentions implementing proximity search to connect species distribution with pollution violation
* Next steps -- resource manager group -
Focus currently is on Water Quality:
focus on one or two geographical regions for this class - that is "interesting"
Functionalities we plan to work on in the next week or two:
* the user select a species of concern (e.g. Pacific Cod), the portal visualizes the distribtuion data at the level of county or watersheds in heat map, next the portal identifies polluting water sources near the watersheds where the selected species are distributed
* the user select a region of concern, the porta identifies and visualizes polluting water sources, and also report the species that might be affected by the pollution
** Linyun: work with Han on the map interface connecting animal (fish) distribution data and pollution violation data
*** HUC (hydrological unit code) identifies water bodies
*** We need to figure out whether the polygon shape corresponding to each HUC exists
*** find geological regions with the most interesting data to showcase
** Ping:
*** explore the data entries provided by Sky, pay attention to distribution at the level of watershed
*** expect EPA thresholds for species from Sky
*** ask Sky if he could give us some pointers to health effects of water pollution on species
*** summarize the next steps for the project
*** provides Qi with csv files to convert
*** read the ontology paper that Deborah is going to send us
** Han: Figure out what data sources will be used by the project
*** Explore data sources such as NFDI about fish distribution on hydrologic unit level
**** How can we use it?
**** Is it out of scope for this course project to visualize these data?
*** Look at WATERS about impaired water data
**** Are they compatible with our current water portal data?
** Weijing : Implement a Sparql Query at to meet the needs of Ping.

end point:
named graph: <
** Qi: get.csv files from Ping, and convert them to RDF. Also do enhancement works.
* Next steps -- citizen scientist group -
Building out in:
* The User interface that the citizen scientist would require to input data.
* The SADI services that provide the semantic capabilities of user interface
** Linyun:
*** work with Yu to solve SADI related problems Yu is fighting with
*** work with Yu to design an ontology to accommodate the uploaded reporting forms
** Robin: Come up with a basic UI for future semantic technology integration
** Bassem:
*** Work on Ushahidi code and try to plug some semantic functionalities, so I will change the code so Ushahidi communicates with a sparql endpoint or a sadi service.
*** Work with Amar on the UI.
** Yu: Figured out the use case ontology for Eutrophication event. Next test with UI group to integrate two parts together
*** For the SADI service OWL definitions are needed for Input and Output
**** Input OWL
***** Water color, smell, plankton density, oxygen level, nitrogen level, phophurus level, dead fish observed floating, etc
**** Output OWL
***** Possibility index of the eutrophication
**** Operation OWL
***** Statistical operation
***** Reasoning operation
Citizen Scientist Group project page:
UI sub-group presentation:
* provenance aspect of the project -- user account support, data from citizens / organizations
* Wine Agent does not help the project much
* ushahidi, pachube and phonegap
* Eutrophication discovery based on SADI service
Excessive richness of nutrients in a lake or other body of water, frequently due to runoff from the land, which causes a dense growth of...
!!!both groups need to:
figure out values of semantic technologies and provenance, and start project demo pages (including static pictures and a link to the live demo)!!!
Read Deborah's Ontology comes of Age paper (she will send the link/paper)
Example for static demo:
Bassem: The semantic technologies and provenance tracking benefits in the Citizen Group project:
- Tracking the provenance information from the citizens scientists, attribute a user expertise level according to the user inputs and their accordance to the organisations data.
- The UI interface may be adapted according to basic reasoning about the water quality, so we may ask for further information or disable some inputs if their values are unnecessary according to previous entered values.
Joanne: Give ordinary users textual or image-based explanations on scores for water body quality (e.g., what does a score of 3 mean)
Amar : First task is to finish the UI and also connect it with the backend service. After this the semantic capabilities of the UI and the Citizen Scientist group has to be explored and added to the application.
Semantic and Provenance -- resource manager group
1. Use ontology to connect two types of environmental knowledge
encode the information about the health effects of pollutants on species with a health ontology and use the health ontology to connect the water polluiton events with species distribution
2. Provide transparency by capturing provenance, and exposing provenance so that the resource managers can decide the trustness of results from the portal
colabrate with bison
provide sparql end point directly