UC Davis: Aggie Experts
The UC Davis Library's Aggie Experts is a database for the campus’ researchers currently in the pilot stage. This site allows researchers to locate other researchers, publications, and grant sources associated with the campus.
They use an ETL approach:
Data is pulled from the Elements API in batches;
converted from XML to JSON;
combined with campus-side data sources (covering grants, campus-specific researcher titles, etc.);
converted into the open-source VIVO linked data schema;
and stored in a Fuseki database;
the batch processes are run in Node.js.
Because researchers are already required to keep their publications up-to-date in Elements — claiming, rejecting, modifying metadata, etc. — the UCD system considers this publication data authoritative. They’re using the campus’ user data because its spec is more robust for, e.g., users with multiple titles.
At present, the site interacts with the Elements API in a read-only way, but the programmers expect to explore write options further down the line; They’re (understandably) concerned with the API’s lack of privilege-setting for write operations.
Heads up! Good things to know:
- The API’s use of Atom/XML can be difficult to work with:
- The queries often return unneeded elements, and unlike newer systems like GraphQL, these must be weeded out during the ETL process rather than at the point of query. Ergo, the transform steps in their system are rather complex.
- The max results per page is 25, so many queries are needed for large data sets.
- The API’s data scope and functions are sometimes frustratingly limited. For example, you can’t “select 3 users and show their [modified when] field” — you can select all of the data from the 3 users, or all of the users’ [modified when] fields… but not both simultaneously.
CDL: UCPMS notification tool
The California Digital Library maintain a tool which sends biweekly emails to UC faculty and employees when they have pending publications identified as new since the previous notification to verify and deposit to eScholarship in accordance with UC’s open access policies.
The notification tool makes extensive use of the Reporting Database. It gathers a user’s basic information (name, email addresses, campus affiliation, etc.), along with information about that user’s publications (titles, IDs, claim status, etc.). Once the data are gathered, it is processed in Python and fed into an HTML template system. The tool send around 6,000 emails on a typical notification week.
Heads up! Good things to know:
The Reporting DB is not well-documented. Expect to spend some time spelunking to locate the tables, views, and fields you need.
The breadth of the reporting DB and the amount of data the UC system has often necessitates joining many different tables.
We often rely on CTEs to select only the required fields from large tables with many fields ([User] or [Publication]).
Massive runtime increases may result from using uncorrelated or complex subqueries.