The UC Publication Management System is a systemwide implementation of the Current Research Information System (CRIS) Symplectic Elements. The system provides two ways to interface with its data beyond its standard web interface: a REST API, and a MSSQL Database (the Reporting Database). Some campuses have already made use of these resources. This document is the California Digital Library's quick start-up guide for using these two systems. It is intended to accompany Symplectic's documentation on the API and Reporting Database. To get access to CDL's mirror of Symplectic's documentation, please contact the California Digital Library.
Key differences between…
☎️ The API
Read and write (write permissions must be enabled by CDL)
Using the API is the only way to create a program that adds or modifies data.
The API interfaces directly with the live data on the UCPMS – ergo, extreme caution must be exercised when altering data!
The data available is more limited than the Reporting DB, and the operations available for its access & transformation are limited to the API’s existing functions.
Format:
A mostly-standard REST interface,
returns results in ATOM+XML with a defined namespace,
And takes XML bodies for modifying records.
? The Reporting Database
Read-only
The large amount of data available, and the flexibility of SQL make the Reporting DB ideal for data analytics and other large-scale queries.
It’s a mirror of most of the data in the UCPMS; some data (e.g. publications, users) are refreshed overnight; the full database is refreshed once a week (Sunday evening).
Format: MSSQL Server 2019, Std. Edition
Feature | API | Reporting Database |
---|---|---|
Tooling | REST API, accepts and returns ATOM+XML (+ namespace) | MSSQL |
IO | Read & Write | Read-only |
Access requirements |
|
|
Data Freshness | Live | Refreshes weekly |
Amount of data accessible | Medium-small | Almost everything |
Amount of data returned | Paginated. Default 25, up to 100. | Returns everything, unless specified with a LIMIT clause. |
Combining data from different tables / objects | Hard: Typically has to be done with multiple queries + combining the query results in code | Easy |
How are the two related?
It’s complicated!
The API typically delivers a subset of fields found on various similarly-named tables on the Reporting DB. (e.g.: the API’s {{endpoint}}/users request displays items from the Reporting DB’s [Users] table.)
However… expect to see some slight differences between the names of the Reporting DB’s fields and the API’s various XML objects and attributes. (e.g., The API’s user record attribute proprietary-id is stored in the Reporting DB as [User Record].[Proprietary ID])
Keep in mind that modifications made via the API are instantaneous, and there may be a delay before they’re reflected in the Reporting DB.
Food for thought
Parsing the XML results from the API can be a little bit tricky, and unlike the Reporting DB, you may need to hit the API with sequential requests to gather all the info you need from various tables. There are some circumstances where it may be easier to get various info from the Reporting DB first, then send an API request.
However, when using this process, make sure the info you’re pulling from the Reporting DB are items which will not have changed between now and the last refresh. An example of a “safe” value is the [User Record].[Proprietary ID] field for items coming from [Data Source] = ‘manual’ – these IDs are created once and never change.
☎️ The API
Getting set up
Contact CDL letting us know you want API access.
We’ll need to know your IVP4 IP ADDRESS (or range of addresses via CIDR), so we can have our vendor add it to the allow-list.
CDL will provide you with an API CLIENT ID and a SECRET to the QA UCPMS's API, as well as access to the full range of API documentation.
Production API credentials are provided after your integration is ready to launch.The API endpoints are:
- QA:
- v5.5: https://qa-oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v5.5/
- v6.13: https://qa-oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v6.13/
PRODUCTION
v5.5 only: https://oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v5.5/
- QA:
ProTip! After your IP is allow-listed, check that your config is correct by sending a GET request to the relevant endpoint; look for a 200-coded response.
Postman collection & XSD schema
Symplectic has provided a Postman collection of API calls, which is handy for smaller queries and demonstrates the available functionality in the API:
As well as the XSD for each version of the API:
When you open the collection in Postman, click on the main “Elements API” in the left menu. From here, click on the “Variables” tab, and enter your username, password, and the API endpoint. Henceforth, all requests sent from this collection will use these variables.
If you’re unfamiliar with how Postman works, a good place to start is in the “Examples” folder on the left-hand menu. The “Get profile data for users of a group,” contains a 4-step process, demonstrating different request syntaxes.
Working with it
The Basic Idea
It uses REST syntax, and operates with ATOM+XML and a defined namespace.
It returns paginated results. The default is 25, but you can append ?per-page=### to the end of your query to change this.
When modifying records, you need to supply correctly-formatted XML in the requests body, and add an additional “Content-Type” : “text/xml” header.
Many operations require specifying a CATEGORY, DATA SOURCE, and either an ID or a PROPRIETARY ID. Let’s take a look at the following:
{{endpoint}}/{{cat}}/records/{{data source}}/{{proprietary id}}
{{cat}}: Categories include: user, publication, grant – see “list of resources and operations” here.
{{data source}}: This is where 3rd-party integrations like Scopus and WoS specify who they are. In most cases, you'll want to use “manual”
Warning: {{proprietary id}} ≠ {{id}}!! These are two separate values, and you may be required to specify one or the other by various API operations. (The “id” is set by the HR feed, the PID by Elements’ user record system) For example:
Send a {{endpoint}}/users query. (This will return all the users, paginated)
Check out the returned XML objects. Check out the users’ <object> attributes: <api:object category="user" id="279757" proprietary-id="[email protected]" authenticating-authority="UCTRUST" username="[email protected]"... >
So… you can get different XML results about me (depending on what fields you need) by using two different queries:
{{endpoint}}/users/279757
{{endpoint}}/[email protected]
Bonus! You can return the fields from the first query using the proprietary ID with this query: {{endpoint}}/users/[email protected]
Tooling
The Postman collection (see above) is an excellent tool:
It provides a nice overview of which operations are available by operation type (GET, PUT, POST, PATCH, etc), along with in-program documentation.
You can check the provided body XMLs for the PUT and PATCH operations. Handy!
It’s a great way to trial & error various request syntaxes.
If you know how to encode your auth (username, password) into the URL, you can hit the endpoint with a web browser.
Curl also works.
See below for Python.
ProTip: We HIGHLY RECOMMEND reading the documentation (learn more at the end of this article). We found this API somewhat confusing, there’re lots of odd details in how it operates, what can and cannot be done, etc.
Building a Python program
You’ll need a library for sending HTTP requests.
We use the 3rd-party Python library “requests”.
If you’re planning on modifying data, you’ll need to be sending correctly-formatted XML in the request bodies.
We use Python’s built-in ElementTree for this.
ProTip: Symplectic highly recommends using an XML library rather than DIYing it with string concatenation: Apparently XML uses some oldschool UTF encoding that could throw errors if certain characters are not properly converted – libraries will handle this for you.
When sending xml bodies to the API, you’ll also need to attach a Content-Type=text/xml header to the http request.
Symplectic mentions it’s a good idea to throttle your requests if (for example) you’re looping through a bunch of user record updates.
? The Reporting Database
Getting set up
- Contact CDL letting us know you want Reporting Database access.
We’ll need to know your IVP4 IP ADDRESS (or range of addresses via CIDR), so we can have our vendor add it to the allow-list.
CDL will respond providing Reporting Database credentials and server information.
Depending on your preference, you can initiate the tunnel via your terminal, or using your favorite MSSQL client software (we use the open source DBeaver).
Working with it
The Basic Idea
Working with the Reporting DB is more straightforward than the API: It’s just an SQL db!… but on the other hand, it’s huge, structurally complex, and poorly documented.
Warning RE: poorly-documented: Particularly when using VIEWS, it’s a good idea to do some ground-truthing with the Reporting DB vs what you’re seeing in the UI. We’ve had instances where views were being populated in slightly unexpected ways, leading us to be missing various edge-cases.
Most tables have a PRIMARY KEY, which will be named [ID]
ProTip: However, most tables don’t utilize FOREIGN KEYS constraints.
Don’t forget to check the VIEWS! There are plenty.
Many of the views are designed to show RELATIONSHIPS between tables, i.e. [PUBLICATION USER RELATIONSHIP] or [GRANT USER RELATIONSHIP]
ProTip: Under the hood, these views are often referencing the gigantic [LINK] table, which lists exactly two objects (of any “category”) which Elements – or its 3rd-party integrations like Scopus, etc – have “suggested” may be related, along with information about this “suggested” link (i.e., if, and by whom, the link was accepted/rejected, etc)
Warning: Some of these tables – like [User] and [Publication] – are absolutely gigantic.
When creating multi-table joins, select only the required fields.
Uncorrelated subqueries can explode your query runtime.
We often utilize multiple CTEs per query, in order to make more reasonably-sized tables, which we then judiciously join in the final query.
Tooling
Any MSSQL client should work fine for this. We are fans of DBeaver, an OS program which includes a free Community version and a paid Pro version.
ProTip: DBeaver’s DB connections have optional SSH tunnels you can implement. Handy!
See below for Python.
Building a Python program
Python: We use the 3rd-party library library pyodbc and sshtunnel.
Warning: Some of the other 3rd-party MSSQL libraries we used had inconsistent support across Apple Silicon machines.
We typically use the following workflow:
Connect to the DB
Load SQL files from disk
Send them with pyodbc, and zip() the results to a dict
Then loop the dict & do whatever you like.
Further documentation and resources
The California Digital Library provide a mirror of all Symplectic technical documentation about the Reporting Database and API. Please contact the CDL to obtain access to the documentation.