UCPMS API & Reporting Database: Quickstart guide

The UC Publication Management System is a systemwide implementation of the Current Research Information System (CRIS) Symplectic Elements. The system provides two ways to interface with its data beyond its standard web interface: a REST API, and a MSSQL Database (the Reporting Database). Some campuses have already made use of these resources. This document is the California Digital Library's quick start-up guide for using these two systems. It is intended to accompany Symplectic's documentation on the API and Reporting Database. To get access to CDL's mirror of Symplectic's documentation, please contact the California Digital Library.


Key differences between…

☎️ The API

  • Read and write (write permissions must be enabled by CDL)

  • Using the API is the only way to create a program that adds or modifies data.

  • The API interfaces directly with the live data on the UCPMS – ergo, extreme caution must be exercised when altering data!

  • The data available is more limited than the Reporting DB, and the operations available for its access & transformation are limited to the API’s existing functions.

  • Format:

    • A mostly-standard REST interface,

    • returns results in ATOM+XML with a defined namespace,

    • And takes XML bodies for modifying records.

? The Reporting Database

  • Read-only

  • The large amount of data available, and the flexibility of SQL make the Reporting DB ideal for data analytics and other large-scale queries.

  • It’s a mirror of most of the data in the UCPMS; some data (e.g. publications, users) are refreshed overnight; the full database is refreshed once a week (Sunday evening).

  • Format: MSSQL Server 2019, Std. Edition

Feature

API

Reporting Database

Tooling

REST API, accepts and returns  ATOM+XML (+ namespace)

MSSQL

IO

Read & Write

Read-only

Access requirements

  • Username / password    

  • IVP4 address or range

  • Username / password

  • SSH tunneling

Data Freshness

Live

Refreshes weekly

Amount of data accessible

Medium-small

Almost everything

Amount of data returned

Paginated. Default 25, up to 100.

Returns everything, unless specified with a LIMIT clause.

Combining data from different tables / objects

Hard: Typically has to be done with multiple queries + combining the query results in code

Easy

How are the two related?

It’s complicated!

The API typically delivers a subset of fields found on various similarly-named tables on the Reporting DB. (e.g.: the API’s {{endpoint}}/users request displays items from the Reporting DB’s [Users] table.) 

However… expect to see some slight differences between the names of the Reporting DB’s fields and the API’s various XML objects and attributes. (e.g., The API’s user record attribute proprietary-id is stored in the Reporting DB as [User Record].[Proprietary ID])

Keep in mind that modifications made via the API are instantaneous, and there may be a delay before they’re reflected in the Reporting DB.

Food for thought

Parsing the XML results from the API can be a little bit tricky, and unlike the Reporting DB, you may need to hit the API with sequential requests to gather all the info you need from various tables. There are some circumstances where it may be easier to get various info from the Reporting DB first, then send an API request.

However, when using this process, make sure the info you’re pulling from the Reporting DB are items which will not have changed between now and the last refresh. An example of a “safe” value is the [User Record].[Proprietary ID] field for items coming from [Data Source] = ‘manual’ – these IDs are created once and never change.


☎️ The API

Getting set up

  1. Contact CDL letting us know you want API access.

    1. We’ll need to know your IVP4 IP ADDRESS (or range of addresses via CIDR), so we can have our vendor add it to the allow-list.

  2. CDL will provide you with an API CLIENT ID and a SECRET to the QA UCPMS's API, as well as access to the full range of API documentation.
    Production API credentials are provided after your integration is ready to launch.

  3. The API endpoints are:

    1. QA:
      • v5.5: https://qa-oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v5.5/
      • v6.13: https://qa-oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v6.13/
    2. PRODUCTION 

      • v5.5 only: https://oapolicy.universityofcalifornia.edu:8002/elements-secure-api/v5.5/

ProTip! After your IP is allow-listed, check that your config is correct by sending a GET request to the relevant endpoint; look for a 200-coded response.

Postman collection & XSD schema

Symplectic has provided a Postman collection of API calls, which is handy for smaller queries and demonstrates the available functionality in the API:

As well as the XSD for each version of the API: 

When you open the collection in Postman, click on the main “Elements API” in the left menu. From here, click on the “Variables” tab, and enter your username, password, and the API endpoint. Henceforth, all requests sent from this collection will use these variables.

0t9weSuKEY8z2EqgOz3GPxJYSeUrpyg67w

If you’re unfamiliar with how Postman works, a good place to start is in the “Examples” folder on the left-hand menu. The “Get profile data for users of a group,” contains a 4-step process, demonstrating different request syntaxes.

Working with it

The Basic Idea

  • It uses REST syntax, and operates with ATOM+XML and a defined namespace.

  • It returns paginated results. The default is 25, but you can append ?per-page=### to the end of your query to change this.

  • When modifying records, you need to supply correctly-formatted XML in the requests body, and add an additional “Content-Type” : “text/xml” header.

  • Many operations require specifying a CATEGORY, DATA SOURCE, and either an ID or a PROPRIETARY ID. Let’s take a look at the following: 

    {{endpoint}}/{{cat}}/records/{{data source}}/{{proprietary id}}
    • {{cat}}: Categories include: user, publication, grant – see “list of resources and operations” here.

    • {{data source}}: This is where 3rd-party integrations like Scopus and WoS specify who they are. In most cases, you'll want to use “manual”

    • Warning: {{proprietary id}}{{id}}!! These are two separate values, and you may be required to specify one or the other by various API operations. (The “id” is set by the HR feed, the PID by Elements’ user record system) For example:

      • Send a {{endpoint}}/users query. (This will return all the users, paginated)

  • Check out the returned XML objects. Check out the users’ <object> attributes: <api:object category="user" id="279757" proprietary-id="oapolicy@ucop.edu" authenticating-authority="UCTRUST" username="oapolicy@ucop.edu"... >

  • So… you can get different XML results about me (depending on what fields you need) by using two different queries:

  • {{endpoint}}/users/279757

  • {{endpoint}}/users?proprietary-id=oapolicy@ucop.edu

  • Bonus! You can return the fields from the first query using the proprietary ID with this query: {{endpoint}}/users/pid-oapolicy@ucop.edu

Tooling

  • The Postman collection (see above) is an excellent tool:

    • It provides a nice overview of which operations are available by operation type (GET, PUT, POST, PATCH, etc), along with in-program documentation.

    • You can check the provided body XMLs for the PUT and PATCH operations. Handy!

    • It’s a great way to trial & error various request syntaxes.

  • If you know how to encode your auth (username, password) into the URL, you can hit the endpoint with a web browser.

  • Curl also works.

  • See below for Python.

  • ProTip: We HIGHLY RECOMMEND reading the documentation (learn more at the end of this article). We found this API somewhat confusing, there’re lots of odd details in how it operates, what can and cannot be done, etc.

Building a Python program

  • You’ll need a library for sending HTTP requests.

    • We use the 3rd-party Python library “requests”.

  • If you’re planning on modifying data, you’ll need to be sending correctly-formatted XML in the request bodies.

    • We use Python’s built-in ElementTree for this.

      • ProTip: Symplectic highly recommends using an XML library rather than DIYing it with string concatenation: Apparently XML uses some oldschool UTF encoding that could throw errors if certain characters are not properly converted – libraries will handle this for you.

    • When sending xml bodies to the API, you’ll also need to attach a Content-Type=text/xml header to the http request.

    • Symplectic mentions it’s a good idea to throttle your requests if (for example) you’re looping through a bunch of user record updates.


? The Reporting Database

Getting set up

  • Contact CDL letting us know you want Reporting Database access.
    1. We’ll need to know your IVP4 IP ADDRESS (or range of addresses via CIDR), so we can have our vendor add it to the allow-list.

  • CDL will respond providing Reporting Database credentials and server information.

  • Depending on your preference, you can initiate the tunnel via your terminal, or using your favorite MSSQL client software (we use the open source DBeaver).

Working with it

The Basic Idea

  • Working with the Reporting DB is more straightforward than the API: It’s just an SQL db!… but on the other hand, it’s huge, structurally complex, and poorly documented.

    • Warning RE: poorly-documented: Particularly when using VIEWS, it’s a good idea to do some ground-truthing with the Reporting DB vs what you’re seeing in the UI. We’ve had instances where views were being populated in slightly unexpected ways, leading us to be missing various edge-cases.

  • Most tables have a PRIMARY KEY, which will be named [ID]

    • ProTip: However, most tables don’t utilize FOREIGN KEYS constraints.

  • Don’t forget to check the VIEWS! There are plenty.

  • Many of the views are designed to show RELATIONSHIPS between tables, i.e. [PUBLICATION USER RELATIONSHIP] or [GRANT USER RELATIONSHIP]

    • ProTip: Under the hood, these views are often referencing the gigantic [LINK] table, which lists exactly two objects (of any “category”) which Elements – or its 3rd-party integrations like Scopus, etc – have “suggested” may be related, along with information about this “suggested” link (i.e., if, and by whom, the link was accepted/rejected, etc)

  • Warning: Some of these tables – like [User] and [Publication] – are absolutely gigantic.

    • When creating multi-table joins, select only the required fields.

    • Uncorrelated subqueries can explode your query runtime.

    • We often utilize multiple CTEs per query, in order to make more reasonably-sized tables, which we then judiciously join in the final query.

Tooling

  • Any MSSQL client should work fine for this. We are fans of DBeaver, an OS program which includes a free Community version and a paid Pro version.

    • ProTip: DBeaver’s DB connections have optional SSH tunnels you can implement. Handy!

  • See below for Python.

Building a Python program

  • Python: We use the 3rd-party library library pyodbc and sshtunnel.

    • Warning: Some of the other 3rd-party MSSQL libraries we used had inconsistent support across Apple Silicon machines.

  • We typically use the following workflow:

    • Connect to the DB

    • Load SQL files from disk

    • Send them with pyodbc, and zip() the results to a dict

    • Then loop the dict & do whatever you like.


Further documentation and resources

The California Digital Library provide a mirror of all Symplectic technical documentation about the Reporting Database and API. Please contact the CDL to obtain access to the documentation.