Harvesting Data into Mint

CSV Harvesting Technical Information

CSV Format

CSV files should adhere to the RFC4180 standard

CSV Filename

The harvester utilises the CSV filename to determine the type of record by the filename. For example, to harvest party records, the filename must be Party_People.csv. Any other filename will cause the harvester to fail as the record type will not be recognised. The filenames required or each record type is outlined below in the record details section.

Adding and updating records

The Mint harvester requires utilises an identifier column to identify whether a record already exists in storage and requires updating or is new and requires adding. As this is column is being used an identifier, there is a requirement that each record is unique. The field can be a string of any format though numeric identifiers are often used.

Deleting a record

You cannot delete a record in Mint via the CSV harvest. This rule is implemented for the following reasons:

  • In some situations, the data feed from the central system (source of truth) is not reliable and records go "missing". For example, the HR system or research management system may not publish information about staff that left.
  • Mint often creates additional metadata as part of the curation model (see Curating Linked Data) and this would be lost if you delete the record. For example, the person has an NLA identifier and links to a ReDBox record
  • There's a hidden benefit - your CSV file needs only contain records that require updating/inserting.

As Mint is in charge of publishing records to services (such as the National Libraries Australia and Research Data Australia), it isn't appropriate for these records to just disappear. Once they're in Mint they need to be treated with curatorial gloves. It is however possible to delete a record via the Mint user interface by logging in as an administrator. This is of course strongly discouraged and you must make sure that you aren't breaking any relationships with other objects (in Mint or ReDBox) or feeds to other services.

Record types and CSV schemas

Parties

Parties data is used to record researcher information.

Information:

CSV Fields

MandatoryField nameDescriptionExample
*idThe institution's ID - likely to be a staff, student or researcher ID00007429
*Given_NameThe first given nameMichael

Other_NamesAny intermediary (middle) namesAlfred

Family_NameOften referred to as surnameJones

Pref_NameThe preferred name, usually a single nameMick

HonorificA name prefixMrs, Dr, Prof

EmailThe person's email addressmick@staff.edu.au

Job_TitleThe primary job title for the personFacility Director

GroupID_1Links to Party (Group) records. This needs to match an ID in the Parties-Group CSV

GroupID_2


GroupID_3


ANZSRC_FOR_1The 2, 4 or 6 digit Field of Research code as designated by the Australian Bureau of Statistics under ANZSRC (2008)04, 0101, 070201

ANZSRC_FOR_2


ANZSRC_FOR_3


NLA_Party_IdentifierA National Library of Australia party identifierhttp://nla.gov.au/nla.party-461793

ResearcherIDA Thomson Reuters ID created using the http://www.researchid.com servicehttp://www.researcherid.com/rid/F-3500-2011

Personal_HomepageThe researcher's personal websitehttp://www.me.example.com/

Staff_Profile_HomepageThe researcher's webpage within the institutional web presencehttp://staffprofiles.edu.au/mjones

DescriptionA brief bio or overview of the researcherMike has been investigating the effects of loud noises.

Groups

Groups are any organisational unit that relates to research. This could be formal groups such as faculties or informal such as an arts collective. Typically, this data is provided in a hierarchical model. For example, the University is named as the top-level group with faculties and institutes below and then schools below them: 

Information:

CSV Fields

MandatoryField nameDescriptionNoteExamples
*idAn identifier for the groupThis is linked to via the GroupID_1 field in the Party data or the Parent_Group_ID field in the Groups dataFoS
*NameThe name of the group
Faculty of Science

EmailA contact email address for the group
fos@uni.edu.au

PhoneA phone number for contacting the group
07 3456 7654

Parent_Group_ID
Link to the parent groupUoE

HomepageThe group's website
http://www.fos.uni.edu.au/

DescriptionA brief description of the group

Activities

Activities data is used to record research activities that are not funded via the ARC or the NHMRC.

Information:

CSV Fields

MandatoryField nameDescription
*idAn identifier for the activity

TitleThe title of the activity

NameThe name of the activity

TypeThe type of activity as described by the ANDS Content Providers Guide

Existence_StartThe year in which the activity started

Existence_EndThe year in which the activity completed

DescriptionA description of the activity

Primary_Investigator_IDThe ID of the PI (links to the Parties-People data)

InvestigatorsA semi-colon (;) separated list of IDs for other researchers related to the activity. These IDs must match records in the Parties-People data

WebsiteA website for the activity

ANZSRC_FOR_1The 2, 4 or 6 digit Field of Research code as designated by the Australian Bureau of Statistics under ANZSRC (2008)

ANZSRC_FOR_2

ANZSRC_FOR_3

Pre-loaded data

As well as the following data sets that contain institution specific there are other vocabularies that Mint stores. These are either:

  • Standard vocabularies used by the ReDBox to enhance it's forms (e.g. language codes)
  • National standardised data that is relevant for all Australian institutions (e.g. ARC and NHMRC grant data)

The complete list of data pre-loaded is as follows:

  1. ANZSRC FOR and SEO codes
  2. Language codes
  3. Geo-spatial information (used by the map widget)
  4. MARC Country codes
  5. ARC and NHMRC Grant data