Transformation YAML Reference

EDC Migrator uses YAML (Yet Another Markup Language) for the configuration files for mapping and transformation. The structure is similar to JSON.

There are many tools that you can use when writing YAML that can help keep you from making simple errors.

Required Files

You will need the following YAML files:

At least one header file (header.yml)
transformations.yaml
global.yaml
YAML files for individual datasets. This is typically one YAML per CSV file of data.

For your CSV files, you will need CSV files to create Casebooks (Subjects), Event Groups, and Events, and then another set of CSV files to create Forms, Item Groups, and Items.

Header Files

The header file is a dictionary that maps column headers found in extract sets to Veeva EDC Fields. The names of these fields aren’t literal field names (e.g. form__v.sequence__v), but tags that help process the mapping configuration file later on and provide some semantics for processing.

If necessary, you can use multiple header files. End each header file’s filename with “header.yml”. Then, specify the header file to use in each mapping file.

For columnSettings, each of these values maps to the column header for that value in the CSV. In the example below, the column containing the Study Name uses the header “project”, and the column containing the Site uses the header “SiteNumber”.

# Stops at Forms - items are flattened out and handled by individual configuration files

columnSettings:
  study: Project
  subject: Subject
  site: SiteNumber
  eventGroup: InstanceName
  eventGroupSeq: InstanceRepeatNumber
  eventGroupSequenceStartingNumber: InstanceRepeatStartingNumber
  event: FolderName
  form: DataPageName
  formSeq: PageRepeatNumber
  itemGroupSeq : RecordPosition
 
# Defaults for most forms - to create objects
defaultTargets:
  - EVENT_GROUP
  - EVENT
  - FORM
  - ITEM_GROUP
  - ITEM

# Identities: When Names aren't available in the source data, you can specify columns, text, and variables to use to create unique identifiers.

identities:
  casebook: [SITEMNEMONIC, SUBJECTNUMBERSTR]
  eventGroup: [SITEMNEMONIC, SUBJECTNUMBERSTR, VISITMNEMONIC, VISITINDEX]
  event: [SITEMNEMONIC, SUBJECTNUMBERSTR, VISITMNEMONIC, VISITINDEX]
  form: [SITEMNEMONIC, SUBJECTNUMBERSTR, VISITMNEMONIC, VISITINDEX, FORMMNEMONIC, FORMINDEX]
  itemGroup: [SITEMNEMONIC, SUBJECTNUMBERSTR, VISITMNEMONIC, VISITINDEX, FORMMNEMONIC, FORMINDEX, 'ITEMSETINDEX?']
  item: [SITEMNEMONIC, SUBJECTNUMBERSTR, VISITMNEMONIC, VISITINDEX, FORMMNEMONIC, FORMINDEX, 'ITEMSETINDEX?', $COLUMN_NAME]

columnSettings

Use columnSettings to specify which columns represent standard identifying values:

study
subject
site
eventGroup
eventGroupSeq
eventGroupSequenceStartingNumber
event
form
formSeq
itemGroupSeq

For each column, enter the Name (header) of the column containing that identifier in your CSV files. This must be an exact match.

The Sequence Number columns (eventGroupSeq, formSeq, and itemGroupSeq) are only required if your study design uses repeating Event Groups, Forms, or Item Groups. Use eventGroupSequenceStartingNumber only if you want to start at a number other than “1”.

defaultTargets

The Target is the object you’re creating with a CSV file. You can set defaults for a CSV file in defaultTargets. You can override the default in individual YAML files using targets. The following target objects are available:

CASEBOOK
EVENT_GROUP
EVENT
EVENT_DID_NOT_OCCUR
FORM
ITEM_GROUP
ITEM

For example, if you want to create Forms, you would need to include FORM, ITEM_GROUP, and ITEM in your defaultTargets.

identities

EDC Migrator uses “identities” to identify unique records to be migrated. An identity typically consists of one or more columns in the CSV file. These values often form a hierarchy (study → site → subject → eventGroup → event, etc.).

Identities & Default Targets: If an object isn’t in the defaultTargets, it is ignored. If you specify a configurable identity for an object, you must include it in the defaultTargets.

Casebook (subject), Event Group, Event, Form, Item Group, Item, Query, and Query Message.

Variables

EDC Migrator supports the following variables for use with configurable identity:

$COLUMN_NAME: The Name (header) of the column in the CSV file
$VEEVADEF: The Name of the Veeva Definition
$PARENTREF: The Name of the parent record

Literal Values

You can include literal values from input strings in your values. To do, use an equals sign (=) before the value, surrounded by single quotes. For example, '=EG' adds “EG” to the identity.

Optional

You can indicate that a column value is optional for the identity using a question mark (?): “ColumnName?”. When a column is optional, EDC Migrator won’t return a “missing column” error if that column is blank.

Transformation YAML Files

The following code samples can be used in your transformations.yaml file.

Boolean Conversion

- dataType: BOOLEAN
  name: YesNoBoolxformer
  default: true
  transformations:
     - { type: SEARCH_AND_REPLACE, pattern: 1, replace: true }
     - { type: SEARCH_AND_REPLACE, pattern: 0, replace: false }

Date Format

- dataType: DATE
  modifiers:
   - simple
  name: DateFormatter
  default: true
  transformations:
     - { type: DATE_FORMAT, inputDateFormat: "[dd MMM yyyy][d MMM yyyy]", outputDateFormat: dd-MMM-yyyy}

Leading Zero Removal

- dataType: INTEGER
  name: RemoveLeading0s
  default: true
  transformations:
     - { type: SEARCH_AND_REPLACE, pattern: '^0+(?!$)', replace: '' }

Set Form Status

- dataType: CODELIST
  default: false
  name: formStatusTransformer
  selectors:
   - formStatus
  transformations:
     - { type: SET_VALUE, value: SUBMITTED }

Time Prefix “T” Removal

- dataType: TIME
  name: RemovePrefixT
  default: true
  transformations:
     - { type: SEARCH_AND_REPLACE, pattern: T(.*), replace: $1 }
     - { type: SEARCH_AND_REPLACE, pattern: (.*):(.*):(.*), replace: $1:$2 }

Trailing Zero Removal

- dataType: FLOAT
  name: RemoveTrailing0s
  default: false
  transformations:
     - { type: SEARCH_AND_REPLACE, pattern: '(?<=.\d)0+$', replace: '' }

Unknown Date Handling

- dataType: DATE
  maskTemplate: FREE_MONTH_DAY
  name: UnknownDate
  default: false
  transformations:
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) JAN (.*)", replace: "$1 Jan $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) FEB (.*)", replace: "$1 Feb $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) MAR (.*)", replace: "$1 Mar $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) APR (.*)", replace: "$1 Apr $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) MAY (.*)", replace: "$1 May $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) JUN (.*)", replace: "$1 Jun $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) JUL (.*)", replace: "$1 Jul $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) AUG (.*)", replace: "$1 Aug $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) SEP (.*)", replace: "$1 Sep $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) OCT (.*)", replace: "$1 Oct $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) NOV (.*)", replace: "$1 Nov $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) DEC (.*)", replace: "$1 Dec $2" }
     - { type: SEARCH_AND_REPLACE, pattern: "(.{2}) (.{3}) (.*)", replace: "$1-$2-$3" }

Mapping YAML Files

Each mapping YAML file contains the following components:

Filename of the corresponding CSV file
Header file

Create Casebooks, Event Groups & Events

Typically, you’ll use a separate “form” (CSV file) to drive the creation of Casebooks (Subjects) and setting the Subject Status for those subjects. In the example below, the form indicates either the Screen Failure or Enrolled subjct statuses. This would be the first form that EDC Migrator would process and use to create Subjects.

file: ss002.csv
headerFile: header.yaml
 
targets: # Hint to migration utility that his form drives creation of subjects and casebooks - will be the first form we will process from the dataset
  - CASEBOOK
  - EVENT_GROUP
  - EVENT
  - FORM
  - ITEM_GROUP
  - ITEM
 
eventGroups:
  - map: { columnValue: SCREEN, veevaDef: Screening }
    events:
      - map: { columnValue: SCREEN, veevaDef: Screening }
        forms:
          - map: { columnValue: "Subject Status_002", veevaDef: SS002 }
            itemGroups:
              - map: { veevaDef: IG_SS002 }
                items:
                  - map: { columnName: SSSTAT_STD, veevaDef: SSSTAT_2 } # set status to in_screening in the first pass of creating subject

The CSV file for this example would need the following columns:

Project
Subject
SiteNumber
InstanceName
FolderName
DataPageName
SSTAT_STD

The first 6 columns are defined in the header.yml file. Then, we have the SSTAT_STD column, which is the Item value from the form.

Creating a Form, Item Groups & Items

Next, you can create the mapping YAML file for a given Form to define the Form, its Item Groups, and its Items.

This example defines a Demographics form, with the Creation Criteria item group and its Items. This form is in the Screen event group and event.

file: demographics.csv
header file: header.yaml
eventGroups:
 - map: { columnValue: Screen, veevaDef: SCR }
   events:
     - map: { columnValue: Screen, veevaDef: SCR }
       forms:
        - map: { columnValue: Demographics, veevaDef: DEMO }
          itemGroups:
            - map: { veevaDef: IG_CreationCriteria }
              items:
               - map: { columnName: INITIALS, veevaDef: INITIALS }
               - map: { columnName: AGE1, veevaDef: AGE }
               - map: { columnName: ETHNIC1, veevaDef: ETHNIC1 }
               - map: { columnName: GENDER1, veevaDef: GENDER1 }

The CSV file for this example would need the following columns:

Project
Subject
SiteNumber
InstanceName
FolderName
DataPageName
INITIALS
AGE1
ETHNIC1
GENDER1

The first 6 columns are defined in the header.yml file. Then, we have columns for the form’s 4 data items.

Map Data from a Single Row into Multiple Forms

If needed, you can map data from a single row into multiple Forms.

To do so, include multiple maps for forms. In the example below, the first map for Forms connects to the EDC form, DEMO, while the second map connects to the EDC form, DISC. Both maps use the same columnValue, “Demographics” in the Form column (DataPageName).

file: dm001-sample.csv
headerFile: header.yaml
 
eventGroups:
  - map: { columnValue: Screen, veevaDef: SCR }
    events:
      - map: { columnValue: Screen, veevaDef: SCR }
        forms:
         - map: { columnValue: Demographics, veevaDef: DEMO }
           itemGroups:
             - map: { veevaDef: IG_DEMO }
               items:
                 - map: { columnName: AGE1, veevaDef: AGE }
                 - map: { columnName: ETHNIC1, veevaDef: ETHNIC1 }
                 - map: { columnName: DMRACEAI1, veevaDef: DMRACEAS1 }
         - map: { columnValue: Demographics, veevaDef: DSIC }
           itemGroups:
             - map: { veevaDef: IG_DSIC }
               items:
                 - map: { columnName: RFICDTC1, veevaDef: ICDATE }

Creating Reusable Forms Using Components

You can create a Component to reuse a Form across multiple Events. Then, when you need to reference the Form, you can use include to map the form, instead of writing out the form map again.

The example below defines the VisualAcuity component, which contains the Visual Acuity_Distance form and its Item.

Then, the maps for Visit 5 and Early Exit use include: VisualAcuity. This adds the mapping defined in the component to both Events.

file: xa003.csv
headerFile: header.yaml
 
components:
  - name: VisualAcuity
    form:
      - map: { columnValue: "Visual Acuity_Distance", veevaDef: XA003 }
        itemGroups:
          - map: { veevaDef: IG_XA003 }
            items:
              - map: { columnName: XAORRES3, veevaDef: XAORRES3 }
 
eventGroups:
  - map: { columnValue: SCREEN, veevaDef: SCR }
    events:
      - map: { columnValue: SCREEN, veevaDef: SCR }
        forms:
          - include: VisualAcuity
  - map: { columnValue: "Visit 5 (1)", veevaDef: "V3A-5A" }
    events:
      - map: { columnValue: "Visit 5", veevaDef: v5A }
        forms:
          - include: VisualAcuity
  - map: { columnValue: "Early Exit (1)", veevaDef: EEAA }
    events:
      - map: { columnValue: "Early Exit", veevaDef: EARLY_EXIT }
        forms:
          - include: VisualAcuity

Define Repeating Event Groups, Forms & Item Groups

Repeating Event Groups, Forms, and Item Groups are used to collect the same set of data across multiple instances of the object. An instance of a repeating object can be identified by its Sequence Number.

Starting Sequence Number: If you want sequence numbers to start at a number other than 1, you can specify that number using EventGroupSequenceStartingNumber.

file: xa003.csv
headerFile: header.yaml
 
eventGroups:
  - map: { columnValue: SCREEN, veevaDef: SCR }
    repeating: true # define this event group to be repeating; the transform code will use sequence # in that case
    events:
      - map: { columnValue: SCREEN, veevaDef: SCR }
        forms:
         - map: { columnValue: "Ocular Info", veevaDef: OCUD001 }
           repeating: true # define this form to be repeating; the transform code will use sequence # in that case
           itemGroups:
           - map: { veevaDef: IG_OCUD }
             repeating: true # define a repeating item group; transform will use the sequence # to create/correlate item groups between source and target
             items:
               - map: { columnName: EYE, veevaDef: EYE }
               - map: { columnName: INFC, veevaDef: INFC_LVL }
               - map: { columnName: ASSESS, veevaDef: ASSESS_LVL }
           - map: { veevaDef: IG_TEST } # non repeating Item group
             items:
               - map: { columnName: TEST, veevaDef: TEST_DESC }

Define Default Data for Item Groups

Your Study may use codelist-type Items to automatically fill the Item with one of the Codelist Values for each permutation of Codelist Value or combination of Codelist Values. In EDC, these Items are read-only for data entry users, but other Items on the form allow data entry. Learn more about default data in Clinical Data Help

Your YAML needs to identify the target EDC Item Groups by correlating the default values in each Item in the CSV to find the matching Item Group in the EDC Form, as in this casem you can’t rely only on the item group’s Sequence Number.

To do this, include defaultData in your itemGroups mapping. For columnNames, specify the Codelists in the exact order specified in the default data configuration in Studio. Then, for columnValues, specify the Codelist Values in the exact order specified in Studio.

file: xa003.csv
headerFile: header.yaml
 
eventGroups:
  - map: { columnValue: SCREEN, veevaDef: SCR }
    events:
      - map: { columnValue: SCREEN, veevaDef: SCR }
        forms:
          - map: { columnValue: "Ocular Info", veevaDef: OCUD001 }
            itemGroups:
              - map: { veevaDef: IG_OCUD }
                defaultData:
                  - columnNames:  [XALOC3_STD, XATEST3_STD] # Specify codelists in the exact order specified when defaulting in studio - XALOC3 as first codelist, XATEST3 as the second codelist
                    columnValues: # Specify values in the exact order as chosen in Studio when selecting codelist items from XALOC3 and XATEST3 
                      - [OD, OS]
                      - [UNCORRECTED, BEST]
                    items:
                      - map: { columnName: XAORRES3_RAW, veevaDef: XAORRES3 }

Custom Permuation Definitions

By default, the default data permutation includes all possible combinations between the values of the two codelists. Your study may customize this by removing a permuation, changing the order of the permutations, or both. When you do this, you must change your mapping approach to derive the appropriate Sequence Number for your data row.

Below there are three values for QSSCAT3_STD and three for QSCAT3_STD but only 5 permutations total (not 9). We need to define what those permutations and do so in the order that they are defined in EDC. INSERTION/COMFORT will get item group sequence number 1, INSERTION/HANDLING gets 2, OVERALL/COMFORT get 3, and so on.

itemGroups:
  - map: { veevaDef: IG_QS003 }
    repeating: true
    defaultData:
      - columnNames: [ QSSCAT3_STD, QSCAT3_STD ]
        permutations:
          - [ INSERTION, COMFORT ]
          - [ INSERTION, HANDLING ]
          - [ OVERALL, COMFORT ]
          - [ OVERALL, VISION ]
          - [ REMOVAL, HANDLING ]
        items:
          - map: { columnName: QSORRES3_STD, veevaDef: QSORRES3 }

Mapping Additional Attributes

You can map the values for the source data verification (SDV) , Intentionally Left Blank (ILB), Did Not Occur (DNO)

Setting Did Not Occur for Events

You can use YAML to indicate that an Event did not occur and specify a reason.

file: ILJ466P003_visdat001.csv
headerFile: header.yaml
 
targets:
  - EVENT_GROUP
  - EVENT
  - EVENT_DID_NOT_OCCUR
 
components:
  - name: DidNotOccur
    didNotOccur: # if rows exist with either of these column values, we will consider those events to be in Did Not Occur state
      map: { columnName: SSSTAT_STD, columnValues: [ MISSED, DISCONTINUE ] }
      changeReason:
          columnName: SSSTAT
          columnValueDictionary: # map the text here to a cdms change reason - what about localization?
            - { columnValue: "Missed Visit - Subject Unavailable", veevaValue: "Subject Missing" }
            - { columnValue: "Subject Discontinued From Study At This Visit", veevaValue: "Terminated From Study" }
 
eventGroups:
  - map: { columnValue: Screen, veevaDef: Screening }
    events:
      - map: { columnValue: Screen, veevaDef: Screening }
        didNotOccur:
          include: DidNotOccur