YAML Builder

This topic describes the YAML Builder tool for EDC Migrator.

Prerequisites

Users with the Migrator, System Administrator, and Vault Owner security profiles can perform the actions described below by default. If your vault uses custom Security Profiles, your profile must grant the following permissions:

Permission Controls
Tabs: Projects Ability to access the Projects tab

The YAML Builder creates the mapping files (YAMLs) used for study migrations. With this tool, you can auto-generate a baseline set of mapping files based on the EDC Study Design.

Users are encouraged to conduct a Study Design Validation in Studio before using the YAML Builder to ensure no core structural design issues are present.

Mapping the Study Design

The YAML Builder is designed to work with various data source types, such as Rave™ and InForm™. Each file corresponds to a specific source, containing unique mapping properties.

Header Files

The YAML Builder generates unique header files for each data source type. These files determine the location of object data within the source data. Header files are based on the Event Type and whether a study has local labs enabled.

The YAML Builder uses specific header file names based on the Event Type and study configuration:

  • header.yaml: Used as the default file name for all event types, except logs.
  • logheader.yaml: Used if the event type equals “log.”
  • lab.yaml: Used for studies with local labs to process extracts containing lab data.

In the example below, predefined mappings are used to locate object data from source data column headings. An Event can be found in the EVENTDEF column. The value or name of the event is derived from the event object’s external ID.

columnSettings:
  subject: "SUBJID"
  site: "SITENUM"
  eventGroup: "EGROUPDEF"
  eventGroupSeq: "ESEQ"
  event: "EVENTDEF"
  form: "FORMDEF"
  formSeq: "FSEQ"
  itemGroupSeq: "IGSEQ"
  visitDate: "EVTDT"

sequenceNumberOptions:
  eventGroupSequenceStartingNumber: 1
  formSequenceStartingNumber: 1
  itemGroupSequenceStartingNumber: 1

defaultTargets:
- "FORM"
- "ITEM"
- "ITEM_GROUP"

Read Header Files for more information.

External IDs

The YAML Builder maps each study design object to its corresponding place in the source data using the object’s external ID. External IDs are obtained from the study design in Studio and default to the veevaDef name.

For Event Groups, Events, and Forms, the external ID determines the object value as it appears in the source data (for example, the event or form name).

For Items, the external ID determines the column name in the source data containing the item value (for example, AETERM for an adverse event term).

In the example below, a Demographics Form extract shows a definition column, EVENTDEF, and an external ID column, EVENTEID. If your source data includes both columns, and they are consistent throughout, you can use either to identify events in the header file.

Example Extract with Event ID and Event Definition
Example Extract with Event ID and Event Definition

In Studio, the external ID defaults to the definition name. The YAML Builder only requires a change where the name differs from the value in the source data. For example, if the external ID defaults to ev_UNSCHEDULED, but the source data shows UNSCHEDULED you must update the external ID in Studio to UNSCHEDULED.

Similarly, if an item definition is prefaced with the form name DM_BRTHYR, for example, but the column appears in the source data as BRTHYR without that preface, the external ID for the item should reflect the source data (BRTHYR).

Excluded Objects

Labels and headers are not generally included in the source data and are excluded when creating YAML files. Only the following item types are mapped by the YAML Builder:

  • Item Type: An EDC field, read-only, displaying status.
  • Data Type: All types except Label (for example, Date, Datetime, Time, Boolean, Text, URL, Codelist, Number, or Unit).

Lab Forms

When designing a lab form for the YAML Builder, it’s important to acknowledge potential differences based on the study’s origin (for example, lab forms originating in Rave or InForm). The lab header file tells the YAML Builder which column names contain lab data. The YAML Builder does not support the remaining items on the lab panel (i.e., those outside of the lab header). You must manually insert these mappings to migrate the data. We recommend using values from the Lab Normals functionality instead of migrating these values.

Rave Example

The example below shows a header file produced for lab forms originating in Rave. Using the lab header file, the YAML Builder knows that for this study, the AnalyteName column sends the analyte’s name, and the LabUnits column sends its unit value. If the source data sends either of these values in a different column, you must update the lab header file.

Example Rave Header File
Example Rave Header File

The external ID determines the values not specified by the lab header.

In Studio, the lab result name defaults to LBORRES_[analyte]. For example, LBORRES_ALKP is the item name for Alkaline Phosphatase. However, for studies originating in Rave, the analyte most likely appears as “Alkaline Phosphatase” in the source data, not LBORRES_ALKP. In this example, you must set the external ID for the Alkaline Phosphatase lab result to “Alkaline Phosphatase” in Studio.

Generally, the source data name is equivalent to the analyte’s LBTEST label.

In the example below, the external ID for the lab result field was updated to match the expected result in the source data.

Lab Result Name Example
Lab Result Name Example

InForm Example

For studies originating in InForm, each analyte test result has its own column in the source extract. The unit value and unit definition also have their own columns. Instead of a separate lab header file, the YAML Builder uses the traditional header or log header file. Therefore, ensure the lab result field name matches the column name as it appears in the extract.

For example, if the item definition LBORRES_ALBUMIN appears as LBORRES_A05 in the extract, update the external ID for LBORRES_ALBUMIN to LBORRES_A05.

The YAML Builder inserts a separate mapping to account for the unit value. For InForm studies, this column’s name usually includes a unit indicator, like “U”, within the item definition (for example, LBORRESU_A05).

In the example below, the lab result’s external ID for Alkaline Phosphatase defaults to LBORRES_ALKP. However, in the source data, the data will most likely appear as “Alkaline Phosphatase,” matching the read-only label above the lab result field.

Lab Result External ID Example
Lab Result External ID Example

Transformation Files

As part of the YAML generation process, the system creates default Transformation YAML files (transformations.yaml) to use in conjunction with generated mapping files. These files contain system defaults for legacy platforms, such as Rave and InForm.

The YAML Builder automatically applies the following default transformers during the YAML generation process:

  • DateFormatter
  • RemovePrefixT
  • UnknownDate
  • YesNoBoolean
  • MedicalCoding
  • DateTimeFormatter
  • LabsDateTimeFormatter
  • VisitDateFormatter
  • DoNotMigrateUnknownTime
  • TextItemCharacterSplit1500
  • TextItemCharacterSplit500
  • formStatusTransformer
  • VisitMethod

If a transformer is not needed, it won’t be used during the migration process and will have no impact on the study.

Read Transformation YAML Files for code examples.

Selectors

Default transformers contain selectors that are used to streamline migration tasks:

  • formStatus: Sets the status of migrated forms to Submitted. This does not apply to repaired forms.
  • visitMethod: Applies global transformations to all Events with Visit Methods, converting source values into a Veeva-compatible format.
  • Visit Date: Converts Event Dates & Item types into the standard Veeva format.

Generating Files

To generate files with the YAML Builder:

  1. From the Projects tab, hover next to the project name to show the Actions () menu.
  2. From the Actions menu, click YAML Builder.
  3. Select the Target Vault. This is the EDC vault where the study design is retrieved.
  4. Enter the Study Name. This is the Name and the Study Environment for which you want to create mapping files (for example, “Cholecap_PROD” or “Deetoza_DEV1”).
  5. Select the Data Source (In-Form, Legacy, or Rave).
  6. Enter the Casebook Version.
  7. Click Execute.
Open YAML Builder Action Menu
Open YAML Builder Action Menu

A progression banner displays at the top of the application during YAML file generation. After the generation is complete, you’ll receive in-app and email notifications. If the generation is successful, files are added to the Attachments section of the project’s Details page.

Troubleshooting Job Failure

This feature is only available to users with Admin access.

If your generation job is unsuccessful, you can find information regarding the failure in Admin > Operations > Job Status > History. To find the most recent job under the History section, locate the YAML Builder title and sort by Started Time. You can also download the job’s log file by hovering next to the Job ID, and selecting Actions () menu > Download Log.

Modifying YAML Files

You can manually modify and re-upload YAML files when needed. To make changes, download the YAML file in the Mapping Configurations section of the project’s Details page. Make the appropriate changes and upload the file to the same Mapping Configurations section.

For instructions on uploading, read Creating Mapping Configurations.

Study Design Structure

EDC Migrator identifies the structure of all study Forms in your EDC Study Design. This includes Items, Item Groups, Events, Event Groups, and Log Events, and forms containing lab data.

Item Groups: Repeating & Non-Repeating

For Forms with repeating Item Groups, the YAML Builder generates a single mapping file for each repeating Item Group definition retrieved from the target study in EDC. For example, if a Form has seven repeating Item Groups, the YAML Builder creates seven Item Group YAML files.

The single mapping file includes a source file for each Item Group, and a default naming structure of “form name_name of item group”, based on the Form’s definition name in Studio.

For Forms without repeating Item Groups, the YAML Builder generates a single mapping file for each form definition retrieved from the target study in EDC. Any non-repeating Item Groups appearing on the same Form are generated in a separate YAML file.

Item Group Columns
Item Group Columns

Log Events

Log Events are events that are not tied to a specific site visit and can span across multiple Events. The YAML Builder generates mapping files for Forms that exist under Log Events (both repeating and non-repeating).

Progressive Display

EDC Migrator provides mapping configurations for Forms with controlling Items & Item Groups for progressive display.

After identifying controlling Items, the YAML Builder creates mapping configurations containing the following:

  • columnName of the Item that progressively displays
  • veevaDef of the Item that progressively displays
  • columnName of the controlling Item
  • columnValue that “triggers” the progressive display
    • Note: If multiple values trigger a display, they’re listed with commas. For example, columnValue: [“N”, “NA”]

The mapping object’s columnName/columnValue is based on the external ID assigned to the object (as present in the Study Design).

After identifying controlling Item Groups, the YAML Builder creates mapping configurations containing the following:

  • veevadef for the Item Group that progressively displays
  • Mappings for controlling Items including:
    • the columnName of the controlling Item
    • the columnValue that “triggers” the progressive display of the Item Group
  • Mappings for the Item Groups items

Events & Event Groups

The YAML Builder creates mappings for Events and Event Groups. The mappings can be found in the eventGroups section of the Forms YAML file. They consist of the following:

  • Events section header
  • Forms section header
  • Event Group definition (and if applicable, the corresponding columnValue for the source data’s Event Group)
  • The source data’s columnValue for the Event, and its corresponding Event definition

Single & Multiple Event Groups

The YAML Builder also creates mappings for Event Groups with single and multiple Events. The number of Events dictates the default columnValue. If an Event Group has one Event, the YAML Builder inserts the Event’s veevaDef as the Event Group’s columnValue. For example, “ev_LOG”. If an Event Group has more than one Event, the YAML Builder inserts all of the Events’ veevaDef values as the Event Group’s columnValue (separated by a comma). For example, “ev_LOG1”, “ev_LOG2”. If the Event Group does not contain any Events, the system does not specify a columnValue.

Generating Event & Event Group YAMLs

For InForm™, Rave™, and EDC study migrations, the YAML Builder provides a baseline set of YAMLs for Events and Event Groups. These files can easily be used for any Load by adjusting the YAML’s file mapping.

The YAML Builder generates two default YAML files for Event data:

  • events.yaml: For all scheduled EDC Events
  • events_unscheduled.yaml: For all unscheduled Events

The placeholder filename, placeholder.csv, is used for InForm™ and Rave™ source data files. Users must replace this with the actual filename after generation. EDC source data files receive the SYS_EVT.csv filename.

Default header file and target mappings are inserted into each file:

Default header file mappings

  • header.yaml: For all scheduled EDC Events
  • logheader.yaml: For all unscheduled Events

Target mappings

  • CASEBOOK
  • EVENT_GROUP
  • EVENT

For each Event Group, the YAML Builder inserts the external ID of the Event: columnValue. It also inserts Event mappings for each Event within an Event Group and assigns default columnNames for Events. Default columnNames and format mappings are not required for Event Dates.

Did Not Occur Events

The YAML Builder generates the dno_event.yaml template for importing Did Not Occur (DNO) Events. This template is applicable to the following source data types:

  • InForm™(Legacy)
  • InForm™(Standard)
  • EDC

The placeholder filename, placeholder.csv, is used for InForm™ (legacy and standard) and Rave™ source data files. Users must replace this with the actual filename after generation. EDC source data files receive the SYS_EVT.csv filename.

Default header file, target, and placeholder mappings are inserted into each file.

Default header file mapping:

  • header.yaml: Default

Target mappings:

  • CASEBOOK
  • EVENT_GROUP
  • DID_NOT_OCCUR_EVENT

Placeholder mappings

The YAML Builder inserts placeholder mappings for events.events and eventGroups.eventGroups. Each receives placeholder values for their columnValue and veevaDef attributes.

For InForm™ and Rave™ source data files, the YAML Builder inserts placeholder mappings for the Did Not Occur columnName and a default NULL value for columnValue. For EDC source data files, the YAML Builder inserts a default STATUS value for columnName and a default Did Not Occur value for columnValue.

For Rave™ files, a placeholder mapping is also added for the Did Not Occur Change Reason.

Med Coding

For studies with Med Coding enabled, the YAML Builder includes mappings whenever possible. If the YAML Builder cannot determine the appropriate mapping value, a placeholder is used instead. Dictionary releases are used to determine which placeholder to insert:

  • The Drug Code placeholder is used for the WHODrug dictionary
  • The LLT Code placeholder is used for the MedDRA dictionary
  • The ATC Code placeholder is used when the Study has Code with ATCs enabled.

Each Form can only have one dictionary release version, and all three codes (Drug, LLT, ATC) will never be used for the same item.

Lab Data

EDC Migrator automates the YAML mapping process for lab data originating in Rave™.

The YAML Builder generates YAML files for study designs containing lab Forms, and auto-names each file. A labheader.yaml file is assigned to each file and contains default column settings, sequence number options, and targets.

Lab forms are identified when an Item Group contains lab_header__v or lab__v.

Event Groups, Events, & Forms

The YAML file includes a nested hierarchy of Event Groups, Events, and Forms. The external ID is used to determine the columnValue mappings for each.

The mappings are as follows:

  • Event Group: Its veevaDef and corresponding columnValue in the source data.
  • Event: Its veevaDef and corresponding columnValue in the source data.
  • Form: The form and lab header name, and mapping for its Items.

The following are example mappings for Event Groups, Events, and Forms:

eventGroups:
- map:
    columnValue: "Unscheduled"
    veevaDef: "Unscheduled"
  events:
  - map:
      columnValue: "Unscheduled"
      veevaDef: "Unscheduled"
    forms:
    - include: "lb_hema_01_001"
components:
- name: "lb_hema_01_001"
  forms:
  - map:
      columnValue: "lb_hema_01_001"
      veevaDef: "lb_hema_01_001"

In the Form YAML file, if the LBHEADER variable and lab Collection DateTime are present, the LabsDateTimeFormatter transformer is applied to the Collection DateTime.

For example:

items:
- map:
    columnName: "SaveTS"
    veevaDef: "LBDTC"
  transformers:
		- { name: LabsDateTimeFormatter }

Lab Panels

Forms containing lab header data also include a lab panel. In EDC, there can only be one lab panel per Form.

The YAML Builder maps the following variables for lab panels:

  • veevaDef: The lab panel’s veevaDef.
  • columnValue: The lab panel name in the source data, based on the external ID of the lab panel’s veevaDef.
    - map:
	    columnValue: "Hematology"
		veevaDef: "Hematology"

A lab panel can have up to 35 analytes, and up to 5 or 6 fields per analyte. The YAML Builder adds mappings for the analyte name and its lab result.

The following variables are mapped for each analyte:

  • columnValue: The analyte name, based on the external ID of the lab result’s veevaDef.
  • veevadef: The lab result field’s veevaDef. This is a system-set field, identified by the LBORRES prefix.
  • columnName: The columnName associated with the plugin (i.e., AnalyteValue) that contains the lab result.