Prepare source CSVs for BulkLoader

Prerequisites : This lab requires Prerequisites and Lab 1 to be completed before you can continue.

In this section we will use AWS Athena service to convert source CSV files to the format that Neptune Expects.

Steps to generate Neptune BulkLoader compatible CSVs from Source CSVs
  1. Open Athena console here and confirm that you have AwsDataCatalog selected as Data source and C360-demo-db selected as database as shown in the screenshot below. Show Tables
  2. Click on Query Editor on top left to find your editor.
  3. You can query the data directly here via typing your SQL statements. Use the following SQL query to explore source_Auto_Customer data.
    SELECT * from source_Auto_Customer limit 10 
    

    Show Tables

  4. Open the Cloud Formation console and click on the stack with the description “Create Amazon Neptune cluster, SageMaker notebook, S3 buckets and cloud9” Show Tables
  5. Find the OutputDir from the Outputs section copy it to your notepad/scratchpad. We will use it later. Show Tables
  6. Replace [REPLACE-S3-PATH-HERE] with OutputDir value copied from cloudformation stack in the SQL query below and run that query in Query Editor.
     CREATE EXTERNAL TABLE Auto_Customer_Vertex (
         `~id` string,
         `fname` string,
         `lname` string,
         `work`  string,
         `birthdate` string,
         `~label` string
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
      STORED AS TEXTFILE LOCATION "[REPLACE-S3-PATH-HERE]"
    

    Output will be an additional table with the name Auto_Customer_Vertex created and showing up on the left tables list.

  7. Transform and ingest the data to the newly created table. Notice that vertex tables require 2 special columns ( ~id and ~label) in addition to regular columns. We will populate those as part of this transform. Copy following query and paste/run in the Query Editor
    INSERT INTO Auto_Customer_Vertex ("~id", fname,lname,work,birthdate,"~label")
        SELECT cast(policyid as varchar(20)), fname, lname, work, birthdate, 'Auto_Customer' FROM source_auto_customer;
    
  8. Verify the data generated by clicking on three dots right next to auto_customer_vertex table. Preview Table from the menu will show data from this table. Refer the screenshot below: Verify Vertex

Above steps were just example of how Source data is converted to Neptune Consumable CSVs. Following will use pre-created output folder in your S3 Bucket that can be ingested into Neptune directly.

Goto next exercise where you will learn to load data into Neptune.