Creating AWS Glue crawler

Prerequisites : This lab requires Prerequisites to be completed before you can continue. Please continue if you have already completed the prerequisites.

PART A:Create Glue Crawler for Auto and Property source data.

1. Navigate to the AWS Glue Console Step - 1

2. On the AWS Glue menu, select Crawlers. Step - 2

3. Click Add Crawler.

4. Enter C360-workshop-glue-crawler as the crawler name for initial data load.

5. Optionally, enter the description. This should also be descriptive and easily recognized and Click Next. Step - 5

6. Choose Data stores, Crawl all folders and Click Next Step - 6

7. On the Add a data store page, make the following selections:

  • For Choose a data store, click the drop-down box and select S3.
  • For Crawl data in, select Specified path in my account.
  • For Include path, browse to the target folder stored CSV files, e.g., s3://mod-xxxx-s3bucketstack-xxx-xxx-s3bucket-xxx/source/

8. Click Next. Step - 8

9. On the Add another data store page, select No. and Click Next. Step - 9

10. On the Choose an IAM role page, make the following selections:

Select Choose an existing IAM role. For IAM role, Glue Role pre-created for you. For example GlueServiceRoleLab

11. Click Next. Step - 11

12. On the Create a schedule for this crawler page, for Frequency, select Run on demand and Click Next. Step - 12

13. On the Configure the crawler’s output page, click Add database to create a new database for our Glue Catalogue. Step - 13

14. Enter c360_workshop_db as your database name and click create Step - 14

Add for field “Prefix added to tables (optional)" –> source_ Step - 14

15. Click Next

16. Review the summary page noting the Include path and Database output and Click Finish. The crawler is now ready to run. Step - 16

17. Tick the crawler name, click Run Crawler button. Step - 17

Crawler will change status from starting to stopping, wait until crawler comes back to ready state (the process will take a few minutes), you can see that it has created 5 tables. Step - 17.1

18. In the AWS Glue navigation pane, click DatabasesTables. You can also click the C360_workshop_db database to browse the tables. Step - 18

Goto Validating data section