Ingesting Entities from an AWS S3 Bucket

Apart from connecting to live Systems of Record and ingesting data into the SGNL Graph, SGNL also allows you to ingest CSV files from an AWS S3 bucket for your System of Record’s entities.

Prerequisites

AWS

  • A dedicated S3 bucket enabled for access
  • A permission policy, attached to the bucket, that allows adequate operations on that S3 resource
  • A user account that is:
    • Allowed to access the bucket
    • Has an Access Key that can be used by SGNL

SGNL

  • A System of Record and its associated entities created

  • CSV files meeting the following requirements:

    • CSV file names must match the external ID of their associated entities (case sensitive) followed by “.csv”
    • CSV file(s) must contain a header row containing the external Ids of the attributes of the entity
    • Any other attributes in the header that are not in the entity configuration are ignored
    • Attributes can be excluded from the CSV file

Permissions Required

Configuring AWS

Creating the S3 bucket
  1. Sign into the Amazon AWS S3 Management Console.
  2. Click the Create Bucket button
  3. Enter a bucket name, like sgnl-aws-s3-ingest.
  4. Select the US East (Ohio) us-east-2 region.
  5. Set Object Ownership to ACLs disabled (recommended).
  6. Leave checkboxes for Block public access checked.
  7. Set Bucket Versioning to Disable.
  8. Tags are not required.
  9. Leave Default encryption at defaults: Encryption key type = Amazon S3 managed keys (SSE-S3) and Bucket Key = Enable.
  10. Do not change any Advanced settings.
  11. Click the Create bucket button.

(Optional) Enable public HTTPS web access to the bucket

Complete this section to enable public web access to files in the bucket. Otherwise, skip to the next section if access must be restricted to a specific user.

  1. From S3 Console, click the Buckets tab in side-bar.
  2. Click on the bucket that you created, e.g. sgnl-aws-s3-ingest.
  3. Click the Permissions tab.
  4. Click the Edit button in the Block public access (bucket settings) section.
  5. Uncheck the Block all public access checkbox and all four checkboxes underneath it.
  6. Click the Save changes button.
  7. Type confirm in the box that pops up, and click the Confirm button.
  8. Scroll down to the Bucket policy section and click the Edit button.
  9. Copy and paste the text below, replacing the bucket name if needed:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "PublicReadGetObject",
                "Effect": "Allow",
                "Principal": "*",
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::sgnl-aws-s3-ingest/*"
                ]
            }
        ]
    }
    
  10. Click the Save changes button.

Note: If you upload a file named user.csv to the bucket, then the publicly-accessible URL to an object in the bucket will look like this: https://s3.us-east-2.amazonaws.com/sgnl-aws-s3-ingest/user.csv


Create a permission policy for the bucket

This permission policy will be assigned to the user created in a later step.

  1. Go to the Amazon AWS IAM Management Console
  2. Click Policies on the side-bar.
  3. Click the Create Policy button
  4. Click the JSON tab
  5. Copy and paste the following into the text box, replacing the bucket name if needed:
  6. {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::sgnl-aws-s3-ingest"
                ]
            },
            {
                "Effect": "Allow",
                "Action": ["s3:*"],
                "Resource": "arn:aws:s3:::sgnl-aws-s3-ingest/*"
            }
        ]
    }
    
  7. Click the Next: Tags button.
  8. Click the Next: Review button.
  9. Enter S3SingleBucketFullAccess as the Name, or another name of your choosing.
  10. Click the Create policy button.

Create a user with access to the bucket
  1. Go to the Amazon AWS IAM Management Console
  2. Click Users on the side-bar.
  3. Click the "Create user" button.
  4. Enter "sgnl-ingest" or equivalent as the Username. Click Next.
  5. Under Permission options click the "Attach policies directly"
  6. In the Permission polices section search for "S3SingleBucketFullAccess" and check the box to the left of it.
  7. Click the "Next" button.
  8. Click the "Create user" button.

Creating credentials to access the S3 bucket
  1. Ensure you're in the Amazon AWS IAM Management Console and click "Users" in the left handrail under "Access management"
  2. Click on the user's Username you just created. e.g. sgnl-ingest
  3. Click on the "Security credentials" tab.
  4. Scroll down to the "Access keys" section. Click "Create access key"
  5. Under "Use case" select "Third-party service"
  6. At the bottom of the page, under Confirmation click the check box next to: "I understand the above recommendation and want to proceed to create an access key."
  7. Click the "Next" button.
  8. Optional - add a tag with the description of the key.
  9. Click the "Create access key" button
  10. Copy and secure both the "Access Key" and "Secret access key". You'll use these to configure the ingest adapter inside SGNL.
  11. Click the "Done" button

Configuring SGNL

Create a New System of Record

  1. Click on "Systems of Record" from the left handrail.
  2. At the top of the screen click the "+ Add" button.
  3. Click on "AWS S3".
  4. Update the "Display Name" to something that more accurately describes the System of Record the data comes from.
  5. Scroll to the "Authentication" section.
  6. Under "Username" enter the Access Key you created in the previous step.
  7. Under "Password" enter the Secret Access Key you created in the previous step.
  8. Under Advanced Settings -> Adapter Config update the region and bucket. Using "us-east-1" as the region (used for global) bucket name of "sgnl-aws-s3-ingest" the adapter configuration will look like:
    creating-and-configuring-aws-s3-ingestion-3
  9. Select your Default Sync Settings.
  10. Click "Continue".

Update the Entities and Their Attributes

  1. If not on your newly created System of Record's page find under Systems of Record and click on it.
  2. Hover over the User entity and click delete.
    creating-and-configuring-aws-s3-ingestion-4
  3. Click on the "Add Entity" button.
  4. Enter a descriptive Display Name for your entity
  5. Under External ID enter name of the csv file you'll be ingesting without the .csv extension. e.g. for roles.csv enter roles as the External ID.
  6. (Optional) Enter a Description of the Entity
  7. Under Scheduled Sync update the sync frequency.
  8. Under Attributes click the "+ Add Attribute" button for each attribute you want to create.
  9. For each attribute you add the External ID will the associated field name (case sensitive) from the csv file.
    creating-and-configuring-aws-s3-ingestion-5
    Note: There must be at least one attribute that represents a unique identifier for each row (entity). When you add that attribute ensure you click the "Unique" box when adding that attribute.
  10. When completed click "Save" at the bottom of the page.
  11. Repeat the Add Entity for each entity (csv files) you need to ingest

Check Ingestion Logs

  1. In the left handrail click "Logs"
  2. Click the "Ingestion Service" tab.
  3. In the filter text box enter the name of your System of Record
  4. You can use the first dropdown to determine when ingestion started and if ingestion completed successfully or failed
  5. If ingestion failed, expand on that log entry for the error to debug the potential problem.

Note

  • Uploading a new CSV containing additional, modified, or deleted records will lead to corresponding updates in the graph.
  • CSV upload is not supported for entities with child entities. SGNL supports ingesting complex entities with nested complex attributes that are modeled as child entities in SGNL. However, CSV upload is not supported for such entities.