User page access tracing using AWS Kinesis + Quicksight

Girish Deshpande
3 min readAug 10, 2021

One of the recent requirement in our platform was to provide ability to view administrator access pattern and viewed information related to clients.

The requirement was for an Insurance Administration platform to be 1+ compliant with HIPPA regulations.

This dataset is important to understand if any of insurance administrators viewed PII or PHI information about their client and leaked out the to the press. Or to identify such access trends and flag the administrators.

Design

This design is very typical for any of Kinesis firehose architecture. Please refer links at the end of blog for details.

Challenge

One of the challenge for us was the event sent to Kinesis was a Json payload of the format:

Notice the structure of element ‘data’. This was designed to allow users to pick and choose the actual data they want to attach along with the page access details like ‘user’, ‘page’ and ‘access_time’.

Solution

I chose to use a flat schema structure for AWS Glue table definition. This allowed me to:

  1. Convert incoming json into parquet format and compress before storing into S3
  2. Dynamic payload structure allowed to create Views in AWS Athena to map the reporting requirements.
  3. Use AWS Quicksight SPICE storage to reduce heavy json parsing on Athena

This solution allow us to abstracted data extraction and reporting requirements.

AWS Glue Table schema used by Kinesis to convert into Parquet and store into S3
AWS Kinesis Firehose stream setting
AWS Athena View and Json extraction example

Current challenge

Parsing JSON at query time is flexible, but impacts wrt scale and cost.

One approach I am considering is to categorize the events at client level such that we may get N types of events mapping to N business objects and can have M attributes in ‘data’ payload for querying.

Refer the links before for more details

--

--