User page access tracing using AWS Kinesis + Quicksight
One of the recent requirement in our platform was to provide ability to view administrator access pattern and viewed information related to clients.
The requirement was for an Insurance Administration platform to be 1+ compliant with HIPPA regulations.
This dataset is important to understand if any of insurance administrators viewed PII or PHI information about their client and leaked out the to the press. Or to identify such access trends and flag the administrators.
Design
This design is very typical for any of Kinesis firehose architecture. Please refer links at the end of blog for details.
Challenge
One of the challenge for us was the event sent to Kinesis was a Json payload of the format:
Notice the structure of element ‘data’. This was designed to allow users to pick and choose the actual data they want to attach along with the page access details like ‘user’, ‘page’ and ‘access_time’.
Solution
I chose to use a flat schema structure for AWS Glue table definition. This allowed me to:
- Convert incoming json into parquet format and compress before storing into S3
- Dynamic payload structure allowed to create Views in AWS Athena to map the reporting requirements.
- Use AWS Quicksight SPICE storage to reduce heavy json parsing on Athena
This solution allow us to abstracted data extraction and reporting requirements.
Current challenge
Parsing JSON at query time is flexible, but impacts wrt scale and cost.
One approach I am considering is to categorize the events at client level such that we may get N types of events mapping to N business objects and can have M attributes in ‘data’ payload for querying.