The goal of this system is to improve security by helping teams find and fix software vulnerabilities
in their back-end microservices.
We just started using a few third-party vendors to scan the external perimeter of the network looking for vulnerabilities.
Design a system that creates a dashboard to display vulnerabilities that affect the services they own.
More details:
The results of these vendor scans can be accessed using APIs provided by vendors.
Information reported includes network data (e.g. IP addresses) and
vulnerability attributes (name, description, score and more).
Sample scanner finding:
{“time”: 12333333335 ,“IP”: “12.34.56.78”, “vuln_name”: “XYZ-1234,” “score”:”5”, “description”:”this is the description of the XYZ”}
The event could be duplicated sometimes and need to remove that duplication
You can assume that each scanner produces 1M data points and the system must update its findings at least 3 times per day.
Internally, we have an orchestration system that runscontainerized services in a large datacenter. The orchestration
system knows what service is running on what IP address and what team owns it.
However, this information is not available to our security team today. As part of this problem, we will need to propose how to feed orchestration data into the system.
Below are some of my thinking, but not sure if they are right or in right direction or miss the main point for for the question
I point to ELK ELastic Search as DB (looks too detail),
The following two solutions , select second choice for future extention
1, create the service to do ETL (Simple case)
2, Use Databrick Datalake for ETL (complex case with more data sources)
For orchestration Data onboarding , could use
1 Store as log file , and then transfer with Filebeat/fluentbit /fluentD or logstash,
2, API get data
3, add Kafka between data collection and data ELK
Elastic Search as index store, and kibana for dashboard,