AWS Certified Big Data - Specialty

Question#31

A media advertising company handles a large number of real-time messages sourced from over 200 websites.
The companys data engineer needs to collect and process records in real time for analysis using Spark
Streaming on Amazon Elastic MapReduce (EMR). The data engineer needs to fulfill a corporate mandate to keep ALL raw messages as they are received as a top priority.
Which Amazon Kinesis configuration meets these requirements?

A. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Pull messages off Firehose with Spark Streaming in parallel to persistence to Amazon S3.
B. Publish messages to Amazon Kinesis Streams. Pull messages off Streams with Spark Streaming in parallel to AWS Lambda pushing messages from Streams to Firehose backed by Amazon Simple Storage Service (S3).
C. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Use AWS Lambda to pull messages from Firehose to Streams for processing with Spark Streaming.
D. Publish messages to Amazon Kinesis Streams, pull messages off with Spark Streaming, and write row data to Amazon Simple Storage Service (S3) before and after processing.

Discover Answer

C

Question#32

A solutions architect for a logistics organization ships packages from thousands of suppliers to end customers.
The architect is building a platform where suppliers can view the status of one or more of their shipments.
Each supplier can have multiple roles that will only allow access to specific fields in the resulting information.
Which strategy allows the appropriate level of access control and requires the LEAST amount of management work?

A. Send the tracking data to Amazon Kinesis Streams. Use AWS Lambda to store the data in an Amazon DynamoDB Table. Generate temporary AWS credentials for the suppliers users with AWS STS, specifying fine-grained security policies to limit access only to their applicable data.
B. Send the tracking data to Amazon Kinesis Firehose. Use Amazon S3 notifications and AWS Lambda to prepare files in Amazon S3 with appropriate data for each suppliers roles. Generate temporary AWS credentials for the suppliers users with AWS STS. Limit access to the appropriate files through security policies.
C. Send the tracking data to Amazon Kinesis Streams. Use Amazon EMR with Spark Streaming to store the data in HBase. Create one table per supplier. Use HBase Kerberos integration with the suppliers users. Use HBase ACL-based security to limit access for the roles to their specific table and columns.
D. Send the tracking data to Amazon Kinesis Firehose. Store the data in an Amazon Redshift cluster. Create views for the suppliers users and roles. Allow suppliers access to the Amazon Redshift cluster using a user limited to the applicable view. B

Discover Answer

Explanation

Question#33

A companys social media manager requests more staff on the weekends to handle an increase in customer contacts from a particular region. The company needs a report to visualize the trends on weekends over the past 6 months using QuickSight.
How should the data be represented?

A. A line graph plotting customer contacts vs. time, with a line for each region
B. A pie chart per region plotting customer contacts per day of week
C. A map of regions with a heatmap overlay to show the volume of customer contacts
D. A bar graph plotting region vs. volume of social media contacts

Discover Answer

C

Question#34

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.
Which three steps should the data engineer take to accomplish this task? (Choose three.)

A. Create a new KMS key in the destination region.
B. Copy the existing KMS key to the destination region.
C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.
D. In the source region, enable cross-region replication and specify the name of the copy grant created.
E. In the destination region, enable cross-region replication and specify the name of the copy grant created.
F. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key created in the destination region. ADF

Discover Answer

Explanation

Question#35

How should an Administrator BEST architect a large multi-layer Long Short-Term Memory (LSTM) recurrent neural network (RNN) running with MXNET on
Amazon EC2? (Choose two.)

A. Use data parallelism to partition the workload over multiple devices and balance the workload within the GPUs.
B. Use compute-optimized EC2 instances with an attached elastic GPU.
C. Use general purpose GPU computing instances such as G3 and P3.
D. Use processing parallelism to partition the workload over multiple storage devices and balance the workload within the GPUs.

Discover Answer

AC

Question#36

An organization is soliciting public feedback through a web portal that has been deployed to track the number of requests and other important data. As part of reporting and visualization, AmazonQuickSight connects to an Amazon RDS database to virtualize data. Management wants to understand some important metrics about feedback and how the feedback has changed over the last four weeks in a visual representation.
What would be the MOST effective way to represent multiple iterations of an analysis in Amazon QuickSight that would show how the data has changed over the last four weeks?

A. Use the analysis option for data captured in each week and view the data by a date range.
B. Use a pivot table as a visual option to display measured values and weekly aggregate data as a row dimension.
C. Use a dashboard option to create an analysis of the data for each week and apply filters to visualize the data change.
D. Use a story option to preserve multiple iterations of an analysis and play the iterations sequentially.

Discover Answer

D

Question#37

An organization is setting up a data catalog and metadata management environment for their numerous data stores currently running on AWS. The data catalog will be used to determine the structure and other attributes of data in the data stores. The data stores are composed of Amazon RDS databases, Amazon
Redshift, and CSV files residing on Amazon S3. The catalog should be populated on a scheduled basis, and minimal administration is required to manage the catalog.
How can this be accomplished?

A. Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the DynamoDB table.
B. Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.
C. Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data sources to populate the catalog.
D. Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that connects to data sources to populate the metastore.

Discover Answer

C

Question#38

An organization is currently using an Amazon EMR long-running cluster with the latest Amazon EMR release for analytic jobs and is storing data as external tables on Amazon S3.
The company needs to launch multiple transient EMR clusters to access the same tables concurrently, but the metadata about the Amazon S3 external tables are defined and stored on the long-running cluster.
Which solution will expose the Hive metastore with the LEAST operational effort?

A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to the Amazon DynamoDB table.
B. Export Hive metastore information to a MySQL table on Amazon RDS and configure the Amazon EMR hive-site classification to point to the Amazon RDS database.
C. Launch an Amazon EC2 instance, install and configure Apache Derby, and export the Hive metastore information to derby.
D. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.

Discover Answer

B

Question#39

An organization is using Amazon Kinesis Data Streams to collect data generated from thousands of temperature devices and is using AWS Lambda to process the data. Devices generate 10 to 12 million records every day, but Lambda is processing only around 450 thousand records. Amazon CloudWatch indicates that throttling on Lambda is not occurring.
What should be done to ensure that all data is processed? (Choose two.)

A. Increase the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.
B. Decrease the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.
C. Create multiple Lambda functions that will consume the same Amazon Kinesis stream.
D. Increase the number of vCores allocated for the Lambda function.
E. Increase the number of shards on the Amazon Kinesis stream.

Discover Answer

AE

Question#40

An Operations team continuously monitors the number of visitors to a website to identify any potential system problems. The number of website visitors varies throughout the day. The site is more popular in the middle of the day and less popular at night.
Which type of dashboard display would be the MOST useful to allow staff to quickly and correctly identify system problems?

A. A vertical stacked bar chart showing today's website visitors and the historical average number of website visitors.
B. An overlay line chart showing today's website visitors at one-minute intervals and also the historical average number of website visitors.
C. A single KPI metric showing the statistical variance between the current number of website visitors and the historical number of website visitors for the current time of day.
D. A scatter plot showing today's website visitors on the X-axis and the historical average number of website visitors on the Y-axis.

Discover Answer

B