A company uses Amazon Redshift for its enterprise data warehouse. A new on-premises PostgreSQL OLTP
DB must be integrated into the data warehouse. Each table in the PostgreSQL DB has an indexed timestamp column. The data warehouse has a staging layer to load source data into the data warehouse environment for further processing.
The data lag between the source PostgreSQL DB and the Amazon Redshift staging layer should NOT exceed four hours.
What is the most efficient technique to meet these requirements?
C
An administrator is deploying Spark on Amazon EMR for two distinct use cases: machine learning algorithms and ad-hoc querying. All data will be stored in Amazon S3. Two separate clusters for each use case will be deployed. The data volumes on Amazon S3 are less than 10 GB.
How should the administrator align instance types with the clusters purpose?
A
An organization is designing an application architecture. The application will have over 100 TB of data and will support transactions that arrive at rates from hundreds per second to tens of thousands per second, depending on the day of the week and time of the day. All transaction data, must be durably and reliably stored. Certain read operations must be performed with strong consistency.
Which solution meets these requirements?
A
A company generates a large number of files each month and needs to use AWS import/export to move these files into Amazon S3 storage. To satisfy the auditors, the company needs to keep a record of which files were imported into Amazon S3.
What is a low-cost way to create a unique log for each import job?
B
A company needs a churn prevention model to predict which customers will NOT renew their yearly subscription to the companys service. The company plans to provide these customers with a promotional offer. A binary classification model that uses Amazon Machine Learning is required.
On which basis should this binary classification model be built?
C
A company with a support organization needs support engineers to be able to search historic cases to provide fast responses on new issues raised. The company has forwarded all support messages into an Amazon
Kinesis Stream. This meets a company objective of using only managed services to reduce operational overhead.
The company needs an appropriate architecture that allows support engineers to search on historic cases and find similar issues and their associated responses.
Which AWS Lambda action is most appropriate?
A
A solutions architect works for a company that has a data lake based on a central Amazon S3 bucket. The data contains sensitive information. The architect must be able to specify exactly which files each user can access. Users access the platform through a SAML federation Single Sign On platform.
The architect needs to build a solution that allows fine grained access control, traceability of access to the objects, and usage of the standard tools (AWS Console, AWS CLI) to access the data.
Which solution should the architect build?
D
A company that provides economics data dashboards needs to be able to develop software to display rich, interactive, data-driven graphics that run in web browsers and leverages the full stack of web standards
(HTML, SVG, and CSS).
Which technology provides the most appropriate support for this requirements?
A
Reference: https://sa.udacity.com/course/data-visualization-and-d3js--ud507
A company hosts a portfolio of e-commerce websites across the Oregon, N. Virginia, Ireland, and Sydney
AWS regions. Each site keeps log files that capture user behavior. The company has built an application that generates batches of product recommendations with collaborative filtering in Oregon. Oregon was selected because the flagship site is hosted there and provides the largest collection of data to train machine learning models against. The other regions do NOT have enough historic data to train accurate machine learning models.
Which set of data processing steps improves recommendations for each region?
D
There are thousands of text files on Amazon S3. The total size of the files is 1 PB. The files contain retail order information for the past 2 years. A data engineer needs to run multiple interactive queries to manipulate the data. The Data Engineer has AWS access to spin up an Amazon EMR cluster. The data engineer needs to use an application on the cluster to process this data and return the results in interactive time frame.
Which application on the cluster should the data engineer use?
C