A telecommunications company needs to predict customer churn (i.e., customers who decide to switch to a competitor). The company has historic records of each customer, including monthly consumption patterns, calls to customer service, and whether the customer ultimately quit the service. All of this data is stored in
Amazon S3. The company needs to know which customers are likely going to churn soon so that they can win back their loyalty.
What is the optimal approach to meet these requirements?
B
A system needs to collect on-premises application spool files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is automatically deleted from the local server after an hour.
What is the most cost-efficient option to meet these requirements?
C
An administrator receives about 100 files per hour into Amazon S3 and will be loading the files into Amazon
Redshift. Customers who analyze the data within Redshift gain significant value when they receive data as quickly as possible. The customers have agreed to a maximum loading interval of 5 minutes.
Which loading approach should the administrator use to meet this objective?
C
An enterprise customer is migrating to Redshift and is considering using dense storage nodes in its Redshift cluster. The customer wants to migrate 50 TB of data. The customers query patterns involve performing many joins with thousands of rows.
The customer needs to know how many nodes are needed in its target Redshift cluster. The customer has a limited budget and needs to avoid performing tests unless absolutely needed.
Which approach should this customer use?
A
A company is centralizing a large number of unencrypted small files from multiple Amazon S3 buckets. The company needs to verify that the files contain the same data after centralization.
Which method meets the requirements?
A
An online gaming company uses DynamoDB to store user activity logs and is experiencing throttled writes on the companys DynamoDB table. The company is NOT consuming close to the provisioned capacity. The table contains a large number of items and is partitioned on user and sorted by date. The table is 200GB and is currently provisioned at 10K WCU and 20K RCU.
Which two additional pieces of information are required to determine the cause of the throttling? (Choose two.)
AD
A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints:
✑ Bicycle origination points
✑ Bicycle destination points
✑ Mileage between the points
✑ Number of bicycle slots available at the station (which is variable based on the station location)
✑ Number of slots available and taken at a given time
The program has received additional funds to increase the number of bicycle stations available. All data is regularly archived to Amazon Glacier.
The new bicycle stations must be located to provide the most riders access to bicycles.
How should this task be performed?
B
An administrator tries to use the Amazon Machine Learning service to classify social media posts that mention the administrators company into posts that require a response and posts that do not. The training dataset of
10,000 posts contains the details of each post including the timestamp, author, and full text of the post. The administrator is missing the target labels that are required for training.
Which Amazon Machine Learning model is the most appropriate for the task?
A
A medical record filing system for a government medical fund is using an Amazon S3 bucket to archive documents related to patients. Every patient visit to a physician creates a new file, which can add up millions of files each month. Collection of these files from each physician is handled via a batch process that runs ever night using AWS Data Pipeline. This is sensitive data, so the data and any associated metadata must be encrypted at rest.
Auditors review some files on a quarterly basis to see whether the records are maintained according to regulations. Auditors must be able to locate any physical file in the S3 bucket for a given date, patient, or physician. Auditors spend a significant amount of time location such files.
What is the most cost- and time-efficient collection methodology in this situation?
A
A clinical trial will rely on medical sensors to remotely assess patient health. Each physician who participates in the trial requires visual reports each morning. The reports are built from aggregations of all the sensor data taken each minute.
What is the most cost-effective solution for creating this visualization each day?
D