Is it possible to externalize JSON LD

Amazon EMR FAQs

Q: What are EMR notebooks?

EMR Notebooks provide a Jupyter Notebook-based managed environment that enables data scientists, analysts, and developers to prepare and visualize data, collaborate with colleagues, build applications, and perform interactive analysis with EMR clusters.

Q: What can I do with EMR Notebooks?

You can use EMR Notebooks to build Apache Spark applications and perform interactive queries on your EMR cluster with minimal effort. Multiple users can create serverless notebooks right from the console, connect them to an existing shared EMR cluster, or deploy a cluster directly from the console and start experimenting with Spark right away. You can detach notebooks and reattach them to new clusters. Notebooks are automatically saved in S3 buckets, and you can retrieve saved notebooks from the console to continue working. EMR Notebooks are prepackaged with the libraries in the Anaconda repository so that you can import these libraries into your notebook code and use them to manipulate data and visualize results. In addition, EMR notebooks have built-in Spark monitoring functions, with which you can monitor the progress of your Spark jobs and debug code from the notebook.

Q: How can I get started with EMR Notebooks?

To get started with EMR Notebooks, open the EMR console and select in the navigation area Notebooks. Just choose there Create notebook, enter a name for your notebook, choose an EMR cluster or immediately create a new one, provision a service role for the notebook to be used, and choose an S3 bucket to store your notebook in Save files, and then click Create notebook. After the notebook has a readyStatus, select to opento start the notebook's editor.

Q: Can I open EMR notebooks without logging into the AWS Management Console?

No, in order to create or open a notebook and run queries on your EMR cluster, you must log in to the AWS Management Console. The notebook files are saved in ipynb format in your S3 bucket and can be downloaded and opened locally on your computer.

Q: What programming languages ​​does EMR Notebooks support?

EMR Notebooks supports PySpark, SparkR, SparkSQL, Spark (Scala) and Python kernels.

Q: What libraries are available with EMR Notebooks?

Libraries residing in Anaconda's open source repositories can be imported into your code. You can import these libraries and use them locally in notebooks.

Q: Can I install custom libraries that will be used in my notebook code?

All Spark queries are run on your EMR cluster, so you need to install any runtime libraries that your Spark application will use on the cluster. You can use a bootstrap action or a custom AMI to install the required libraries when creating a cluster. For more information, see Creating Bootstrap Actions to Install Additional Software and Using a Custom AMI in the Amazon EMR Management Guide. The installation of libraries from the notebook editor is not supported.

Q: What service limits are associated with EMR notebooks?

Notebooks use the master node of your EMR cluster to run queries. The size of the master instance limits the number of notebooks that you can attach to a cluster. Once you have exceeded the limit, you must stop an active notebook before you can start another.

Q: How do I stop my notebook?

You can use the EMR console. Choose Notebooks, select the notebook from the list and choose stop. This ends the notebook session and it is no longer possible to open it in the notebook's editor. You can begin Select to restart the notebook.

Q: How do I erase my notebook?

You can use the EMR console. Choose Notebooks, select the notebook from the list and choose Clear. Deleting a notebook only removes it from the list in the console. The notebook file is still in the Amazon S3 location that you specified when you created the notebook.

Q: How can I query and run code from a notebook?

Spark queries that you run within a notebook run on the EMR cluster that you select when you created the notebook. The programming language kernel that you select in the notebook editor will interact with the Livy server installed on your EMR cluster to create a Spark session and all of your queries will run on the cluster. The results of the Spark application are reported back to the kernel with Livy and are visible in the notebook.

Before you run any code within the notebook editor, you need to make sure that the notebook has the status ready Has. This status means that the interface between the applications on the cluster and the notebook editor is ready to run queries and run code. To open the editor, select the notebook from the list of notebooks and then select to opento start the notebook editor in a new browser tab. In the Notebook Editor, select from the list Kernel the kernel of the programming language for your queries. After the kernel has started and is ready, you can run code in a Jupyter notebook as usual - for example, by clicking the button To run in a single cell, selecting Run all from the menu cell etc.

Q: Which EMR versions are supported by EMR notebooks?

EMR notebooks can be connected to EMR clusters with EMR version 5.18.0 or higher.

Q: Can I create a notebook or open the notebook editor without an EMR cluster?

No, in order to create or open your EMR Notebook from the console, you have to connect it to a running EMR cluster for the duration of your notebook session. You can quickly create a compatible EMR cluster when you create the notebook or before you restart it. You can download a previously created notebook file in ipynb format at any time from the S3 location you selected when you created the notebook.

Q: Can I let my notebook session run indefinitely?

No. If a notebook is idle for a long time, the notebook will stop. If the notebook editor is still open, the code you are running in the editor will fail. You can restart a notebook from the EMR console and then reopen the notebook editor.

Q: What if I close the notebook editor while it is executing code on the cluster?

Closing the Notebook Editor does not affect the code that runs on the cluster, but if you do not reopen the Notebook Editor for an extended period of time, the Notebook will stop and you will not get any output back to the Notebook. You can restart this notebook and continue your work by clicking the notebook link.

Q: Does the EMR cluster turn off when it is no longer connected to a notebook?

No. You must end the cluster to shut it down.

Q: What other Apache Hadoop applications can I use with EMR Notebooks?

EMR Notebooks currently supports Spark in the Hadoop ecosystem.

Q: Can I use a notebook with different EMR clusters?

Yes, you can change EMR clusters. Notebooks must be stopped before you can switch clusters. You can then select the cluster from the list Notebooks select, select View Details, choose Change clusterto select a running cluster or create a new one, and then select Change cluster and start notebook.

Q: Where are the notebooks stored?

Notebook files are automatically saved periodically in the ipynb file format in the Amazon S3 location that you specify when you create the notebook. The notebook file has the same name as your notebook in the EMR console. You can also save the notebook manually at any time by using the function Save and checkpoint use in the editor of the notebook. This will put an ipynb file with the same name in a subfolder called Checkpoint created. The most recent checkpoint file overwrites previous checkpoint files. The function Save as is not available in the notebook's editor.

Q: How do I use version control with my notebook? Can I use repositories like GitHub?

You can link Git-based repositories to your Amazon EMR notebooks to store your notebooks in a version-controlled environment.

Q: How do I use my saved notebooks?

To work with a saved notebook, use the EMR console to click the notebook in the list Notebooks.

Q: Can I integrate my company-wide Active Directory with EMR Notebooks?

EMR notebooks can only be accessed through the AWS Management Console for EMR. You can integrate users from your Active Directory (AD) into the AWS management to enable a single login. For more information, see Enable federation for AWS with Active Directory, ADFS, and SAML 2.0

Q: What IAM guidelines are required for using the notebooks?

Users must have an identity-based policy that gives them permission to create and use EMR notebooks. In addition to user policies, EMR Notebooks uses a service role to access other AWS resources and take action. For more information, see Security for EMR Notebooks in the Amazon EMR Release Guide.

Q: How does the notebook communicate with the master node of my EMR cluster and what is the security for it?

The EMR master node uses Livy to interact with the notebook editor. Each EMR notebook uses Amazon EC2 security groups to control network traffic between the Livy server on the master node and an EMR notebook. The default security group rules limit network traffic so that only Livy traffic can pass between notebook editors and master nodes on clusters used by notebooks. You can provide user-defined security groups with user-defined entry and exit rules for each notebook and each cluster in order to further restrict permitted communication between certain notebooks and clusters from the notebook console side, or provide permissions in the role of the notebook service to enable the notebook Service can create security groups on your behalf. For more information, see Specifying EC2 Security Groups in the Amazon EMR Release Guide.

Q: As an admin, how can I control access to the EMR cluster for notebook users?

You can limit the Amazon EMR clusters that a user can query with a notebook by using tags on the cluster. If a user has permission to create a notebook, they can join any Amazon EMR cluster unless the use of tags restricts access. For more information, see EMR Notebook Tags in the Amazon EMR Release Guide.

Q: Can multiple users open the same notebook at the same time?

No, only one user can open a notebook at a time. To view the current user, select the notebook from the Notebooks list, choose View Details, and you can see the username and IAM Amazon Resource Name (ARN) of the user who last changed the notebook as “Last Modified.” from “has changed. For more information on ARNs, see Amazon Resource Names in the AWS general overview.

Q: How do I limit the ability of users to edit or delete my notebook?

You can control access to your notebooks using notebook tags in conjunction with identity-based IAM policies. By default, a tag is automatically added to the notebook that is assigned to the user who created the notebook. For more information, see Using Notebook Tags to Control IAM User Access in the EMR administration manual.

Q: Can I connect my notebook to a Kerberos-enabled EMR cluster?

No, Kerberized EMR clusters are not currently supported.

Q: Can I terminate a cluster if a notebook is using it?

Yes. If the notebook editor is still open, the code you run in the editor will fail and the notebook will stop after a while.

Q: What is the cost of using EMR notebooks?

EMR notebooks are provided to you at no additional cost. The costs for the connected EMR clusters in your account will be billed to you as usual. For more information on pricing for your cluster, please visit https://aws.amazon.com/emr/pricing/