Things you should know about Kerberos in Hadoop Clusters

Data Analysis Skills 08

BW L.
Data Engineering Insight

--

This post is part of the Hadoop Series.

Many Hadoop clusters use Kerberos authentication to secure the environment. This kind of clusters are called Kerberized clusters. In a Kerberized cluster, all Hadoop application (Hive, Spark, MapReduce) require a valid Kerberos ticket when a user submit a job to the cluster.

In the commands below, we assume that the Kerberos realm is “REALM.COM”. It’s normally the domain name of your company.

What is Kerberos Ticket?

Think about a Kerberos ticket as a temp cache of your credential in the systems. When using a Hadoop server, a Kerberos ticket is normally stored on /tmp/krb5cc_xxxxx on the Hadoop server you are using.

How Kerberos tickets are created?

When a user use SSH to connect to a Linux Server using username password authentication, there will be a Kerberos ticket created by the server.

A user can also manually invoke a “kinit” command:

kinit ${USER^}@REALM.COM

The command will prompt the user to type in password.

Another way is to use the user’s keytab that was created previously

kinit ${USER^}@REALM.COM -kt ~/${USER}.keytab

When do I need a Kerberos ticket?

To submit any job (Hive, Spark, MapReduce) to a kerberized Hadoop cluster, you need a valid Kerberos ticket cache that has not expired.

Below are some sign that your might need a new Kerberos ticket.

  • Your spark session takes a very long time to come up.
  • You get a GSS error when running hadoop fs command to access HDFS.

Follow the kinit command above to get a ticket and problem will go away.

How long does a Kerberos ticket valid for?

A Kerberos ticket expires in 10 hours.

Kerberos keytab

Most likely your company/institute already use Microsoft Active Directory for user identity management. You can create a Kerberos keytab to avoid typing your password every time. This is very useful when there are jobs scheduled to be run at a certain time and the user’s kerberos ticket might have expired on the server. You’ll need to know some information like your principal name from active directory. It’s normally all upper case. Consult your cluster admin for the commands to create a kerberos ticket.

Renew kerberos ticket automatically using keytab

The following commands will add a new scheduled task to your crontab that will renew your Kerberos ticket every 6 hours.

crontab -l > ~/$USER.crontab
echo "00 */6 * * * kinit ${USER^}@REALM.COM -kt ~/$USER.keytab 2>&1 >> $HOME/kinit.log" >> ~/$USER.crontab
crontab ~/$USER.crontab

Remember to rerun the keytab creation script from your cluster admin every time your password changes.

What’s next

Continue to following posts:

Resources

--

--