Coding Hassle : Nisan 2014

9 Nisan 2014 Çarşamba

Create User and Groups in Linux

How can I create users on linux? How can I create groups and assign users to a group? Although there are more to creating a user or group, it comes in handy to learn the basics.

These commands have been run on Centos 6.5. They should work on RedHat implementations. On other linux versions, behavior can differ.

1. Create a User

useradd command creates a user. New username is written to system files as needed and home directory for user is created. In Centos and other RedHat derivatives, a default group with same name as the user is also created and user is assigned to this group.

Creates user "javauser". This also creates a group name "javauser" and assign the user to it. You can see groups of new user by typing "groups javauser".

> useradd javauser

Creates a user and assigns it to a existing group.

> useradd -g devgroup javauser

Creates a user and assigns it to a secondary group. Also a new group with same name as user is created and assigned as primary group.

> useradd -G testgroup javauser

You can specify multiple secondary groups.

> useradd -G testgroup,sysgroup javauser

Creates a user but prevents default behaviour of creating a group with same name. Creating a new group is specific to RedHat derivatives.

> useradd -n javauser

2. Set Password for User

passwd command sets or updates a user's password.

Sets or updates password for a user.

> passwd javauser

When you enter the password and confirm it, password is set.

3. Change Groups of User

To change groups of an existing user, you can use usermod command. A user one primary group and multiple secondary groups.

Updates primary group of a user to specifed group. Former primary group is removed from the list of user's groups.

> usermod -g gr1 javauser

gr1 becomes the primary group of javauser.

Sets specified group(s) as secondary groups of user. Former secondary groups are unassigned if they are not specified in new group list. Groups are passed as a comma seperated list with no spaces between.

> usermod -G gr2,gr3 javauser

javauser now has gr2 and gr3 in its secondary groups.

Appends specified group(s) to the user's groups.

> usermod -a -G gr4 javauser

gr4 will be added javauser's groups.

4. Delete a User

userdel command deletes the specified user.
However, processes started by user, files owned by that user and jobs created by user must be handled seperately and in a planned order.

Deletes javauser

> userdel javauser

Deletes user and its home directory.

> userdel -r javauser

5. Create a Group

groupadd command creates a group.

Creates a group name mygroup

> groupadd mygroup

6. List Groups of a User

groups command list primary and secondary groups of a user

Prints groups of javauser

> groups javauser

Prints groups of the effective user (current working user)

> groups

7. Further Reading

You can more in manuals of these commands:
http://linux.die.net/man/8/useradd
http://linux.die.net/man/8/groupadd
http://linux.die.net/man/8/userdel
http://linux.die.net/man/8/usermod

6 Nisan 2014 Pazar

HDFS Security: Authentication and Authorization

Hadoop Distributed File System (HDFS) security features can be confusing at first, but it in fact follows simple rules. We can examine the topic in two parts: Authentication and Authorization.

I am testing with Hadoop 1.0.3 on Centos 5.8 server. In my client, I am using Centos 6.5.

1. Authentication

Authentication is determining whether someone is really who he claims to be.

Hadoop supports two different authentication mechanisms specified by the hadoop.security.authentication property which is defined in core-default.xml and its site specific version core-site.xml.

simple (no authentication)
kerberos

By default, simple is selected.

1.1. simple (no authentication)

If you have installed Hadoop with its defaults, there is no authentication made. This means that any Hadoop client can claim to be any user on HDFS.

The identity of client process is determined by host operating system. In Unix-like systems, this is equivalent to "whoami". This is the username that user is working under. Hadoop does not provide user creation or management.

Let's say, in your client machine you have typed:

> useradd hadoopuser
> su hadoopuser
> hadoop dfs -ls /

hadoopuser will be sent as user identity to NameNode. And after that, permissions checks (which is a part of authorization process) will be done.

Implications

This mode of operation poses great risk.

Think about this scenario:
1. In your Hadoop cluster, you have started NameNode daemon as user "hdfsuser". In Hadoop, the user running NameNode is super-user and can do anything on HDFS. Permission checks never fail for super-user.
2. There is a client machine Hadoop binaries installed on and this machine has network access to Hadoop cluster.
3. Owner of the client machine knows NameNode address and port. He also knows that super-user is "hdfsuser", but does not know its password.
4. To access HDFS and do much more, all he needs to do is to create a new user "hdfsuser" in client machine. Then he can run Hadoop shell commands with this user.

> useradd hdfsuser
> su hdfsuser
> hadoop dfs -rmr /

When he runs these commands, Hadoop believes the client that "he is hdfsuser" and executes requested command. As a result, all data on HDFS is deleted.

1.2. kerberos

I have no hands-on experience with this mode.

2. Authorization

Authorization is function of specifying access rights to resources.
First we can examine HDFS permissions model and then see how permissions checks are done.

2.1. HDFS Permissions Model

HDFS implements a permissions model for files and directories that shares much of the POSIX model.

We can continue by an example. If you run:

> hadoop dfs -ls /

You will see something like:
Found 2 items
drwxr-xr-x - someuser somegroup 0 2014-03-05 15:01 /tmp
drwxr-xr-x - someuser somegroup 0 2014-03-03 11:01 /data

1. Each file and directory is associated with an owner and a group. In the example, someuser is the owner of /tmp and /data directories. somegroup is the group of these directories.

2. The files or directories have separate permissions for the user that is the owner, for
other users that are members of the group, and for all other users.

In the example, drwxr-xr-x determines this behaviour. First letter d specifies whether it
is a directory or not. rwx show owner permissions, r-x shows permissions for group users,
and last r-x show permissions for other users.

3. For files, the r permission is required to read the file, and the w permission is required to write or append to the file. For directories, the r permission is required to list the contents of the directory, the w permission is required to create or delete files or directories, and the x permission is required to access a child of the directory.

4. When a new file/directory is created, owner is the user of client process and group is the group of its parent directory.

One note:
In HDFS owner and group values are just Strings, you can assign non-existent username and groups to a file/directory.

2.2. Permission Checking

Authorization takes places after user is authenticated. At this point, NameNode knows the username, -let's say hdfsuser-, then it tries to get groups of hdfsuser.

This group mapping is configured by the hadoop.security.group.mapping property in core-default.xml and its site specific version core-site.xml. Default implementation achieve this by simply running groups command for the user. Then it maps the username with returned groups.

Group mapping is done on NameNode machine NOT on the client machine. This is an important realization to make. hdfsuser can have different groups on client and NameNode machines; but what goes into mapping is the groups on NameNode.

When NameNode has username and its groups list, permission checks can be done.

If the user name matches the owner of file/directory, then the owner permissions are tested;

Else if the group of file/directory matches any of member of the groups list, then the group permissions are tested;

Otherwise the other permissions of foo are tested.

If a permissions check fails, the client operation fails.

2.3. Super-User

The user running NameNode daemon is the super-user and permission checks for super-user never fails.

2.4. Super Group

If you set dfs.permissions.supergroup in hdfs-site.xml, you can make members of given group also super-users. By default, it is set to supergroup in hdfs-default.xml.

<property>
    <name>dfs.permissions.supergroup</name>
    <value>admingroup</value>
</property>

In later Hadoop versions, property is named as dfs.permissions.superusergroup.

2.5. Disable HDFS Permissions

If you set dfs.permission to false, permission checking is disabled. However this does not change the mode, owner or group of files/directories.

<property>
    <name>dfs.permissions</name>
    <value>true</value>
</property>

In later Hadoop versions, property is named as dfs.permissions.enabled

1 Nisan 2014 Salı

Shell Types and Shell Config Files: /etc/profile, /etc/bashrc, ~/.bash_profile, ~/.bashrc

Setting JAVA_HOME must be easy, right? However, this can sometimes get tricky. To better understand your version of "JAVA_HOME is not set" errors, it can be instructive to look at shell types, config files and their working principles.

I am using Centos 6.5. Below explanations should also work for Red Hat implementations. In other Linux implementations, it could differ.

1. Environment Variables

1.1. Process Locality

If you open two shells and define an environment variable in first shell, second shell will not be able to see new environment variable.

1. Open two terminals

2. Create an environment variable in first:

> export SHELLNUMBER=1

3. In second shell, type:

> echo $SHELLNUMBER

This will print nothing.

1.2. Inheritance

If you define a variable in a shell and open a sub-shell, new variable will be available to sub-shell.

1. Open a terminal

2. Create an environment variable:

> export PARENT=1

3. Create a sub-shell by typing:

> bash

4. List variables:

> env | grep PARENT

This will print

PARENT=1

We can continue as follows to also test process locality:

5. Create another variable in sub-shell and list variables:

> export PARENT2=2
> env | grep PARENT

This will print:

PARENT=1

PARENT2=2

6. Now exit sub-shell and list variables:

> exit
> env | grep PARENT

This will print only

PARENT=1

As you can see, variables defined in sub-shell are not available to parent shell.

1.3. Case-Sensitivity

Environment variables are case sensitive meaning that JAVA_HOME and Java_Home are different variables. It is a common practice to use capital letters and underscore signs.

2. Shell Types

We can group shells in two categories: interactive/non-interactive and login/non-login shells.

2.1. Interactive/non-interactive shells

An interactive shell is the one whose input and output are connected to terminals or the one started with -i flag.
Non-interactive shell is the one where user input is not needed such as shell scripts.

You can learn if the shell you are working (probably interactive) by typing:

> echo $-

If output contains i, it is interactive.

To see a non interactive shell, create a shell script name test.sh with contents:

echo $-

Then run with:

> bash test.sh

The output will not contain i.

2.2. Login and non-login shells

A login shell is the shell when you login or the one started with -l flag.
In login shells, it usually prompts for user and password. This is the case when you ssh to remotely login to a linux machine.
Other Examples:

> bash -l

> su -l

In non-login shells, it does not prompt for user and password. This is the case when you are already logged in and type /bin/bash. An interactive non-login shell is also started when you open a terminal in a graphical environment.

3. Config Files

Files under /etc is usually provide global settings and files under user home directory provides user specific settings. User specific files can override global settings.

An interactive login shell reads /etc/profile and ~/.bash_profile. In Centos, ~/.bash_profile also reads ~/.bashrc and then /etc/bashrc files.

An interactive non-login shell gets its parent environment and reads ~/.bashrc and /etc/bashrc for additional configuration.

A non-interactive non-login shell expands $BASH_ENV variable if not null and reads specified file. Otherwise, it only gets its parent environment.

You can test this behaviour as following:
1. Open a terminal. We will use this as our main shell, do not close it.
2. Create a shell script named test.sh with contents:

env | grep SCRIPTVAR

3. Define a new variable in /etc/bashrc file:

export SCRIPTVAR=1

This variable is not be available to our main shell since /etc/bashrc must be read again.

4. Will it be available to shell script? Run test.sh:

> bash test.sh

This will not print our new variable

5. Set BASH_ENV variable to /etc/bashrc (just for testing, not for daily usage):

> export BASH_ENV=/etc/bashrc

6. Run script again and it prints SCRIPTVAR=1.