23 Haziran 2014 Pazartesi

Hadoop Trash - Recover Your Data

Hadoop gives the capability to recover your deleted files. When files are deleted, they are moved to .Trash folder under user's home directory (for example "/home/myuser/.Trash" ) and remain for a minimum period of time before being deleted permanently. You can recover your files by copying under .Trash folder to your desired path.
However, Hadoop trash only stores files that are deleted from filesystem shell. Files that are deleted programmatically are deletely immediately. Though you can use trash programmatically by using its org.apache.hadoop.fs.Trash class.

Hadoop 1.0.3 is used on Ubuntu 12.04 machine.

1. Enable Trash

By default trash feature is disabled.

To enable it, write following property in core-site.xml on NameNode machine:

  fs.trash.interval
  60
  Number of minutes after which the checkpoint
  gets deleted.
  If zero, the trash feature is disabled.
  
As description states, deleted files will be moved to .Trash folder and remain there for 60 minutes before being deleted permanently. A thread checks trash and removes the files that remained more than this interval.

In Hadoop 1.0.3, time interval for this thread to run is not specified in core-default.xml and code, therefore states that this property is not available in Hadoop 1.0.3. However in newer versions, you can configure it:


  fs.trash.checkpoint.interval
  15
  Number of minutes between trash checkpoints.
  Should be smaller or equal to fs.trash.interval.
  Every time the checkpointer runs it creates a new checkpoint
  out of current and removes checkpoints created more than
  fs.trash.interval minutes ago.
  


2. fs -rm/-rmr Commands

If you use "hadoop fs -rm" or "hadoop fs -rmr" commands, these files will be moved to trash and you can restore them under .Trash directory.

> hadoop fs -rmr /data/logs/data.log

Moved to trash: hdfs://localhost:10000/data/logs/data.log
> hadoop fs -mv /home/myuser/.Trash/Current/data/logs/data.log /data/recovered_data.log

3. skipTrash

You can delete your files immediately by using skipTrash option

> hadoop fs -rmr -skipTrash /data/logs/data.log

Deleted hdfs://localhost:10000/data/logs/data.log

4. fs -expunge

You can empty your .Trash folder by expunge method. This will delete files in .Trash folder and creates a new checkpoint

> hadoop fs -expunge

14/06/20 20:25:20 INFO fs.Trash: Created trash checkpoint: /user/myuser/.Trash/1406202025


//TODO
In later versions, client side configuration of trash enables trash feature for that user running the client. It is TODO for me to try this out in Hadoop 1.0.3.




3 yorum:

  1. Seems more research has been done to create this blog as the information is very good on this blog. To this I also attending hadoop online training, which is adding to my knowledge more.

    YanıtlaSil
  2. I just loved your article on the beginners guide to starting a blog.If somebody take this blog article seriously in their life, he/she can earn his living by doing blogging.thank you for thizs article.
    Salesforce Training in Chennai

    Salesforce Online Training in Chennai

    Salesforce Training in Bangalore

    Salesforce Training in Hyderabad

    Salesforce training in ameerpet

    Salesforce Training in Pune

    Salesforce Online Training

    Salesforce Training

    YanıtlaSil
  3. Here is the best music to calm and relax your mind

    1. best relaxing music
    2. best Depp sleep music
    3. best meditation music
    4. best calm music
    5. best deep focus music

    YanıtlaSil