Skip to main content
Big Data Consulting Services

Hadoop – Handling Big Data

Apache Hadoop is all about handling Big Data especially unstructured data. It helps in streamlining data for any distributed processing system across clusters of computers.

Activities on Big Data:

  • StoreBig Data needs to be collected in a repository and it is not necessary to store it in a single physical database.
  • Process – The process becomes more tedious in terms of cleansing, calculating, transforming and running algorithms.
  • Access – There will be no business sense if the data cannot be searched and retrieved and data must be virtually showcased along  business lines.

Hadoop Distributed File System (HDFS):

Hadoop stores large files  in the range of gigabytes to terabytes across various machines. HDFS provides data awareness between the task tracker and job tracker. The job tracker schedules jobs to task tracker in the data location.

The two main aspects of Hadoop are ‘Data processing framework’ and ‘HDFS’.  HDFS is a rack-based file system to handle data effectively. HDFS uses single-writer and multiple-reader models. It supports operations like read, write and delete files, to create and delete directories.

HDFS Architecture

Elements of HDFS Architecture:

  • Namenode

    Namenode is a commodity hardware that contains GNU/Linux operating system and Namenode software. It is a software that can run on commodity hardware. The system having Namenode acts as the master server. It manages the file system namespace and regulates clients access to files. It is also responsible to execute file system operations such as renaming, closing, opening files and directories.

  • Datanode

    Datanode is a commodity hardware having GNU/Linux operating system and Datanode software. For every node in a cluster, there will be a Datanode. It performs read-write operations on the file system, as per the client’s request. It also handles operations such as block creation, deletion and replecation based on the instructions of the namenode.

  • Block

    User data is stored in the form of files of HDFS. These files in the file system are divided into one or more segments and stored in individual data nodes. These file segments are called blocks. The minimum amount of data that HDFS can read/write is a block. The block size is 64 MB by default and it can be increased as per requirements in HDFS configuration.

Data Processing Framework & MapReduce:

Data processing Framework is a tool used to process data.  It is a Java-based system called MapReduce. MapReduce algorithm contains two tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where each element is broken into <Key, Value> pairs.

MapReduce program executes in three stages –Map stage, Shuffle stage, Reduce stage.

  • Map Stage – The Map’s job is to process the input data. Generally, input data will be in the form of file or directory and it will be stored in Hadoop File System (HDFS). Input file is passed to mapper line-by-line. Then the mapper processes data and creates several chunks of data.
  • Reduce stage – This stage is a combination of Shuffle stage and Reduce stage. Reducer’s job is to process the data that comes from the mapper. It produces a new set of output which will then be stored in HDFC.

Hadoop Workflow

Benefits of Hadoop:

  • Hadoop is open source and because it is Java-based it is compatible with all  platforms.
  • It provides a cost-effective storage solution for businesses. It helps to easily access data sources and results in much faster data processing.
  • It  is a highly scalable storage platform, as it can store and distribute large data sets across hundreds of servers that operate in parallel.
  • A key advantage of using Hadoop is its fault tolerance. When data is sent to an individual node, that data is also replicated in other nodes in the cluster, which means that in the event of failure, there is another copy available for use.
  • It is widely used across industries such as finance, media, entertainment, government, healthcare, retail and so forth.
  • It provides great data reliability. It stores and delivers all data without compromising on any aspect.
  • It is very secured and authenticated. Its Hbase security, HDFS and MapReduce allow only approved users to operate on secured data and hence securing entire system from illegal access.
Deepika M S

Deepika M S

Deepika works as Software Test Engineer with Trigent Software. She has over five years of IT industry experience in testing web-based & mobile applications using both manual and automation testing. Deepika is also experienced in identifying test scenarios and designing effective test cases and is well versed with SDLC/Agile and Scrum methodologies. Deepika has been involved in developing automated test scripts for new features, analyzing results and reports on test results.

15 thoughts on “Hadoop – Handling Big Data

  1. I’m impressed, I must say. Seldom do I come across a blog that’s both educative and engaging, and let me tell you, you’ve hit the nail on the head. The issue is something not enough people are speaking intelligently about. Now i’m very happy I came across this in my search for something relating to this.

  2. Hadoop can hаndle dаta in a very fluid wаy. Hadoop is more than just a fаster, cheaper datаbase and anаlytics tool. Unlike dаtabases, Hаdoop doesn’t insist that you structure your dаta. Data may be unstructured аnd schemаless. Users can dump their dаta into the frаmework without needing to reformat it. By contrast, relational dаtabases require that dаta be structured and schemas be defined before storing the dаta.
    More info throught this link

  3. I think this is among the most important info for me. And i am glad reading your article. But wanna remark on few general things, The website style is perfect, the articles is really nice : D. Good job, cheers|

  4. Hello there, just became alert to your blog through Google, and found that it is really informative. I am gonna watch out for brussels. I’ll appreciate if you continue this in future. Lots of people will be benefited from your writing. Cheers!|

  5. If you are going for most excellent contents like myself, just visit this web page all the time because it offers feature contents, thanks|

  6. Wow, that’s what I was seeking for, what a information! present here at this web site, thanks admin of this website.|

  7. I was very happy to seek out this web-site.I wished to thanks on your time for this glorious read!! I undoubtedly having fun with each little little bit of it and I have you bookmarked to take a look at new stuff you blog post.

  8. I blog often and I seriously thank you for your information. This article has truly peaked my interest. I am going to bookmark your website and keep checking for new details about once per week. I opted in for your RSS feed too.

  9. This is such a great website its nice to see someone who has great quality. Come check out our sporting good, laptops, green energy, fishing gear, cell phones, note pads, tablet, tenting, camping,baseball, boats and much all at The Trade Hut online at https://thetradehut.ca/

  10. This is such a great website its nice to see someone who has great quality. Come check out our sporting goods store. We have sporting good, laptops, green energy, fishing gear, cell phones, note pads, tablet, tenting, camping,baceball, boats and much more all at The Trade Hut online at https://thetradehut.ca/

Comments are closed.