Blogging is a conversation, not a code.


In this post , we will show the configuration needed by hadoop to start the namenode sucessfully. Usually we format the namenode before starting hadoop, but a common problem is that the namenode formatting files are written (by default configuration) into tmp directory that is deleted by the operating system every time it starts. So we show the steps to change this defaul behaviour 

Steps 

  1. in hdfs-site.xml , put the following property 
    <property>
        <name>dfs.name.dir</name>
        <value>/<hadoop installation path>/hadoop-1.2.1/name/data</value>
    </property>
    where this is the path to write namenode metadata , run
  2. ~$ stop-all.sh
  3. Change directory permission to give user full access (First digit is 7, u)
    ~$sudo chmod 750 /<hadoop installation path>/hadoop-1.2.1/name/data  
  4. format the name node
    ~$ hadoop namenode -format
    The output should be 
    15/03/25 12:27:06 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = baddar-pc/127.0.1.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.2.1
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
    STARTUP_MSG:   java = 1.8.0_40
    ************************************************************/
    Re-format filesystem in /home/baddar/hadoop-1.2.1/name/data ? (Y or N) Y
    15/03/25 12:27:11 INFO util.GSet: Computing capacity for map BlocksMap
    15/03/25 12:27:11 INFO util.GSet: VM type       = 64-bit
    15/03/25 12:27:11 INFO util.GSet: 2.0% max memory = 932184064
    15/03/25 12:27:11 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    15/03/25 12:27:11 INFO util.GSet: recommended=2097152, actual=2097152
    15/03/25 12:27:11 INFO namenode.FSNamesystem: fsOwner=baddar
    15/03/25 12:27:11 INFO namenode.FSNamesystem: supergroup=supergroup
    15/03/25 12:27:11 INFO namenode.FSNamesystem: isPermissionEnabled=true
    15/03/25 12:27:11 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    15/03/25 12:27:11 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    15/03/25 12:27:11 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
    15/03/25 12:27:11 INFO namenode.NameNode: Caching file names occuring more than 10 times 
    15/03/25 12:27:11 INFO common.Storage: Image file /home/baddar/hadoop-1.2.1/name/data/current/fsimage of size 112 bytes saved in 0 seconds.
    15/03/25 12:27:11 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/baddar/hadoop-1.2.1/name/data/current/edits
    15/03/25 12:27:11 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/baddar/hadoop-1.2.1/name/data/current/edits
    15/03/25 12:27:11 INFO common.Storage: Storage directory /home/baddar/hadoop-1.2.1/name/data has been successfully formatted.
    15/03/25 12:27:11 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at baddar-pc/127.0.1.1
    ************************************************************/

    Note that name metadata are written to the specified path
  5. make sure namenode metadata is written to the path (list all recursive)
    $ ls -aR /home/baddar/hadoop-1.2.1/name/data/
    /home/baddar/hadoop-1.2.1/name/data/:
    .  ..  current  image  in_use.lock  previous.checkpoint

    /home/baddar/hadoop-1.2.1/name/data/current:
    .  ..  edits  fsimage  fstime  VERSION

    /home/baddar/hadoop-1.2.1/name/data/image:
    .  ..  fsimage

    /home/baddar/hadoop-1.2.1/name/data/previous.checkpoint:
    .  ..  edits  fsimage  fstime  VERSION
  6. start all hadoop daemons 
    ~$ start-all.sh 
    the output should be 
    starting namenode, logging to /home/baddar/hadoop-1.2.1/libexec/../logs/hadoop-baddar-namenode-baddar-pc.out
    localhost: starting datanode, logging to /home/baddar/hadoop-1.2.1/libexec/../logs/hadoop-baddar-datanode-baddar-pc.out
    localhost: starting secondarynamenode, logging to /home/baddar/hadoop-1.2.1/libexec/../logs/hadoop-baddar-secondarynamenode-baddar-pc.out
    starting jobtracker, logging to /home/baddar/hadoop-1.2.1/libexec/../logs/hadoop-baddar-jobtracker-baddar-pc.out
    localhost: starting tasktracker, logging to /home/baddar/hadoop-1.2.1/libexec/../logs/hadoop-baddar-tasktracker-baddar-pc.out
  7. make sure that all daemons are started 
    ~$ jps

    23678 TaskTracker
    23060 NameNode
    23406 SecondaryNameNode
    23978 Jps
    23504 JobTracker

Testing application over android environment is costly because more devices  with diffrent specs have to be tested with,
so it is costly and time consuming so indeed the best solution is to test over default Android emulator, but the first question in our minds is what we should do about emulator poor performance?

Now Genymotion is here, it is a relativly fast android emulator, any app can be tested on all major devices available on the market. It is so easy to install and be fully integrated with Android Studio and Eclipse following the instructions here.

Genymotion features:
  • Easily download and run pre-configured virtual images covering most of android versions and diffrent screen sizes.
  • Networking: Ethernet (emulates WiFi connection).
  • GPS (with configurable coordinates).
  • Battery (with configurable battery levels).
  • Display: OpenGL hardware acceleration, multiscreen, full screen display.
  • Genymotion shell which allows you to interact with your VM using a command line ADB support Eclipse and Android Studio plugins Supports Linux, Windows and Mac. "Drag&Drop" APK installs "Drag&Drop" Zip support for system updates/patches
What about google play services?
  • Download the following ARM Translation Installer.
  • Download GApps according to your version 4.3, 4.2, 4.1.
  • Next Open your Genymotion VM and and drag&drop the Genymotion-ARM-Translation.zip onto it, a message says "File transfer in progress" should appear now.
  • Reboot Genymotion VM.
  • Drag&Drop the GApps-signed.zip onto it.
  • Reboot then open google play store.
  • Once in the Store go to the 'My Apps' menu and let everything update becuase this will fixe a lot of issues.
  • Search for 'Netflix' and 'Google Drive', if both apps show up in the results and you're able to Download/Install them, then congrats you now have ARM support and Google Play fully setup!

BADR, in partnership with QCRI, has developed and published TweetMogaz, a system that allows Arab users to get the maximum information from the Arabic content on Twitter, on the spot.

Basically, TweetMogaz consumes streams of Arabic tweets from Twitter, classifies them into relevant topics, then present them to the users in a much more intelligent way.

By intelligence we mean that TweetMogaz can understand tweets topics’ context, group tweets based on that, and present these groups (topics) to the user for a better user experience.
TweetMogaz also is the only Arabic events detector. It’s constantly searching the Arabic content for hot, trending tweets, gathers tweets that relate and occur in a certain timeframe, then present the user a solid, homogenous story.

To achieve that feat, a thorough research has been done (and is continuously in improvment) to get the best out of the Arabic content on Twitter. The research areas extend to: Information Retrieval, Natural Language Processing, Machine Learning, Distributed Systems and Big Data.

The first publication out of TweetMogaz is a demo paper: TweetMogaz v2: Identifying News Stories in Social Media, by Eslam Elsawy (BADR), Moamen Mokhtar (BADR) and Walid Magdy (QCRI), it's published in CIKM 2014.

Why is Git always asking for credentials?

October 13th 2014, 5:46 amCategory: None 0 comments

There are two ways to clone any project from a remote reporesitory, either using SSH protocol or HTTPS protocol. If Git prompts for credentials every time you interact with the repository, it means you are using HTTPS clone URL.

Using HTTPS URLs to clone repositories is easier than SSH and needs almost zero configurations, but, Git will keep asking for credentials, the solution is, to configure Git to cache credentials for you:

git config --global credential.helper cache

The default timeout for git is 15 minutes, you may change it using the following(1 month):

git config --global credential.helper 'cache --timeout=2592000'

Hortonworks
 

2014, Aug 4th — BADR, today announced that it has joined the Hortonworks Systems Integrator Partner program and will be delivering big data solutions powered by Hortonworks Data Platform (HDP). Hortonworks is the leading contributor to and provider of Apache™ Hadoop®. BADR’s customers can now benefit from easier integration with Hadoop to our range of data engineering and visualizations services.
By joining the Hortonworks Systems Integrator Partner program, BADR will strengthen its ability to implement enterprise-level big data solutions including Hortonworks Data Platform, the industry’s only 100-percent open source Hadoop distribution, explicitly architected, built, and tested for enterprise-grade deployments. BADR’s implementation services enable customers to leverage the power of their data and reveal new aspects of hidden information. With the integration of Hadoop, customers can now scale to new extents in volume, variety and velocity.

Hortonworks Data Platform was built by the core architects, builders and operators of Apache Hadoop and includes all of the necessary components to manage a cluster at scale and uncover business insights from existing and new big data sources. With a YARN-based architecture, HDP enables multiple workloads, applications and processing engines across single clusters with optimal efficiency. A reliable, secure and multi-use enterprise data platform, HDP is an important component of the modern data architecture, helping organizations mine, process and analyze large batches of unstructured data sets to make more informed business decisions.

”We welcome BADR to the Hortonworks Systems Integrator Partner Program and look forward to working with them to provide data-driven applications powered by HDP for their customers,” said John Kreisa, vice president of strategic marketing at Hortonworks.
“BADR’s long-standing experience delivering high-value enterprise solutions creates a natural expansion to add Hadoop integration services as companies of all sizes are adopting Hadoop to support their big data projects.”

About BADR

BADR is an established IT company which has now set its sights on changing the world of big data. Established in 2014, this new branch of BADR is dedicated to providing the most effective and innovative big data tools possible to companies large and small in the Middle East. BADR has the knowledge and experience necessary to make a difference in the big data world, and we use this experience to help our customers on every step of the way.