This page references "BlueBerry", an IBM-internal project I created in 2007. Designed to provide a search interface across multiple databases using commodity hardware, BlueBerry made unique use of over 100 surplus IBM ThinkPads. Consult the links below for more information.



  How it Works - Design - BlueBerry.
Home
SAE Contest
How it Works
  Software
  Hardware
  Design
  Examples
  SOAP & AJAX

Search Tips
About BlueBerry

   


This page contains a description of the various design choices made to create the BlueBerry application. For a detailed synopsis of the components of the design, please consult the Software and Hardware pages.

The nexus of cost, time, space, code and hardware availability.
BlueBerry's design flowed out of a single core idea - search the contents of every record for every query. That is, whether the query is a value that is unique to one record or a value that is in every record (like 'IBM'), search and score the results. All subsequent design decisions were influenced by this central concept.

Processing individual employee records is an application that lends itself well to parallelization. There is no message passing between nodes, as each record is scored independent of it's neighbors. Splitting up the records into portions to be individually processed is therefore a straightforward way to achieve a linear increase in performance. Take the 'IBM' query for example, if you have a 4-way machine processing 4 chunks of records, the highest scored records will be available in 1/2 the time of a machine with one processor. Adding 4 more processing chunks (as in an 8-way machine), further reduces the time to 1/4 the time a single processor would require. Adding more processors quickly reduces the amount of time required to process hundreds of thousands of records into the realm of useful performance for a web application.

Web 2.0 innovation focuses on applications in the long logistical tail of corporate data. IBM has another long logistical tail - surplus hardware. S/390's, R/S 6000's, bladecenters, desktops etc. are all available for use without capital expenditure. BlueBerry needs processors and ram - but not alot. There is also no budget whatsoever, so items that are usually not on the surplus hardware access list, such as high performance switches, uninterruptible power supplies, or operating software impose limits on what can be used.

The selection of hardware was as much about what was not available, as what was readily accessible. I did have access to approximately 1260 cubic feet of space (the size of a medium closet), 110 volts of electrical power, and 100MBit ethernet connection to IBM's internal network. What was available from the surplus hardware list that has a fast processor, more than 100MB of RAM, and it's own power supplies? ThinkPads.

While severly limited in performance in some applications, ThinkPad's provide an ideal platform for high density, low power consumption grid type processing. More than 100 thinkpads are fitted in that small space available, and less than 1/3 that many desktops could fit in a similar space. Desktops would require multiple (and unavailable!) KVM's for configuration and control, whereas ThinkPads have their own keyboard and display built in. Desktops typically require much more power, as well as create much more heat than a typical ThinkPad of similar performance. The list of benefits to using ThinkPad's are extensive, but the primary reasons are: availability, physical dimensions, hardware completeness, power consumption and heat output. ThinkPad's are plentiful on the surplus hardware list, and finding 100 T20 class units that were about to be scrapped to use in this project was not a challenge.

A ThinkPad's throughput is generally limited to the fastest network interface on the system - 100MBps ethernet. Fortunately, 4 port ethernet cards and 2 desktop units were available, so a groups of thinkpads are hooked up to a 100M switch, which is individually pluggedinto a port on the 4 port card. With two 4-port cards and 6 100M switches, a network of systems could be created with minimial collisions.

With commodity hardware, especially 7+ year old hardware, failures need to be expected. Configuring the units to run a minimal linux installation with custom failover code solved this problem. The BlueBerry system can tolerate multiple hardware failures with no loss of data redundancy or processing time.

The BlueBerry system's operating configuration is described in detail on the Software and Hardware pages. Please see these links for more detail (and cool pictures) of how 100+ ThinkPads and 1 Desktop deliver the BlueBerry application.




Learn about BlueBerry's entry in the Situational Applications Environment contest.

Search BlueBerry data (IBM internal use only):

Home | Search Tips | How it works | About BlueBerry


BlueBerry was created by Nathan Harrington.