December 31, 2012
The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.
Here’s an excerpt:
600 people reached the top of Mt. Everest in 2012. This blog got about 5,300 views in 2012. If every person who reached the top of Mt. Everest viewed this blog, it would have taken 9 years to get that many views.
August 24, 2012
March 26, 2012
Do a Google on Big Data and you are more likely to find people talking about two things:
- How Open Source solutions like Hadoop have pioneered this space
- How some companies have used these solutions to build large scale analytics solutions and business intelligence modules.
Read more and one will find mention of Map Reduce and how many of the NoSQL data stores support this useful “Data Locality” pattern – taking compute to where the data is.
Hadoop users and the creators themselves acknowledge that the technology is good for “streaming reads” and supports high throughput at the cost of latency. This constraint and the fact that Map Reduce tasks are very I/O bound, make it seemingly unsuitable for use cases that involve users waiting for a response such as in OLTP applications.
While all of the above is relevant and mostly true, is it also leading to a certain stereo-typing – that of equating Big Data to Big Analytics?
It might be useful to describe Big Data first. Gartner categorizes data build up in an enterprise as under : Volume, Variety and Velocity. Rapid growth in any of these categories or combinations thereof, results in Big Data. It might be worthwhile to note here that there is no classification under transaction processing or analytics, thereby implying that Big Data is not just Big Analytics.
Big Data solutions need not be limited to Big Analytics and may extend to low latency data access workloads as well. A few random thoughts on patterns and solutions:
- Data Sharding – useful to scale low latency data stores like RDBMS to store Big Data. Sharding may be built into application code, use an intermediary between the application and data store or inherently supported by the data store using auto-sharding of data.
- Data Stores by purpose – Big Data invariably means distribution and may result in data duplication; within a single store or multiple. For e.g. data extracts from a DFS like Hadoop may also be stored in a high-speed NoSQL or sharded RDBMS and accessed via secondary indices. This could lead to scenarios outlined by the CAP theorem (http://en.wikipedia.org/wiki/CAP_theorem).
- Data Stores that effectively leverage CPU, RAM and disk space – Moore’s Law has been proven right the last few years and data stores like the Google Big Table (or HBase) successfully leverage the trend of abundant commodity compute, memory and storage.
- Optimized Compute Patterns – Efforts like Peregrine(http://peregrine_mapreduce.bitbucket.org/) that support pipe-lined Map Reduce jobs.
- Data aware Grid topologies – A compute grid where worker participation in a compute task is influenced by data available locally to the worker, usually in-memory. Note that this is different from the data locality pattern implemented in most Map Reduce frameworks.
- And more…..
It may suffice to say that Big Analytics has been the most visible and commonly deployed use case on Big Data. New age companies, especially the internet based ones, have been using Big Data technologies to deliver content sharing, email, instant messaging and social platform services in near real time. Enterprises are slowly but surely warming up to this trend.
June 17, 2010
The big software vendors use the term – Service Registry synonymously with SOA Governance. In the process they inadvertently confuse a reader into thinking that setting up a Service Registry can ensure SOA Governance. I wrote an article on this subject that got published here : http://www.cioupdate.com/insights/article.php/3886106/SOA-Governance-Requires-More-than-a-Service-Registry.htm
March 23, 2010
I wrote this article for CIOUpdate.com (http://www.cioupdate.com) a while back on SOA and its relation to the Cloud. The article tries to introduce the two concepts and compares them using different view perspectives.
You can find the original article here : http://www.cioupdate.com/reports/article.php/3853076/How-SOA-and-the-Cloud-Relate.htm
March 4, 2009
Mention SOA or Services and most of your audience would immediately relate it to web-services – yes, the often un-intented misuse of XML over Http that gives the technology and anything related to it a bad name in the world of high-performance J2EE applications.
Two of the biggest culprits in loss of Performance are I/O and Transformation overheads. Web services has both these drawbacks – increased data transfer i.e higher I/O associated with markup overhead of well-formed XML and the CPU utilization overhead when converting XML to Java and back aka. Marshalling.
Web-services and its implementation of XML over Http is good when it is genuinely needed. For e.g. exposing services for consumption with partner organizations where consuming technologies are not known or for integration between disparate systems. However often this need for integration unfortunately leads people to stereotype services as web-services in a SOA.
The question then is : can we reap the benefits of SOA and not suffer the drawbacks of the overheads inherent in web-services? I believe, we can.
Quite a while back, I read this excellent IBM Redbook : Implementing SOA using ESB where the author recommends deploying a B2B Gateway external to the ESB. I must admit it didnot make much sense to me then. I have come to appreciate it much better these days. A B2B Gateway enables consumption of services by “third-party” . This “third-party” may be a client from a different technology platform or from an altogether different organization.
A separate B2B gateway introduces the possibility of:
- Making the web-service channel independent of the service implementation and therefore a matter of choice to use (and therefore suffer) the XML over Http interface
- Introducing the much required security standards(and implementations) for securing services and data managed by the services
- Using third party implementations that specialize in implementing WS-* policies
- Using hardware to augment the processing capability provided by software frameworks – e.g. XML appliances
The SOA runtime therefore must enable services to be written independently of XML and the WS-* specifications/constraints. The Web-service interface is then an optional channel , via a B2B Gateway, to invoke the services.
We, at MindTree, have taken this design further in our implementation of Momentum – an SOA based delivery Platform. Interfaces like JMS and web-services are optional channels provided by the Framework to invoke any deployed service. A schematic that explains this approach is shown below:
The web-service interface is therefore an optional means to invoke your service when you separate the service container and the ESB(optional) from the B2B Gateway and deploy the latter as a separate infrastructure. You can then benefit from the good of web-services without compromising on your service’s QoS.
September 16, 2008
I would not be exaggerating if I said its a war out there between the supporters of stateful and stateless frameworks and solutions in Java. Add to this melee the number of frameworks that are built on either of these two premise. An application developer is therefore spoilt for choices and often confused. The easiest way out is to align with one of the camps and trust their solutions.
Proponents of the various frameworks vouch for their solutions – a natural thing, but unfortunately it is biased due to their own business interests. I’ll take two solutions for illustration purposes – Spring and JBoss Seam. I am a big fan of Spring and am interested enough to dabble in Seam. FYI, one of my projects uses Seam extensively and I personally have designed complex systems on the JBoss platform in the past. So, there is no bias there. I’ll use an example to illustrate the comment on bias - I find the Spring Acegi framework to be a great solution for application security. I suspect the people behind Seam might think likewise but will never integrate the two. I read something about they having evaluated Acegi and decided to write their own. JBoss Seam security looks good but one is forced to write stuff for say RDBMS or LDAP based authentication while it is readily available in Spring Acegi. Other useful features are Transaction(compensation based) support for LDAP operations and the Spring AOP implementation.
I do believe the need for evolution. Hibernate came along and changed the way we wrote JDBC access. But, I would hate to see somebody rubbish stuff that others have done before. For e.g. I dont subscribe to Gavin King’s (creator of Hibernate and Seam) thought that applications are by default stateful. Maintaining state comes with a price – we have all not forgotten the problems we had with earlier versions of Stateful EJBs, Entity beans and Http session replication. I dont think Seam has cracked this entirely yet despite claims that the JBoss clustering on JGroups does fine grained replication of changes. My team members complain of huge files on the JBoss server environment that is presumably used to passivate state. I would not convince myself to write all my applications as stateful ones even if the associated framework claims greater productivity (this is debatable and the answers are unique to situations).
The best solution for an application developer like me would be to use the best of both worlds. There definitely is a case for using both Spring and Seam in your application stack. Lets look at an architectural paradigm that has generated lot of interest these days – SOA. Web services over Http and JMS are common technologies used to implement services in a SOA. Both these technologies advocate a request-response model and are mostly stateless. Spring and its integration with a framework like Mule make a compelling choice to implement the services layer. However you need more than just services to build an end application – you need a presentation layer, a mechanism to maintain conversational state (between user and application) and ability to perform data access where the nature of the use case does not require you to write it as a service. Enter Seam and other productivity enhancing web frameworks like Grails. Presentation frameworks and component libraries (Facelets, JSF component libraries, jQuery) e.t.c integrate nicely and is rarely an issue.
Such an approach allows one to leverage the scalability benefits of stateless frameworks (for the services layer) and the productivity benefits of the stateful frameworks. A combined reference architecture that demonstrates building stateful applications on a stateless service layer in the true spirit of SOA is useful. The MindTree Momentum platform achieve this – leverage the best of open source stateful and stateless frameworks.