Dear All,
I super happy to communicate you that one of my white paper has survived three company acquisitions, every time a company was acquired I use to think, they may drop my work from their corporate site, but my work is still adopted and carried on by 3 companies –
Company 1 – SYSTIME Global Solutions
When Microsft Azure platform was just launched and I wrote this paper and was initially published on SYSTIME’s corporate website which is no longer available today, that time marketing team also published at other location such as
Company 2 – KPIT Technologies
When SYSTIME was taken over by KPIT, they converted my paper in their own format and published on their website and also on the below location. Since KPIT is not taken over by Birlasoft, their corporate website is down but you can see it at below location –
Company 3 – Birlasoft
Now, Birlasoft’s corporate website is hosting the same site in new format –
Please have a look at same paper in a new theme, colors, and logo –
Click to access windows-azure-partner-selection-guide.pdf
I no longer work with any of these companies and have moved on working with 3 more companies, but it is so nice to go back and see if our work is still valued.
Enjoy reading it !! 🙂
Thanks
Cloud storage is a model of networked online storage where data is stored in virtualized pools of storage which are generally hosted by third parties. Cloud storage is based on highly virtualized infrastructure and has same characteristic as Cloud computing in terms of agility, scalability, elasticity and multi-tenancy.
Cloud storage services may be accessed through a web service application programming interface (API), a cloud storage gateway or through a Web-based user interface.
Overall Cloud storage provides customers with required agility, scalability and cost effectiveness.
While understanding the economics, it is necessary to try to compare apple with apple, generally organizations compare on-premise storage devices cost with cloud storage/GB, which will be wrong to do. For understanding actual cost required for X GB storage you need to consider below parameters.
Source: Forrester Research – based on sample data provided under specific conditions*
Security
Accessibility
Regulatory Compliances
For e.g.
1.The Patriot Act in the US allows the government to subpoena all data stored within the US. This might not be acceptable to many organizations.
2. European Privacy Acts require that data be stored within the country of origin. Storing in the datacenter of out‐of‐country Service Provider cloud might not meet these requirements.
Cloud Service Provider stability
Total Cost
Many of the above concerns can be taken care by using Cloud storage Gateways.
For many organization’s may it be small-medium business or large businesses, majorly two obstacles stopping from utilizing cloud storage and those are really genuine but resolvable.
The first is the relatively slow performance as measured by response time obviously because of limitation of available internet bandwidth. This slow response time often makes cloud-based storage unacceptable for some users. The second is the requirement to write code for applications to the representational state transfer (REST) API. If the applications don’t have a native interface to cloud storage, many small to medium businesses lack the aptitude, desire, skills or time to develop it themselves.
So, what is the solution?
The potential solution could be to have a mechanism in place which will overcome above two obstacles and provide a way so that enterprises don’t need to worry about the storage, communication and performance complexities. A solution should provide wrapper to Cloud storage allowing users to use storage as normal as they had been doing on-premises.
‘Cloud storage gateways’ are the appropriate solution to this, since they are designed to overcome above two obstacles and they allow you to deal with cloud storage as if you are dealing with traditional SAN, NAS storage systems using NFS, iSCSI or FC methods. Additionally they can be used as a ‘primary storage’ unit providing features such as snapshots, thin provisioning, de-duplication and compression. This also eliminates any requirement of writing an application code for enterprises before using cloud storage.
There are many cloud storage gateways in the market, one has to understand how cloud storage gateway move your data cloud storage and how it brings back to you when needed. Various gateway companies use different approaches and algorithms some of them are patented technologies as well. The more efficiency achieved in data movement decides the quality and productivity of the cloud storage gateway. Additionally some cloud storage gateways allows you to use different cloud storage platforms such as Windows Azure, Amazon S3, EMC Atmos, Nirvanix and others providing complete flexibility.
Basic working of Cloud Storage Gateways
Cloud storage gateways are nothing but a customized appliance (server) having various types and specialty disk storage in it like HDDs, SSDs (Solid state devices) and software control on it. On-premise applications interact with these disks as normal, data is stored on these disks initially, it is them moved to the storage cloud based on policy/ algorithms such as age of the data, last access timings, or number of snapshots etc.
Though cloud storage gateways come with some costs but they relax you from several responsibilities providing low TCO solution. You get relax mind as cloud storage gateway takes care of data backups, snapshots, archival, de-duplication, compression, allowing you to use cost effective cloud storage, disaster recovery along with acting as a primary storage medium with classic storage technology combinations.
There are a number of cloud storage gateway vendors in the market today, with more emerging every quarter. They include Cirtas Systems, CTERA Networks, Nasuni Corp., StorSimple Inc., TwinStrata Inc. and others that are still emerging.
Opting out from owned on-premise infrastructure for a business application and moving it to the cloud could streamline the operations especially for data driven applications. Although, cloud storage can be used for any application dealing with data, it benefits specially when your data is increasing rapidly or existing data size is more than you want to control it on-premise. Cloud storage specially assists in web facing applications where upload and download of content is entirely up to end users and size of data can grow by any extent. Data could unstructured (simple files, documents, videos, audio, media content, database backups) or structured content (SQL databases) or NoSQL data, cloud storage is applicable for all kind.
Few examples of places where cloud storage can be used is-
How about using cloud storage for on-premise web applications?
One can use cloud storage for on-premise application and it is perfect to use it. However, for better performance, Storage and application should be co-located in the cloud to avoid possible latency. REST API exposed by cloud storage services can be easily consumed by applications for leveraging services and reaping benefits like data redundancy, availability and cost effectiveness.
Mobiles are everywhere. Be it be business phones like blackberry, Windows, iPhones or tablets/iPads with full blown applications generating and pushing data to central database for analysis, accounting and reporting purpose. Sales force or field agents and inspections generate lot of data with audio and video contents which need to be stored for longer period of process compliance purposes or reference purposes. Nowadays devices comes with inbuilt facilities to store content on local storage or on cloud, it has just become an option and cloud storage just a click away. Because of limited processing power, memory and bandwidth, data from mobile devices need to be pushed /pulled more frequently with availability requirement from anywhere in the world.
Cloud storage proves very efficient option in such cases providing complete data availability; data transfer rate doesn’t matter here since anyways it will happen via Internet with on-premise data center.
This is data is largest in size in any organization and uncontrollable also. People create copies of documents and version them as they want, it is difficult to track or control the pattern that users manage documents including emails, text documents, images/photos, manuals, training contents, proposals, marketing contents, accounting statements etc.
As per An IDC paper, “The Diverse and Exploding Digital Universe,” highlights how a single email with 1MB attachment when sent to four people consumes a total of 51MB of storage. (Source: “The Diverse and Exploding Digital Universe, An Updated Forecast of Worldwide Information Growth Through 2011,” March 2008, by International Data Corporation.) In other words, email suffers from attachment size limitations and is also an inefficient way of data sharing.
If constraints are applied on storage sizes, users tend to delete the content which may again create problems in accessing that in future. So to deal with such situations, strong storage policy is needed appreciating business need and impact of data availability for the organization. There should be a flexible way of data sharing which will increase collaboration in the organization along with an approach should prove cost effective and add value in terms of availability, disaster recovery, data redundancies, backups and versioning support.
Cloud storage helps you address all of above concerns fostering effective data storage, sharing, availability and pay as you storage option.
I have met with one customer who is into construction business from homes to business towers to ships and dams. Company operates in 7 countries following country specific policies and regulations for record keeping. Some of the countries like U.S and Canada need all the records for a construction project to be retained for 10 to 15 years. Data should be recoverable and available when it is required.
One may serve such request by regular maintenance of your storage policies and infrastructure considering backups, verifications, retention, duplication activities. Not only for the regulatory compliances but it is also important for every organization to retain, backup and preserve their data which is a real asset and outcome of thousands of hours of work.
Cloud storage takes all of your responsibilities of storing your data consistently for years you want to store. Outsourcing such as pain area will allow your IT to focus more on innovation and value added service for the businesses and not merely on record keeping purpose. Below are some of the instances where cloud storage has been used from some time –
Databases, may those be SQL or NoSQL can happily reside in the cloud. Some of the Cloud service provides like Microsoft Windows Azure has SQL DB database provided in a ‘Database as a service’ mode. Microsoft has partnered with other database providers like MySQL to make it available on their cloud platform. There is a long list of databases supported on cloud platform like –
SQL Database can deployed in cloud could be your primary databases, backup copies or as a secondary data source purpose like reporting purpose. SQL database in the cloud are charged based on monthly database size.
NoSQL databases present very cost effective ways for managing data and there are very good example in the industry that how people are using it for storing large data sets in the NoSQL database like Azure tables with partitioning policies spanning data in multiple datacenters around the world.
If you are using cloud computing for any reason, there is high possibility that you will fall in love with cloud storage. Most of the cloud providers have storage integrated with other offerings they have.
Whenever we use cloud platform like Microsoft Windows Azure either in IaaS, PaaS or SaaS mode, you are actually using Azure Cloud storage in some or other format.
One of the typical examples when people use Cloud storage is when they use IaaS offerings-
Global content distribution is not a new mechanism to boost application performance when you have wide range of users around the globe and your application content is cached at many places nearer to the users for faster delivery to them. With the advent of Cloud computing, this mechanism has become more powerful with integrated support from the cloud providers like Microsoft has CDN(Content distribution network) feature as of Windows Azure platform, users has to perform few clicks to configure it for their application and your site will be transformed with tremendous boost in content delivery. With increased number of CDN nodes around the globe, latency and scalability issues are being addressed very proactively and easily.
Scientific societies and researchers need large storage systems to store their simulations during their research. Not all scientists have liberty to buy storage systems and maintain them. Cloud storage provides efficient technique for them to use actually when required, pay for use and release the resources once results are drawn from the calculations. Cloud storage has been considered good candidate for storing content generated in digital movie production. Computer generated movie production generates huge data which need to be stored for short period of movie production may be for few months.
For e.g. movie like ‘avatar’ generated one petabyte or one million gigabytes of data which was stored using Microsoft digital asset management solution’ which could be today stored in Cloud storage with added benefits.
Hope this helps! 🙂
Laxmikant Patil
Since the inception of computers, enterprises had been using hard disks for data storage and transfer purpose. Most popular option is ‘portable hard disk’ being cost effective, easy to use and ability to carry anywhere features. Because of success of portable hard disks enterprises got attracted to it and started leveraging them for business data storage, backup and archival purposes, which is not the purpose these disks are meant for. Portable hard disks were developed for storage of temporary data and primarily for portability purpose.
Companies need to look at more reliable storage solution considering below mentioned criteria’s, because data not available on time or loss of data is as bad as data was never available!
Below is the comparison summary between ‘portable hard disk’ and ‘Windows Azure Storage services’ storage options against different criteria’s.
Sr. No |
Criteria | Portable Hard Disk | Cloud(Windows Azure) |
Winner |
1 |
Data Security | Low – Easily accessible | High- Always Secure Access | Azure Storage |
2 |
Ease of data Access | Data Access is easy as one has to just plug the HDD in USB | Internet based data access | HDD |
3 |
Portability | High | Low – but data is available around the world via Internet |
HDD |
4 | Reach | Low – Need to physically carry everywhere | High – Data accessible globally |
Azure Storage |
5 |
Disaster Recovery | Low – Very less chances of data recovery | High – Inbuilt disaster recovery. Data gets copied at 3 places and will be automatically made available in disaster recovery scenarios |
Azure Storage |
6 |
Data redundancy | Low – Need to implement explicitly which costs your more | High – Inbuilt data redundancy. Data gets copied at 3 places |
Azure Storage |
7 | Availability | Low- Vulnerable to numerous environmental conditions | High – 99.9% promised availability with world class data centers support |
Azure Storage |
8 |
Performance | High – Local access | Low- Internet based access | HDD |
9 | Maintenance | Need more care in periodic verifying of device | No maintenance needed |
Azure Storage |
10 |
Vulnerable to physical damage, heat, dust, wear and tear | Yes- highly vulnerable | No |
Azure Storage |
11 |
Life of storage device | 2-3 years max | Virtually unlimited | Azure Storage |
12 | Risk of device theft | High | Low |
Azure Storage |
13 |
Data access concurrency | Cannot be accessed concurrently for more than few users | Can be accessed by large users concurrently |
Azure Storage |
14 |
Governed SLAs | No | YES – by Microsoft | Azure Storage |
15 | Device Driver needs | YES | NO |
Azure Storage |
16 |
Data Access time | Less because of local data transfer | More because of internet based data transfer | HDD |
17 | Price | 1 TB for Approx. $100 | $0.07 per GB per month |
HDD |
18 |
Storage Capacity | Fixed – need to decide at the time of buying | Virtually unlimited | Azure Storage |
19 | Storage flexibility | If data size increases, Data storage cannot grow by itself | Cloud storage supports scalability out of the box with ability to storage unlimited data. Flexibility in terms of Pay as you go model |
Azure Storage |
20 |
Pricing Model | Capex – Capital investment is needed. | Opex – Only monthly usage charges need to pay, no upfront commitment | Azure Storage |
21 | Focus | Organization need to spend time and give focus on maintaining HDD in good way along with redundant copies of it | No need to spend additional minute in caring about the storage once data is uploaded. |
Azure Storage |
Just to conclude, ‘Windows Azure Storage Services’ wins in most of the cases and proves to be best option for storage purpose. Use this information wisely in your scenario to analyze the benefits scenarios.
Hope this helps !
Laxmikant Patil 🙂
If we look forward to year 2025, where we will have big dreams realized like Nanotechnology, artificial intelligence, next generation cloud and high performance computing. Impact of such technologies on overall human life is unimaginable at this point in time. We are not far away from tiny Nano factories and Nano robots at home doing some smart job for us. Computers around us will be million times faster, smaller, ready for you to serve within fraction of energy consumption as compared to today. These possibilities are beautiful and are likely to be realized but one important question is ‘Are we ready for that?’, ‘Are we putting correct foundation for next generation computing?’ Answer may be a ‘YES’ or ‘NO’ basis individuals perceptions and context. However, it would be certainly ‘No’ if we look at the current un-structured nature of World Wide Web, the biggest information store freely available over our fingertips.
Due to tremendous size of Web, the way we have organized our web resources and the rate of web adoption in developing countries, soon it will become difficult to identify relevant information and services of interest easily. Total dependence on merely text based search engines for information identification will not be sufficient and we will lose credible information which search engines cannot put forward effectively and such a loss may become unaffordable in near future.
Lot of research has been happening around web standardization like research on classifying web sites by Christoph Lindemann[2] and Lars Littig[2], research on extracting and managing structured Web Data by Michael John Cafarella[1] is remarkable.
This paper advises few techniques on structuring the Web to make it best usable.
This is the first paper from the series targeted towards research on ‘Structured Web: 2025’ topic.
Fig. 1 Conceptual view of Web showing scattered information without specific structure
Due to the heterogeneity of the Web and its lack of structure, it is crucial to identify properties of a Web resource that best reflect its functionality. In Relational Database world, we call it a Schema. If we want to read any tuple from database, we need to first know its schema. This principle is equally applicable to Web resource as well. Once we know the schema, second step is, we should allow database tuple to be read by anybody.
Here I propose two step methodology to describe the structure of Web resource.
A. Every Web resource should describe and expose its properties.
B. Every Web resource should be accessible using unified structure.
Here I am considering Web resource as everything which will be publicly accessible.
This applies to one of the major web resource i.e. Web site. Every Web site should describe its schema using below properties and should expose it for public access.
TABLE I
Web site properties
Sr. No. | Web Site Properties Description | |
Category |
Element | |
1 |
Domain |
Domain |
2 |
Presence
|
Country |
3 | Languages | |
4 | Time Zone | |
5 | Currency | |
6 |
Web Content
|
Images |
7 | Text | |
8 | Video | |
9 | Audio | |
10 | XML | |
11 | XHTML | |
12 | RSS | |
13 | Documents (Word, PDF, XLS etc.) | |
14 |
Security |
Secure |
15 |
Audience |
Adult |
16 |
Volume
|
Size of Pages |
17 | Count of Pages | |
18 | External site out degree | |
19 |
Technical realization |
JavaScript, or another scripting |
20 |
Domain dictionary |
Domain dictionary keywords |
21 |
Popular URLs |
Popular URLs of the site |
22 |
Rank |
Rank(1…10) |
23 |
Subdomains |
Subdomains |
24 |
Web resource structure |
See section V. |
Using above information available with each Web site, organizations can write crawlers, which will visit web sites and retrieve these details to maintain database of all this information.
Where –
Domain dictionary keywords can be used by search engines to index the web site against those keywords.
Security signifies if that website can be openly used by anybody or registration is required.
Once we understand about web site properties, we will be able to understand general structure of it. Next level of categorization is done using how actual web site content is made available for public access. This content access is different from content access using rendered web page. By directly exposing content using URLs will help categorising overall information in terms of relational database table like below.
TABLE II
WEB resource structure
Sr. No. | Web site content access structure | ||
Content Type | Name | URL | |
1 | Image | Einstein.png | http://www.example.com/Einstein.png |
2 | Image | James Cameron | http://www.example.com/JamesC.png |
3 | USLReport | http://www.example.com/USL.pdf | |
4 | Text | Football Game | http://www.example.com/FootballGame.htm |
5 | Video | President Speech | http://www.example.com/Presdspeech.mp4 |
.. | .. | .. | .. |
.. | .. | .. | .. |
A. WebResource_Properties.XML file
Now the question is how any web site will expose properties and content access structure to the outside world. Answer is one XML file with standard schema that should be published by every website owner. This file would be WebResource_Properties.XML. This file should be present in each sites root virtual folder and should be accessible publicly by using below URL format –
Http://www.example.com/WebResource_Properties.xml
Using above mechanism, we can build relational database table for all the websites exposing web resource properties.
One can easily write piece of software which will provide you list of all sites from ‘Ireland, in Health care domain, with Audio and images, having page count >20 without any security’ for accessing content.
B. Ranking Website
Another way of classifying web resources/web sites is ranking them. This ranking should be done basis
Web site ranking should be done by independent organizations to provide real usability aspect to the world. Rank is always linked to Domain, so while comparing ranks domain always comes into picture.
C. Domain
Some of the domains can be listed as Affiliate site, Archive Site, Blogs, Corporate site, Commerce Site, database site, development site, directory site, download site, employment site etc.
Figure below shows two views of web site as –
A. View which is rendered in the browser and user can see it directly. Search engines works on this view for performing indexing on web site. Search engine cannot reach to the web resource which has got no link in the browser rendered page. Search engine cannot crawl the web sites which have got some files on web servers without links provided in web pages.
B. Second view is the view provided through WebResources_Properties xml file, sample as shown in right side of this figure.
Fig. 2 Web Virtual directory and two views of it
By implementing above guidelines, web information can be structured to some level which allows us to leverage following advantages.
A. Technology neutral way of categorizing of web sites
Using above method web sites can be categorized and web can be structured in a technology neutral way.
B. Improved search engine optimization
Now search engines need not just depend on text based indexing, additional web resource properties can help in getting meaningful search results.
C. Minimal work to get started
Web resource owners don’t need to make any changes in their web applications. Just one XML file will help bring in lots of difference.
Figure below shows conceptual view of Web when such structuring will happen over a period. Web being a massive data store, it will take time for people to adopt such standards and apply them.
Fig 3. – Conceptual view of Web showing structured information after employing above techniques
Important point is if we don’t take action on time we will be at great loss where millions of ideas/research/opinions by billions of people might get into dark ages just because nobody could find it at correct time and carry on further work. People will keep on reinventing the wheel, and next generation will blame on us because we could not manage the Web with great responsibly. If we start today, hope is entire Web will be structured data source by 2025 and next generation might use structured query language to search the Web seamlessly.
Because “Information could not be found easily is as good as information is not present.”
[1] Michael John Cafarella, Extracting and Managing Structured Web Data, university of Washington, 2009
[2] Chrisoph Lindemann and Lars Littig, Classifying web sites, University of Leipzig, Johannisgasse 26, 2007
[3] John M. Pierre, On the Automated Classification of Web Sites, California UAS, 2001