Go Back   CORTEX Forums > Vendors and Service Provders > Open Source Analytics
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Related Posts

This is a discussion on Related Posts within the Open Source Analytics forums, part of the Vendors and Service Provders category; Open source is also being discussed in other forums. Post links to those discussions here....


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 20th February 2009, 09:26 AM   #1
Member
 
Join Date: Oct 2008
Posts: 26
zamir is on a distinguished road
Thumbs up Related Posts

Open source is also being discussed in other forums. Post links to those discussions here.
zamir is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Old 20th February 2009, 09:40 AM   #2
Member
 
Join Date: Oct 2008
Posts: 26
zamir is on a distinguished road
Post Open Source Data Integration

An interesting review of Eclipse-based Talend here.
zamir is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Old 2nd April 2009, 12:41 PM   #3
Member
 
Join Date: Oct 2007
Posts: 384
Blog Entries: 26
Steve Bennett will become famous soon enough
Post Pentaho and Amazon.com deliver BI to the cloud

Pentaho and Amazon.com deliver BI to the cloud
Companies will be able to 'rent' Pentaho Version 3.0 via Amazon's EC2.

Eric Lai 24/03/2009 08:59:00

Open-source business intelligence application Pentaho is joining the roster of applications available via Amazon.com Inc.'s EC2 Web hosting service.

Companies will be able to "rent" the new release of Pentaho, Version 3.0, via EC2. That arrangement should lower the upfront start-up costs of using Pentaho -- though those costs were already low for its on-site version, according to Lance Walter, vice president of marketing at Orlando-based Pentaho Corp.

The on-site version of Pentaho's open-source software can be used for free, though many business customers subscribe to Pentaho support.

Pentaho is not the first "BI as a service" provider. Cambridge, Mass.-based start-up Good Data Corp. began testing a cloud-based BI service last fall. Good Data's offering is also on EC2.

Other enterprise applications available via EC2 include open-source ERP application Compiere, application servers such as JBoss, and a plethora of databases, including Oracle, MySQL and Microsoft Corp.'s SQL Server.

Other new features in Pentaho 3.0 include redesigned dashboards that incorporate Adobe Flash technology for better visuals and are now easy enough for most business end users to build themselves, said Walter.

Pentaho and JasperSoft Corp. are the two most popular vendors of open-source BI offerings. JasperSoft is backed by Linux vendor Red Hat Inc.

Pentaho's open-source community includes 40,000 registered members and, according to Walter, "hundreds of active contributors."
Steve Bennett is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Old 2nd April 2009, 03:03 PM   #4
Guru
 
Join Date: Oct 2007
Posts: 101
Doug Heywood is on a distinguished road
Post Greenplum leverages open source PostgreSQL

From Techworld:

Greenplum touts super-quick data loading
Database "fastest in the industry"

Tom Jowitt (Techworld) 18/03/2009

Greenplum has released new technology which it says can speed the loading of data into large scale databases, without compromising overall performance.

San Mateo, California-based Greenplum provides a high performance database (DBMS) typically used in data warehousing and large-scale analytical processing (or business intelligence) applications. It powers the Sun Data Warehouse Appliance, and customers include the likes of Linkedin, Nasdaq, NYSE Euronext, Fox Interactive Media, and Myspace.

Data loading is rapidly becoming an issue for companies increasingly facing exponential data growth. "For many companies data loading is a bottleneck," said Ben Werther, director of product marketing at Greenplum. "Data loading is traditionally done at night, but more data and longer loading cycles, sometime means this extends into the working day."

"The amount of data is growing on a daily or weekly basis," said Paul Salazar, VP of corporate marketing. "Companies are seeking to gain competitive advantage from analysing the data they capture and they are also choosing to store more data about specific events."

Salazar said that if customers can gain field intelligence quickly, by shorten data loading times to a couple of hours instead of overnight or longer, then there is a definite competitive advantage to be had.

To this end, Greenplum has introduced technology it is calling MPP Scatter/Gather Streaming' (or SG Steaming for short). SG Streaming technology is available immediately with the Greenplum Database. It is included at no extra charge to Greenplum customers, and the company says it eliminates the bottlenecks associated with other approaches to data loading.

Indeed, Greenplum cites customers that are achieving production loading speeds of over 4TB per hour. "The loading capabilities of this database are remarkable," said Brian Dolan, director of research analytics at Fox Interactive Media. "We're loading at rates of four terabytes an hour, consistently."

"This is definitely the fastest in the industry," said Greenplum's Werther. "Netezza for example quotes 500GB an hour, and we have not seen anyone doing more than 1TB an hour."

According to Werther, Greenplum utilises a "parallel-everywhere" approach to loading in which data flows from one or more source systems to every node of the database without any sequential choke points. This differs from traditional "bulk loading" technologies, used by most mainstream database and MPP appliance vendors that push data from a single source, often over a single or small number of parallel channels, and result in fundamental bottlenecks and ever-increasing load times. Greenplum's approach also avoids the need for a "loader" tier of servers, as required by some other MPP database vendors.

The SG Streaming technology ensures parallelism by "scattering" data from all source systems across 100s or 1,000s of parallel streams that simultaneously flow to all nodes of the Greenplum Database. Performance scales with the number of Greenplum Database nodes, and the technology supports both large batch and continuous near-real-time loading patterns with negligible impact on concurrent database operations.

Another useful feature is that the data can be transformed and processed in-flight, utilising all nodes of the database in parallel, for extremely high-performance ELT (extract-load-transform) and ETLT (extract-transform-load-transform) loading pipelines.

Of course, this means that Greenplum competes against the likes of hardware-based players like NCR's Teradata and Netezza, as well as other mainstream players such as Oracle. But Greenplum says that its ability to utilise off-the-shelf servers, storage, and networking, means that customers are not tied into any particular hardware configuration, and instead are offered cost-effective scaling on commodity hardware.

Greenplum launched version 3.2 of its database software back in September last year. Greenplum Database 3.2 was the first database to include MapReduce, a parallel computing technique pioneered by Google for analysing the web, which boosted the data analytics capabilities of the new DBMS.
Doug Heywood is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Old 2nd April 2009, 03:31 PM   #5
Member
 
Join Date: Oct 2008
Posts: 26
zamir is on a distinguished road
Cool

Quote:
Originally Posted by Doug Heywood View Post
From Techworld:

Greenplum touts super-quick data loading
Database "fastest in the industry"

Tom Jowitt (Techworld) 18/03/2009

Greenplum has released new technology which it says can speed the loading of data into large scale databases, without compromising overall performance.
The vendors are having a bit of a spat over this claim:

Quote:
31 March 2009

Row erupts over DBMS loading speeds
By Tom Jowitt, Techworld

High performance database provider Vertica Systems has dismissed the claims of one of its rivals over data loading speeds, after boasting that its own figure is properly benchmarked and therefore transparent and open.

Earlier this month, Greenplum released new technology which it said could speed the loading of data into large scale databases, without compromising overall performance. Indeed, Greenplum pointed to one of its customers, who said he was achieving production loading speeds of over 4TB per hour.

"This is definitely the fastest in the industry," said Greenplum's Ben Werther, director of product marketing, at the time. "Netezza for example quotes 500GB an hour, and we have not seen anyone doing more than 1TB an hour."

But rival outfit Vertica has taken exception to this. It points to a benchmark figure it set in collaboration with HP in December last year, where Syncsort's data integration product, DMExpress v4.8 extracted, transformed, cleansed and loaded 5.4TB of raw data into the Vertica Analytic Database in 57 minutes 21.51 seconds. The data was generated using the data generation tool of the TPC-H benchmark.

"Fundamentally, we are a relational DBMS (database management system)," said Dave Menninger, VP of marketing and product management for Vertica. "But under the covers we do things differently with the data, to improve performance."

"It is possible he [Greenplum's Ben Werther] was ignoring our benchmark when he made that claim, but I suspect he probably knew about it," said Menninger. "We ran a test, published the results and let everyone know the specifications of the test itself."

"Greenplum claims are incomplete," Menninger added, citing the lack of knowledge about the specifications of the machines involved in Greenplum's claims and pointing to the full disclosure of Vertica's benchmark.

But Greenplum soon hit back. "I was aware of their [Vertica's] benchmark, but I was referring to real world usage," said Werther, responding to Vertica's comments. "A number of people have been fairly amused by this. Our focus is on the customer doing something real, not targeting high loading speeds. Ours is a real world system. Vertica's numbers are devoid of any customer references."

Werther said that Vertica's figures were a classic benchmark, where the database assumes to have 7TB to 8TB but starts empty, with clean data to be loaded. There is even no redundant RAID for storage [in their benchmark] said Werther. "We are loading real data, data that is messy; it is not clean but has to worked on." He said that with Vertica's benchmark, they know sort the order of data that ensures they get clean figures. "It is a benchmark, and they tuned and tweaked it, to get a good number," he said.

So how does Vertica respond to charges that benchmarks are artificial and do not reflect real world scenarios? "That is a potentially valid criticism, that is why disclosure is so important," said Menninger. "People can understand what has been done, any special tweaks etc, so it is transparent and open and that has value. Benchmarks are important as they provide direction and give some sense of what is possible."

But Werther disagrees, and he feels that real world customers quoting figures are of more use than artificial benchmarks. "In our case, it was a customer putting their name on the line and saying those figures. With the Vertica disclosure, yes it published the way it was set up, but it took me couple of hours to decipher what they were doing." He said he could not imagine a user being so patient.

"We have great customers and we love to use them to showcase what they are doing, which speaks far more than artificial numbers someone else can generate," concluded Werther.

Werther pointed to a paper by Professor Joe Hellerstein (University of Berkeley) which provides a much more detailed analysis of the work that Greenplum has done with Fox Media, the customer concerned.
zamir is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Old 12th May 2009, 11:21 AM   #6
Administrator
 
Join Date: Oct 2007
Posts: 14,318
Blog Entries: 7
admin has disabled reputation
Post Deficient network at the heart of swine flu response

Cross post from latest news (whole article is here.

Quote:
"We need a system that manages rumour surveillance, influenza-like illness data, population data, geographic mapping, anti-viral usage, adverse events data and staffing capacity to ensure an effective and efficient response," an interim report says.

"Staff found NetEpi difficult to use, data entry was problematic, and analysis and reporting functionalities could not be utilised."

NetEpi is open source web-based software designed to help public health authorities investigate and manage outbreaks of communicable disease as well as other chronic illnesses.
admin is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Old 17th September 2009, 03:16 PM   #7
Administrator
 
Join Date: Oct 2007
Posts: 14,318
Blog Entries: 7
admin has disabled reputation
Lightbulb 5 open source project management apps

Cross post here.
admin is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cross Posts About Forecasting admin Forecasting Special Interest Group 3 6th December 2009 11:35 AM


All times are GMT +11. The time now is 10:06 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO