Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Te presentamos la mejor plataforma de Planificación y Presupuestacion BI

Forecasts, Web and excel-like interface, Mobile Apps, Qlikview, SAP and Salesforce Integration...

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 6 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

La mejor oferta de Cusos Open Source

Después de la gran acogida de nuestros Cursos Open Source, eminentemente prácticos, lanzamos las convocatorias de 2016

29 sept. 2016

Usando Tableau y Pentaho con los datos de la Liga de Futbol


Muchas veces publicamos estudios y comparativas de diferentes tecnologías Business Intelligence o Big Data. Pero como suele ocurrir en muchos aspectos, lo mejor es verlos en funcionamiento sobre la práctica. 

Por ello, os mostramos ejemplos de Cuadros de Mando creados con Tableau y Pentaho con los datos de la Liga de Futbol en España

Pinchad en cada uno de los cuadros de mando para acceder a los mismos:

Tableau:





Pentaho:



27 sept. 2016

Analisis de los Panama Papers con Neo4J - Big Data



En este ejemplo se usa Neo4j como Base de Datos basada en grafo para modelar las relaciones entre las entidades que forman parte de los Papeles de Panamá (PP). A partir de ficheros de texto con los datos y relaciones entre clientes, oficinas y empresas que forman parte de los PP, hemos creado este grafo que facilia la comprensión de las interacciones entre sujetos distintos en esta red.
La demostración comienza seleccionando una entidad de cualquier tipo (Address, Company, Client, Officer), según el tipo que seleccione se muestran los atributos de ese nodo, luego seleccione el atributos que desea e introduzca el filtro, agregando varios paneles para filtrar por más de uno si es necesario. El parámetro "Deep" significa el número de conexiones al elemento seleccionado que se quiere mostrar.
En el servidor se hace una búsqueda BFS a partir del nodo seleccionado realizando consultas a Neo4j para cada tipo de relación donde una de sus partes sea el nodo actual, hasta llegar al nivel de profundidad solicitado. Se van guardando los nodos y los arcos para devolverlos como resultado.


Para la visualización del grafo se ha usado Linkurious, uno de los componentes más efectivos para este propósito en el mercado. Se puede interactuar con el grafo haciendo zoom, seleccionando elementos, moviendo elementos o usando el lasso tool para seleccionar varios nodos. Haciendo doble click sobre un nodo se cargan las conexiones a él que no estén visualizadas.
Neo4j y las Bases de Datos basadas en grafos en general tienen aplicaciones muy particulares, como Detección de Fraudes (descubriendo patrones de relaciones entre nodos), Recomendaciones en Tiempo Real (es relativamente sencillo, usando el peso de las relaciones de cada nodo, su tendencia, etc), Analítica de Redes Sociales (por la facilidad de implementar algoritmos de grafos en este tipo de Base de Datos)
Enjoy it!!

26 sept. 2016

How to create Balance Scorecards in Pentaho?


Now, you can create a Balance Scorecard application in Pentaho CE using this solution based in open source

You can see how it works in this video. More info an details here

25 sept. 2016

Que paso con las 50 empresas Open Source mas importantes?


Muy interesante la recopilación que hace thevarguy, en donde nos hace un seguimiento de que ha ido pasando con las principales soluciones open source a lo largo de los años. Cuales permanecen, cuales fueron compradas, cuales han desaparecido...

Descargar documento


22 sept. 2016

Location Intelligence: Bringing together the power of maps and Business Intelligence, with Carto and Pentaho


Location intelligence or spatial intelligence, is the process of deriving meaningful insight from geospatial data relationships to solve a particular problem. (Click on the above dashboard)

It involves layering multiple data sets spatially and/or chronologically, for easy reference on a map, and its applications span industries, categories and organizations It is generally agreed that more than 80% of all data has a location element to it and that location directly affects the kinds of insights that you might draw from many sets of information (Wikipedia rules)

Deploying location intelligence by analyzing data using a geographical information system (GIS) within business is becoming a critical core strategy for success in an increasingly competitive global economy.

Location intelligence is also used to describe the integration of a geographical component into business intelligence processes and tools, often incorporating spatial database and spatial OLAP tools.

Check this Online Dashboard created by our friends from Stratebi

Now, this is easier and more affordable than never thanks to tools like Carto and Pentaho

El cerebro tecnologico de la NBA


La NBA lleva recogiendo estadisticas desde 1943. Con la eclosión del Big Data y las nuevas tecnologías 'real time', las posibilidades se han multiplicado. En este video, el responsable de Tecnología de NBA lo explica muy bien

En España y en otros paises europeos y latinoamericanos estamos aún muy lejos, pero es seguro que va a haber un gran desarrollo proximamente

Recordar lo que os contamos hace unos meses sobre Moneyball



 Aquí se detalla la tecnología que emplean:

 

20 sept. 2016

Diferencias entre Data Analyst, desarrollador Business Intelligence, Data Scientist y Data Engineer



Conforme se extiende el uso de analytics en las organizaciones cuesta más diferenciar los roles de cada una de las personas que intervienen. A continuación, os incluimos una descripción bastante ajustada

Data Analyst

Data Analysts are experienced data professionals in their organization who can query and process data, provide reports, summarize and visualize data. They have a strong understanding of how to leverage existing tools and methods to solve a problem, and help people from across the company understand specific queries with ad-hoc reports and charts.
However, they are not expected to deal with analyzing big data, nor are they typically expected to have the mathematical or research background to develop new algorithms for specific problems.

Skills and Tools: Data Analysts need to have a baseline understanding of some core skills: statistics, data munging, data visualization, exploratory data analysis, Microsoft Excel, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Microsoft Access, Tableau, SSAS.

Business Intelligence Developers

Business Intelligence Developers are data experts that interact more closely with internal stakeholders to understand the reporting needs, and then to collect requirements, design, and build BI and reporting solutions for the company. They have to design, develop and support new and existing data warehouses, ETL packages, cubes, dashboards and analytical reports.
Additionally, they work with databases, both relational and multidimensional, and should have great SQL development skills to integrate data from different resources. They use all of these skills to meet the enterprise-wide self-service needs. BI Developers are typically not expected to perform data analyses.

Skills and tools: ETL, developing reports, OLAP, cubes, web intelligence, business objects design, Tableau, dashboard tools, SQL, SSAS, SSIS.

Data Engineer

Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists. They are software engineers who design, build, integrate data from various resources, and manage big data. Then, they write complex queries on that, make sure it is easily accessible, works smoothly, and their goal is optimizing the performance of their company’s big data ecosystem.
They might also run some ETL (Extract, Transform and Load) on top of big datasets and create big data warehouses that can be used for reporting or analysis by data scientists. Beyond that, because Data Engineers focus more on the design and architecture, they are typically not expected to know any machine learning or analytics for big data.

Skills and tools: Hadoop, MapReduce, Hive, Pig, MySQL, MongoDB, Cassandra, Data streaming, NoSQL, SQL, programming.

Data Scientist

A data scientist is the alchemist of the 21st century: someone who can turn raw data into purified insights. Data scientists apply statistics, machine learning and analytic approaches to solve critical business problems. Their primary function is to help organizations turn their volumes of big data into valuable and actionable insights.
Indeed, data science is not necessarily a new field per se, but it can be considered as an advanced level of data analysis that is driven and automated by machine learning and computer science. In another word, in comparison with ‘data analysts’, in addition to data analytical skills, Data Scientists are expected to have strong programming skills, an ability to design new algorithms, handle big data, with some expertise in the domain knowledge.

Moreover, Data Scientists are also expected to interpret and eloquently deliver the results of their findings, by visualization techniques, building data science apps, or narrating interesting stories about the solutions to their data (business) problems.
The problem-solving skills of a data scientist requires an understanding of traditional and new data analysis methods to build statistical models or discover patterns in data. For example, creating a recommendation engine, predicting the stock market, diagnosing patients based on their similarity, or finding the patterns of fraudulent transactions.
Data Scientists may sometimes be presented with big data without a particular business problem in mind. In this case, the curious Data Scientist is expected to explore the data, come up with the right questions, and provide interesting findings! This is tricky because, in order to analyze the data, a strong Data Scientists should have a very broad knowledge of different techniques in machine learning, data mining, statistics and big data infrastructures.

They should have experience working with different datasets of different sizes and shapes, and be able to run his algorithms on large size data effectively and efficiently, which typically means staying up-to-date with all the latest cutting-edge technologies. This is why it is essential to know computer science fundamentals and programming, including experience with languages and database (big/small) technologies.


Skills and tools: Python, R, Scala, Apache Spark, Hadoop, data mining tools and algorithms, machine learning, statistics.


Visto en BigDataUniversity

Every employee should know data analysis?



Segun Venturebeat, 'Every employee should know data analysis'

"I will wager that 99 percent of businesses in the U.S don’t need anyone proficient in C++ or Java. 
The tech skills required by most employers are substantial but quite different:"


1. The basics of a scripting language. Bash for Unix/Linux, JavaScript for web browsers, or Visual Basic for Microsoft Applications are simple coding skills that are easy to learn and valuable for workers across disciplines and levels. These skills allow you to automate tasks, promoting efficiency in manipulation and analysis.
For example, if you run a contest, you could write a simple script to determine if people who’ve entered the contest submitted their content to your site by the specified date. Looking up hundreds of users manually would be very tedious, but this scripting language know-how would make the process efficient.



2. Simple SQL commands. These commands are necessary to process raw data and turn it into information that you can analyze and apply.
Sure, the right people on your team should know how to code – but most of them should be writing spreadsheet macros and pivot tables to support your internal business processes, not agile algorithms for entrepreneurial endeavours. They should know the basics of HTML editing and how to set up folders and accounts with the correct security rights for your team. That’s what the bulk of businesses need from technology education.


3. Deductive reasoning skills. Being able to look at various pieces of data and draw a conclusion is probably the most valuable skill for any employee to have, and surprisingly it’s something that’s too often missing from otherwise technically advanced employees.

16 sept. 2016

Qlikview and Jedox integration


A partir de ahora, ya puedes unir el mundo de los agiles Cuadros de Mando de Qlikview con las herramientas de Planificación y Presupuestación, gracias a la integracioón de Qlikview y Jedox

Nuevo en Jedox? Aquí toda la información de esta potente suite. No dudes en contactar

Comprueba en este video que os hemos creado a continuación cómo de sencillo es integrar Jedox con Qliksense de Qlik. Pulsa aquí para tener más información de esta integración



You can also use for free the Jedox App for Apple and Android


15 sept. 2016

Big Data: Real Time Dashboards with Spark Streaming



Al abrirse la página de esta demostración, se solicita una conexión con el end point que provee los datos de la wikipedia, mediante un WebSocket.


Enel servidor se crea una conexión con el cliente y mientras esté abierta y no ocurran errores en el envio, el sistema busca los datos de los componentes de "Broadcast Queue". Estos componentes, a su vez, están recibiendo datos del API REST, que les llega a través del Cliente Http implementado y usado por Spark para enviar los resultados.
La implementación de la "Broadcast Queue", permite que todas las conexiones al servidor puedan buscar los datos en la misma cola obteniendo un tiempo óptimo de O(1), (Complejidad Computacionalde obtener datos de una Cola de Mensajes) para cada conexión en recibir el mensaje.


A su vez, en su papel de Cola de Mensajes permite que la comunicación entre Spark y el Server Socket sea óptima, en O(1) igualmente sin contar los retrazos por red.


Esta implementación permite que un número muy alto de clientes puedan conectarse a visualizar en tiempo real los datos recibidos de la wikipedia.