Material Big Data

Lanzados ppts informativos de tecnologías BigData: Hadoop, Hbase, Hive, Zookeeper...

Te presentamos la mejor plataforma de Planificación y Presupuestacion BI

Forecasts, Web and excel-like interface, Mobile Apps, Qlikview, SAP and Salesforce Integration...

Pentaho Analytics. Un gran salto

Ya se ha lanzado Pentaho 7 y con grandes sorpresas. Descubre con nosotros las mejoras de la mejor suite Open BI

La mejor oferta de Cusos Open Source

Después de la gran acogida de nuestros Cursos Open Source, eminentemente prácticos, lanzamos las convocatorias de 2016

22 mar. 2017

Oferta de empleo Business Analytics (Business Intelligence, Big Data)

Nuestros compañeros de Stratebi tienen posiciones abiertas para trabajar en el campo del Business Intelligence, Big Data y Social Intelligence en Madrid y Barcelona. Si estás interesado, no dejes de echarle un vistazo y enviarnos tu CV: rrhh@stratebi.com


Posiciones Abiertas:

Debido a la ampliación de operaciones en Madrid y Barcelona, estamos buscando verdaderos apasionados por el Business Analytics y que hayan tenido interés en soluciones Open Source y en el desarrollo de tecnologías abiertas. Y, sobre todo, con ganas de aprender en nuevas tecnologías como Big Data, Social Intelligence, IoT, etc... 

Si vienes del mundo frontend, desarrollo de visualizaciones en entornos web, también serás un buen candidato 

Si estas leyendo estas lineas, seguro que te gusta el Business Intelligence. Estamos buscando a personas con gran interés en este área, que tengan una buena formación técnica y alguna experiencia en la implementación de proyectos Business Intelligence en importantes empresas con (Oracle, MySQL, Powercenter, Business Objects, Pentaho, Microstrategy...) o desarrollos web adhoc. También se valorarán candidaturas sin experiencia profesional en este campo, pero con interés en desarrollar una carrera profesional en este área.


Mucho mejor, si además fuera con BI Open Source, como Pentaho, Talend... y conocimientos de tecnología Big Data y Social Media, orientado a la visualización y front-end



Todo ello, será muy útil para la implementación de soluciones BI/DW con la plataforma BI Open Source que está revolucionando el BI: Pentaho, con la que mas trabajamos, junto con el desarrollo de soluciones Big Data, Social Intelligence y Smart Cities

Si ya conoces, o has trabajado con Pentaho u otras soluciones BI Open Source será un punto a favor. De todos modos, nuestro Plan de Formación te permitirá conocer y mantenerte actualizado en estas soluciones.

¿Quieres saber un poco mas sobre nosotros y las características de las personas y perfiles que estamos buscando para 'subirse al barco'?


¿Qué ofrecemos?


- Trabajar en algunas de las áreas de mayor futuro y crecimiento dentro del mundo de la informática: Business Intelligence, Big Data y el Open Source.
- Colaborar en la mejora de las soluciones Bi Open Source, entre las que se encuentran desarrollando algunas de las empresas tecnológicas más importantes.
- Entorno de trabajo dinámico, aprendizaje continuo, variedad de retos.
- Trabajo por objetivos.
- Considerar el I+D y la innovación como parte principal de nuestros desarrollos.
- Retribución competitiva.
- Ser parte de un equipo que valora a las personas y al talento como lo más importante.


Ya sabes, si te gusta la idea, escribenos, contando tu interés y un CV a:  rrhh@stratebi.com

O si conoces a alguien, que crees que le podría encajar, no dudes en reenviarselo.




Detalle de algunas tecnologías que manejamos:

Conocimientos de Bases de datos:
- Administracion
- Desarrollo
- Oracle, MySql, PostgreSQL, Vertica, Big Data

- Conocimientos de BI y Datawarehousing con Pentaho u otros BI comerciales (BO, Powercenter, Microstrategy...)
- Modelado de DataWarehouse
- ETL
- Cuadros de mando
- Reporting, OLAP...

- Conocimientos de linux
- Bash scripting
- Configuracion de servidores y servicios
- Conocimientos de Java y J2EE
- Tomcat
- Jboss
- Spring
- Hibernate
- Ant
- Git

17 mar. 2017

El Cuadro de Mando que controla toda tu vida



Anand Sharma, registra sus peripecias vitales como una forma de legar a la posteridad los datos vinculados con su salud. En la web de su proyecto Aprilzero puedes conocer cada minúsculo detalle, y muy pronto publicar también los tuyos.



Trabaja en una herramienta para que cualquier persona pueda monitorizarse a sí misma. Se trata de un nuevo proyecto llamado Gyrosco.pe, que aún está en fase de desarrollo y que es, en definitiva, una segunda versión de Aprilzero abierta a la comunidad, que integra muchos datos:



Si echas un vistazo a su página web, comprobarás que es increible todos los aspectos analizados y se echan algo de menos algunas herramientas tipo informes, dashboards adhoc, etc... para explotar toda esa información 

Visto en el diario

11 mar. 2017

Mas de 20 Tecnicas y Tipos de Analisis Big Data


A continuación, os detallamos las principales técnicas y tipos de análisis que se realizan en Big Data, muchas veces agrupadas bajo nombres como algoritmos, machine learning, etc.... pero que no siempre se explican correctamente

Aquí os hemos creado algunos ejemplos online usando algunas de estas técnicas

Si quieres saber más, puedes consultar también otros posts relacionados:

Las 53 Claves para conocer Machine Learning
69 claves para conocer Big Data
Como empezar a aprender Big Data en 2 horas
Tipos de roles en Analytics (Business Intelligence, Big Data)
Libro Gratuito: Big Data, el poder de convertir datos en decisiones

Veamos pues, cuales son estas técnicas:

1. A/B testing: A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate. This technique is also known as split testing or bucket testing. An example application is determining what copy text, layouts, images, or colors will improve conversion rates on an e-commerce Web site. Big data enables huge numbers of tests to be executed and analyzed, ensuring that groups are of sufficient size to detect meaningful (i.e., statistically significant) differences between the control and treatment groups (see statistics). When more than one variable is simultaneously manipulated in the treatment, the multivariate generalization of this technique, which applies statistical modeling, is often called “A/B/N” testing

2. Association rule learning: A set of techniques for discovering interesting relationships, i.e., “association rules,” among variables in large databases.These techniques consist of a variety of algorithms to generate and test possible rules. One application is market basket analysis, in which a retailer can determine which products are frequently bought together and use this information for marketing (a commonly cited example is the discovery that many supermarket shoppers who buy diapers also tend to buy beer). Used for data mining.

3. Classification: A set of techniques to identify the categories in which new data points belong, based on a training set containing data points that have already been categorized. One application is the prediction of segment-specific customer behavior (e.g., buying decisions, churn rate, consumption rate) where there is a clear hypothesis or objective outcome. These techniques are often described as supervised learning because of the existence of a training set; they stand in contrast to cluster analysis, a type of unsupervised learning. Used for data mining.

4. Cluster analysis: A statistical method for classifying objects that splits a diverse group into smaller groups of similar objects, whose characteristics of similarity are not known in advance. An example of cluster analysis is segmenting consumers into self-similar groups for targeted marketing. This is a type of unsupervised learning because training data are not used. This technique is in contrast to classification, a type of supervised learning. Used for data mining.

5. Crowdsourcing: A technique for collecting data submitted by a large group of people or ommunity (i.e., the “crowd”) through an open call, usually through networked media such as the Web.This is a type of mass collaboration and an instance of using Web.

6. Data fusion and data integration: A set of techniques that integrate and analyze data from multiple sources in order to develop insights in ways that are more efficient and potentially more accurate than if they were developed by analyzing a single source of data. Signal processing techniques can be used to implement some types of data fusion. One example of an application is sensor data from the Internet of Things being combined to develop an integrated perspective on the performance of a complex distributed system such as an oil refinery. Data from social media, analyzed by natural language processing, can be combined with real-time sales data, in order to determine what effect a marketing campaign is having on customer sentiment and purchasing behavior.

7. Data mining: A set of techniques to extract patterns from large datasets by combining methods from statistics and machine learning with database management. These techniques include association rule learning, cluster analysis, classification, and regression. Applications include mining customer data to determine segments most likely to respond to an offer, mining human resources data to identify characteristics of most successful employees, or market basket analysis to model the purchase behavior of customers

8. Ensemble learning: Using multiple predictive models (each developed using statistics and/or machine learning) to obtain better predictive performance than could be obtained from any of the constituent models. This is a type of supervised learning.

9. Genetic algorithms: A technique used for optimization that is inspired by the process of natural evolution or “survival of the fittest.” In this technique, potential solutions are encoded as “chromosomes” that can combine and mutate. These individual chromosomes are selected for survival within a modeled “environment” that determines the fitness or performance of each individual in the population. Often described as a type of “evolutionary algorithm,” these algorithms are well-suited for solving nonlinear problems. Examples of applications include improving job scheduling in manufacturing and optimizing the performance of an investment portfolio.

10. Machine learning: A subspecialty of computer science (within a field historically called “artificial intelligence”) concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. Natural language processing is an example of machine learning

11. Natural language processing (NLP): A set of techniques from a subspecialty of computer science (within a field historically called “artificial intelligence”) and linguistics that uses computer algorithms to analyze human (natural) language. Many NLP techniques are types of machine learning. One application of NLP is using sentiment analysis on social media to determine how prospective customers are reacting to a branding campaign.

12. Neural networks: Computational models, inspired by the structure and workings of biological neural networks (i.e., the cells and connections within a brain), that find patterns in data. Neural networks are well-suited for finding nonlinear patterns. They can be used for pattern recognition and optimization. Some neural network applications involve supervised learning and others involve unsupervised learning. Examples of applications include identifying high-value customers that are at risk of leaving a particular company and identifying fraudulent insurance claims.

13. Network analysis: A set of techniques used to characterize relationships among discrete nodes in a graph or a network. In social network analysis, connections between individuals in a community or organization are analyzed, e.g., how information travels, or who has the most influence over whom. Examples of applications include identifying key opinion leaders to target for marketing, and identifying bottlenecks in enterprise information flows.

14. Optimization: A portfolio of numerical techniques used to redesign complex systems and processes to improve their performance according to one or more objective measures (e.g., cost, speed, or reliability). Examples of applications include improving operational processes such as scheduling, routing, and floor layout, and making strategic decisions such as product range strategy, linked investment analysis, and R&D portfolio strategy. Genetic algorithms are an example of an optimization technique

15. Pattern recognition: A set of machine learning techniques that assign some sort of output value (or label) to a given input value (or instance) according to a specific algorithm. Classification techniques are an example.

16. Predictive modeling: A set of techniques in which a mathematical model is created or chosen to best predict the probability of an outcome. An example of an application in customer relationship management is the use of predictive models to estimate the likelihood that a customer will “churn” (i.e., change providers) or the likelihood that a customer can be cross-sold another product. Regression is one example of the many predictive modeling techniques.

17. Regression: A set of statistical techniques to determine how the value of the dependent variable changes when one or more independent variables is modified. Often used for forecasting or prediction. Examples of applications include forecasting sales volumes based on various market and economic variables or determining what measurable manufacturing parameters most influence customer satisfaction. Used for data mining.

18. Sentiment analysis: Application of natural language processing and other analytic techniques to identify and extract subjective information from source text material. Key aspects of these analyses include identifying the feature, aspect, or product about which a sentiment is being expressed, and determining the type, “polarity” (i.e., positive, negative, or neutral) and the degree and strength of the sentiment. Examples of applications include companies applying sentiment analysis to analyze social media (e.g., blogs, microblogs, and social networks) to determine how different customer segments and stakeholders are reacting to their products and actions.

19. Signal processing: A set of techniques from electrical engineering and applied mathematics originally developed to analyze discrete and continuous signals, i.e., representations of analog physical quantities (even if represented digitally) such as radio signals, sounds, and images. This category includes techniques from signal detection theory, which quantifies the ability to discern between signal and noise. Sample applications include modeling for time series analysis or implementing data fusion to determine a more precise reading by combining data from a set of less precise data sources (i.e., extracting the signal from the noise).

20. Spatial analysis: A set of techniques, some applied from statistics, which analyze the topological, geometric, or geographic properties encoded in a data set. Often the data for spatial analysis come from geographic information systems (GIS) that capture data including location information, e.g., addresses or latitude/longitude coordinates. Examples of applications include the incorporation of spatial data into spatial regressions (e.g., how is consumer willingness to purchase a product correlated with location?) or simulations (e.g., how would a manufacturing supply chain network perform with sites in different locations?).

21. Statistics: The science of the collection, organization, and interpretation of data, including the design of surveys and experiments. Statistical techniques are often used to make judgments about what relationships between variables could have occurred by chance (the “null hypothesis”), and what relationships between variables likely result from some kind of underlying causal relationship (i.e., that are “statistically significant”). Statistical techniques are also used to reduce the likelihood of Type I errors (“false positives”) and Type II errors (“false negatives”). An example of an application is A/B testing to determine what types of marketing material will most increase revenue.

22. Supervised learning: The set of machine learning techniques that infer a function or relationship from a set of training data. Examples include classification and support vector machines.30 This is different from unsupervised learning.

23. Simulation: Modeling the behavior of complex systems, often used for forecasting, predicting and scenario planning. Monte Carlo simulations, for example, are a class of algorithms that rely on repeated random sampling, i.e., running thousands of simulations, each based on different assumptions. The result is a histogram that gives a probability distribution of outcomes. One application is assessing the likelihood of meeting financial targets given uncertainties about the success of various initiatives

24. Time series analysis: Set of techniques from both statistics and signal processing for analyzing sequences of data points, representing values at successive times, to extract meaningful characteristics from the data. Examples of time series analysis include the hourly value of a stock market index or the number of patients diagnosed with a given condition every day. Time series forecasting is the use of a model to predict future values of a time series based on known past values of the same or other series. Some of these techniques, e.g., structural modeling, decompose a series into trend, seasonal, and residual components, which can be useful for identifying cyclical patterns in the data. Examples of applications include forecasting sales figures, or predicting the number of people who will be diagnosed with an infectious disease.

25. Unsupervised learning: A set of machine learning techniques that finds hidden structure in unlabeled data. Cluster analysis is an example of unsupervised learning (in contrast to supervised learning).


26. Visualization: Techniques used for creating images, diagrams, or animations to communicate, understand, and improve the results of big data analyses.

Visto en Big Data made simple

5 mar. 2017

Por qué si tengo un dashboard no soy capaz de tomar decisiones?



Muy interesante esta reflexión de Tristan Elosegui, de hace ya un par de años, pero que mantiene toda su vigencia. Abajo os indicamos los puntos principales que detalla:

En TodoBI, hablamos mucho de Dashboards (ver posts), de los que os destacamos:

12 aplicaciones gratuitas para crear Dashboards
Tutorial de Creación de Cuadros de Mando Open Source
Ejemplos Dashboards
- Cuadro de Mando Integral (Scorecard)

Según Tristán, las empresas tienen gran cantidad de datos a su alcance, pero no son capaces de poner orden entre tanto caos y como consecuencia, no tienen una visión clara de la situación. 

El ruido es mayor que la ‘señal’

El volumen de datos y la velocidad con la que se generan, provocan más ruido que señal.
Esta situación lleva a las empresas a la toma de decisiones sin los datos necesarios o a la parálisis post-análisis en lugar de facilitar la acción (toma de decisiones).
Los datos llegan desde diferentes fuentes, en diferentes formatos, desde diferentes herramientas,… y todos acaban en informes, que intentan integrar en un dashboard que les ayude a tomar decisiones.

¿Por qué teniendo tantos datos las empresas no son capaces de tomar decisiones estratégicas?

Tener muchos datos no siempre significa tener mejor visión sobre la situación. Seguro que más de uno de los que estáis leyendo este post, os sentís identificados.
Las empresas toman decisiones en base a datos todos los días (y sin datos también), el problema es que estas decisiones son tácticas ya que se toman tipo ‘silo’ (por áreas).
Para poder tomar decisiones que optimicen la estrategia global de la empresa necesitamos:
  • Tener los datos necesarios, ni más ni menos, para tomarlas (la foto más completa posible del contexto) y
  • ser capaces de entender los datos,para transformarlos en información y a continuación en conocimiento.
No hay nada peor que haber recorrido el camino hasta tener un dashboard estratégico, y que la persona que tiene que tomar las decisiones no las tome. ¿por qué ocurre esto?

Falta de contexto

El motivo principal para no tomar decisiones, es que los datos representados en el dashboard no sean relevantes, no sean accionables.
Esto ocurre cuando no hemos definido correctamente el dashboard (los pasos correctos están definidos en el modelo de madurez de la analítica digital). Los errores más comunes suelen ser:
  • Objetivos y KPIs mal definidos: si el punto de partida esta mal definido, todo lo que venga detrás nos llevará a error. Y por supuesto, el contexto será del todo equivocado.
  • Datos irrelevantes o no accionables: bien por una mala definición de objetivos y de las KPIs que nos ayudan a controlarlos o simplemente porque hemos seleccionado mal los datos, llegamos a un dashboard lleno de números y gráficas, que no nos permite tomar decisiones.Bien porque no muestra los datos con el área de responsabilidad de la persona que toma las decisiones, o simplemente porque son datos no accionables. En cualquiera de los dos casos el resultado es el mismo.
  • Datos incompletos: es el otro extremo del caso anterior. Nos faltan los datos necesarios para tomar decisiones.

Visualización de datos

El segundo gran problema es que la persona que tiene que tomar las decisiones no entienda los datos.
Al igual que tenemos que mostrar a cada stakeholder los datos que son relevantes para su trabajo (caso anterior), tenemos que adaptar el lenguaje y la visualización, para que el decisor entienda lo que está viendo.
Así que, para que un dashboard estratégico funcione debes empezar por tener definir bien los objetivos y KPIs, trabajar la calidad del dato, que estos datos te estén contando lo que te interesa y que integren datos de las diferentes fuentes que manejas.

No te saltes ninguna fase del modelo de madurez de la analítica digital, porque sino te puedes encontrar con los problemas que hemos visto en este post.

Ver Articulo completo

28 feb. 2017

Machine Learning: Choosing the right estimator



Often the hardest part of solving a machine learning problem can be finding the right estimator for the job.
Different estimators are better suited for different types of data and different problems.


The flowchart below by Scikit Learn is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data

24 feb. 2017

Leaflet and R


Leaflet 1.1.0 is now available on CRAN! The Leaflet package is a tidy wrapper for the Leaflet.js mapping library, and makes it incredibly easy to generate interactive maps based on spatial data you have in R.




Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB

This release was nearly a year in the making, and includes many important new features.
  • Easily add textual labels on markers, polygons, etc., either on hover or statically
  • Highlight polygons, lines, circles, and rectangles on hover
  • Markers can now be configured with a variety of colors and icons, via integration with Leaflet.awesome-markers
  • Built-in support for many types of objects from sf, a new way of representing spatial data in R (all basic sf/sfc/sfg types except MULTIPOINT and GEOMETRYCOLLECTION are directly supported)
  • Projections other than Web Mercator are now supported via Proj4Leaflet
  • Color palette functions now natively support viridis palettes; use "viridis", "magma", "inferno", or "plasma" as the palette argument
  • Discrete color palette functions (colorBin, colorQuantile, and colorFactor) work much better with color brewer palettes
  • Integration with several Leaflet.js utility plugins
  • Data with NA points or zero rows no longer causes errors
  • Support for linked brushing and filtering, via Crosstalk (more about this to come in another blog post)
Visto en blog.rstudio

23 feb. 2017

Citus 6.1 Released, escala tu Base de datos PostgreSQL


Interesantes novedades de Citusdata, ver Community Edition

Citus es una base de datos distribuida que permite escalar PostgreSQL (una de nuestras Bases de Datos favoritas), permitiendo usar todas las funcionalidades de PostgreSQL con las ventajas de escalar.

Microservices and NoSQL get a lot of hype, but in many cases what you really want is a relational database that simply works, and can easily scale as your application data grows. Microservices can help you split up areas of concern, but also introduce complexity and often heavy engineering work to migrate to them. Yet, there are a lot of monolithic apps out that do need to scale. 

If you don’t want the added complexity of microservices, but do need to continue scaling your relational database then you can with Citus. With Citus 6.1 we’re continuing to make scaling out your database even easier with all the benefits of Postgres (SQL, JSONB, PostGIS, indexes, etc.) still packed in there.

With this new release customers like Heap and Convertflow are able to scale from single node Postgres to horizontal linear scale. Citus 6.1 brings several improvements, making scaling your multi-tenant app even easier. These include:
  • Integrated reference table support
  • Tenant Isolation
  • View support on distributed tables
  • Distributed Vaccum / Analyze

All of this with the same language bindings, clients, drivers, libraries (like ActiveRecord) that Postgres already works with

21 feb. 2017

How to create your own Dashboards in Pentaho?



Just a sneak preview of new functionalities we are including in Pentaho in order end user can create their own powerful dashboards in minutes. We call it STDashboard, by our colleagues of Stratebi.

These new functionalities include: new templates, panel resize, drag and drop, remove and create panels, Pentaho 7 upgrade...

As always and as other Pentaho plugins, is free and included in all of our projects. Check the DemoPentaho Online, where all new components are updated frequently

You can use it too, directly in your own projects, including configuration, training and support with our help


Video in action (Dashboards in minutes):

15 feb. 2017

Glosario de Terminos de Business Intelligence


Para todos aquellos que se están introduciendo en el mundo del Business Intelligence, os incluimos un Glosario de los principales términos de Business Intelligence. Visto en el blog de Panorama

Si queréis jugar con una Demo abierta, open source, para conocer y probar estos conceptos, es lo mejor para familiarizarse.

Glosario de Términos Business Intelligence:

  • Automated Analysis: Automatic analysis of data to find hidden insights in the data and show users the answers to questions they have not even thought of yet.
  • BI Analyst: As stated by modernanalyst.com, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things. (Can be called a data analyst too)
  • BI Governance: According to Boris Evelson, from Forrester Research, BI governance is a key part of data governance, but if focuses on a BI system and governs over who uses the data, when, and how.
  • Big Data: Enormous and complex data sets that traditional data processing tools cannot deal with.
  • Bottlenecks: Points of congestion or blockage that hinder the efficiency of the BI system.
  • Business Intelligence: According to Gartner, “Business Intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.”
  • Centralized Business Intelligence: A BI model that enables users to work connected and share insights, while seeing the same and only version of the truth. IT governs over data permissions to ensure data security.
  • Collaborative BI: An approach to Business Intelligence where the BI tool empowers users to collaborate between colleagues, share insights, and drive collective knowledge to improve decision making.
  • Collective Knowledge: Knowledge that benefits the whole enterprise as it comes from the sharing of insights and data findings across groups and departments to enrich analysis.
  • Dark Data: According to Gartner, the definition for Dark Data is “information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes”. 90% of companies’ data is dark data.
  • Dashboards: A data visualization tool that displays the current enterprise health, the status of metric and KPIs, and the current data analysis and insights.
  • Data Analyst: As stated by modernanalyst.com, a data analyst is a professional who is in charge of analyzing and mining data to identify patterns and correlations, mapping and tracing data from system to system in order to solve a problem, using BI and data discovery tools to help business executives in their decision making, and perform statistical analysis of business data, among other things.
  • Data Analytics: According to TechTarget, “data analytics is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software.”
  • Data Governance: According to Boris Evelson, from Forrester Research, data governance “deals with the entire spectrum (creation, transformation, ownership, etc.) of people, processes, policies, and technologies that manage and govern an enterprise’s use of its data assets (such as data governance stewardship applications, master data management, metadata management, and data quality).
  • Data Mashup: An integration multiple data sets in a unified analytical and visual representation.
  • Data Silos: According to Tech Target, a data silo is “data that is under the control of one department or person and is isolated from the rest of the organization.” Data silos are a bottleneck for effective business operations.
  • Data Sources: The source where the data to be analyzed comes from. It can be a file, a database, a dataset, etc. Modern BI solutions like Necto can mashup data from multiple data sources.
  • Data Visualization: The graphic visualization of data. Can include traditional forms like graphs and charts, and modern forms like infographics.
  • Data Warehouse: A relational database that integrates data from multiple sources within a company.
  • Embedded Analytics: The integration of reporting and data analytic capabilities in a BI solution. Users can access full data analysis capabilities without having to leave their BI platform.
  • Excel Hell: A situation where the enterprise is full of unnecessary copies of data, thousands of spreadsheets get shared, and no one knows with certainty which is the most updated and real version of the data.
  • Federated Business Intelligence: A BI model where users work in separate desktops, creating data silos and unnecessary copies of data, leading to multiple versions of the truth.
  • Geo-analytic capabilities: The ability that a BI or data discovery tool has to analyze data by geographical area and reflect such analysis on maps on the user’s dashboard.
  • Infographics: Visual representations of data that are easily understandable and drive engagement.
  • Insights: According to Forrester Research, insights are “actionable knowledge in the context of a process or decision.”
  • KPI: Key Performance Indicator. A quantifiable measure that a business uses to determine how well it meets the set operational and strategic goals. KPIs give managers insights of what is happening at any specific moment and allow them to see in what direction things are going.
  • Modern BI: An approach to BI using state of the art technology, providing a centralized and secure platform where business users can enjoy self-service capabilities and IT can govern over data security.
  • OLAP: Stands for Online Analytical Processing and it is a technology for data discovery invented by Panorama Software and then sold to Microsoft in 1996. It has many capabilities, such as complex analytics, predictive “what if” scenario planning, and limitless report viewing.
  • Scalability: The ability of a BI solution to be used by a larger number of users as time passes.
  • Self-Service BI: An approach that allows business users to access and work with data sources even though they do not have an analyst or computer science background. They can access, profile, prepare, integrate, curate, model, and enrich data for analysis and consumption by BI platforms. In order to have successful self-service BI, the BI tool must be centralized and governed by IT.
  • Smart Data: Smaller data sets from Big Data that are valuable to the enterprise and can be turned into actionable data.
  • Smart Data Discovery: The processing and analysis of Smart Data to discover insights that can be turned into actions to make data-driven decisions in an organization.
  • Social BI: An approach where social media capabilities, such as social networking, crowdsourcing, and thread-based discussions are embedded into Business Intelligence so that users can communicate and share insights.
  • Social Enterprise: An enterprise that has a new level of corporate connectivity, leveraging the social grid to share and collaborate on information and ideas. It drives a more efficient operation where problems are uncovered and fixed before they can affect the revenue streams.
  • SQL: Stands for Standardized Query Language. It is a language used in programming for managing relational databases and data manipulation.
  • State of the Art BI: The highest level of technology, the most up-to date features, and the best analysis capabilities in a Business Intelligence solution.
  • Suggestive Discovery Engine: An engine behind the program that recommends to the users the most relevant insights to focus on, based on personal preferences and behavior.
  • Systems of Insight: This is a term coined by Boris Evelson, VP of Forrester Research. It is a Business Intelligence system that combines data availability with business agility, where both IT and business users work together to achieve their goals.
  • Workboards: An interactive data visualization tool. It is like a dashboard that displays the current status of KPIs and other data analysis, with the possibility to work directly on it and do further analysis.