Data Analysis: Automation And Techniques For Best Practices
Chapter 4: Analysis and Comparison
The process of data Analysis becomes complete once the collected data is analyzed and interpreted accordingly. The results of the analysis is what dictates on the actions that must be taken, and informs on societal needs and characteristics. To put it short, data analysis reveals many features of the sampled population and directs on appropriate course of action.
Analysis of voluminous data requires high expertise and care. Analyzing complex data in the present day scientific world is an uphill task, and in order to avoid any problems such as data redundancy, data defects, standardize data analysis procedures and perform casual analysis, automated methods have been developed and deployed for use in all process that data comes in in human life- ranging from data collection, analysis, reporting and interpretation of the results (John, 2017). This has been made possible by technological innovations that have enabled the development of software applications being used in these processes. Such technologies as Artificial Intelligence (AI) and other similar technologies have helped reduce the costs incurred in data analysis and related processes as well as reducing error affinity in the processes.
The human race have been, and are in an endless journey to automate virtually all human processes and activities, and so is data analysis domain. Automation of data analysis envisions a “drag and drop” kind of a scenario in which an analyst in a box would just best practices in data analysis in a systematic way, generate logs of all processes and stages with accompanying reasons and justifications for the respective stages of analysis (Arundat, et al., 2008). With automation in data analysis, data analysts would just have to reports and interpretations through open source applications such as R- application.
Automation of data analysis means automating best practices that would lead to automatically performing relevant and complex statistical calculations and reporting or interpreting results of such analyses in a clear, concise, and in a way that it is accessible to the outside world. Pattern recognition, computer vision, and artificial intelligence are examples of the many complex computer innovations and developments that have helped automate the many arduous tasks of identifying mathematical models that present data characteristics and features, and candidates sufficient for fitting and evaluating this data too.
Greet Peersman suggest that best methods need be chosen to specifically match particular evaluation of data in regard to its key evaluation questions (KEQs) and available resources (Greet, 2014). In this article, presented to the UNICEF, Greet suggest for a mixed method of evaluation and support this argument by arguing that mixed methods of evaluation minimizes errors and weakness that are inherent in individual methods. Such methods include qualitative and quantitative methodologies and that which could be systematically integrated to produce best results ever.
Data analysis methods could be divided into two: software for computational statistics that includes statistical software and computational statistical software systems that includes computer systems such as Artificial intelligence and Expert (statistical) systems.
Pattern Recognition
Nature of patterns in inherent data is bound to change as channels for data changes increase and become more diversified in any environment. The changes in data patterns leads to increased complexity and more challenges in data mining thanks to several reasons: sensitive data might be originating from different sources and it could be hard to amalgamate or merge them, and even if they were to be merged, analysis of this data would still be hard going by the fact that local properties possessed by data must be retained. Pattern recognition therefore comes to our aid at such a time by giving rise a new suit of highly efficient mining patterns that lead to the discovery of important patterns in data acquired from different sources (Animesh, et al., 2014).
4.1 Analysis of Solutions
In disciplines related to science and engineering, pattern recognition techniques helps in activities such as automatic machine recognition, classification, pattern grouping and description of patterns. Pattern recognition has been applied successfully in such disciplines as medicine, biology, chemistry, marketing and psychology as well, where a pattern is defined as a single entity, though unclearly defined, but which could be named and the opposite of chaos. Patterns appear in any formats including but not limited to fingerprints, handwritten cursive words, speech or voice signal and human face.
Pattern recognition is achievable through two techniques: supervised and unsupervised learning. In the supervised learning, also called discriminant analysis, the inputted pattern can only belong to, and is identified as a member belonging to predefined class during learning/training process. Unsupervised learning on the other hand refers to a pattern recognition analysis technique in which the pattern under analysis is assigned or assumed to belong to an unknown class.
Pattern recognition as a data analysis technique has been spurred in the recent past by the many emerging applications such as correlation, documenting classification, among other applications and which have proved to be complex, challenging, and computationally demanding. The table below presents a summary of the many applications in which pattern recognition is applied in our present day world.
Table 1: application of pattern recognition techniques
Domain of the Problem |
How and where pattern recognition is applied |
Input Pattern (training set) |
Pattern Class |
Bioinformatics |
Sequence analysis |
DNA/protein sequence and matching or analysis |
Known types of genes, and patterns |
Data mining |
Searching for meaningful patterns from data |
Points from multidimensional pace |
Compact, well separated clusters |
Analysis of Document Image |
Reading machine for the visually impaired |
Document image |
Alphanumeric characters and words |
Info retrieval from multimedia database |
Internet search |
Video clip |
Different video genres |
Remote sensing |
Forecasting crop yields |
Multispectral image |
Land use categories, crop growth patterns |
Speech recognition |
Telephony based enquiry, security and privacy |
Speech waveforms |
Words spoken by an individual |
Industrial automation |
Inspecting printed circuit board. |
Range and/or intensity image |
Defective or non-defective nature of a product. |
Pattern recognition, as a statistical analysis technique has been employed in a successful manner in the design of efficient commercial recognition systems. In the statistics, patterns are represented using a set of features that are of known dimensional vectors. Well known and established concepts used in statistical decision theory are then employed in order to establish boundaries between the unmasked patterns.
Recognition of patterns is realized using two different methods. The first method involves training algorithms and classification, also called testing process. Features found to be appropriate representatives of input patterns are extracted during this process. Using these features, the classifying algorithm- simply the classifier- is then trained on how to do partitioning on the feature space while with the classification mode, a trained classifier is used to assign an input pattern to one of the found/existing pattern classes following considerations that are based on some given measured features. The figure below represent a model for statistical pattern recognition method of data analysis.
A given pattern is assigned to one of the elements of a set representing the different categories that are based on vectors of known feature values. These features, it is assumed of them, that they have a probability density or mass that which depends on the nature of the features themselves- either continuous or discrete. As such, statistical analysis makes sure that each pattern will belong to a single category/or class of certain features.
Artificial intelligence
Decision analysis- as an operational research, management science and statistical decision theory is increasingly becoming more engaged in a main process of trying to eliminate the intuitive processes carried out during decision making processes with formal and scientific approaches. The science behind Artificial Intelligence revolves around the development of methods and systems that exhibit and display human behavior including human intelligence or thinking and reasoning capacity. In a nutshell, AI systems are built with an aim of imitating the human behavior of thinking, reasoning, learning, sensing and language processing.
4.2 Comparison Criteria
The ‘arms race’ toward the realization of this dream has led to the development of artificial intelligence, a technique that has been used in the analysis of more complex and voluminous data using techniques such as Natural Language Processing (NLP), expert systems, and intelligent knowledge based systems among many others. Artificial intelligence is defined as the ability of a machine to possess and exhibit human abilities, features and intelligence (Dewhurst & Cwinnett, 1990). Machines, robots and compute or computer systems that are termed to be intelligent or to exhibit human behavior have the ability to reason, learn, solve problem, draw perceptions and process the natural human language. Perhaps, these are some of the characteristics that researchers found it AI based systems and that which made them fit for data and solution analysis in the recent past (Copeland, 2018).
Artificial intelligence has brought around tremendous developments in the processes concerned with automating data analytics. DeepMind, for example, have developed a game system that has been on record of beating human player using AI techniques. Advances in AI too have propelled advancements in other areas like deep learning, reinforcement learning techniques which have been found to have the ability of automatically learning, recognizing and classifying images, texts, and speech from different channels with great precision and accuracy.
It is important to note that a huge difference exist between AI applications that learn the gaming and one that are used in data analytics, while the later function and are controlled by data that result from the human body or that which is provided for, there are other AI systems that are driven by language scenarios and structures that are only understandable by humans only.
What makes artificial intelligence the best and ultimate resort to complex data and problem analysis is its learning ability- AI based systems learn progressively through any given scenario, store results of such execution for future referencing.
It is with no doubt that the fundamental changes caused by Artificial Intelligence will span through both short terms and long terms, and will go along as long as the world increases its race towards realizing full automation, like it has been witnessed in the twenty-first century. Artificial intelligence techniques serves best to train scenarios for futuristic purposes and to analyze every possible solution before settling on them. Artificial intelligence revolves around two main characteristics as described below.
Symbolic Processing
Computers in the Artificial Intelligence world processes symbols instead of numbers or letters, like humans do. The processing is executed on a series of strings that in actual sense represent a real-world situation/scenario. Symbols to be processes are arranged into structures such as lists, networks and hierarchies- among many other structures. The basic idea behind structuring the symbol is to reveal the interrelations between string symbols.
Nonalgorithmic Processing
Outside the world of Artificial Intelligence is computer programs and applications that run through programmed algorithms. These algorithms fully specify, in a step-wise and discrete manner, the nature of execution of commands and instructions within a program. On the opposite hand are knowledge-based AI systems that depend on the degree of the situation in which it is used.
4.3 Comparison of the Approaches
Expert Systems
Expert systems are data driven management information system which are used by experts to analyze and interpret data, and give recommendations appropriately. These systems can also be defined as a computer program that simulates human behavior and characteristics through use of Artificial Intelligence technologies. Data explorer is an example of an expert system that has been built for statistical and analytical work.
Data Explorer is one such example of an expert system which builds on the versatility nature and the power of statistical software applications that are in existents. The system is used in the field of medicine to provide automated analysis and interpretation of such results from medical related data. Data explorer uses powerful system of network methods instead of the traditionally manual, difficult to automate and classical multivariate statistical techniques. The system works by identifying important relationships between and among variables and then resorts to power-size analysis, belief network inference and learning methods to help users understand the findings.
Below is a conceptual model or representation of how Data explorer works.
Expert systems have been used in many industries to provide such services as financial analysis, healthcare advice and projections, customer services, telecommunications, video gaming services, manufacturing, transportation, aviation, and written communication including many other services. In the field of medicine, such systems as Dendral and MYCIN which have helped chemists discover and identify organic molecules and bacteria respectively.
Apart from computer based systems for analyzing data and solutions automatically, there are software based application that can be installed in one’ s personal computer and are relatively easy to use. Such include the R- Program, Orange data mining, and the Weka data mining.
The R- Project
R- Project is a statistical tool that was developed by Robert Gentleman and Ross Ihaka using the S programming language that was then the statistical programming language. R is a free, open source software.
Features of R:
R-Project is a statistical analytical tool that uses strong object oriented facilities than any other statistical computing platform has ever done. It is easily/highly extensible. The application can also be linked or integrated with common programming languages. Examples include C++ and Java. The extension and linking capability makes it easy for R to be embedded within applications.
The main advantageous thing with R-Project is its extensibility. System developers and data engineers can easily write and distribute their own software. Perhaps, this explains why thousands of R packages are in existent today. Additionally, many statistical methods have been published, most of which bear R-Project’s attachments.
Today, R-Project tool is being used by millions of data analysts worldwide with most uses and application being in research and development related activities in contrast to large-scale production and analytical processes.
Limitations of R-Project
Scalability- R- Project has not been in a position to scale up to the level of other commercial analytical tools, and in accordance with the developments that have been realized in the recent past. It is too unfortunate to note that while most applications are currently running data against memory, R-Project is still running software in memory.
Chapter 5: Scenarios and Recommendations
R-Project can only handle data sets that are of the same size as they available memory in the machine, and since many machines do not support large memory sizes, they then experience lot of difficulties when working with R. memory requirements by R-Project is a key issue, even with the very expensive machines, they still find it hard to work with R. it is a heavyweight application.
Using R programming language is a fairly intensive task and a long process. Despite the fact that there are some tangible graphical interfaces that accompany R-Project, some developers still have to go through the hectic process of writing code since the provided user interfaces are not user friendly- they are immature, compared to interfaces from other commercial analytical tools.
SPSS Analytical Tool
The giant analytical tool was first in the year 1968 under the name Statistical Package for Social Sciences. Its name has since changed to IBM SPSS following the acquisition of the SPSS business by the IBM one. This application is known for its friendly user interface that brings forth intuitive functionalities and features.
Just like its name suggests, SPSS main area of reflection was initially in the original market but new developments have expanded the application, and enabled it to cover other fields as well. The main difference between SPSS and R application is that SPSS does not have all functionalities encompassed in R but its syntax and database format make it compatible with R. moreover, SPSS can handle voluminous and complex data.
Features Associated with SPSS:
Commands in SPSS are executed in a sequential manner- line by line while updating tables and adding results of analysis to the output window(s). This output window also provides options for storing the executed syntaxes together with their execution timestamps. SPSS has the ability to write to and reads from ASCII files, tables, databases and any other statistical software application. Likewise, SPSS statistics have the ability of reading from and writing to external relational databases. The application too provides functions that help in managing data such as sorting, merging, aggregating, transposition among many others.
IBMS SPSS has the capability of sending output of the analysis processes directly to a file instead of using an output window. The application is available in multiple versions and that which are compatible with different operating systems such as Windows, Mac OS X and UNIX.
The SAS Software Suite
SAS-Statistical Analysis System (SAS) is a statistical tool that was developed in 1976 in the IBM mainframe, as a tool to handle large and voluminous data volumes. Its capacity to handle data increased with the implementation of a parallel architecture in 1996.
This is a software suite that can be used to mine, alter, manage and retrieve data from multiple sources and perform any statistical analysis on the retrieved data. As an information delivery system, SAS is a representation of modular integrated package that is independent of computer hardware. SAS provides an independently and sufficiently broad environment used to organize databases- thus enabling data analysts with the capability of transforming datasets into useful information as it deems fit. Once transformed, decision making process becomes a walk in the park.
Pattern Recognition
The SAS program, as a software suite contains procedures, macros and data steps that collaborate to provide a highly comprehensive set of functions, and enables users to import/export data files in different formats.
Features Associated With SAS
- Statistics –the SAS application offers a variety of statistical software that range from the traditional analysis to the present day dynamic data visualization approaches. Through these software, companies can well manage their customer base.
- Data and Text Mining – organizations and businesses collect data from multiple and different channels. For accuracy and efficiency purpose, these organizations are turning to data mining and using it to develop more informed strategies and make sound management decisions.
- Another feature associated with SAS is data visualization. This provides user friendly interfaces through which advanced and highly effective analytical capabilities are realized.
- Forecasting –the SAS application comes along with capabilities that support data forecasting of all types, either for fort goal or long term purposes.
- Optimization – through its powerful tools, SAS provides room for optimization, scheduling of projects, and simulation techniques that are important in achieving maximum and dependable results.
- By default, SAS is designed to deliver universal data access. This is made possible through user-friendly user interfaces that increase functionalities of the application software. Through a variety of analysis procedures, users are able to navigate through data leading. Increased navigation leads to reading of more concise information from available data, and that which is successfully analyzed.
Decision Making- IBM SPSS Statistics software application has advantage over SAS. This occurs in both the prices associated with these software and in their ability of obtaining answer trees from decision tress while eliminating the need of buying in from mining suites. not only in its lower price, but also in the possibility of obtaining Answer tree for decision trees without having to buy the data mining suite. The IBM SPSS is a powerful application if these factors are to be considered. Anybody willing to construct decision trees with the SAS software, they have to go an additional mile to purchase another application to make it work best- enterprise Miner.
The IBM SPSS is additionally a more competitive application for decision tree creation when compared to R which doesn’t offer decision tree algorithms, with most of its packages implement CART only and its interfaces are not user-friendly ones.
Data Management- SAS has an upper edge over IBM SPSS in data management wise. The application is too far better than R programming which has a major throw back that emanates from its functions having to first load data into its memory before execution begins. This creates a set of limits and boundaries over what volume of data and amount that can be handled. Some packages, however, have started to break from this constraint. An example of such is linear models and related packages.
Documentation- the R application has easily and readily available, elaborate and easy to understand documentation files. SPSS on its side lacks this feature probably because of its limited use. SAS on the other hand consists of a more than 8000 page-comprehensive technical documentation.
SAS is an application that is widely used in most organizations and big enterprises than IBM SPSS. It has, therefore, many devoted resources and materials. Such materials include forums, user clubs, trainers, websites, macro libraries, and books. However, the R application remains and stands strong among the three applications thanks to its features like depreciation, compound interest, cash flow, hyperbolic functions, factorials, combinations and arrangements not forgetting its open source availability to the community.
The main difference between these algorithms and software applications or computer systems solely lies in the architectural design, software application specification and design which dictates on the functionalities that each offers. However, Artificial intelligence system, Expert Systems and other computer based systems could be tailored to meet specific needs of a given discipline or domain of study. To the contrary, software based statistical data analysis methods can only be installed in one’s computers and can never be ‘edited’ or changed by the user as a way of trying to make them meet other needs. Additionally, these applications are not best fit to handle huge, voluminous and complicated data like the later do.
Artificial Intelligence technologies come in two ways- either general or narrow AI systems. General artificial intelligence systems exhibit all the characteristics of human intelligence which includes language processing, learning and perception. Narrow Artificial Intelligence applications on the other hand exhibit some- not all- human characters and abilities. The best thing about these systems is that they are really good in portraying the features of the human character embedded in them, they are like experts with specialization in specific fields.
Artificial Intelligence
Machine learning is such one way through which artificial intelligence has been realized. This is so because, by definition, machine learning involves automating automation or creating the ability to learn without explicit programming. Machine learning saves us the tasks associated with hard coding by allowing us to train an algorithm so that it can accomplish particular and specific tasks. Training an algorithm involves feeding it with voluminous data set and allowing it to adjust accordingly.
Drastic improvements and changes have been realized in computer vision technology where thousands of pictures with and embed them with human tags.
Deep learning is taking currently at the stage of taking space. As one of the many approaches to machine learning, deep learning includes use of decision trees, inductive programming, reinforcement, clustering learning as well as Bayesian network among other technologies. The inspiration behind deep learning is the functioning of the human brain, through the interconnections of neurons. The inspiration resulted in designing of artificial neural networks which try to mimic the human neural network system. ANN have discrete layers and interconnection among millions of “neurons” with each layer in the network picking out specific features to learn.
Machine learning and artificial intelligence have jointly led to increased capabilities over the recent years. Predictive analytics help glean meaningful business insights using sensor-based and structured data and unstructured data as well like in the case with unlabeled texts and video, used in mining customer information and sentiments.
Data access has been in the increase in the past years thanks to the wave shift that has taken data engineers towards embracing of cognitive cloud platforms. This has in turn created an array of advancements in analytics and artificial intelligence- powered systems and services that are more accessible to both medium and small sized organizations.
Below are five case studies and scenarios in which these approaches have been used.
Global tech Light Emitting Diode Company: Google Analytics Instant Activation of marketing-
This is a global company that is based in Bonita Springs in Florida, USA is a globally known company that specializes in lighting design and supplying of light emitting diodes retrofits and commercial space fixtures LEDs. The company uses google analytics in the following ways:
- Google Analytics’ Smart Lists- used for automatic identification of Global tech LED prospectives and who the firm can engage most in addition to remarketing users with more targeted products.
- The conversion optimizer is used to adjust potential customer bids automatically, leading to increases conversions or sales.
Value proposition
- The remarketing campaigns that were initiated by Smart Lists increased the number of page clicks five folds.
- This campaign was more than two times other remarketing campaigns.
- The strategy led to increased traffic in the company’s website- with more than 100%. This enabled the firm to re-engage users in the markets in more proactive ways.
- The conversion optimizer usage was an opportunity for the firm to allocate market costs and resource more efficiently, based on bid potentials from their website.
Under Armour: IBM Watson Cognitive Computing
This is an application that was built using IBM Watson cognitive computing platform, and was designed to offer the services of a health assistant through provision of real time data. Data that is provided on sensor and is manually input for sleeping, fitness and nutrition purpose. The application further reads geospatial data to determine nature of weather at a particular time so as to determine how these elements may affect training. Users are also provided with the capability of seeing health insights shared in the platform by other users as well.
Plexure: Internet of Things and Azure Stream Analytics
Plexure (initially called as VMob), is a New Zealand-based media company which uses real-time data technology and analytics to help many companies to tailor their marketing strategies to individual target customers and optimize on transaction related processes.
Supervised and Unsupervised Learning
One notable case where this technology has been successfully used is in the case of McDonald which saw its customer engagement rise to cover new geographic markets in Sweden, New Zealand, and other areas in the region.
The company further employed Azure Analytics tool to analyze over 40 million end points in its big data store, a process through which customer behavior and patterns were honed. With this information, it was very possible to target ads to reach target audience and customers were easily reached. An additional advantage came from the combination of these two technologies with the client’s mobile app thus analyzing contextual information and social engagements to further help in customizing UX.
Coca-Cola Company Amatil: Trax Retail Execution System
Coca-Cola is known worldwide for its bottling and distribution of non-alcoholic beverages. Coca-Cola Company Amatil stands as the largest and the main branch of the Coca-Cola that is based in the Asian Pacific region. Prior to the deployment of the Trax’s imaging technology, the branch used to rely on manual methods of measuring products in its stores. This process had its own limitations: for one it was an unreliable method that always caused delays of data that was being sourced from and through mobile communications.
Sales representatives at the Coca-Cola Amatil used the Trax Retail Execution- which is an image based technology to take the pictures of shelves in the stores using their mobile devices and sent them to the Trax Cloud. Once received by the cloud platform, the pictures were then analyzed to produce informative reports within a flash of a minute. These reports have helped the sales department in the branch to plan for more online assessments and provide management solutions to the branch managers.
Value proposition:
The Trax system opened an opportunity for the branch sales department to take real images of current stock and rapidly produce reports through which their eyes were opened to the existing gaps in the market. The reports also helped the department to apply corrective measures in the store as a way of bridging the discovered gaps. In addition to this, these reports further provided share important insights about shelf conditions, thus allowing the sales representatives to strategically plan for the opportunities in the shelves. The net effect of the undertaking was an additional 1.3% increase in the market share.
Peter Glenn’s AgilOne Advanced Analytics
Known for his provision of his outdoor apparel and gear to individual and wholesale services to individual customers for the last fifty years, supplying them with brick-and-mortar in areas along the east coast of Alaska as well as South Beach.
AgilOne Analytics’ Dashboard analytics platform provides clients with a highly consolidated and integrated view of channels; whether online or offline. This capability alone enabled him to have a close and critical monitoring of trends between buyers and buyer groups. The end result was him being in a position to make better segregated decisions that were a target to individual buyers and buyer groups. Among the advanced segmentation that are included in the AgilOne analytics software are customer household data, their segment value as well as proximity to any of his brick and mortar center(s).
With this crucial information at hand, Peter Glenn moves a step forward to use it in the launching of integrated promotional, triggered, and lifecycle campaigns across the online and offline channels, with a sole goal of increasing sales during lean or low seasons and increasing in-store traffic too.
Value proposition of the Software
The systems data quality engine combs through the customers’ database to unmask customer desires and demographic features. Through this system, the company learnt that over 80% of its customer base had collapsed- customers had lost faith and trust in it and were shifting to other alternatives. With this information, the company strategized on re-targeting and re-engaging its ‘stagnant’ clients.
The firm reported a thirty percent increase in Average Order Value, thanks to automated marketing that was engineered through the application. Moreover, access to data points gave the firm an opportunity to target more customers, and sensitizing them on store events using advanced segmentation techniques and a more aligned channel marketing strategies.
Chapter 6: Conclusions
State of the art complex data analysis is slow, expensive, error-prone, and often unconvincing. Automation of complex data analysis can save time, save money, reduce or eliminate errors, save lives, increase persuasiveness, and enable third-party auditing of results. Computational based applications are taking the world of statistics and data analytics at a neck breaking speed. As discussed in the given examples, artificial intelligence and strong computational algorithms plus experts systems are being used to drive key organizational strategies and decisions making them the best fit for use. It will be therefore wise for the world to align its technological developments towards tapping the high analytics potential that comes along with these tools.
References:
Animesh, A., Jhimli, A. & Witold, P., 2014. Data Analysis and Pattern Recognition in Multiple Databases. s.l.:s.n.
Arundat, M. D., Amit, V. & Sumanta, B., 2008. Automated data analysis and recommendation system and method. [Online]
Available at: https://patents.google.com/patent/US20100153124
[Accessed 06 July 2018].
Copeland, B. J., 2018. Artificial Intelligence. [Online]
Available at: https://www.britannica.com/technology/artificial-intelligence
[Accessed 07 july 2018].
Dewhurst, F. W. & Cwinnett, E. A., 1990. Artificial Intelligence and Decision Analysis. The Journal of the Operational Research Society, August, 41(8), pp. 693-701.
Greet, P., 2014. Overview: Data Collection and Analysis Methods in Impact Evaluation. [Online]
Available at: https://www.unicef-irc.org/publications/pdf/brief_10_data_collection_analysis_eng.pdf
[Accessed 06 July 2018].
John, F. M., 2017. Automating Complex Data Analysis: A Disruptive Business Opportunity. [Online]
Available at: https://www.mathematical-software.com/
[Accessed 06 July 2018].