Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Transaction
Validation and Analysis January 14, 2020 _________________________________________
A Major Qualifying Project Report Submitted to the Faculty of Worcester Polytechnic Institute In partial
fulfilment of the requirements for the Degree of Bachelor of Science.
Project ID:
14985
Project Team
Manasi Danke CS
Ethan Merrill MGE
Joseph Yuen CS
Project Advisors:
Michael Ginzberg, Business Department
Robert Sarnie, Business Department
Wilson Wong, Computer Science Department
Sponsored by:
Hedge Fund Company
i
Acknowledgements
First, we would like to thank our sponsor for the amazing opportunity to learn about financial
technology and to assimilate into the company culture.
We would also like to thank our WPI advisors Professor Michael Ginzberg, Robert Sarnie, and Wilson
Wong for their availability and support. Our regular meetings with them encouraged us and taught us
how to be agile in the financial industry.
Lastly, we would like to thank the open source community for their extensive documentation and
tutorials. We were able to learn a wide array of new technologies due to these valuable resources.
Thank you,
Manasi Danke
Ethan Merrill
Joseph Yuen
ii
Table of Contents
Table of Contents .......................................................................................................................................... ii
Table of Figures ........................................................................................................................................... vii
Abstract ...................................................................................................................................................... viii
Executive Summary ...................................................................................................................................... ix
1. Introduction .............................................................................................................................................. 1
1.1 Problem ............................................................................................................................................... 1
Thread 1 (Winners and Losers Report Update) .................................................................................... 1
Thread 2 (Azure Validation Dashboard) ................................................................................................ 1
1.2 Goals ................................................................................................................................................... 1
1.3 Deliverables ......................................................................................................................................... 2
Thread 1 ................................................................................................................................................ 2
Thread 2 ................................................................................................................................................ 2
2. Background ............................................................................................................................................... 3
2.1 Finance Industry .................................................................................................................................. 3
2.1.1 Financial Reporting ...................................................................................................................... 3
2.1.2 Accounting Validation .................................................................................................................. 3
2.1.3 Current System ............................................................................................................................ 3
2.2.4 Previous Work .............................................................................................................................. 4
2.3 Software Development Environment ................................................................................................. 5
2.3.1 Python .......................................................................................................................................... 5
2.3.2 Pandas .......................................................................................................................................... 5
2.3.3 Anaconda ..................................................................................................................................... 6
2.3.4 Apache Spark and Databricks....................................................................................................... 6
2.3.5 Power Business Intelligence (Power BI) ....................................................................................... 6
2.3.6 Microsoft Azure Data Lake ........................................................................................................... 7
2.3.7 Project Management Tools .......................................................................................................... 7
2.3.8 Source Control ............................................................................................................................. 8
3. Methodology ............................................................................................................................................. 9
3.1 Project Management .......................................................................................................................... 9
3.2 Choosing a Methodology .................................................................................................................... 9
iii
3.2.1 Scrum ......................................................................................................................................... 10
3.2.2 Risk Management ...................................................................................................................... 13
3.2.3 Requirements Gathering ............................................................................................................ 13
4. Requirements Gathering ......................................................................................................................... 14
4.1 Sprint Planning Meetings .................................................................................................................. 14
4.2 Sponsor Communication ................................................................................................................... 14
4.2.1 Daily Scrums ............................................................................................................................... 14
4.2.2 Product Demonstrations ............................................................................................................ 14
4.2.3 Interviews ................................................................................................................................... 14
5. Analysis ................................................................................................................................................... 15
5.1 Epics & Themes ................................................................................................................................. 15
Epic 1: Improve Winners Losers Report Generator ............................................................................ 15
Epic 2: Azure Validation Dashboard (Validator) .................................................................................. 15
5.2 User Stories ....................................................................................................................................... 16
6. Design ...................................................................................................................................................... 17
6.1 System Architecture .......................................................................................................................... 17
6.2 Data Flow Diagram (DFD) .................................................................................................................. 17
6.3 Entity Relationship Diagram (ERD) .................................................................................................... 19
6.4 Use Case Diagrams ............................................................................................................................ 23
6.5 User Interface Structure Diagram ..................................................................................................... 24
6.6 User Experience ................................................................................................................................ 25
6.6.1 Home .......................................................................................................................................... 25
6.6.2 Commentary .............................................................................................................................. 26
6.6.3 Alerts .......................................................................................................................................... 27
6.6.4 History ........................................................................................................................................ 28
6.7 Design Patterns ................................................................................................................................. 29
6.7.1 Strategy Pattern ......................................................................................................................... 29
7. Implementation ...................................................................................................................................... 30
Pre-Qualifying Project Work ................................................................................................................... 30
Sprint 1 .................................................................................................................................................... 30
User Stories Completed: ..................................................................................................................... 30
Sprint Review ...................................................................................................................................... 30
Sprint Retrospective Meeting ............................................................................................................. 30
iv
Sprint 2 .................................................................................................................................................... 31
User Stories Completed: ..................................................................................................................... 31
Sprint Review ...................................................................................................................................... 32
Sprint Retrospective Meeting ............................................................................................................. 32
Sprint 3 .................................................................................................................................................... 33
User Stories Completed: ..................................................................................................................... 33
Sprint Review ...................................................................................................................................... 33
Sprint Retrospective Meeting ............................................................................................................. 34
Project Risks ........................................................................................................................................ 34
Sprint 4 .................................................................................................................................................... 34
User Stories Completed: ..................................................................................................................... 34
Sprint Review ...................................................................................................................................... 35
Sprint Retrospective Meeting ............................................................................................................. 36
Project Risks ........................................................................................................................................ 36
Sprint 5 .................................................................................................................................................... 36
User Stories: ........................................................................................................................................ 36
Sprint Review ...................................................................................................................................... 37
Sprint Retrospective Meeting ............................................................................................................. 38
Project Risks ........................................................................................................................................ 38
Sprint 6 .................................................................................................................................................... 38
User Stories: ........................................................................................................................................ 38
Sprint Review ...................................................................................................................................... 40
Sprint Retrospective Meeting ............................................................................................................. 40
Project Risks ........................................................................................................................................ 40
Weekly Burndown ................................................................................................................................... 41
8. Testing ..................................................................................................................................................... 43
8.1 Quality Assurance Procedure ............................................................................................................ 43
8.2 User Feedback ................................................................................................................................... 43
9. Future Work ............................................................................................................................................ 44
9.1 Thread 1 ............................................................................................................................................ 44
9.1.1 Modularize Strategies Further ................................................................................................... 44
9.1.2 Modularize Pre-Processing Functions Further ........................................................................... 44
9.1.3 Determine User Base ................................................................................................................. 44
v
9.2 Thread 2 ............................................................................................................................................ 44
9.2.1 Add More Timeseries Data to Datalake ..................................................................................... 44
9.2.2 Schedule Script ........................................................................................................................... 44
9.2.3 Add More Alerts and Analysis .................................................................................................... 45
9.2.4 Add More Fields to Data Frame ................................................................................................. 45
9.2.5 Create Summary Page ................................................................................................................ 45
10. Learning Assessment............................................................................................................................. 46
10.1 Challenges ....................................................................................................................................... 46
1. Identifying Requirements................................................................................................................ 46
2. Planning VS Execution ..................................................................................................................... 46
3. Domain Knowledge ......................................................................................................................... 46
4. Optimization.................................................................................................................................... 46
10.2 Learnings ......................................................................................................................................... 47
10.2.1 Computer Science .................................................................................................................... 47
10.2.2 Project Management ............................................................................................................... 48
10.3 What we would do differently ........................................................................................................ 48
1. Determine needs of client – priorities and whether it is a want or a need .................................... 48
2. Establish capability of tools with client ........................................................................................... 49
3. Testing ............................................................................................................................................. 49
4. Team Communication ..................................................................................................................... 49
5. Technical Mentors ........................................................................................................................... 49
11. Conclusion ............................................................................................................................................. 50
Works Cited ................................................................................................................................................. 51
Appendix ..................................................................................................................................................... 53
APPENDIX A: User Stories ....................................................................................................................... 53
APPENDIX B: Project Risks Per Sprint ...................................................................................................... 60
APPENDIX C: Interview 1 with Firm Accountants ................................................................................... 62
APPENDIX E: Financial Terminology ........................................................................................................ 64
Asset Valuation ................................................................................................................................... 64
Internal Rate of Return (IRR) ............................................................................................................... 64
Multiple of Invested Capital (MOIC) ................................................................................................... 64
Gross Profit ......................................................................................................................................... 64
Remaining Market Value (remMV) ..................................................................................................... 64
vi
Total Cost ............................................................................................................................................ 64
Total Sales ........................................................................................................................................... 64
Total Terminal Value ........................................................................................................................... 65
Return Period ...................................................................................................................................... 65
APPENDIX F: Site Map ............................................................................................................................. 65
APPENDIX G: Site Structure Diagram ...................................................................................................... 66
vii
Table of Figures
Figure 2.0 Financial Structure.......................................................................................................................5
Figure 3.0 Methodology Comparison Chart................................................................................................10
Figure 3.1 Product Backlog........................................................................................................................12
Figure 3.2 Burndown Chart Guide..............................................................................................................13
Figure 3.3 Risk Management Framework ..................................................................................................14
Figure 6.0 System Architecture Diagram ..................................................................................................19
Figure 6.1 Context and Level 0 Diagram ...................................................................................................20
Figure 6.2 Data Flow Diagram Level 1 .......................................................................................................21
Figure 6.3 As Is Data Lake Entity Relationship Diagram.............................................................................23
Figure 6.4 New Data Lake Entity Relationship Diagram.............................................................................24
Figure 6.5 results_and_flows Data Frame Entity Relationship Diagram....................................................26
Figure 6.6 irr_timeseries Data Frame Entity Relationship Diagram ….......................................................27
Figure 6.7 Use Case Diagram.....…..............................................................................................................28
Figure 6.8 User Interface Structure Diagram..............................................................................................29
Figure 6.9 Power BI Home..........................................................................................................................30
Figure 6.10 Power BI Commentary.............................................................................................................30
Figure 6.11 Power BI Alerts........................................................................................................................32
Figure 6.12 Power BI History Timeseries………………………………………..………………………………………………………33
viii
Abstract
While working for a large investment firm, our team worked on two projects (‘threads’), both of which
enhanced the company’s reporting capabilities. We updated the firm’s system for portfolio performance
tracking and reporting and created a tool to confirm, automate, and customize commentary describing
investment performance. For our first thread, we provided documentation on how to utilize and
manipulate code to add and auto populate data for new columns on the Winners and Losers Report. For
our second thread, we created an Azure Validation dashboard to work with the firm’s new cloud data
management infrastructure and operate consistently with the other firm systems. We developed scripts
to validate transaction data, generate commentary on top contributors and detractors for gross profit
month over month, and utilize timeseries data to investigate trends and perform statistical analysis. Our
dashboard visualizes data with Microsoft Power Business Intelligence and provides users the ability to
customize their view and drill through the raw data to find what caused certain alerts and movement in
performance of strategies. Throughout the project, we used Agile Scrum to work in a team of three
members to deliver and document software solutions that provide efficiency and flexibility for the firm,
its information technology analysts, its accountants, and its clients.
ix
Executive Summary
The firm is an alternative investment manager that focuses on credit, private equity, real estate, and
multi-Strategy. As a company in the financial technology industry, it utilizes reporting tools, data
analytics, and financial indicators to evaluate how well its strategies are performing. It has a Winners
and Losers report along with a product that the past MQP team created to generate and customize the
document. The past MQP aimed to simplify the workflow and interface in generating these reports, but
customization components of the project were not being fully utilized as they had outdated setup and
use instructions. In order to effectively showcase the customization components, we modified the code
and recorded a tutorial on how to add and modify more columns to the winners and losers report.
During this first project (Thread 1), we worked with the firm & Co to enhance its reporting systems.
Additionally, we explored the firm’s new Azure cloud system for Thread 2, to validate data and generate
commentary on what caused changes in performance for their investment strategies. The firm
requested this functionality because current processes for validation and commentary generation were
labor intensive. The firm sought to perform advanced timeseries analyses to make more informed
investment decisions. As a result, we wrote scripts in Databricks that queried data from Azure Data Lake
and utilized SQL, Spark, and Pandas to analyze data and display visualizations in Power Business
Intelligence (Power BI) for our Azure Validation Dashboard.
In order to validate data that was uploaded to Data Lake, we flagged incorrect data and generated
specific alerts in our dashboard. The validation checks a range of potential flaws in the data, from
missing information to suspicious performance metrics. In addition, we developed commentary showing
top contributors and detractors using the change in gross profit for return periods: Inception to Date,
Year to Date, 1 Year, 3 Year, and 5 Year. Furthermore, we used statistical analysis and created a
timeseries to evaluate different trends in the data. Our dashboard helps users view these features in
different panes and enables them to drill through to see which data points accounted for alerts,
performance, and timeseries in the raw data. Overall, our dashboard supports the firm’s IT analysts,
accountants, and clients in examining their data and gives them the power to customize their view to
further investigate the reason behind movements.
Throughout the duration of the project, we utilized the Scrum Agile methodology along with a Kanban
interface in Airtable. We conducted daily standup meetings and created user stories for seven one-week
sprints. Towards the end of each sprint, we reflected on the week and used client feedback to write user
stories and continue sprint planning for the next week. This methodology enabled us to realistically plan
and make continuous improvements to our software product.
1
1. Introduction
1.1 Problem As technology advances, financial institutions seek to utilize the latest technology to stay ahead of their
competitors. This hedge fund management firm is no exception. It utilizes a variety of technologies such
as cloud storage, visualization programs, and data manipulation software in its daily workflow. All these
technologies are used to make informed investment decisions and communicate fund performance to
stakeholders.
The firm recently updated its data management infrastructure to a new cloud database and wanted to
enhance its reporting systems to reap the benefits of the new infrastructure. Until this upgrade,
historical pricing and performance data was not stored in a way which allowed for easy analysis across
multiple time periods.
By placing all available data in an Azure cloud database, the firm’s IT employees could programmatically
access all of the firm’s historical transaction data quickly using Python scripts. Using this, the firm’s IT
employees generated historical performance metrics such as Internal Rate of Return and Multiple Of
Invested Capital. This allowed for further analysis of the historical changes in these metrics. However,
although the data was more accessible, it was not utilized by non-IT employees. The firm’s Accountants
and Analysts—the individuals who could use this data most—were unfamiliar with how to access the
cloud database.
Thread 1 (Winners and Losers Report Update) The firm analysts required additional tools for their static investment report generator created by a
previous MQP group. Users did not know how to modify the output of the Winners and Losers Report
generator because documentation on how to use the past generator was outdated.
Thread 2 (Azure Validation Dashboard) The firm analysts required an enhanced reporting system that ensured accounting data was correct and
generated human readable reports. The company loaded a portion of raw transaction data into a new
cloud database, but it still needed to be validated and used to inform analysts. The data came directly
from the firm’s automated accounting system. The firm asked our team to verify the data using a variety
of tests. Additionally, the firm wanted to analyze historical investment performance data with its new
cloud database, as the data was not accessible nor in a format that would be easily comprehensible by a
firm Analyst.
1.2 Goals Our project goal was to improve the way the firm validates its investment performance data and how
that data is presented to various stakeholders.
The purpose of validating data is to make sure that all the information for each reporting period is
correct. These validation checks are designed to highlight abnormal activity. For example, some checks
were binary, while others used thresholds to validate the data. By communicating with those who
understood the data and performed the checks manually, we developed checks which met the needs of
the firm’s Analysts.
2
Additionally, we aimed to present the validation and other information in a way which was not
overwhelming to the average user and made intuitive sense. A software tool is only useful if it is
understood, so design and presentation were a key priority during development.
We aimed to improve these processes to save the firm’s employees time, improve the accuracy of their
reports, and generate new insights from their data. Each of our two threads have smaller sub goals:
Thread 1 was a mission to update documentation for an existing tool:
• Maintainable - The previous code was difficult to understand, so we made the code easy to
maintain and extend upon in future development.
For Thread 2, we aimed to improve how the firm works with and views data in its cloud database:
• Robust - The firm’s cloud database has a type of information that is critical to primary business
functions. Our software was required to be reliable due to the importance of the data.
• Cloud Independent – We had to minimize our dependency on the Azure cloud infrastructure.
The firm has a long-term goal of being cloud independent, so they would like to limit the
number of Microsoft Azure specific integrations in their software.
• Transparent Calculations – To prove that calculations are correct, we needed to display
supporting data points and the information used to derive calculations.
1.3 Deliverables
Thread 1
After assessing the current system, we concluded that the structure and capabilities of the codebase
were adequate. The documentation for the codebase, however, was not up to date. Therefore, we
created improved documentation on how to add, modify, or remove columns from the program-
generated Excel report.
Thread 2 We built a Power BI dashboard to display the firm’s investing records from its Azure Data Lake. We
programmed back-end Python scripts programmed in Databricks to perform various analyses on the
data. We also designed an intuitive Power BI dashboard for business analysts and accountants to view
these analyses.
3
2. Background
2.1 Finance Industry
2.1.1 Financial Reporting Financial reporting is designed to inform investors, build trust, and comply with federal regulations.
Financial statements are issued to build investors’ trust in an institution. In publicly traded investments,
these statements are required to be reported by law and to adhere to Generally Accepted Accounting
Principles. Private funds such as the firm are required to register with the Securities and Exchange
commission as investment advisors because of the Dodd-Frank Wall Street Reform and Consumer
Protection Act of 2010 (Eriksson, 2016). The accounting standards used by hedge funds and private
equity firms are called Global Investment Performance Standards. To understand fund performance and
to address inquiries of investors in funds, most funds generate reports internally and distribute them to
investors on a monthly or quarterly basis. From the perspective of investors, it is of the utmost
importance to understand how their money is being invested and how those investments have
performed historically (Securities and Exchange Commission, 2012). Financial reporting uses a variety of
calculations to value and assess performance of investments.
2.1.2 Accounting Validation The accuracy of these statements is just as important as issuing reports and statements on performance.
Any retraction or correction of financial statements has significant negative impacts on investor trust in
the institution. As a result, financial reports are thoroughly checked to ensure they are accurate before
they are published. This procedure may be performed by accounting teams through a series of checks
typically in various Excel Sheets. These checks are based on a variety of rules to determine which data
may be incorrect due to human error, a reporting failure, or other causes.
2.1.3 Current System
Financial Structure
The firm has its assets organized in a hierarchy as shown in Figure 2.0, which allows the company to
aggregate and abstract performance and other information at varying levels.
First, the firm has various business units. These units are the highest level of organization within the
company. Business units are an overarching part of the firm’s Business such as Energy or Distressed. Our
team worked with data in the Distressed and Energy Business units of the firm.
Within business units, there are many portfolios. Portfolios are a collection of various investments which
are known as strategies at the firm. In other words, portfolios are a bundle of assets for the purpose of
tracking performance and management. A portfolio can also belong to multiple business units, and
these portfolios are managed by different teams within the firm. Each portfolio has its own objectives to
justify the investments it makes and how those investments are maintained.
The level below portfolio is Strategy. Strategies are a specific investment such as a company or piece of
real estate. These Strategies have synonymous Deal Names which are more descriptive. The Deal Name
is often a company name which is used more by analysts than the Strategy code itself, which is a
combination of letters and numbers. Each Strategy also can belong to many regions. This information is
used to determine where the firm’s risk is located geographically.
4
Below Strategy code is Sycode. There can be many Sycodes for each Strategy, however there is often
one Sycode for one Strategy. Sycodes are the type of financial instrument used to interact with the
investment. A Sycode could be a stock purchase, derivatives contract, or something else. In addition, a
Sycode can describe a trade on a Strategy as an option purchase or any other range of financial
instrument types. These Sycodes are a combination of many other given fields such as TransactionType,
TradeDate, and Strategy. Sycode is used in this hierarchy since one Strategy can have many types of
financial instruments. These financial instruments can be bought, sold, etc. The TransactionType field is
used to describe these transactions.
Figure 2.0 Financial Structure
2.2.4 Previous Work To ensure that our software is well integrated with the firm’s processes and to avoid replicating previous
work, we researched past projects that interface with the firm’s systems. After speaking with our project
sponsor and advisors we found that Thread 2 is a ‘greenfield’ in that there is no prior work or research
on this project. We were able to find the previous WPI MQP project that is the basis for Thread 1. The
project is summarized below:
Wall Street: Engineering Investment Profit & Loss Reporting Pipeline
This MQP team from 2017 was tasked with automating and streamlining an internal report built by Scott
Burton called the ‘Winners and Losers’ report. The report was originally manually populated by
members from various departments. As a result, producing reports took too much time. The project
aimed to limit these points of failure with a software system which would automatically retrieve data to
5
populate and format the report. This data was retrieved from the Geneva accounting system with the
use of a custom script. The data in this form is called a ‘Geneva extract’ and is in CSV format. CSVs are
easily manipulated using Python and Excel.
The team built a program entirely in Python and relied on OpenPyXL to interact with Excel. Excel was
used because it is already used and well understood at the firm. The previous MQP project worked to
develop the capability to add or remove columns. The report states that a regular expression is used to
update cell references after a new column has been added. The report describes this feature as working
but mentions difficulty in implementing it.
These difficulties included:
• Manipulating formatting could result in ‘broken headers’ , which causes the file to be corrupted.
The report does not make it clear if this corrupted file is the XML or the Excel file.
• After updating the cell references through the use of regular expressions, the number in a given
fund name would auto increment in the Excel sheet. This problem was overcome by including
more underscores in the naming of the funds in the sheet (instead of FUND_CAT50,
FUND_CAT_53)
• Python’s duck typing made it difficult to ensure that the data types used in the Excel sheet were
allowed. OpenPyXL and the program developed by the MQP team both implemented checks to
ensure only proper typing was allowed in the Excel sheet.
2.3 Software Development Environment
2.3.1 Python
Python is an open source and object-oriented programming language which has emerged as the
standard for data science in recent years. Python has an extensive standard library and is considered a
high-level language. In addition to its standard library, the open source community has built numerous
installable packages to expand the language’s functionality (Python Software Foundation, 2020).
The firm recommended the team to use Python as they use it in many of their existing programs. Python
allows developers to utilize packages, such as Pandas, to simply perform data manipulation over large
data sets. In addition, Python also has additional statistics libraries like SciPy that can simply implement
statistical analysis on a data set. As a result, we selected to use Python as our backend programming
language.
2.3.2 Pandas Pandas is a Python package designed to manipulate and manage data sets. Pandas uses Data Frames to
hold information. These frames are two dimensional and stored in local memory. In addition, these
frames are analogous to Excel spreadsheets in structure. Since data is stored in this rigid structure,
Pandas is commonly used with relational SQL, CSV, or TSV databases.
Pandas data structures are faster than native Python structures to manipulate large datasets. When
using Pandas, joins, unions, merges, and other data manipulation functions are simply performed with
few commands on a Data Frame. Additionally, Pandas is widely used, and documentation is readily
available. Finally, the firm already uses Pandas in many of their projects and recommended us to use it
(Pandas, 2019).
6
2.3.3 Anaconda Anaconda is an open-source distribution used primarily for Python and R that makes development easier
to manage and deploy packages. Anaconda’s package version management system is called Conda.
When we were onboarded to Thread 1, we realized the project was already using Conda to manage the
project’s package versions. Since the thread did not require us to use any additional packages and kept
our packages consistent across our local computers, we decided to use Anaconda.
2.3.4 Apache Spark and Databricks Apache Spark is a big data and machine learning analytics engine. Spark SQL aids in structured data
processing and is a module of Spark. It provides users with Data Frames and organizes data into rows
and named columns; Data Frames are a programming abstraction that organizes data like that of a
relational database. Spark acts as a distributed SQL query engine that manages logically interrelated
databases over a computer network (Databricks, 2019).
Databricks is a development platform that is optimized for and integrated with Microsoft's Azure cloud
services platform. It is based off Apache Spark and provides streamlined workflows and an interactive
workspace to increase collaboration between business analysts and data engineers. The Azure Data
Factory enables raw or unstructured data to persist and be stored in Azure Data Lake. Databricks can
then read data with Spark using Spark SQL. In addition, Databricks is integrated with Power BI to share
analytics, insights, and visual representations of data quickly and easily via Spark (Microsoft Azure,
2019b).
Since Thread 2 data was already stored in an Azure Data Lake, the firm requested that we use Apache
Spark and Databricks to access and manipulate the data. Regardless of the firm’s requests, Apache Spark
Clusters allowed for easy integration into Power BI. Although Spark does have its own version of Data
Fame manipulation like Pandas, Spark Data Frame documentation was limited compared to Pandas
documentation. In addition, we used Databricks because of its simple integration with Azure and Spark.
While other Python notebook development environments such as Jupyter exist, Databricks is integrated
with the Azure system far more than Jupyter. In Databricks, users can easily access the Azure data tables
for reference and then switch back to coding all in the same program.
2.3.5 Power Business Intelligence (Power BI) Power Business Intelligence (Power BI) is a Microsoft application that enables consumers, analysts, and
developers to transform data and convey key insights via dashboards and reports. It enables users to
connect to their data from Excel spreadsheets, the cloud, or its own hybrid data warehouses to visualize
and share these insights with others (Microsoft, 2020).
Overall, Power BI consists of three main components - the Power BI Windows Desktop application, the
online Software as a Service (SAAS) known as Power BI Service, and mobile applications for Android and
iOS devices. The Power BI desktop connects to data sources, shapes and models data, integrates with
Python, and implements RLS (row-level security) so users are given the proper access to restricted data.
The report can then be published to the Power BI service and share with end users who have access to
the Power BI Service and mobile devices. These users can then view and interact with the data and
insights.
7
Power BI is one of the largest and fastest growing applications that implements cloud computing for
business intelligence. The application allows flexibility for developers to implement the Power BI API
into their own applications and can extract data from a variety of data storage locations. It allows them
to indicate user privileges so timely reports can be sent to the correct parties. Moreover, Power BI
Report Server allows companies to deploy the application behind their firewall, in the case that they do
not store data in the cloud.
Although there are other data visualization programs such as Tableau, the firm requested that we
remain within the Microsoft suite for seamless integration between Microsoft programs. In Power BI,
the user can stream data from Spark tables without having to load the entire dataset into Power BI
which would significantly increase the size of the program and slow down analysis.
2.3.6 Microsoft Azure Data Lake
Azure Data Lake is a Microsoft product that enables users to store data of any size. The Azure suite can
also run programs and processes in languages such as SQL and Python. Additionally, it works well with
big data technologies such as Spark and Hadoop.
The firm recently moved their data to a Microsoft Azure instance because they wanted to move their
data to the cloud instead of locally storing it on internal servers (Microsoft Azure, 2019a). In addition,
they wanted to experiment with data analysis using tools such as Power BI and Databricks.
2.3.7 Project Management Tools
Airtable
Airtable is a cloud-based project management tool centered around modular shared tables. We used
Airtable throughout our entire project to manage our Agile implementation. All parts of our Agile plan,
from Sprint planning to Sprint Review, were included in this tool. These tables can be linked together in
order to tie User Stories to Themes and Themes to Epics. This software also allowed for easy export of
User Stories, calculation of Story Point totals, and totaling of hours worked. Airtable also facilitates
different views of these tables. The tables can be viewed as a Kanban chart or as a table grouped by
Sprint.
Although Trello provides functionality to keep track of tasks with a Kanban user interface, it does not
have alternative spaces like in Airtable. These alternative spaces are lists that contain information on
sprints, risks, and multiple aspects of Agile within one platform. Airtable’s broader feature set in
comparison to other free alternatives made it the right tool for the job.
Communication Software
Throughout the duration of the project, it was essential to communicate with our WPI team members,
our project sponsor, our advisors, and the accountants to discuss project requirements and deliverables.
We utilized software tools such as Skype, Outlook, and Slack to facilitate these conversations.
We used Skype to send messages in real-time to our project sponsor and other employees to confirm
meetings, clarify details, and ask questions that required minimal explanation. This platform was used
internally at the firm, so it was the logical choice for our team to use it when on site.
8
We used WPI and the firm’s Outlook accounts to send important files to our project sponsor, our
advisors, and the accountants. Additionally, we used Outlook‘s calendar feature to view their availability
and schedule meetings through Outlook invites. Like Skype, Outlook was integrated into the firm’s
culture, so we used it to communicate.
Furthermore, we utilized Slack to communicate. Slack is an online instant messaging platform designed
for project communication. Messaging is organized into channels, so we were able to discuss various
project-related topics in parallel. Since instant messaging requires less time and formality than email,
the platform encourages constant communication which may enable more efficient execution of project
goals. Slack was used to discuss details pertaining to the project, especially when we were on campus at
WPI or working remotely when could not be at the firm’s office.
2.3.8 Source Control Source control enables modern software development a system for integrating the individual edits to a
code base made by each member of a team. The firm implements Git through Azure Repos, and our first
thread used Git. Git is an open source version control system which has well defined procedures for use:
First, a repository is made and then each member edits that repository. After users make edits, they can
then issue pull requests which allow other users to review the code before it is pushed into the main
branch. This process continues for all edits.
Most development was performed in notebooks for the Azure Validation Dashboard. These notebooks
could be edited by two users at the same time like a Google Document. Like online document tools, the
Databricks notebooks had an integrated version history which largely eliminated the need for traditional
version control systems.
9
3. Methodology
3.1 Project Management Modern software project management is a rapidly evolving and diverse field consisting of a range of
processes and methodologies. Broadly, software projects follow the six activities of software
development: Analysis, design, implementation, testing, deployment and maintenance. This is known as
the Software Development Lifecycle (SDLC). These activities must be executed to successfully develop a
piece of software, however there are many ways of executing these steps at varying intervals and
durations. These different forms of execution are called development methodologies.
3.2 Choosing a Methodology Our team of three used the Scrum implementation of the Agile software development methodology.
There are many different ways to approach organized software development. Waterfall is a rigid
methodology wherein the development team focuses entirely on each development stage for a set
period of time. In this method, requirements are set at the start of the project and do not change. Users
do not have input during the development process beyond the requirements gathering phase. The
Waterfall Methodology follows the entire SDLC over the full length of the project, while Agile loops
through requirements gathering, analysis, and deployment in a series of rapid iterations (Radack, 2009).
Parallel development can be much faster and was created to address some of the time concerns of
Waterfall development. A drawback of Waterfall development includes the inability to rapidly adapt to
changes in project requirements. Parallel development involves splitting the project into multiple
subprojects which are designed and implemented by smaller teams. This was not an optimal
methodology for our project due to its procedural nature and limited number of team members. A
comparison of methodologies adapted from Systems Analysis and Design 6th Edition By Dennis, Wixom,
and Roth is shown below:
Agile Scrum Agile Kanban Waterfall Parallel V-Model
Unclear/Changing Requirements
Good Good Poor Poor Poor
Complex Systems OK OK Good Good Good Reliable Systems OK OK Good Good Excellent Self-contained Projects
Good Poor Good Good Good
Short Time Schedule
Excellent OK Poor Poor Poor
Schedule Visibility Good Excellent Poor Poor Poor
Figure 3.0 Methodology Comparison Chart (Dennis, Wixom, & Roth, 2015)
We found Agile to be the most suitable methodology due to a combination of factors. First, we only had
seven weeks to rapidly create and deploy a product which met the customer’s business requirements.
10
This ruled out the Waterfall based methodologies such as V-Model and Waterfall which are slower and
offer little opportunity for end user feedback. Additionally, we worked on site and regularly
communicated with our project sponsor regarding needs and requirements; so it was easy to loop
feedback into the product quickly.
Also, the firm, along with most modern software teams, uses a form of the Agile methodology in their
software development teams. Therefore, Agile meshed well with the firm’s existing processes.
Agile was created in 2001 by a coalition of members from multiple different methodologies. Members of
these different methodologies developed the Agile manifesto which is a description of principals that
should be adhered to when developing software (Beck et al., 2001). Since the writers of this manifesto
each championed their own methodologies, Agile is more of an overarching collection of ideas which
binds together many other project management methodologies such as Extreme Programming, Scrum,
Adaptive Software Development, and more. This manifesto and the sub methodologies within
emphasize the importance of customer satisfaction through early and continuous delivery of valuable
software. This delivery is usually on the time scale of two weeks. Agile also stresses the importance of
daily collaboration between businesspeople and developers. This collaboration is most efficient when
information is communicated face to face. Agile also focuses on the autonomy and empowerment of
individuals in selecting their own work, via the use of self-organizing teams. Finally, the best measure of
progress is the delivery of working software. By using Agile, our software was always in a condition to be
deployed (Beck et al., 2001). Our team was prepared to iterate quickly and develop efficiently with the
use of an Agile methodology. Agile has many sub-methodologies including Scrum. Each of these
methodologies have their own set of key activities and are useful for different teams and projects.
3.2.1 Scrum Scrum is comprised of roles, activities, artifacts, and rules. In our implementation of Scrum, one
individual was both the Scrum Master and Product Owner. The Scrum Master is the servant leader of
the software development team. This role is responsible for clearing blockers and providing process
leadership. The Product Owner is the central voice of the Scrum team and is usually more business
oriented. This individual defines what to do and the order in which to do it (Rubin, 2013).
Finally our software development team consisted of two individuals forming a self-organized team to
execute the development of the Epics. Because of the small size of this team, we assumed cross-
functional roles. All members were expected to contribute to the software development in this project.
However, well defined roles and responsibilities ensured that leadership and initiative were taken
quickly to guide and motivate the team, preventing decision paralysis.
Overall project tasks are built around a hierarchy of Epics, Themes, and User Stories. Epics are the
overarching large goals in a project. Epics would never be used in Sprints because they are very large
and not detailed (Atlassian, 2020). The team starts with Epics and through a series of conversations with
users and stakeholders is able to further refine subcategories of these Epics. These subcategories are
called Themes. Themes are an intermediary between the big picture Epics and small and specific User
Stories.
11
Figure 3.1 Product Backlog (Rubin, 2013)
User Stories are created by meeting with the users and having conversations to identify requirements.
User Stories can be written in multiple formats. Our team used the following format: As a (user) I want
to (feature) so that I can (outcome). Before each Sprint, User Stories were generated by the team. These
stories were usually derived from meetings held with users of the product in the prior week. Stories are
valued by the development team using the effort hours system (Rubin, 2013).
Scrum, like most Agile methodologies, is centered around Sprints. Although Sprints are usually two
weeks long, our Sprints lasted one week. We used shorter sprints because of the compressed timeline of
our project. After the Sprint Planning Meeting on Monday, we held daily Scrum meetings each morning.
Our project sponsor met with us daily and greatly assisted in guiding us throughout our project. During
these daily Scrums, also known as standups, each team member said what they finished since the last
daily Scrum, what they planned to complete before the next one, and any blockers which may have
inhibited their progress. At the end of each Sprint, the Scrum Master determined the Sprint velocity and
if the goals of the Sprint were met. Additionally, we carried over any incomplete tasks to the backlog for
the next week (Rubin, 2013).
We used a Kanban visualization of the tasks for each Sprint. This visualization was in our project
management tool: Airtable. Using the Kanban visualization, the team could easily see the project
backlog, Works in Progress, and completed tasks. This visualization also made it possible to move tasks
between these categories.
12
After each sprint we reviewed our progress on a team and project level. This was performed using Sprint
Review and reflection meetings, respectively. During sprint review meetings, the team determined
which user stories were completed. Also, during this meeting, the scrum master calculated and
presented the team’s velocity. This gave us an understanding of our overall progress for the sprint and
opened conversation regarding the project risks. Also at the end of each sprint, we demoed the
application to our sponsor. The demo served to show our sponsor what progress had been made during
the week (Rubin, 2013).
The sprint reflection or retrospective was performed at the end of each week to continuously improve
the team’s development process. During each retrospective, each team member discussed what they
thought went well, what could be improved, and what they would commit to improving during the next
sprint (Rubin, 2013).
Finally, burndown charts are a sprint artifact which is used to determine team productivity and work
pace. The burndown chart can be created for user stories or hours worked per day or per sprint. A
representation of a burndown chart can be found below. Sprints or days are typically the x axis, while
work units are on the y axis. An ideal burndown chart is perfectly linear because the same amount of
work is performed each week (Rubin, 2013).
Figure 3.2 Burndown Chart Guide (Goncalves, 2019)
By implementing the Agile Scrum project management methodology, we were able to efficiently deliver
a product which met the needs of interested parties.
13
3.2.2 Risk Management In any project, risks may inhibit progress and make it difficult to complete the project to the client’s
specifications. A key component of project management is identifying and tracking these risks. Our team
built a risk management framework which included the name of the risk, a brief description, its
category, the probability it would occur at the time the risk was created, its current status, and
mitigation plans as seen below.
Figure 3.3 Risk Management Framework
The risk was described using the following format: (risk) may result in (risk outcome). Phrasing the risks
in this way gave consistency and clarity to our risk descriptions.
The categories of risks used were:
Technical – Risks involving working with the varying software technologies.
Organizational – Risks having to do with poor organization and project planning such as scheduling or
identifying requirements.
Human Capital – Risks which are related to the energy on the team.
External – Dependency risks, items that are beyond the control of the team, but would still significantly
impact the ability to deliver a final product.
The mitigation column listed steps the team would take to avoid this risk. These could be one or more
bullet-points. Our risk framework was modified, or added to, at the end of each Sprint. This framework
and process was based on the system recommended by The Project Management Institute (Lavanya &
Malarvizhi, 2008).
3.2.3 Requirements Gathering
Sprint Planning Meetings
Sprint Planning Meetings are designed to determine and value the upcoming Sprint’s User Stories. The
team’s plan was to follow the set Scrum standard as well as revise Epics and Themes if necessary.
Sponsor Communications
Daily Scrums
Daily Scrums are quick stand-up meetings designed to regularly inspect how the team is moving towards its
project goal.
Product Demonstrations
Product demonstrations consist of the user testing out the product and giving their initial impressions.
User Interviews
User interviews are conversations with the interviewee(s) about specific topics. Our intent was to learn
about validation methods and goals.
14
4. Requirements Gathering To obtain our project requirements and adhere to the Scrum methodology, we planned each Sprint’s
User Stories, scheduled regular sponsor communication, and conducted interviews with the firm’s
accountants.
4.1 Sprint Planning Meetings
The team performed a Sprint planning meeting at the beginning of each Sprint. These meetings were
designed to determine and value the upcoming week’s User Stories and revise the Epics and Themes.
4.2 Sponsor Communication
4.2.1 Daily Scrums
To maintain clear communication with the sponsor, the team conducted Daily Scrums each morning in
the project sponsor’s office. Each team member relayed what they had completed, what they aimed to
do, and any blockers that prevented them from moving forward. After the team discussed their
progress, the sponsor suggested improvements and new requirements to help us create new User
Stories.
4.2.2 Product Demonstrations
Throughout the project, the team scheduled demonstration sessions with the sponsor, so that the team
could receive feedback. The sponsor assessed the dashboard without team commentary to test whether
the product was intuitive, clear, and accurate. The sponsor’s feedback proved extremely useful as each
demonstration revealed what to keep, what to change, and what to eliminate.
4.2.3 Interviews
The project sponsor informed us that our project would assist accountants in their validation of monthly
reports. To practice user-centered design, we scheduled interviews with accountants. We aimed to learn
about their validation practices and workflow. Our interview notes can be found in Appendix C and D.
Accountant Interview 1
The team initially met with the project sponsor and two accountants: Firm Accountant 1 and Firm
Accountant 2. The team asked the accountants about their current validation system and how they
typically prioritize their checks. After the accountants showed the team their Excel processes, they sent
a follow up email listing their validation procedure. The notes from the interview can be found in
Appendix C.
Accountant Interview 2
The team then met with accountants Firm Accountant 1 and Brennan Canese and asked them to demo
the dashboard. The accountants liked the drill down feature, which allowed them to see detailed
information for each Strategy on the transaction level. They also revealed that the list of validation
checks needed to show the supporting transactions behind each check. They found it useful to display
human-readable commentary on gains and losses. This functionality was like the reporting capability
provided in the ‘IRR Analytic Report,’ which was created with manual processes in the accounting
department. The notes from this interview can be found in Appendix D.
15
5. Analysis
Our meetings with the accountants enabled us to gain insight on how the report generator and
dashboard could help them in their day-to-day tasks. The meetings helped us gather and establish
requirements so that we could create epics and themes to execute the necessary tasks.
5.1 Epics & Themes
Epic 1: Improve Winners Losers Report Generator
This Epic was focused on the improvement of the Winners and Losers Report generator. The sponsor
requested the ability to add a column to this report. Adding a column meant that the report needed to
be able to take another field as input, manipulate it for formatting, and paste it into a new location in
the Winners and Losers Report.
Epic 1 Theme
1. Update Documentation (Document W-L Reporting)
After the code was understood, the documentation could be updated. User stories which
pertained to the development and revision of documentation were added to this category.
Epic 2: Azure Validation Dashboard (Validator) The main goal of this thread was to deliver a dashboard which provided accountants and other
members of the team at the firm with the ability to view, manipulate, and understand financial data in
new ways.
Epic 2 Themes
1. Validate Data (Validate Data/Alerts)
The firm’s accountants perform validation checks on the firm’s transaction history. This
validation procedure flagged both material and immaterial issues in the data loaded into the
Azure Validation Dashboard. Additionally, any flags that were raised were supported with their
respective transaction information.
2. Present Interactive Raw Data (Power BI RAW/Explorer)
To provide an understanding of the base numbers for our calculations, we connected the raw
financial data in the data lake to an interactive view in Power BI. This raw data built confidence
in the accuracy of the analysis performed. The final system design used two tables for all
displays. These two tables were able to be viewed and filtered in their raw forms. Additionally,
developers could see all the raw data at Power BI’s disposal by viewing the data or model views
in Power BI.
3. Generate Performance Commentary (Summary Information/Perf Summary)
Human-readable commentary was generated to provide a more understandable narration of
changes in gross profit. This commentary concatenates the Deal Name, Strategy, and most
recent month over month difference in gross profit into an intelligible sentence. The user also
had the ability to drill through the performance commentary and view transactions which
contributed to notable changes in gross profit. The performance commentary was presented on
the Strategy level.
16
4. Integrate data lake to Power BI (Integration and Automation)
This theme involved the steps required to create a connection between the backend (Datalake)
and the frontend (Power BI). This connection caches all the data in Power BI via refresh in the
Power BI interface.
5. Design User Experience (User Experience and Design)
A large portion of this project focused on how to best display the data on the front end, so that
the user was informed but not overwhelmed. To do this, we created User Stories focused on
what individuals wanted to see and how they wanted to see the displays refined for future
releases.
6. Write Documentation (Documentation)
A goal of our project was to write code which could be maintained in the future. To do this, we
produced documentation to ensure that users and developers were well informed of the
capabilities and design of the dashboard system.
5.2 User Stories
Since Epic 1 was a continuation of a previous MQP project, most User Stories for Epic 1 consisted of
setting up our development environments, analyzing the code, running tests on the system, and
producing a tutorial video.
For Epic 2, we broke down our themes based on the different sections of the dashboard. We then made
User Stories for each theme. Each theme required User Stories that took place in both Databricks and
Power BI. Some User Stories focused on research, as we were not as familiar with some of the
technologies such as Power BI and Pandas. All User Stories are listed in Appendix A and throughout the
paper.
17
6. Design
In order to execute the two threads, we utilized Azure Datalake to store and maintain relevant data. We
also used Databricks with Spark to perform calculations and manipulate the data. Lastly, we worked with
Power BI to visualize our data and insights. We utilized specific design patterns to produce modular and
well documented code and developed multiple iterations of Databricks notebooks. Additionally, the
Power BI dashboard went through a series of top down design changes as the capabilities and
limitations of the programs involved became better understood by the team. To explain our design
choices and how the program is structured, we created a series of diagrams and descriptions. Then, we
explained how the user interface looks and functions.
6.1 System Architecture To query the data lake from Databricks, we had to understand the firm’s cloud infrastructure. As seen in
Figure 6.0, we learned that a script fetches raw transaction data from the Geneva Accounting System via
the Active Batch Scheduler and then prepares it to be stored in the Azure Data Lake. Then, additional
scripts convert the data into Delta tables which can be manipulated in Databricks. By loading the tables
into the data lake, Power BI can import the tables and display them as visualizations.
Figure 6.0 System Architecture Diagram
6.2 Data Flow Diagram (DFD) The following figures describe how data is processed and flows throughout the Azure Validation
Dashboard system. Three levels of detail are provided. The Context Diagram presents the process from a
high level with the entire system represented by one process which views financial performance. The
next diagram, level 0 goes into more detail on how the data moves between different systems and
18
external entities in the process. The diagram breaks out the front-end viewing processes, the back-end
data analysis processes, and summarizes the data which flows between them.
Figure 6.1 Context and Level 0 Diagram
Finally, the most detailed diagram is the level 1 diagram which introduces data stores. In this diagram,
one can see how data flows for the main processes and views in the dashboard. Most viewing processes
simply access locally cached data from the Power BI Datastore (D2). When all the data is refreshed
(process 1.0) the backend Databricks processes are triggered to run and update the data in Power BI.
The updates from the Geneva Accounting System (external entity) are currently scheduled by the firm.
19
Figure 6.2 Data Flow Diagram Level 1
6.3 Entity Relationship Diagram (ERD) To create the back-end table for the Power BI dashboard, the team needed to understand which tables
to access in the data lake. Figure 6.2 displays three tables that were accessed. The tables are not
connected on shared keys, but they are uploaded to the data lake using prebuilt scripts.
20
Figure 6.3 As Is Data Lake Entity Relationship Diagram
The figure below shows the additional tables that were generated from sections of the former tables.
Even though the generated tables default.results_and_flows and default.irr_timeseries contain data
from the other tables, they are not joined in SQL. Instead, we created them using Pandas Data Frame,
converted them to Spark Data Frame, and then uploaded them to the data lake. We created two tables
because results_and_flows analyzes the selected PeriodEndDate and the previous month, while
irr_timeseries examines the data from inception to the selected PeriodEndDate. By having these two
time ranges, we could set Power BI’s drillthrough functionality to exclusively show what transactions
contribute to validation checks pertaining to month over month changes.
21
Figure 6.4 New Data Lake Entity Relationship Diagram
Since the above figure does not show the relationships between tables converted into Pandas Data
Frames, we created an entity relationship diagram on the Pandas Data Frame level. For the
results_and_flows table, we merged reporting.irr_results and reporting.irr_mod_cashflows to show
summary values such as IRR, MOIC, and GrossProfit as well as the transactions that contributed to them
over the PeriodEndDate selected and the previous month. Many of the additional tables merged into
the central table are alert tables created from the irr_results and irr_mod_cashflows Data Frame. The
team had to merge all alerts into the central table because of the limitations of PowerBI. Although
Power BI offers powerful visualizations and useful functionality such as drillthrough and drilldown,
Power BI can only join tables on one attribute. Thus, to use certain features such as drillthrough for
alerts, we had to merge all alerts into one table.
22
23
Figure 6.5 results_and_flows Data Frame Entity Relationship Diagram
Instead of proving alerts like in Figure 6.5 for the selected PeriodEndDate and the previous month,
Figure 6.6 was used to calculate historical estimations based on data from inception to the selected
PeriodEndDate. These values include the mean, standard deviation, month over month change, and
linear regression estimate for the next PeriodEndDate.
Figure 6.6 irr_timeseries Data Frame Entity Relationship Diagram
6.4 Use Case Diagrams The following figure describes the use cases for the Azure Validation Dashboard. The three main use
cases are: Analyze Historical Performance, View Alerts, and View Commentary. These uses are featured
prominently in the user interface. Also illustrated is the drillthrough use case. Drillthrough functionality
24
is included in the View Alerts and Commentary use cases. This drillthrough routes the user to the Raw
Data page. This page is represented by the View Raw Data 2 Month use case.
Figure 6.7 Use Case Diagram
6.5 User Interface Structure Diagram We constructed a user interface structure diagram to show how each view is connected. Users start at
the home view which acts as a launch pad to see different analyses. When a user goes to view Alerts or
Commentary, the user can drill through on an entry in the tables and navigate to Raw Data 2 Month for
supporting data. When a user goes to the History screen, the user can select further analyses derived
from Raw Data ITD. Explanations for each screen can be found in the User Experience section. Another
version of this diagram that includes every alert page can be found in Appendix F.
25
Figure 6.8 User Interface Structure Diagram
6.6 User Experience
6.6.1 Home The home display of the Azure Validation Dashboard was designed to give the user a general overview
of what the program is capable of and what is within the three main uses of the program. The top of the
page displays the date for which the report was generated. In all instances, this date is automatically set
to the latest available date in the alerts table (results_and_flows). As seen in the figure below, the firm’s
logo is to the left of the date and to the right is the page name. The scroller lies below the date display
and provides a preview of the data in the commentary section. The three main functions are denoted by
large clickable panes which lead the user to their respective landing pages. These panes are titled
Commentary, Alerts, and History.
26
Figure 6.9 Power BI Home
6.6.2 Commentary The Commentary Page was designed to have all the capability of the IRR Analytic report. The IRR
Analytic is a report manually created by the accounting department each month which describes the
biggest positive and negative changes in gross profit across different regions and time periods on a per
deal (Strategy) basis. These gains and losses are described in easy to read phrases using the following
syntax: [Deal Name] [Strategy] [gross profit] [gain or loss]. For example: ‘Apple Computer (BRKT:0005)
1.3mm gain.’ The wording in the original report is ‘contributors’ and ‘detractors’ for the biggest gainers
and losers, respectively for a given region or time period. Our report has a table on the left for the
biggest contributors and a table on the right for the biggest detractors. These tables can then be filtered
on region and return period.
Figure 6.10 Power BI Commentary
*firm logo
27
6.6.3 Alerts The Alerts Page contains all the validation checks performed on the data. Each button leads to a
different alert type as described on that button label. On the right of the page, the number of Business
Units, Portfolios, Strategies and Sycodes are displayed. These numbers are shown in order to give the
user an understanding of what data was analyzed. As a user of this program at the firm, the user is
expected to know how many business units and portfolios exist. Therefore, if the numbers displayed on
this page are radically different than what is known, it is a sign that the program may have
malfunctioned.
Each alert page has a table with relevant identifiable information for that alert. All entries which were
flagged with that alert for the given month will appear in the alert table. Each alert entry has
drillthrough capability. This means that users can right click and select the drillthrough option to be
directed to the raw data page where they can view all the transaction level information for that flagged
row.
There are 17 alerts which are broken up into 6 categories. The categories, alerts, and descriptions are
listed below:
• Transactions
o RemMV Change, but no transaction: Outputs Strategy codes that have a remMV change
over the previous month and do not contain any significant transactions (any
TransactionType that contains the string 'buy', 'sel', and 'AccountingRelated').
o RemMV same, but transaction exists: Outputs Strategy codes that have no change in
remMV over the previous month and contain any significant transactions (any
TransactionType that contains the contains 'buy', 'sel', and 'AccountingRelated').
o Monetized Strategy code with Transactions: Identifies Strategy codes that have been
monetized, have a quantity of zero when transaction type is total terminal value, and
have any other transactions for a given month.
o Gross Profit Changed, but No Transaction: Identifies Strategy codes that have a
GrossProfit change over the previous month and do not contain any significant
transactions (transactions that contain the string 'buy','sel' and 'AccountingRelated').
o Gross Profit Same, but Transaction Exists: Outputs Strategy codes that have no change
in GrossProfit over the previous month and contain significant transactions (any
TransactionType that the string contains 'buy', 'sel', and 'AccountingRelated').
• IRR, MOIC Breaks
o Negative IRR Change, Positive MOIC Change: Identifies Strategy codes that have a
positive MOIC and Negative IRR change over the previous month.
o Negative MOIC Change, Positive IRR Change: Identifies Strategy codes that have a
negative MOIC and positive IRR change over the previous month.
o MOIC < 1, IRR Positive: Identifies Strategy codes that have a MOIC less than 1 and an IRR
that is positive.
• Missing Data
o Missing Begin Date: Sycode level analysis which determines if the begin date field is null
• Sycode
28
o SyCode Price Inconsistencies Across Portfolios: Identifies SyCode Price inconsistencies
across portfolios for a given month.
o Sycode: One-to-Many Strategies: Lists Sycodes that belong to multiple Strategies.
o Sycode Price Change Month Over Month: Lists SyCode month over month (MoM)
changes over the previous month (in any Sycode-StratCode pair), if a current and
previous month exist.
• Monetized
o Ongoing, but Listed End Date: Strategy Codes that are ongoing and have an end date
that is not the current period EndDate.
o Not Monetized and RemMV is 0: Identifies Strategy codes that have a
Total_Terminal_Value/remMV of 0 and are not monetized.
o RemMV Not 0, but Listed as Monetized: Finds Strategy codes that are monetized and
contain a non-zero remMV.
o Monetized, No Listed End Date: Identifies Strategy codes that are monetized and do not
have an end date.
• Strategy
o New Strategy Codes: Identifies Strategy codes that exist in the given PeriodEndDate, but
do not exist in the previous month.
Figure 6.11 Power BI Alerts
6.6.4 History All panes in the history page use the timeseries data table as their source. Additionally, all historical
analysis occurs on three metrics: gross profit, Internal Rate of Return (IRR), and Multiple of Invested
Capital (MOIC).
The timeseries view is designed to display 1-5 deals a time. To use this view, the Deal Name, return
period, and business unit are selected on the left. After this, the month over month changes in the three
metrics will be displayed in the table in the center of the page. Additionally, on the left the historical
values for gross profit, IRR, and MOIC will be plotted in separate charts. Below these charts the
29
minimum, maximum, average and projected values for each deal selected will be shown. See Figure
6.12.
Back on the history page, three other views aside from timeseries can be selected. Each of these views
shows a table which compares the selected metric to its historical average on a per Strategy basis. The
table is sorted by the absolute value of the difference between the mean historical value and the most
recent value of the metric. Each table holds additional identifiable information such as return period,
portfolio, business unit, and Sycode to assist the user in understanding where this data can be located
and what might explain the deviation between the most recent and average value.
Figure 6.12 Power BI History Timeseries
6.7 Design Patterns Python’s code needs to be organized so that it is easily readable, understandable, and well documented
for developers. In order to accomplish the tasks required for T
hread 1, we implemented the strategy pattern (Boyanov, 2016).
6.7.1 Strategy Pattern
The strategy pattern enables an algorithm or class behavior to be changed at run time. Strategy objects
are created for different strategies and the behavior of the context object depends on the strategy
object, which changes the algorithm that is run for the context object.
We also implemented the Strategy pattern during Thread 1 when we demonstrated how to add a
column to the output sheet. There are multiple levels of processing, but our final level of processing
determined how the final number or string should be displayed; the data in the output sheet was
processed and generated using a Strategy_mapping hashmap with keys such as in_millions and
monetized. The value associated with the key referred to a class that defined the logic for how that
Strategy is implemented. In order to add new columns and populate them with data in the correct
format, developers can reference the defined Strategy or create a new Strategy and reference that, as
we did with in_abs_millions.
30
7. Implementation
Pre-Qualifying Project Work Before starting work on site at the firm’s office in New York city, we prepared by performing background
research as well as speaking with the project sponsor on a weekly basis. Regular communication with
the sponsor helped us develop a preliminary understanding of the project. Also, these early
conversations helped us develop a project plan and identify basic project requirements. However, it was
difficult to conduct extensive research without access to the software environment.
Sprint 1
User Stories Completed: As a firm analyst, I want to add columns in the Excel template, so that I don't have to manually edit the report.
As a firm analyst, I want to delete columns in the Excel template, so that I don't have to manually edit the report.
As a firm analyst, I want to modify columns in the Excel template, so that I don't have to manually edit the report.
As a firm analyst, I want to populate the modified template with data corresponding to the column names, so that I don't have to manually input data into the report.
As a firm employee, I want to learn how to use the report system, so that accounting can manually produce reports.
Sprint Review
This week was spent entirely on our initial setup at the firm and Thread 1. After gaining access to the
codebase, we learned that adding or modifying columns in the report required only minimal
modifications to the code. The prior MQP designed the program with this functionality in mind. After
presenting this to our sponsor, he recommended that we revise the documentation to better describe
this functionality and modify the template by adding a column. We created a video tutorial and made
significant revisions and updates to the documentation to better describe this process. These changes
and the video were pushed to the GIT repository by the end of the week.
No User Stories were rolled over or left incomplete for this Sprint.
Story Points Completed: 104
Hours Worked: 116.5
Velocity: 89.27%
Sprint Retrospective Meeting
What Worked Well
• Our sponsor was happy to meet with us every morning which kept our communication clear and easy.
• The onboarding process was faster and smoother than expected.
31
• The team was able to adapt to fast changes in scope and direction as the unfolded for Thread 1.
• Subject Matter experts on Python and Azure seem to be somewhat available to assist us with this project.
• Coming into work early meant we had to spend less time commuting.
What Could be Improved
• No items were rolled over in the backlog. As we learned more about the scope of Thread 2, we
should have built a larger backlog.
• The development environment did take some time to set up which slightly slowed our progress,
this was expected and likely will not be an issue after this week.
• We need to improve how information is shared on the team regarding code changes and how
we can all collaborate on code.
• We need to be aware of when parallel work is needed, so everyone has the same base of
knowledge.
• We need to work on unifying syntax and procedures in code.
Sprint 2
User Stories Completed: As a firm analyst I want to know if a Strategy switched from being a gain to a loss or vice versa so I can
recognize performance changes which affect the overall fund.
As a firm accountant I want to know if any terminal values changed when there was no buying or selling
activity because this is indicative of incorrect Data copy.
As a firm analyst I want to know if any terminal values went to 0 over the last month so I am aware of
any new closed positions.
As a firm accountant I want to know if remMV values changed when there was no trading activity so
that I can check if the data is correct.
As a firm analyst, I want to see the biggest month over month change in IRR at the Strategy code level
over the last N months, so I can make an informed decision about investing.
As a firm analyst, I want to see the biggest month over month change in MOIC at the Strategy code level
over the last N months, so I can make an informed decision about investing.
As a firm analyst, I want to see the biggest month over month change in GrossProfit at the Strategy code
level over the last N months, so I can make an informed decision about investing.
As a firm analyst, I want to know the "Buy and Sell" transactions over the last month, so I can make an
informed decision about investing.
As a firm trainee, I want to find the difference in IRR over 1 month, so that I can learn how to use
DataBricks and interact with the DataLake.
32
Project Risks Sprint 2
Sprint Review
Early in the week we had some difficulty with a rapidly changing scope for Thread 2. Initially we
interpreted this project as the need to create a report from scratch (Monday). We spent time Monday
Sprint planning and planning the overall structure of how the report would be built using our existing
understanding of the firm’s database systems. Tuesday, we learned that we would be providing
validation checks on data as it is loaded into the database. These procedures would be run daily from
the start of each month to determine what data is in the Datalake and what still needs to be added for
the report at the end of the month. This week our sponsor gave us many tasks involving insights he
would find useful to extract from the database. Although slightly challenging without business context
or an understanding of the database structure, we were able to accomplish most of these tasks. We are
learning a lot about the firm’s development environment. Additionally, because there were so many
tasks, the parallel work issue has been resolved.
Tasks added to the backlog include finding strategies that have a total terminal value of 0 and have not
been monetized. This is because we were unable to deliver the User Story the exact specifications of our
sponsor. We are close to completion on this and it will be easy to finish quickly next week.
Story Points Completed: 152
Hours Worked: 230
Velocity: 66.09%
Sprint Retrospective Meeting
What Worked Well
• Weekly updates to advisors seem to be appreciated and will continue in order to ensure all
parties understand our progress and status on the project
• Databricks/Azure lake are easy to use because Python is user friendly and Python notebooks
make code very easy to debug.
• Communication with firm staff continues to be going well as we speak to other members of the
firm
• Less parallel work because of the wider range of tasks.
• More tasks given by sponsor which left our team more room to plan the project
33
What Could Be Improved
• Be more agile -- less time planning, more time doing.
• Find different places to work, others have arrived in the space we are working in and they do
not appreciate our chatter.
Sprint 3
User Stories Completed: As a firm accountant, I want to be able to see the largest difference in IRR between two months so that I
do not have to manually find it.
As a firm accountant, I want to be able to see the largest difference in Gross Profit between two months
so that I do not have to manually calculate it.
As a firm accountant, I want to know which Strat codes changed from ongoing to monetized from one
month to the next to understand which stratcodes affect overall fund performance.
As a firm dev I want to know if the total terminal value is zero because if it is it should be monetized.
As an accountant I want to be able to drill down in the raw files so that I can see where the data may be
incorrect.
As a firm analyst I want to filter down to specific funds so that I can perform more accurate validation
checks.
As a developer I want to figure out how this report to connect Data Frame from Databricks to PowerBI
and create tables out of Data Frame so that I don't have to manually create a report.
As a firm accountant I want to see data points that are outside a number of standard deviations from
what are normal so that I can identify extraneous data.
As a firm accountant, I want to see missing data (IRR, MOIC, GrossProfit, Total_Cost, Total_Sales,
Total_Terminal_Value) in irr_results and irr_mod_cashflows for a certain month, so that I fix them.
Sprint Review This week our scope and objective narrowed and remained consistent. We now know that we are
building a Power BI dashboard which will be used by accountants to check the validity of the firm’s
monthly financial data. Today we sent our project sponsor our initial prototype of this dashboard.
Although still in the early stage, we are now confident enough in the definition of our project to focus
our time towards one deliverable, which contrasts with the smaller missions of last week. Some of this
can be attributed to our experience in communicating with our sponsor. On the technical side, there is
still a lot for us to learn about the development environment. Understanding how to efficiently work
with large datasets in Pandas has been particularly challenging. We have had the assistance of Robert
Dreeke and Oren Efrati in helping us to understand how to create tables in the Databricks Database and
efficiently manipulate data in Pandas, respectively. Next week it was mentioned that we would be able
to meet with an accountant. In preparation we have developed a set of questions to better determine
what an accountant would like to see in a data validation dashboard. We are at the halfway point of
usable project time, the project at this stage appears entirely achievable in the next three weeks.
34
For next week we rolled over User Stories relating to biggest month over month changes in total cost
and total sales. These items have been added to the backlog for next week and will be started in Sprint
4.
Story Points Completed: 162
Hours Worked: 143.5
Velocity: 112.89%
Sprint Retrospective Meeting
What Worked Well
• Good communication with individuals in the firm who are not the Project Sponsor.
• Less time was spent planning, and more time was spent on pursuing User Stories, this greatly
improved our velocity
• Our team has started to understand how to use the Pandas package effectively in Python.
What Could Be Improved
• Show more work to our project sponsor in context. This helped our sponsors understanding of
our progress and the value of our work.
Project Risks
Project Risks Sprint 3
Sprint 4
User Stories Completed: As a firm accountant, I want to know the average change over any time period for MOIC, Gross Profit,
Total Cost, Total Sales, after specifying a Strategy so that I can understand changes in Strategies over
time.
As a firm accountant, I want to see the biggest month over month change in TotalCost at the Strategy
code level, so I can make an informed decision about investing.
As a firm accountant, I want to see the biggest month over month change in TotalSales at the Strategy
code level, so I can make an informed decision about investing.
As an accountant I want to be able to have a report that updates automatically, so I always have the
most up to date information.
35
As a firm accountant, I want to see if Strategies in funds with end dates are monetized, so that I can
determine why.
As a firm accountant, I want to know if a Strategy in a fund is monetized and whether it has no quantity
and no market value, so that I can determine why there is a notable change in the data.
As an accountant I want to see the biggest sycode move for any strat code so I can further analyze that
strat code.
As an accountant I want to see when MOIC and IRR are moving in opposite directions so I can further
analyze the story associated with it.
As an accountant I want to know when Gross Profit does not change and there are many transactions
As an accountant I want to know when RemMV changes and there are many transactions
As an accountant I want to know the month to month price changes for a sycode, so that I can see the
biggest moves in sycode price.
As a firm accountant, I want to check if there are begin dates for strategies, so that I can see why there
might be none.
As an accountant I want to see if a monetized portfolio has a terminal value OR RemMV which changes
from 0 to any number.
As a firm accountant, I want to see if strategies in funds with a terminal value of 0 are monetized, so that
I can determine why.
As a firm accountant, I want to see which strategies are new, so that I can determine which strategies do
not have previous data.
As a firm accountant, I want to see if a sycode belongs to multiple strategies, so that I can determine
how to override the data.
As a firm accountant, I want to see whether prices for sycodes changes across funds, so that I can see if
there were inconsistencies in the data.
Sprint Review
As a result of meeting with accountants Firm Accountant 1 and Doug Mackenzie on Tuesday, we were
able to further refine the needs of our future users. During the meeting we discussed which checks are
performed on the data, the order the checks are performed, and developed an understanding of the
priority of these checks. After the meeting, Firm Accountant 1 sent the Excel files currently used to
perform these checks. Using these sheets and the recording of our meeting we created an outline of all
the validation checks to be performed on the data.
36
Wednesday and Thursday, we developed functions to execute these checks. Thursday afternoon and
Friday were spent integrating and re-validating these checks. Due to difficulties with integration, this
took longer than expected, as a result we were unable to ship a revised dashboard Friday. We will work
to complete this Monday and will review it with the project sponsor. After this review we will have
another meeting with Evan and possibly other members of the accounting department to receive
feedback on the dashboard.
Multiple user stories were incomplete which prevented the dashboard from coming together at the end
of the week. Some of the stories we rolled over to next week related to the human readable
commentary and drillthrough capability. Drillthrough was confused with drilldown which resulted in a
false completion of a User Story.
Story Points Completed: 168
Hours Worked: 126.5
Velocity: 132.81%
Sprint Retrospective Meeting
What Worked Well
• Improved morale
• Realistic goals were set
What Could be Improved
• Better planning for integration, it took far longer than expected because of poor planning of
code.
• Re-use others work, there is no need to re-invent the wheel.
• Narrow scope to allow a finished product by the end of the week.
Project Risks
Project Risks Sprint 4
Sprint 5
User Stories: As an accountant I want to see human readable commentary on which Strategy codes influenced the
portfolio, and moved the most
37
As a firm accountant, I only want to see BKRT, so that I can make decisions on a more meaningful
dataset
As a firm accountant, I want to stratify the data by Region, so I can gain insights about the progress of
each region.
As an MQP student, I want to structure the final paper, so it accurately describes our work at the firm.
As a firm accountant, I want to drill through alerts, so that I can prove that an alert is valid.
As a firm accountant I want to be able to use filters on every page for common fields such as portfolio,
Business Unit, Strategy, Region, Sycode, ALERT ATTRIBUTE so that I can universally filter displayed data
As a firm accountant, I want to see projected values based on historical data like standard deviation and
linear regression, so that I can determine if my numbers are in a reasonable range.
As a firm accountant I want to see the Alert Description without the Extra linked column Visible so that
the view is less cluttered
As an MQP team member, I want to refresh and update our paper-omit previous technologies used and
write about the new technologies used.
As a firm accountant I want to be able to see The DealName as well as StrategyCode Because I know
Deal Names better than Stratcodes
As a firm accountant when Viewing the GP No-Change Transactions Rules I want to be able to drill down
to transactions
As a firm accountant I want Extreme IRR Values to be Filtered Out (Perhaps greater than 1000) Before
the Standard Deviation Is calculated So that I only see useful Data
As a frim trainee I want to meet with Users of the dashboard to better understand their needs
Sprint Review
This Sprint we showed our progress to our sponsor twice and had another meeting with two members
of the accounting team. These sessions were brief but helped us design the Power BI interface. We used
these meetings to hear direct feedback on the state of our dashboards. As a result of a meeting early in
the week, the data structure of our project had to be consolidated to enable the 'Drillthrough' feature in
Power BI. Also, as a result of this meeting we were informed of additional tables in the database which
specify deal-name and region. The accountants use these fields very often, so joining them to our main
table will make the data far easier to understand and manipulate.
During our last stand up with our sponsor this week we received feedback focused mostly on the
presentation of the data in Power BI. Usability is a key sponsor concern. Also, of note in this meeting is a
feature request to use portfolio as a filter in one of our pages.
Our team has concerns on the feasibility of implementing this capability because of the complexity it
would introduce in the back-end data processing. We will communicate our concerns at the start of
Sprint 6. Scheduling would be affected, and it has become a priority to minimize changes to the backend
for two reasons: First, we need to focus time on improving the existing Power BI interface and secondly
38
because changes to our processing and organizing of the data on the backend have the potential to
break our interface. We also started the outline and planning of our final paper.
Many of the items added to the backlog this week were larger formatting tasks which will not be
confirmed to be done until the project is nearly complete. For instance, when a new table is added in
PowerBI, grand totals are added by default. Until no more tables are added we cannot be sure that
there are extraneous and irrelevant grand total fields. This is also true of larger uniformity in formatting,
such as making all the headers the same.
Story Points Completed: 151
Hours Worked: 151.15
Velocity: 99.99%
Sprint Retrospective Meeting
What Worked Well
• Using the pair programming technique allowed for more effective collaboration and better
communication across the team.
• Splitting tasks and User Stories amongst the team has become easier and more natural.
• Working more in the front end is gratifying.
What Could Be Improved
• Estimating the time, a task needs to be completed and conveying that number to others.
Essentially performing a better real time story point allocation and communication when tasks
are in progress and perhaps running long.
• Communicating current objectives casually could be improved so that all members of the team
have a sense of direction.
• Planning Power BI usage for versioning and collaboration purposes is important because only
one person can edit it at the same time.
Project Risks
Project Risks Sprint 5
Sprint 6
User Stories: As a firm accountant I want to Disable Grand totals on non-Applicable Fields.
39
AS a firm accountant I want the ToolTip on Diffs to show the two values used to calculate the DIFF.
As a Project Sponsor, I want to see when a MOIC and IRR are different in my own terms, so that I can be
alerted when it happens.
As a firm accountant I want the Closed_fund_transactions field to be renamed as the
monetized_stratcode_with_transactions and to only check for non-null values in the RESID Column.
As an accountant, I want to see DealName and StrategyRegionofRisk as columns and as filters so I can
effectively analyze the data and utilize the PowerBI Dashboard.
As an accountant, I want to be able to filter on portfolio (including ALL), so that I can assess strategies on
a general level.
As an accountant I want to see the absolute value of all Diffs so that I can sort them.
As an accountant I want to see relevant usable filters on each report page so that I can filter the data
appropriately.
As an accountant, I want to just see values where the alert is true, so I see data respective to that alert.
As an accountant I want a Top Level Summary page that contains the data and a well organized way to
access alerts.
As a developer, I want to learn how to properly use the slicer to arrange data to the accountant's
satisfaction.
As an accountant, I want to see a flat list of transitions: raw data.
As a user of the PowerBI dashboard, I want the columns names to be to be easier to understand, so I can
better understand the data is represented.
As an accountant, I want to see a description of each page, so I understand how to use the data
provided and further understand the alert and its check.
As a firm developer I want to see commented Code so that I can maintain the software.
As a firm developer I want to add "changes" to IRR, MOIC, Buttons so that the RAW explorer is more
useable.
As a firm accountant I want to filter the entire Report by Investment type so that I do not see irrelevant
cash transactions.
As a project sponsor I want the headers of each page to be the same on every page so that there is
consistency in design.
As an accountant I want to see commentary for all all return periods 1, 3, 5 year to date.
As a project sponsor for each dealname I want to see the Average Min Max LR for IRR, MOIC.
As a project Sponsor I want to see a graph of time series data with IRR GP MOIC all in one visual.
As a project sponsor I want to see a descending sort of a diff between current value and average Value.
40
Sprint Review
This Sprint was focused on revisions to the user interface. After Wednesday no more changes were
made to the backend code, and the feature we were concerned about implementing last week was
added within our time constraints. This week we went from demoing once or twice per week to nearly
every day with our project sponsor. This compressed feedback loop let us make the many small changes
needed to improve the user interface much faster. These changes focused on formatting and the overall
flow of the user through the interface. A key challenge was providing enough information to summarize
performance, while not overwhelming the user, all while also giving the user transparency into how the
values were calculated. Towards the end of the week we received some informal feature requests over
email, these features were implemented by review time Friday. We plan to not develop any further
features after this week to stay on track. Our sponsor understands this and will be working with us to
assist in refining our presentation next week.
Story Points Completed: 185
Hours worked: 146
Velocity: 126
Sprint Retrospective Meeting
What Worked Well
• Our goals and expectations we achievable and realistic for the time we have left.
• Scheduling of essay allowed for early professor feedback
• Advisor feedback is positive, which is a good indication of project status.
What Could Be Improved
• We should try to avoid pursuing low clarity instructions without asking for more information,
because it is unlikely we will be able to meet expectations.
• We need to better communicate technical limitations of Power BI.
Project Risks
Project Risks Sprint 6
41
Weekly Burndown
42
43
8. Testing
8.1 Quality Assurance Procedure
For Thread 1, the team used the intermediary Excel file to determine if the numbers were correctly
displayed in the Winners and Losers Report. We found key values for TotalSales and their associated
Deal Names to do a quick check on the validity of the data. In addition, the sponsor verified that the
newly produced column in the report was correct.
In Thread 2, testing was more complex than quickly determining if numbers had been copied over. Since
accountants were one of our primary users, we attempted to use their accounting procedure and
former Excel Sheets to check if our validation alerts had produced similar information. When we tried to
compare our numbers, however, we realized that the accountants’ files had a series of overrides that
were futile to replicate. Our sponsor later told us to not use their numbers as we would waste time
implementing overrides. As a result, we had separate notebooks where we would redo different alert
entries using SQL queries instead of using Pandas. For example, to prove that a certain Strategy had a
GrossProfit change with no significant transactions in the given PeriodEndDate, we queried
reporting.irr_results to prove the GrossProfit change for the given Strategy. Then, we queried
reporting.irr_mod_cashflows to see that the Strategy contained no significant transactions with a
TradeDate within the time of the PeriodEndDate. In addition, we produced sanity check columns for
Power BI, so that we could see the inputs of certain calculations such as month over month changes.
8.2 User Feedback
The team scheduled regular demos with the sponsor to receive user feedback on the accuracy and
usability of the dashboard. These regular meetings allowed us to quickly see initial reactions from the
sponsor and write new User Stories. The stories would then be used to fine tune the user interface and
back-end accordingly.
The firm accountants demoed early versions of the dashboard during interviews, which can be found in
Appendix C and D. We had two major meetings with the accountants that revealed what they value and
how they validate their data.
In the first meeting, we learned about what suspicious activity should be flagged such as IRR and MOIC
movements in opposite directions. We also learned in this meeting how the accountants use a series of
interconnected Excel spreadsheets to flag alerts and generate the commentary for the IRR Analytic
Report. In addition, we got a glimpse of their workflow and what accountants prioritize when validating
data.
In the second meeting, the accountants interacted with a prototype of the dashboard and relayed their
first impressions. They initially did not like the validation section but were interested in the section when
we explained the drillthrough functionality. In addition, they liked the drilldown functionality in Power BI
because it was an intuitive way to navigate the large tables.
44
9. Future Work
9.1 Thread 1
9.1.1 Modularize Strategies Further
Although the current code base modularizes strategies for report customization, the strategies could be
further modularized in column_Strategy.py. The main concern is that the user can only call one Strategy
per header. These strategies are highly specific to headers, which means that creating one requires
intermediate coding knowledge. As a result, developing multiple, modular strategies that can be mixed
and matched (i.e. absolute value, in millions, in billions, in percentage etc.) will likely generate a simpler
user experience for generating custom columns. Keep in mind that some strategies may be so unique to
the data set that they cannot be easily modularized.
9.1.2 Modularize Pre-Processing Functions Further
While implementing the “All Other Positions” portion of the “Invested Capital Column” in
preprocessing_factory.py, we noticed that the file is similarly organized to column_Strategy.py.
However, the code that generates the “All Other Positions” is less modular because it is a helper
function called by the class ConcatLowerBPSProcessing(). As a result, creating multiple, more modular
functions that can be mixed and matched may allow for users to easily customize the template. Some
pre-processing functions may be so unique to the template that they cannot be easily modularized.
9.1.3 Determine User Base The current generator relies heavily on both the initial Excel Template and Python code. As a result,
potential users who are not familiar with Python and software development may have issues editing the
code base to suit their needs. Since the firm may ask non-technical employees to perform report
generation in the future, it is key to determine who will be using the software before further
development. This determination will dictate how to develop the software in a manner that is easy and
appropriate for the user base. Depending on the user base, a solely Excel or Python implementation may
be needed.
9.2 Thread 2
9.2.1 Add More Timeseries Data to Datalake As of December 2019, the data lake only contained PeriodEndDates from mid 2018 to late 2019. To
develop the dashboard, we examined 8/2019 and 7/2019, because 9/2019 and 10/2019 did not have as
much data. As a result, adding more data would allow for a recent analysis of the latest PeriodEndDate
and accurate historical analysis. In addition, more data could be used to train a machine learning model
and perform further analysis.
9.2.2 Schedule Script We designed the dashboard to support accountants in their validation of the latest PeriodEndDate. To
put the Azure Validation Dashboard into production, we recommend that the firm run the script for the
latest PeriodEndDate using the PeriodEndDate selector widgets in Databricks.
45
9.2.3 Add More Alerts and Analysis As of December 2019, we built 17 alerts into the Power BI dashboard. We also laid the foundation for
others. For example, we calculated the linear regression prediction of GrossProfit, IRR, and MOIC for the
latest PeriodEndDate. An additional alert could be designed to find the difference between the actual
value versus the predicted value for the latest PeriodEndDate.
9.2.4 Add More Fields to Data Frame After working in the data lake, we realized we only used a small portion of the many fields in the tables.
The fields in the current dashboard were required by our sponsor, but we imagine that even more
analysis could be performed if more fields were introduced. In addition, we recommend adding all the
ReturnPeriods in the irr_timeseries script as we only included ITD and YTD at our sponsor’s request.
9.2.5 Create Summary Page Although we created a wide range of alert reporting pages, these functions are not prioritized or readily
accessible when opening the dashboard. In its current state, it would take at least 35 clicks to view every
possible alert. A streamlining of the user interface is necessary to improve the workflow and reduce the
amount of time required to view alerts. A future redesign could reduce the number of interactions
required to see the most important alerts. This is a considerable challenge due to the rigidity of
designing in Power BI and because assigning priority to each alert will require a deeper understanding of
their importance.
46
10. Learning Assessment
10.1 Challenges
1. Identifying Requirements
Before we started our project in New York, we had a general idea of what we had to do. Due to security
reasons, we were not able to see how the data was structured until we got to New York. We understood
that we needed to edit the previous MQP’s code and develop new ways to log and analyze data, but
many of the details were not clear. When we arrived in New York, we realized that some of our
requirements had changed. In Thread 1, we planned to upgrade the report generator by implementing
XML, but we quickly realized after looking at the code, that the previous MQP team had already
implemented a Python package utilizing XML. In Thread 2, we were tasked with completing one out of
the three sub threads that we had planned. Additionally, some of the requirements such as using Power
BI as the primary front-end tool was not clearly established until mid-way through the project.
In response, we attempted to clarify requirements with the sponsor and engaged in conversations on
what we had to do. Although the conversations gave the team new insights, these insights would
occasionally conflict with other requirements. Eventually, by having regular product demonstrations
with the sponsor along with an agile mindset, the team was able to determine requirements, gain
actionable feedback, and move forward.
2. Planning VS Execution
During the second Sprint, the team planned the project after getting the initial overview of Thread 2. We
created diagrams and wrote User Stories for four hours. As we began to execute our plan, our
requirements rapidly changed mid-sprint, and much of our planning was not applicable to the project.
On the flip, the team began to develop the back-end tables without considering the limitations of Power
BI. Overall, the team was challenged to find the balance between planning and executing.
After experiencing both extremes, the team realized that shorter planning and execution cycles with
daily feedback was most effective. By receiving our sponsor’s reactions on smaller chunks of our user
interface and back-end code, we were able to align ourselves more with our sponsor’s needs.
3. Domain Knowledge Although the team had some financial literacy and a rough idea of the firm’s asset organization, we
struggled to understand the entirety of the system. Different in-house column headers frequently
confused us as we worked on the back-end structure. Even though our sponsor clarified many terms for
us, we did not interact with most of the columns in the datasets. Although many of the columns were
not relevant to the project, we frequently wondered if we were missing information. Towards the end of
the project, we added Deal Name to our Data Frames since our sponsor requested it. While the task was
easy to complete, the field was stored in an obscure table that we would not have found on our own.
4. Optimization
Despite having some experience with the Pandas Python library, the team had to research how to use
the library correctly. Initially, the team used for-loops to analyze the data. However, we quickly learned
that Pandas was created for vectorization. When trying to optimize our functions, we attempted to learn
best Pandas practices, but we did not fully understand the program. We eventually asked for help from
47
a software developer at the firm, and he showed us the groupby function and the apply function. By
using these functions, we were able to analyze large chunks of data in a shorter amount of time.
10.2 Learnings
10.2.1 Computer Science
Technologies
Throughout the project, the team learned how to adapt and use the firm’s technologies. These
technologies included the Azure Data Lake, Databricks, Pandas, and Power BI. While the team had
familiarity with Python and SQL, we had programmed and interfaced with the data lake and Power BI.
By speaking with employees of the firm, we were able to ask about the company’s best coding practices,
development setup, and advice on how to write in Pandas and Databricks. The team, however, did not
have as much support when working in Power BI. We relied on YouTube tutorials, Microsoft
documentation, and experimentation to develop the final deliverable.
Optimization
Vectorization and GroupBy
During development of various Data Frames, we understood our commands had to run relatively quickly
and utilize the matrix feature of Pandas. At first, we used for-loops and Pandas’ version of for-loops to
iterate through the large matrices. Although our for-loop code produced accurate results, the
commands were relatively slow for large datasets. A software developer at the firm suggested using
different groupby techniques to apply functions on an entire column or group as opposed to rows.
When these techniques were implemented, the commands cut our run times exponentially. As a result,
the team coded with vectorization in mind and structured the Data Frames with temporary columns to
allow for quick calculations and analysis.
Query Optimization
Each team member worked individually on different alerts, so each person wrote their own SQL queries.
When merging the team’s code together into one notebook, we realized the commands took a
considerable amount of time. After running some tests on the code base, we learned that some SQL
queries took minutes to complete, while Pandas commands executed in a tenth of a second. As a result,
the team extracted their SQL commands and made four relatively large Data Frames at the beginning of
the program to be shared amongst the different alerts. By completing this task, we significantly cut
down our run times.
Integration
Midway through the project, our goal was to link our alerts table with the raw data table in Power BI.
We discovered that Power BI could only join tables on one field with a 1-to-1 relationship and would
allow for drillthrough only on that one key. As various alerts needed to be joined on different sets of
keys, we realized we had to re-design our entire Data Frame. At first, we tried creating a unique
identifying key for each row, but we realized that this system would not be able to provide enough
context for drillthroughs. In a similar manner, we then tried to create a column for each alert type in the
raw data table. We also entertained the idea of creating a customized raw data table for each alert type.
While the idea might have worked in Power BI, we quickly dismissed the idea because of the lack of
extensibility. Eventually, we realized we had to merge our alerts table into our raw data table. We
refactored our commands to allow for the merge, and by doing so, we were able to avoid the Power BI
48
join process and allow for immediate drillthrough. Because of this experience, the team learned to be
agile after many failed attempts and aware of the limitations of integration with another program.
10.2.2 Project Management
Working with clients
Throughout this project, we learned how to work with clients. The process of translating abstract ideas
into concrete business and functional requirements in the real world is very different than any
experience in a classroom. Using a variety of techniques, our team refined our ability to ask the right
questions and determine what the client and end users were really interested in. As we became more
familiar with the software environment, data structures, and financial terms, it became easier to identify
the needs of the client.
Additionally, we learned that taking good notes even during small interactions with our project sponsor
helped us keep a good record of feedback. This allowed us to triangulate a solution from all feedback
with a bias towards what feedback was most recent. If we only pursued what was mentioned at the
most recent meeting, as we often did early in the project, we would start many items, complete few,
and overall set difficult-to-achieve goals.
We also learned to speak in the terms of the user. As we became acclimated to the business
environment of the firm, we picked up on many of the terms used in the industry to communicate key
information. By learning the definitions of these terms, we began to have far more productive
conversations when gathering requirements for future versions.
Finally, we found that stating our interpretation a sponsor’s directive and asking if we were correct, was
a productive way to determine if we understood what was communicated. This technique allowed us to
catch any misunderstandings. Getting confirmation early and often was a constant theme throughout
the project.
Iterative development with feedback
Throughout our project experience, it was apparent how important it was to receive feedback on
prototypes quickly and consistently. A large portion of the early project work was spent focusing on
back-end development with minimal feedback from users. Once we developed an initial prototype, we
were able to make greater improvements to our overall product once the user had the product in their
hands. Although this experience confirms the value of user input - a main tenant of the agile
methodology, our project was developed with less initial user input because of other factors. The main
factor was the connection between the user-interface and the back-end: Due to the way Power BI
connects to Databricks, the table we exported from Databricks had to have the same name and header-
names, or every visual would need to be rebuilt. We spent a considerable amount of time understanding
this connection.
10.3 What we would do differently
1. Determine needs of client – priorities and whether it is a want or a need To begin, we could have improved the methods by which we gathered requirements for the Power BI
dashboard. We were able to meet with the sponsor and the accountants on separate occasions to get
their feedback related to our product; however, we spent hours creating and fine-tuning features for the
dashboard that were later discarded. While meeting with our client, we tried to avoid this issue by
49
prioritizing features based on accountant feedback. Instead, we could go about gathering requirements
by asking the accountants which features were “wants” versus “needs.”
2. Establish capability of tools with client The team learned that it is essential to communicate technical limitations when developing project for a
client. Although we were not experts in any of the technologies used, we gained experience, and it
became clear that some functionality desired by the sponsor would be either impossible or very time-
consuming to develop. As a result, we would convey the limitations of the tools to the sponsor early in
the design process and thus close the expectations gap between the team and the project sponsor.
3. Testing Midway through the project, we received a set of validation files from the firm’s accountants, however,
we were advised by our sponsor to not use their numbers for tests. The team learned that the
accountants applied many complex and nuanced overrides that would have taken too much time to
replicate. As a result, we did not have a set of ground truths to test our code with. Instead, the team
tested alert calculations by running independent SQL queries. If we were to do the project again, we
would put more priority on asking for usable tests. The lack of an official ground truth created some
confusion for the team and thus slowed down development.
4. Team Communication Towards the last few weeks of the project, the team had to focus on developing a testable Power BI
dashboard involving various tasks and thus work longer hours. During this time, there was a general
concern about how long we would stay at the office. Although we agreed to work on certain items until
they were finished, we knew that we needed to set time expectations with one another. In retrospect,
we would have established more expectations regarding how long to stay at the office and proactively
establish the priority of certain tasks.
5. Technical Mentors Throughout the project, we met with several firm employees who gave us coding tips, set up tutorials,
and provided feedback on our dashboard. Each time we met with them, we learned how to approach
problems in new ways and gathered clear project requirements. As a result, we feel that having more
conversations with firm team members would have benefited the team greatly and may have increased
our productivity.
50
11. Conclusion
While at the firm, the team improved the Winners and Losers report generator and developed an Azure
Validation Dashboard. By adding documentation to the Winners and Losers report generator, we were
able to help future firm employees maintain the code base. By building the Power BI dashboard, we
provided the firm’s analysts with robust and transparent calculations in a cloud-independent
environment.
Although we faced many challenges such as identifying requirements, planning appropriately, learning
domain knowledge, and optimizing our code base, we were able to overcome them by planning with the
end user in mind, iteratively developing with regular feedback, and learning powerful new tools.
At the end of the project, we were able to present our deliverables to our sponsors and exceed their
expectations.
51
Works Cited
Appelo, J. (2010, October 26). Agile Goal Setting. Retrieved from https://www.infoq.com/articles/agile-
goal-setting-appelo/.
Atlassian. (2020, January 3). Atlassian Documentation. Retrieved from
https://confluence.atlassian.com/.
Beck, K., Beedle, M., Bennekum, A. van, Cockburn, A., Cunningham, W., Fowler, M., … Thomas, D.
(2001). Manifesto for Agile Software Development. Retrieved from https://agilemanifesto.org/.
Boyanov, A. (2020). Python Design Patterns: For Sleek and Fashionable Code. Retrieved from
https://www.toptal.com/python/python-design-patterns.
Databricks. (2019). Apache Spark. Retrieved from https://databricks.com/spark/about.
Dennis, A., Wixom, B. H., & Roth, R. M. (2015). Systems Analysis and Design, 6th Edition. Hoboken, NJ:
Wiley.
Eriksson, D. (2016). Compliance for Hedge Funds. Retrieved from
https://thehedgefundjournal.com/compliance-for-hedge-funds/.
Garnick, N., & Klein, A. (2019, May 29). [Hedge Fund Company] Raises over $2.75 Billion for Most Recent
U.S. Real Estate Fund.
Gonçalves, L. (2019, September 1). Burndown Chart - The Ultimate Guide for every Scrum Master.
Retrieved from https://luis-goncalves.com/burndown-chart-ultimate-guide/.
Gupta, D., & Moore, K. (2019). Finite State Machines. Retrieved from https://brilliant.org/wiki/finite-
state-machines/.
Hayes, A. (2019, June 3). Internal Rate of Return – IRR. Retrieved from
https://www.investopedia.com/terms/i/irr.asp.
[Hedge Fund Company]. (2019a). About.
[Hedge Fund Company]. (2019b). History.
Lavanya, N. & Malarvizhi, T. (2008, March 3). Risk analysis and management: a vital key to effective
project management. Retrieved from https://www.pmi.org/learning/library/risk-analysis-project-
management-7070.
Microsoft Azure. (2019a). Data Lake. Retrieved from https://azure.microsoft.com/en-au/solutions/data-
lake/.
Microsoft Azure. (2019b, May 7). What Is Azure Databricks? Retrieved from
https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks.
Microsoft. (2020). Turn Data into Opportunity. Retrieved from https://powerbi.microsoft.com/en-us/.
Murray, B. (2019, August 27). SKB, [Hedge Fund Company] Pocket $67M in Northern California.
52
Owler. (2020). The firm’s Competitors, Revenue, Number of Employees, Funding and Acquisitions.
Pandas. (2019, November 9). pandas: powerful Python data analysis toolkit. Pandas, 18 Jan. 2019,
https://pandas.pydata.org/pandas-docs/version/0.25/.
Python Software Foundation. (2020). Python 3.7.6 Documentation. Retrieved from
https://docs.python.org/3.7/.
Radack, S. (2009, April 01). The System Development Lifecycle (SDLC). Retrieved from
https://csrc.nist.gov/CSRC/media/Publications/Shared/documents/itl-bulletin/itlbul2009-04.pdf.
Rubin, K. S. (2013). Essential Scrum: a practical guide to the most popular agile process. Upper Saddle
River, NJ: Addison-Wesley.
Securities and Exchange Commission. (2012, October 3). Investor Bulletin: Hedge Funds. Retrieved from
https://www.investor.gov/additional-resources/news-acuritilerts/alerts-bulletins/investor-bulletin-
hedge-funds.
53
Appendix
APPENDIX A: User Stories
Key
Epics
1 - Improve Winners Losers Report Generator
2 - Azure Validation Dashboard
Epic 1 Themes
1 – Update Documentation
Epic 2 Themes
1 - Validate Data
2 - Present Interactive Raw Data
3 - Generate Performance Commentary
4 - Integrate Datalake into PowerBI
5 - Design User Experience
6 - Write Documentation
User Story Theme Epic Story Points Sprint
As a firm analyst, I want to add columns in the Excel template, so that I don't have to manually edit the report.
1 1 16 1
As a firm analyst, I want to populate the modified template with data corresponding to the column names, so that I don't have to manually input data into the report.
1 1 24 1
As a firm analyst, I want to delete columns in the Excel template, so that I don't have to manually edit the report.
1 1 16 1
As a firm analyst, I want to modify columns in the Excel template, so that I don't have to manually edit the report.
1 1 24 1
As a firm employee, I want to learn how to use the report system, so that accounting can manually produce reports.
1 1 24 1
As a firm employee, I want to be guided through the use of the win-loss reporting system so that I can change the output of the report.
1 1 12 1
54
As a firm developer I want a Video Tutorial to Guide me through adding a column to the Win-Loss Report so that I can use the report generator more effectively.
1 1 3 2
As a firm analyst I want to be able to choose variable months for my diff report so that I can validate any month pair in the database.
1 2 24 2
As a firm trainee, I want to find the difference in IRR over 1 month, so that I can learn how to use DataBricks and interact with the DataLake.
1 2 12 2
As a firm analyst I want to know if a Strategy switched from being a gain to a loss or vice versa so I can recognize performance changes which affect the overall fund.
3 2 12 2
As a trainee I want to be able to get basic information from monthly data so that I can decide what is valuable to include in the report.
3 2 12 2
As a firm analyst, I want to see the biggest month over month change in IRR at the strategy code level over the last N months, so I can make an informed decision about investing.
1 2 4 2
As a firm accountant I want to know if remMV values changed when there was no trading activity so that I can check if the data is correct.
1 2 12 2
As a firm analyst I want to know if any terminal values went to 0 over the last month so I am aware of any closed positions in a fund.
1 2 12 2
As a firm analyst, I want to see the biggest month over month change in MOIC at the strategy code level over the last N months, so I can make an informed decision about investing.
1 2 4 2
As a firm analyst, I want to see the biggest month over month change in GrossProfit at the strategy code level over the last N months, so I can make an informed decision about investing.
1 2 4 2
As a firm analyst, I want to know the "Buy and Sell" transactions over the last month, so I can make an informed decision about investing.
1 2 8 2
As a firm accountant I want to know if any terminal values changed when there was no Buying or Selling activity because this is indicative of incorrect Data copy.
1 2 36 2
As a firm accountant, I want to be able to see the largest difference in IRR between two months so that I do not have to manually find it.
3 2 12 3
As a firm accountant, I want to be able to see the largest difference in Gross Profit between two months so that I do not have to manually calculate it.
3 2 12 3
As a firm accountant, I want to know which Strat codes changed from ongoing to monetized from one month to
3 2 8 3
55
the next to understand which stratcodes affect overall fund performance.
As a firm dev I want to know if the total terminal value is zero because if it is it should be monetized.
1 2 10 3
As an accountant I want to be able to drill down in the raw files so that I can see where the data may be incorrect.
2 2 18 3
As a firm analyst I want to filter down to specific funds so that I can perform more accurate validation checks.
2 2 6 3
As a developer I want to figure out how this report to connect dataframes from Databricks to PowerBI and create tables out of dataframes so that I don't have to manually create a report.
4 2 48 3
As a firm accountant I want to see data points that are outside a number of standard deviations from what are normal so that I can identify extraneous data.
1 2 24 3
As a firm accountant, I want to see missing data (IRR, MOIC, GrossProfit, Total_Cost, Total_Sales, Total_Terminal_Value) in irr_results and irr_mod_cashflows for a certain month, so that I fix them.
1 2 24 3
As a firm trainee I want to understand which analysis are performed on what tables .
1 2 4 4
As an accountant I want this report to be easy to use so that I can accurately check company reporting.
3 2 12 4
As an accountant I want to take the sum of inflow values in cashflow, so I can use that to further analyze cashflow data.
3 2 4 4
As an accountant I want to take the sum of outflow values in cashflow, so I can use that to further analyze cashflow data.
3 2 4 4
As an accountant I want to take the sum of total terminal values in cashflow, so I can use that to further analyze cashflow data.
3 2 4 4
As an accountant I want to be able to drill down from an alert to the specific information in the results or cashflows table.
4 2 12 4
As a firm accountant, I want to know if a strategy in a fund is ongoing and if it has quantity or accrued interest, so that I can determine why there is a notable change in the data.
1 2 12 4
As a firm accountant, I want to know when there is a break in the time series for strategies, so that I can determine why there is a break.
1 2 12 4
As a firm accountant, I want to see a line graph showing the changes in price for a given SyCode, so that I can predict where it might go next.
1 2 4 4
As a firm accountant, I want to know the average change over any time period for MOIC, Gross Profit, Total Cost,
3 2 48 4
56
Total Sales, after specfying a strategy so that I can understand changes in Strategies over time.
As a firm accountant, I want to see the biggest month over month change in TotalCost at the strategy code level, so I can make an informed decision about investing.
3 2 12 4
As a firm accountant, I want to see the biggest month over month change in TotalSales at the strategy code level, so I can make an informed decision about investing.
3 2 12 4
As an accountant I want to be able to have a report that updates automatically so I always have the most up to date information.
4 2 8 4
As a firm accountant, I want to see if strategies in funds with end dates are monetized, so that I can determine why.
1 2 4 4
As a firm accountant, I want to know if a strategy in a fund is monetized and whether is has no quantity and no market value, so that I can determine why there is a notable change in the data.
1 2 8 4
As an accountant I want to see the biggest sycode move for any strat code so I can further analyze that strat code.
3 2 4 4
As an accountant I want to see when MOIC and IRR are moving in opposite directions so I can further analyze the story associated with it.
1 2 12 4
As an accountant I want to know when GP does not change and there are many transactions.
1 2 8 4
As an accountant I want to know when RemMV changes and there are many transactions.
1 2 8 4
As an accountant I want to know the month to month price changes for a sycode, so that I can see the biggest moves in sycode price.
1 2 8 4
As a firm accountant, I want to check if there are begin dates for strategies, so that I can see why there might be none.
1 2 6 4
As an accountant I want to see if a monetized portfolio has a terminal value OR RemMV which changes from 0 to any number.
1 2 12 4
As a firm accountant, I want to see if strategies in funds with a terminal value of 0 are monetized, so that I can determine why.
1 2 6 4
As a firm accountant, I want to see which strategies are new, so that I can determine which strategies do not have previous data.
1 2 4 4
As a firm accountant, I want to see if a sycode belongs to multiple strategies, so that I can determine how to override the data.
1 2 2 4
57
As a firm accountant, I want to see whether prices for sycodes changes across funds, so that I can see if there were inconsistencies in the data.
1 2 6 4
As an accountant I want to see human readable commentary on which strategy codes influenced the portfolio, moved the most.
3 2 12 5
As a firm accountant, I only want to see BKRT, so that I can make decisions on a more meaningful dataset.
3 2 2 5
As a firm accountant, I want to stratify the data by Region, so I can gain insights about the progress of each region.
1 2 18 5
As an MQP student, I want to structure the final paper so it accurately describes our work at the firm, and so it is not based solely on our proposal.
6 2 6 5
As a firm accountant, I want to drill through alerts, so that I can prove that an alert is valid.
1 2 60 5
As a firm accountant I want to be able to use filters on every page for common fields such as portfolio, Business Unit, Strategy, Region, Sycode, ALERT ATTRIBUTE so that I can universally filter displayed data.
1 2 2 5
As a firm accountant, I want to see projected values based on historical data like standard deviation and linear regression, so that I can determine if my numbers are in a reasonable range.
3 2 20 5
As a firm accountant I want to see the Alert Description without the Extra linked column Visible.
1 2 4 5
As an MQP team member, I want to refresh and update our paper-omit previous technologies used and wrote about the new technologies used.
6 2 4 5
As a firm accountant I want to be able to see The DealName as well as StrategyCode Because I know deal names better than Stratcode.
5 2 2 5
As a firm accountant when Viewing the GP No-Change Transactions Rules I want to be able to drill down to transactions.
1 2 12 5
As a firm accountant I want Extreme IRR Values to be Filtered Out (Perhaps greater than 1000) Before the Standard Deviation Is calculated So that I only see useful Data.
1 2 3 5
As a firm trainee I want to meet with Users of the dashboard to better understand their needs.
5 2 6 5
As a firm accountant I want to Disable Grand totals on non-Applicable Fields.
5 2 5 6
AS a firm accountant I want want the ToolTip on Diffs to show the two values used to calculate the DIFF.
5 2 8 6
58
As a project sponsor, I want to see when a MOIC and IRR are different in my own terms, so that I can be alerted when it happens.
1 2 4 6
As a firm accountant I want the Closed_fund_transactions field to be renamed as the monetized_stratcode_with_transactions and to only check for non-null values in the RESID Column.
1 2 4 6
As an accountant, I want to see DealName and StrategyRegionofRisk as columns and as filters so I can effectively analyze the data and utilize the PowerBI Dashboard.
2 2 4 6
As an accountant, I want to be able to filter on portfolio (including ALL), so that I can assess strategies on a general level.
1 2 24 6
As an accountant I want to see the Absolute value of all Diffs so that I can sort them.
5 2 12 6
As an accountant I want to see relevant usable filters on each report page so that I can filter the data appropriately.
2 2 4 6
As an accountant, I want to just see values where the alert is true, so I see data respective to that alert.
5 2 4 6
As an accountant I want a Top Level Summary page that contains the data and a well organized way to access alerts.
5 2 24 6
As a developer, I want to learn how to properly use the slicer to arrange data to the accountant's satisfaction.
5 2 3 6
As an accountant, I want to see a flat list of transitions: raw data.
2 2 3 6
As a user of the PowerBI dashboard, I want the columns names to be to be easier to understand, so I can better understand the data is represented.
5 2 20 6
As an accountant, I want to see a description of each page, so I understand how to use the data provided and further understand the alert and its check.
2 2 12 6
As a firm developer I want to see commented Code so that I can maintain the software.
6 2 3 6
As a firm developer I want to add "changes" to IRR, MOIC, Buttons so that the RAW explorer is more useable.
2 2 1 6
As a firm accountant I want to filter the entire Report by Investement type so that I do not see irrelivant cash transactions.
1 2 2 6
As a project sponsor I want the headers of each page to be the same on every page so that there is consistency in design.
1 2 3 6
As an accountant I want to see commentary for all all return periods 1, 3, 5 year to date.
1 2 12 6
59
As a project sponsor for each dealname I want to see the Average Min Max LR for IRR, MOIC.
1 2 24 6
As a project Sponsor I want to see a graph of time series data with IRR GP MOIC all in one visual.
2 2 1 6
As a project sponsor I want to see a Descending sort of a diff between current value and average Value.
2 2 8 6
As an accountant, I want to the commentary to be split up into sections based on ReturnPeriod, so that I can easily digest the commentary section.
5 2 24 6
As a firm accountant I do not want to see total terminal value transaction types on gross profit same but transaction exists page because these types are not relevant.
1 2 1 6
60
APPENDIX B: Project Risks Per Sprint Sprint 2
Sprint 3
Sprint 4
Sprint 5
Sprint 6
61
62
APPENDIX C: Interview 1 with Firm Accountants Firm Accountant Meeting 11.11.2019
Attendees: Firm Accountant 1, Firm Accountant 2, Manasi Danke, Ethan Merrill, Joseph Yuen
Objective: Ask accountants about validation procedure and priority of checks
Introduction
1. WPI Project Description
Questions:
1. How do accountants validate cashflows? - Process Overview
a. Use a series of excel sheets that check for certain behavior
b. Accountants demoed their excel sheets
c. Commentary is used for marketing purposes and explains why certain behavior
happened
d. Evan sent us the steps and excel sheets that go through the validation process
2. What do you check first? What is the priority of different validation techniques?
a. Compare cashflows and profit & loss
b. Evan – new Strategy codes
c. Doug – inverse change in IRR and MOIC
i. IRR – time based – cash weighted return
ii. MOIC – total return type metric
d. Zombie Strategy codes
63
APPENDIX D: Interview 2 with Accountants Firm Accountant Meeting 11.21.2019
Attendees: Firm Accountant 1, Firm Accountant 3, Manasi Danke, Ethan Merrill, Joseph Yuen
Objective: Gather feedback on PowerBI Dashboard V0.3
Reactions:
Overview – No feedback
Explorer
• Show greatest movements
• Ability to filter on Strategy code
• Add Deal Name
• View change in GrossProfit
o Should equal PNL for the month
Alerts
• Show new Strategy codes
• Show MOIC negative changes
• Add drillthrough functionality
Statistics – No feedback
Commentary
• Add month over month values
• Useful
Time Series
• Add region
o Be able to filter on region
• Add report filter wide for region
Forecast
• Risk team already handles projections
Post Demo Questions:
• What do you like about the dashboard?
o Liked drilldown ability to see Strategy code level on a deal by deal basis
• What could be improved?
o We want to see Strategy code gross profit over time
• Could you see yourself or your department using this? Please explain?
o Drilldown ability in explorer could be useful
64
APPENDIX E: Financial Terminology
Asset Valuation To understand how an investment has performed, first the actual book value of the asset itself must be
determined. For publicly traded assets, the Net Asset Value (NAV) is used. The formula for Net Asset
Value is (Assets-Liabilities)/# of outstanding shares. NAV is commonly used to determine the value of
assets before any additional fees are charged by the brokerage or other entities in the trading pipeline.
Hedge funds such as the firm often have investments in less liquid assets such as real estate. Valuing
these assets is more difficult may not be performed more than twice per year because the assets are
illiquid, their value does not change rapidly, and they are difficult to value.
Internal Rate of Return (IRR) Internal Rate of Return is a percent measure of growth of an investment. More specifically, IRR
measures the annual compounded rate of return for an investment. This metric is commonly used to
determine the potential rate of return on future projects, however it can be used for existing projects or
investments as well. The internal Rate of Return can be used to calculate the Net Present Value in the
following formula (Investopedia):
Multiple of Invested Capital (MOIC) This metric is the return divided by the original invested capital. For instance, a 10x MOIC Could be the
result of a 1$ investment that returned 10$ or a one million dollar investment that returned 10 million.
Gross Profit
In business accounting gross profit Is calculated by subtracting the cost of goods sold from revenue. In
Investing, Gross Profit on an investment is the cash amount that an investment has appreciated since
conceived.
Remaining Market Value (remMV) Remaining Market Value is the total value of the investment, Strategy portfolio, or fund at the end of
the accounting period.
Total Cost
Total cost is the cash amount expensed in order to acquire the asset.
Total Sales
Total sales is the cash value of an asset that was sold during a given transaction.
65
Total Terminal Value
Total Terminal Value is very similar to remMV however it is calculated for every transaction, not just at
the end of the reporting period.
Return Period
The return period is the length over which the return is calculated. Most common intervals for the
return period are year to date, inception to date, one years, three years, and five years.
APPENDIX F: Site Map
66
APPENDIX G: Site Structure Diagram
67