This commit is contained in:
Andy Sotheran 2019-04-28 19:04:40 +01:00
parent 0ed655e9ad
commit bc76ef7791
7 changed files with 885 additions and 151 deletions

722
PID.txt Normal file
View File

@ -0,0 +1,722 @@
Individual Project (CS3IP16)
Department of Computer Science
University of Reading
Project Initiation Document
PID Sign-Off
Student No.
24005432
Student Name
Andrew Sotheran
Email
andrew.sotheran@student.reading.ac.uk
Degree programme
(BSc CS/BSc IT)
BSc CS
Supervisor Name
Kenneth Boness
Supervisor Signature
Date
1
SECTION 1 General Information
Project Identification
1.1
Project ID
(as in handbook)
N/A
1.2
Project Title
Cryptocurrency market and value prediction tracking
1.3
Briefly describe the main purpose of the project in no more than 25 words
To provide a means to predict the value of cryptocurrencies that will aid in investor decision making
in investment of the market
Student Identification
1.4
Student Name(s), Course, Email address(s)
e.g. Anne Other, BSc CS, a.other@student.reading.ac.uk
Andrew William Sotheran
BSc CS
Andrew.sotheran@student.reading.ac.uk
Supervisor Identification
1.5
Primary Supervisor Name, Email address
e.g. Prof Anne Other, a.other@reading.ac.uk
1.6
Secondary Supervisor Name, Email address
Only fill in this section if a secondary supervisor has been assigned to your project
Company Partner (only complete if there is a company involved)
1.7
Company Name
N/A
1.8
Company Address
N/A
1.9
Name, email and phone number of Company Supervisor or Primary Contact
N/A
2
SECTION 2 Project Description
2.1
Summarise the background research for the project in about 400 words. You must include
references in this section but dont count them in the word count.
To create a tool that aims to predict the price of cryptocurrencies that aids in investor decisions.
Research will need to be conducted into the following topics that surround data mining, machine
learning and artificial neural networks.
This research will consist along the lines of;
Natural Language processing and analysis To analyse and process fed in data gathered through RSS
data feeds and social media feeds, through the underlying tasks of Natural language processing.
Content categorisation (search and indexing, duplication detection), Topic discovery and modelling
(Obtain meanings and themes within the data and perform analytic techniques), sentiment and
semantic analysis (which will identify the mood and opinions within the data), summariser (to
summarise a block of text and disregard the rest).
Machine learning algorithms: The three types of machine learning (Supervised, Unsupervised and
Reinforced)
The types of common algorithms used, each of these will be researched to identify the most suitable
for this project and only one will be used: (Linear Regression, Logistic Regression, Decision Tree,
SVM, Naive Bayes, kNN, K-Means, Random Forest, Dimensionality Reduction Algorithms,
Gradient Boosting algorithms (GBM, XGBoost, LightGBM, CatBoost).
Artificial Neural Networks: To identify the drawbacks and benefits of using them or other
computational models within machine learning. Recurrent Neural networks and 3rd generation
Neural Networks.
Data mining: To investigate the different techniques and algorithms used (Same as the ones listed
above for machine learning including C4.5, Apriori, EM, PageRanks, AdaBoost and CART) these
will be researched and the most appropriate identified.
To investigate techniques: for storing and processing large amount of data, such as Hadoop,
Elasticsearch utilities, Graphing and data modelling and visualisation.
To identify appropriate libraries for python or C for each of the topics above to aid in the creation of
this project. Libraries such as:
Natural Language Toolkit (NLTK) python
Pandas - python
Sklearn - python
Numpy python - scientific computation for working with arrays
Matplotlib - python - data visualisation
Investigate into types of databases. Sql and nosql for a storage medium between receiving data and
feeding it into the machine learning algorithm.
Investigate into the use of REST API and other web-service based technologies (GRPC,
Elasticsearch)
Investigate into frameworks for the thin client, such as Angular vs React, Nodejs, Leafelt.js, charts.js
Additionally Web scraping may be needed if certain website that dont either have an API or JSON
for the data needed.
https://www.sas.com/en_gb/insights/analytics/what-is-natural-language-processing-nlp.html
https://blog.algorithmia.com/introduction-natural-language-processing-nlp/
https://gerardnico.com/data_mining/algorithm
https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/
https://www.kdnuggets.com/2015/05/top-10-data-mining-algorithms-explained.html
https://www.datasciencecentral.com/profiles/blogs/artificial-neural-network-ann-in-machine-learning
http://scikit-learn.org/stable/index.html
https://grpc.io/docs/
3
2.2
Summarise the project objectives and outputs in about 400 words.
These objectives and outputs should appear as tasks, milestones and deliverables in your project plan.
In general, an objective is something you can do and an output is something you produce one leads
to the other.
To produce a thin web client that provides a dashboard that provides tangible and useful information
to users such as; current price of a cryptocurrency (Updated every 5 minutes), exchange rates, network hashrates,
historical price data. It will also display statistics about sentiment analysis conducted on social media
about the currency, graphical predictions on what the price may be, in a given time, and will also
compare this to other currencies for aid in investment.
To produce significant research into the topics in and around data mining, machine learning and
Artificial Neural network and the underlying tasks and algorithms used, the efficiency, drawbacks
and advantages of each to identify the most suitable for the use in this project.
To produce a system that analyses a data set obtained through social media feeds and posts on news
sites regarding crypto currencies. It should perform sentiment analysis using Natural Language
processing and analysis techniques to identify features and identifies the type of sentiment in the data
and categorises it for machine learning.
To utilise machine learning techniques and algorithms to produce a system that learns from historical
data to predict to an extent the possible future price of a given currency. To compare this with the use
of an Artificial Neural Network and to analyse the drawbacks of both.
2.3
Initial project specification - list key features and functions of your finished project.
Remember that a specification should not usually propose the solution. For example, your project
may require open source datasets so add that to the specification but dont state how that data-link
will be achieved that comes later.
The finished project should provide a thin client single page application. This will provide a means to
users the ability to view various statistics on crypto currencies on a dashboard that incorporates text
analysis through natural language analysis, and will utilise various machine learning and data mining
techniques to provide price predictions to the users. The nature and level of this will depend on the
research conducted into the areas of data mining, machine learning, natural language processing and
artificial neural networks, along with the algorithms used.
The data set will be created from scratch for this project as it will require the gathering of data from
numerous sources and performing text analysis on them to for the data needed. Data sets for the
characteristic and data for the currencies can be obtained from pre-existing data sets such as:
https://www.kaggle.com/sudalairajkumar/cryptocurrencypricehistory
https://www.kaggle.com/jessevent/all-crypto-currencies
Web scraping may be included if certain news/social media websites do not provide an API or RSS
feed for the analysis engine to perform text analysis on
Additionally, there will be a server between the analysis/prediction engine and the thin client that will
maintain a database, either SQL or NoSQL, that will hold statistics about the currencies and data
about the price predictions about the currencies. It will not hold any of the data used in the analysis
engine, as this database will only hold data available to the end users.
4
2.4
Describe the social, legal and ethical issues that apply to your project. Does your project
require ethical approval? (If your project requires a questionnaire/interview for conducting
research and/or collecting data, you will need to apply for an ethical approval)
The project will not be handling any user related data, therefore it does not need ethical approval.
2.5
Identify and lists the items you expect to need to purchase for your project. Specify the cost
(include VAT and shipping if known) of each item as well as the supplier.
e.g. item 1 name, supplier, cost
None Needed
2.6
State whether you need access to specific resources within the department or the University e.g.
special devices and workshop
Possibly a server to host the database and analysis engine on to perform the computation necessary,
and a server to host the thin client.
5
SECTION 3 Project Plan
3.1
Project Plan
Split your project work into sections/categories/phases and add tasks for each of these sections. It is
likely that the high-level objectives you identified in section 2.2 become sections here. The outputs from
section 2.2 should appear in the Outputs column here. Remember to include tasks for your project
presentation, project demos, producing your poster, and writing up your report.
Task No.
Task description
1
Background Research
1.1
Investigate into RPC frameworks and REST APIs
0.3
1.2
Research into Natural Language processing and analysis
techniques
0.5
1.3
Research into the use of machine learning types and
algorithms
0.5
1.4
Research into the application of Neural Networks
drawbacks and advantages of using them
0.3
1.5
Research techniques for storing and processing large
amount of data, such as Hadoop, spark or Elasticsearch
utilities.
1
1.6
Identify appropriate libraries for data modelling and
visualisation, NLP and Machine Learning
1
1.7
Investigate into frameworks for the front-end thin clients
0.3
1.8
Research web scraping techniques
0.3
2
2.1
Analysis and design
Resolve issues discovered by background research
Identify limitations discovered from research and
what is not feasible
UML Diagrams/ XUML
Wire frames for frontend
Data Flow
User Flow
Develop prototype
Develop thin client
Develop analysis Engine
Develop Prediction Engine
Develop Unit tests
Testing, evaluation/validation
Unit testing
Acceptance Testing
User testing
Assessments
write-up project report
produce poster
Log book
2.2
2.3
2.4
2.5
2.6
3
3.1
3.2
3.3
3.4
4
4.1
4.2
4.3
5
5.1
5.2
5.3
Effort
(weeks)
6
Outputs
To identify the type of API/RPC
framework that would be most
suitable
To get an understanding of how
NLP works and how it could be
used
To grasp how ML paradigms work
and how this project will use it
To identify whether there will be a
need for a neural network or ML
paradigms can be used instead
To understand the uses, application
and whether the use of these are
more viable solution than standard
ML practices
To identify what libraries will aid in
the construction of this project
To identify what frameworks the
thin client should be used with,
along with drawbacks and
advantages
To understand the application of
these techniques and learn how to
apply them
0.2
...
0.1
0.2
0.1
0.1
0.1
2
4
3
2
1
0.8
0.8
2
0.5
0.5
Project Report
Poster
TOTAL
Sum of total effort in weeks
7
21.9
SECTION 4 - Time Plan for the proposed Project work
For each task identified in 3.1, please shade the weeks when youll be working on that task. You should also mark target milestones, outputs and key decision points.
To shade a cell in MS Word, move the mouse to the top left of cell until the curser becomes an arrow pointing up, left click to select the cell and then right click and
select borders and shading. Under the shading tab pick an appropriate grey colour and click ok.
START DATE: 10/2018
<enter the project start date here>
Project Weeks
0-3
3-6
9-12
6-9
12-15
Project stage
1 Background Research
Investigate into RPC frameworks and REST
APIs
Research into Natural Language processing
and analysis techniques
Research into the use of machine learning
types and algorithms
Research into the application of Neural
Networks drawbacks and advantages of
Research techniques for storing and
using them
processing large amount of data, such as
Identify appropriate libraries for data
Hadoop, spark or Elasticsearch utilities.
modelling and visualisation, NLP and
Investigate into frameworks for the frontMachine Learning
end thin clients
Research web scraping techniques
2 Analysis/Design
Resolve issues discovered by background
research
Identify limitations discovered from
research and what is not feasible
UML Diagrams/ XUML
Wire frames for frontend
Data Flow
User Flow
8
15-18
18-21
21-24
24-27
27-30
30-33
33-36
36-39
3 Develop prototype.
Develop thin client
Develop analysis Engine
Develop Prediction Engine
Develop Unit tests
4 Testing, evaluation/validation
Unit testing
Acceptance Testing
User testing
5 Assessments
write-up project report
produce poster
Log book
9
RISK ASSESSMENT FORM
Assessment Reference No.
Area or activity
assessed:
Assessment date
Persons who may be affected by
the activity (i.e. are at risk)
Andrew Sotheran
SECTION 1: Identify Hazards - Consider the activity or work area and identify if any of the hazards listed below are significant (tick the boxes that apply).
1.
2.
Fall of person (from
work at height)
Fall of objects
3.
Slips, Trips &
Housekeeping
4.
Manual handling
operations
5.
Display screen
5
equipment
5
6.
7.
Lighting levels
Heating &
ventilation
11.
Use of portable
tools / equipment
12.
Fixed machinery or
lifting equipment
Layout , storage,
X 8. space, obstructions
9.
13.
Welfare facilities
Electrical
X 10. Equipment
14.
X
15.
Pressure vessels
Noise or Vibration
16.
21.
17.
Outdoor work /
extreme weather
22.
Hazardous
biological agent
27.
18.
Fieldtrips / field
work
23.
Confined space /
asphyxiation risk
28.
24.
Condition of
Buildings & glazing
29.
19.
Fire hazards &
flammable material
10
Hazardous fumes,
Vehicles / driving
at work
20.
Radiation sources
Work with lasers
25.
chemicals, dust
Food preparation
26.
Occupational stress
Violence to staff /
verbal assault
Work with animals
Lone working /
work out of hours
Other(s) - specify
30.
X
SECTION 2: Risk Controls - For each hazard identified in Section 1, complete Section 2.
Hazard
No.
3
5
Hazard Description
Tripping over wires
Eye strain from
looking at a
monitor
Existing controls to reduce risk
Risk Level (tick one)
Further action needed to reduce risks
High
(provide timescales and initials of person responsible)
Med
Cable management is at a minimum, none are
currently properly cable managed and kept out
of way
Current screen contrast and brightness is
acceptable
x
x
SIGNED
Name of Assessor(s)
Review date
11
Low
Sufficient cable management needed, cables
tied together and moved out of way of feet
To have periodic breaks from the screen
Health and Safety Risk Assessments continuation sheet
Assessment Reference No
Continuation sheet number:
SECTION 2 continued: Risk Controls
Hazard
No.
Hazard Description
Existing controls to reduce risk
Risk Level (tick one)
Further action needed to reduce risks
High
(provide timescales and initials of person responsible for
action)
Med
SIGNED
Name of Assessor(s)
Review date
12
Low

View File

@ -294,9 +294,9 @@
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {13}Social, Legal and Ethical Issues}{88}{section.13}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {14}Conclusion and Future Improvements}{89}{section.14}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {14.1}Conclusion}{89}{subsection.14.1}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {14.2}Future Improvements}{89}{subsection.14.2}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {15}Appendices}{95}{section.15}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {15.1}Appendix A - Project Initiation Document}{95}{subsection.15.1}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {14.2}Future Improvements}{90}{subsection.14.2}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {section}{\numberline {15}Appendices}{97}{section.15}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {15.1}Appendix A - Project Initiation Document}{97}{subsection.15.1}}
\abx@aux@refcontextdefaultsdone
\abx@aux@defaultrefcontext{0}{SaTdpsmm}{none/global//global/global}
\abx@aux@defaultrefcontext{0}{nlAeiBTCPSO}{none/global//global/global}
@ -349,4 +349,4 @@
\abx@aux@defaultrefcontext{0}{SpamOrHamGit}{none/global//global/global}
\abx@aux@defaultrefcontext{0}{MBE}{none/global//global/global}
\abx@aux@defaultrefcontext{0}{TwitterTerms}{none/global//global/global}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {15.2}Appendix B - Log book}{108}{subsection.15.2}}
\@writefile{toc}{\defcounter {refsection}{0}\relax }\@writefile{toc}{\contentsline {subsection}{\numberline {15.2}Appendix B - Log book}{110}{subsection.15.2}}

View File

@ -1,4 +1,4 @@
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=pdflatex 2018.10.16) 28 APR 2019 17:50
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=pdflatex 2018.10.16) 28 APR 2019 19:03
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
@ -1237,56 +1237,56 @@ Overfull \hbox (14.37637pt too wide) in paragraph at lines 1746--1747
[]
[81] [82 <./images/with_sentiment.png> <./images/without_sentiment.png>]
[83] [84] [85] [86] [87] [88] [89]
Overfull \hbox (40.38213pt too wide) in paragraph at lines 1867--1867
[83] [84] [85] [86] [87] [88] [89] [90] [91]
Overfull \hbox (40.38213pt too wide) in paragraph at lines 1879--1879
\T1/cmr/m/n/12 works,'' To-wards Data Sci-ence, 2018. [On-line]. Avail-able: []
$\T1/cmtt/m/n/12 https : / / towardsdatascience .
[]
[90]
Overfull \hbox (83.66737pt too wide) in paragraph at lines 1867--1867
[92]
Overfull \hbox (83.66737pt too wide) in paragraph at lines 1879--1879
\T1/cmr/m/n/12 works,'' Ma-chine Larn-ing Mas-tery, 2017. [On-line]. Avail-able
: []$\T1/cmtt/m/n/12 https : / / machinelearningmastery .
[]
Overfull \hbox (28.45175pt too wide) in paragraph at lines 1867--1867
Overfull \hbox (28.45175pt too wide) in paragraph at lines 1879--1879
\T1/cmr/m/n/12 lem,'' Su-per Data Sci-ence, 2018. [On-line]. Avail-able: []$\T1
/cmtt/m/n/12 https : / / www . superdatascience .
[]
[91]
Overfull \hbox (7.75049pt too wide) in paragraph at lines 1867--1867
[93]
Overfull \hbox (7.75049pt too wide) in paragraph at lines 1879--1879
\T1/cmr/m/n/12 2019. [On-line]. Avail-able: []$\T1/cmtt/m/n/12 https : / / medi
um . com / datadriveninvestor / overview -[]
[]
[92]
Overfull \hbox (7.25049pt too wide) in paragraph at lines 1867--1867
[94]
Overfull \hbox (7.25049pt too wide) in paragraph at lines 1879--1879
\T1/cmr/m/n/12 2017. [On-line]. Avail-able: []$\T1/cmtt/m/n/12 https : / / www
. statisticshowto . datasciencecentral .
[]
Overfull \hbox (9.24751pt too wide) in paragraph at lines 1867--1867
Overfull \hbox (9.24751pt too wide) in paragraph at lines 1879--1879
\T1/cmr/m/n/12 [On-line]. Avail-able: []$\T1/cmtt/m/n/12 http : / / blog . alej
andronolla . com / 2013 / 05 / 15 / detecting -[]
[]
Overfull \hbox (0.88026pt too wide) in paragraph at lines 1867--1867
Overfull \hbox (0.88026pt too wide) in paragraph at lines 1879--1879
[]\T1/cmr/m/n/12 P. Cryp-tog-ra-phy, ``A tu-to-rial on au-to-matic lan-guage id
en-ti-fi-ca-tion - ngram based,''
[]
[93] [94]
[95] [96]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1794, 597.55246pt x 845.07718pt>
<PID.pdf, id=1802, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf>
Package pdftex.def Info: PID.pdf used on input line 1872.
Package pdftex.def Info: PID.pdf used on input line 1884.
(pdftex.def) Requested size: 597.551pt x 845.07512pt.
@ -1294,7 +1294,7 @@ pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
File: PID.pdf Graphic file (type pdf)
<use PID.pdf>
Package pdftex.def Info: PID.pdf used on input line 1872.
Package pdftex.def Info: PID.pdf used on input line 1884.
(pdftex.def) Requested size: 597.551pt x 845.07512pt.
@ -1304,222 +1304,222 @@ rsion <1.7>, but at most version <1.5> allowed
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1797, page=1, 597.55246pt x 845.07718pt>
<PID.pdf, id=1805, page=1, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 1>
Package pdftex.def Info: PID.pdf , page1 used on input line 1872.
Package pdftex.def Info: PID.pdf , page1 used on input line 1884.
(pdftex.def) Requested size: 597.551pt x 845.07512pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 1>
Package pdftex.def Info: PID.pdf , page1 used on input line 1872.
Package pdftex.def Info: PID.pdf , page1 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[95]
[97]
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 1>
Package pdftex.def Info: PID.pdf , page1 used on input line 1872.
Package pdftex.def Info: PID.pdf , page1 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 1>
Package pdftex.def Info: PID.pdf , page1 used on input line 1872.
Package pdftex.def Info: PID.pdf , page1 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 1>
Package pdftex.def Info: PID.pdf , page1 used on input line 1872.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[96 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1827, page=2, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 2>
Package pdftex.def Info: PID.pdf , page2 used on input line 1872.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 2>
Package pdftex.def Info: PID.pdf , page2 used on input line 1872.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 2>
Package pdftex.def Info: PID.pdf , page2 used on input line 1872.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[97 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1834, page=3, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 3>
Package pdftex.def Info: PID.pdf , page3 used on input line 1872.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 3>
Package pdftex.def Info: PID.pdf , page3 used on input line 1872.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 3>
Package pdftex.def Info: PID.pdf , page3 used on input line 1872.
Package pdftex.def Info: PID.pdf , page1 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[98 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1848, page=4, 597.55246pt x 845.07718pt>
<PID.pdf, id=1836, page=2, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 4>
Package pdftex.def Info: PID.pdf , page4 used on input line 1872.
<use PID.pdf, page 2>
Package pdftex.def Info: PID.pdf , page2 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 4>
Package pdftex.def Info: PID.pdf , page4 used on input line 1872.
<use PID.pdf, page 2>
Package pdftex.def Info: PID.pdf , page2 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 4>
Package pdftex.def Info: PID.pdf , page4 used on input line 1872.
<use PID.pdf, page 2>
Package pdftex.def Info: PID.pdf , page2 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[99 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1854, page=5, 597.55246pt x 845.07718pt>
<PID.pdf, id=1842, page=3, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 5>
Package pdftex.def Info: PID.pdf , page5 used on input line 1872.
<use PID.pdf, page 3>
Package pdftex.def Info: PID.pdf , page3 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 5>
Package pdftex.def Info: PID.pdf , page5 used on input line 1872.
<use PID.pdf, page 3>
Package pdftex.def Info: PID.pdf , page3 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 5>
Package pdftex.def Info: PID.pdf , page5 used on input line 1872.
<use PID.pdf, page 3>
Package pdftex.def Info: PID.pdf , page3 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[100 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1860, page=6, 597.55246pt x 845.07718pt>
<PID.pdf, id=1856, page=4, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 6>
Package pdftex.def Info: PID.pdf , page6 used on input line 1872.
<use PID.pdf, page 4>
Package pdftex.def Info: PID.pdf , page4 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 6>
Package pdftex.def Info: PID.pdf , page6 used on input line 1872.
<use PID.pdf, page 4>
Package pdftex.def Info: PID.pdf , page4 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 6>
Package pdftex.def Info: PID.pdf , page6 used on input line 1872.
<use PID.pdf, page 4>
Package pdftex.def Info: PID.pdf , page4 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[101 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1866, page=7, 597.55246pt x 845.07718pt>
<PID.pdf, id=1862, page=5, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 7>
Package pdftex.def Info: PID.pdf , page7 used on input line 1872.
<use PID.pdf, page 5>
Package pdftex.def Info: PID.pdf , page5 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 7>
Package pdftex.def Info: PID.pdf , page7 used on input line 1872.
<use PID.pdf, page 5>
Package pdftex.def Info: PID.pdf , page5 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 7>
Package pdftex.def Info: PID.pdf , page7 used on input line 1872.
<use PID.pdf, page 5>
Package pdftex.def Info: PID.pdf , page5 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[102 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1872, page=8, 845.07718pt x 597.55246pt>
<PID.pdf, id=1868, page=6, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 8>
Package pdftex.def Info: PID.pdf , page8 used on input line 1872.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
<use PID.pdf, page 6>
Package pdftex.def Info: PID.pdf , page6 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 8>
Package pdftex.def Info: PID.pdf , page8 used on input line 1872.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
<use PID.pdf, page 6>
Package pdftex.def Info: PID.pdf , page6 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 8>
Package pdftex.def Info: PID.pdf , page8 used on input line 1872.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
<use PID.pdf, page 6>
Package pdftex.def Info: PID.pdf , page6 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[103 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1882, page=9, 845.07718pt x 597.55246pt>
<PID.pdf, id=1875, page=7, 597.55246pt x 845.07718pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 9>
Package pdftex.def Info: PID.pdf , page9 used on input line 1872.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
<use PID.pdf, page 7>
Package pdftex.def Info: PID.pdf , page7 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 9>
Package pdftex.def Info: PID.pdf , page9 used on input line 1872.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
<use PID.pdf, page 7>
Package pdftex.def Info: PID.pdf , page7 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 9>
Package pdftex.def Info: PID.pdf , page9 used on input line 1872.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
<use PID.pdf, page 7>
Package pdftex.def Info: PID.pdf , page7 used on input line 1884.
(pdftex.def) Requested size: 562.1644pt x 795.0303pt.
[104 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1893, page=10, 845.07718pt x 597.55246pt>
<PID.pdf, id=1881, page=8, 845.07718pt x 597.55246pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 10>
Package pdftex.def Info: PID.pdf , page10 used on input line 1872.
<use PID.pdf, page 8>
Package pdftex.def Info: PID.pdf , page8 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 10>
Package pdftex.def Info: PID.pdf , page10 used on input line 1872.
<use PID.pdf, page 8>
Package pdftex.def Info: PID.pdf , page8 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 10>
Package pdftex.def Info: PID.pdf , page10 used on input line 1872.
<use PID.pdf, page 8>
Package pdftex.def Info: PID.pdf , page8 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
[105 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1905, page=11, 845.07718pt x 597.55246pt>
<PID.pdf, id=1891, page=9, 845.07718pt x 597.55246pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 11>
Package pdftex.def Info: PID.pdf , page11 used on input line 1872.
<use PID.pdf, page 9>
Package pdftex.def Info: PID.pdf , page9 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 11>
Package pdftex.def Info: PID.pdf , page11 used on input line 1872.
<use PID.pdf, page 9>
Package pdftex.def Info: PID.pdf , page9 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 11>
Package pdftex.def Info: PID.pdf , page11 used on input line 1872.
<use PID.pdf, page 9>
Package pdftex.def Info: PID.pdf , page9 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
[106 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1911, page=12, 845.07718pt x 597.55246pt>
<PID.pdf, id=1901, page=10, 845.07718pt x 597.55246pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 12>
Package pdftex.def Info: PID.pdf , page12 used on input line 1872.
<use PID.pdf, page 10>
Package pdftex.def Info: PID.pdf , page10 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 12>
Package pdftex.def Info: PID.pdf , page12 used on input line 1872.
<use PID.pdf, page 10>
Package pdftex.def Info: PID.pdf , page10 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 12>
Package pdftex.def Info: PID.pdf , page12 used on input line 1872.
<use PID.pdf, page 10>
Package pdftex.def Info: PID.pdf , page10 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
[107 <./PID.pdf>]
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 1877.
[108]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 1877.
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1913, page=11, 845.07718pt x 597.55246pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 11>
Package pdftex.def Info: PID.pdf , page11 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 11>
Package pdftex.def Info: PID.pdf , page11 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 11>
Package pdftex.def Info: PID.pdf , page11 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
[108 <./PID.pdf>]
pdfTeX warning: /usr/bin/pdflatex (file ./PID.pdf): PDF inclusion: found PDF ve
rsion <1.7>, but at most version <1.5> allowed
<PID.pdf, id=1919, page=12, 845.07718pt x 597.55246pt>
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 12>
Package pdftex.def Info: PID.pdf , page12 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 12>
Package pdftex.def Info: PID.pdf , page12 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
File: PID.pdf Graphic file (type pdf)
<use PID.pdf, page 12>
Package pdftex.def Info: PID.pdf , page12 used on input line 1884.
(pdftex.def) Requested size: 795.0303pt x 562.1644pt.
[109 <./PID.pdf>]
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 1889.
[110]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 1889.
(./document.aux)
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 1877.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 1877.
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 1889.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 1889.
Package rerunfilecheck Info: File `document.out' has not changed.
(rerunfilecheck) Checksum: 1D7B2504DFF5D56ABCCDF1948D08498A;14207.
@ -1528,8 +1528,8 @@ Package logreq Info: Writing requests to 'document.run.xml'.
)
Here is how much of TeX's memory you used:
25151 strings out of 492982
396355 string characters out of 6134895
25153 strings out of 492982
396371 string characters out of 6134895
1018656 words of memory out of 5000000
27463 multiletter control sequences out of 15000+600000
21245 words of font info for 60 fonts, out of 8000000 for 9000
@ -1553,10 +1553,10 @@ ic/cm-super/sfrm0600.pfb></usr/share/texmf/fonts/type1/public/cm-super/sfrm1000
mf/fonts/type1/public/cm-super/sfrm1440.pfb></usr/share/texmf/fonts/type1/publi
c/cm-super/sfrm2488.pfb></usr/share/texmf/fonts/type1/public/cm-super/sfti1200.
pfb></usr/share/texmf/fonts/type1/public/cm-super/sftt1200.pfb>
Output written on document.pdf (108 pages, 2423767 bytes).
Output written on document.pdf (110 pages, 2428026 bytes).
PDF statistics:
2175 PDF objects out of 2487 (max. 8388607)
1980 compressed objects within 20 object streams
886 named destinations out of 1000 (max. 500000)
2185 PDF objects out of 2487 (max. 8388607)
1988 compressed objects within 20 object streams
888 named destinations out of 1000 (max. 500000)
855 words of extra memory for PDF output out of 10000 (max. 10000000)

Binary file not shown.

Binary file not shown.

View File

@ -683,7 +683,7 @@
The initial PID did, however, give an initial basis to base ideas and initial research from and was the beginning drive of this project.
\subsection{Solution Summary}\label{summary}
The overall solution, concerning the problem statement, is to create a system mainly consisting of; a frontend application that will display plotting, predicted and true, performance metric data to the user as a clear and concise form. The backend system behind the price forecasting will consist of various subsystem responsible for data collection, filtering, data pre-processing, sentiment analysis, network training, validation and training and future price predictions. Each stage will consist of relevant tools and techniques for performing their required task.
The overall solution, concerning the problem statement, is to create a system mainly consisting of; a frontend application that will display plotted; predicted and true, performance metric data to the user as a clear and concise form. The backend system behind the price forecasting will consist of various subsystem responsible for data collection, filtering, data pre-processing, sentiment analysis, network training, validation and training and future price predictions. Each stage will consist of relevant tools and techniques for performing their required task.
\newpage
@ -1799,8 +1799,8 @@ def create_sets(self, data, lookback, sentiment):
Lastly, a limitation that could be identified and is also discussed in the results section above is that of the performance metrics not showing a clear distinction between the two network models. This limitation could be overcome by using more suitable explanative metrics, rather than relying on a more visual inspection, such as:
\begin{itemize}
\item Adjusted $R^2$ statistic - which shows how well the selected independant variables of the model explain the variability of the dependant variables, and shows how well the terms fit a regression line \cite{RMSEMAE}.
\item Mean Bias Error (MBE) - Is the Mean Absolute Error (MAE), which is calculated, if the absolute value is not taken (the signs of the errors are not removed) the MAE becomes the mean biased error. The MBE is intended to measure the average model bias and can convay more useful information that the MAE, but should be interpeted with caution due to the positive and negative error cancelling out. \cite{MBE}
\item Adjusted $R^2$ statistic - Shows how well the selected independant variables of the model explain the variability of the dependant variables, and shows how well the terms fit a regression line \cite{RMSEMAE}.
\item Mean Bias Error (MBE) - Is the Mean Absolute Error (MAE), but is calculated if the absolute value isn't taken (the signs of the errors are not removed) the MAE becomes the mean biased error. The MBE is intended to measure the average model bias and can convey more useful information that the MAE, but should be interpreted with caution due to the positive and negative error cancelling out. \cite{MBE}
\end{itemize}
By calculating these metrics could aid in distinguishing a correlation between the models based on metrics rather than on visual analysis.
@ -1812,7 +1812,7 @@ def create_sets(self, data, lookback, sentiment):
It has taught me how the classical (multinomial) naive Bayes probability model can be used for classification for spam or ham (wanted) data, and how the underlying maths and algorithm works - due to hand-coding the algorithm from scratch. Through the use of both the Bag Of Words algorithm for term-frequency identification: how it builds upon the base probability model of the Bayes algorithm. How TF-IDF (Term Frequency-Inverse Document Frequency) further builds upon this to both identify the amount of occurrence of words in a given text by assigning a weight, but also how it uses this to identify commonly used words that are of no relevance to classification. Further how the Addictive Smoothing method aids in dealing with words that have not been identified during training, due to not being initially presented in the training data.
Development of this project has given me a further understanding of time management and priorities, in the sense of what needs to focused on during development prior to other features being coded or implemented. An excellent example of where priorities had changed can be seen from the original PID form, \textit{Appendix B}, in which the solution to this project changed from focusing on the front-end application to focusing on the back-end system. This was due to a few factors that have already been identified in the Solution approach section. Where both stakeholders, I the developer and the supervisor of the project concluded that the creation of a front-end application, with a basic back-end for predictions, would not be a satisfactory solution, and more focus should be invested into the price predictions of Bitcoin. Another point where time management had to be considered was when implementing the Naive Bayes Classifier for spam filtering. Time management was not considered during its development and saw the feature go out-of-scope for what was initially wanted - the initial idea was to use the scikit-learns in-built Multinomial Naive Bayes classifier for spam classification. However, tutorials were found on top of the papers used for describing the algorithm during the literature review, which further described how to implement such an algorithm from scratch. Thus this was undertaken and leading to arguably time wasted coded the classification model rather than spending more time on coding the neural network. Arguably in a sense, as detailed in the previous paragraph, it taught me a great deal of how the algorithm works and it's limitations.
Development of this project has given me a further understanding of time management and priorities, in the sense of what needs to focused on during development prior to other features being coded or implemented. An excellent example of where priorities had changed can be seen from the original PID form, \textit{Appendix B}, in which the solution to this project changed from focusing on the front-end application to focusing on the back-end system. This was due to a few factors that have already been identified in the Solution approach section. Where both stakeholders, I the developer and the supervisor of the project concluded that the creation of a front-end application, with a basic back-end for predictions, would not be a satisfactory solution, and more focus should be invested into the price predictions of Bitcoin. Another point where time management had to be considered was when implementing the Naive Bayes Classifier for spam filtering. Time management was not considered during its development and saw the feature go out-of-scope for what was initially wanted - the initial idea was to use the scikit-learns in-built Multinomial Naive Bayes classifier for spam classification. However, tutorials were found on top of the papers used for describing the algorithm during the literature review, which further described how to implement such an algorithm from scratch. Thus this was undertaken and leading to arguably time wasted coded the classification model rather than spending more time on coding the neural network. Arguably in a sense, as detailed in the previous paragraph, it taught me a great deal of how the algorithm works, and it's limitations.
Furthermore, it has allowed me to form a better knowledge base and understanding of the Python language and data mining techniques used for the languages, to manipulate and used data for a required purpose, and has taught me relevant useful performance metrics for identifying the performance of a neural network and what the results of the metrics represent.
@ -1825,16 +1825,26 @@ def create_sets(self, data, lookback, sentiment):
\section{Conclusion and Future Improvements}
\subsection{Conclusion}
What was aimed for?
As stated, the projects focus and solution changed considerably from the original Project Initiation Document which stated in section 2.2 of the document that, the main objective was to "produce a thin web client that provides a dashboard, that provides tangible and useful information to users such as; current price" - \textit{of a cryptocurrency} - "exchange rate, network hashrates and historical price data". "It will also display statistics about sentiment analysis conducted on social media about the currency" with "graphical predictions on what the price may be, in a given time". As from the extracts, the initial objectives of the project were broad and also vague. As this suggests that the focus of the project will be that of creating a thin client dashboard which was no longer the focus of the project, and \textit{"about the cryptocurrency"}, indicating sentiment analysis and price predictions would be conducted on multiple currencies which were an extremely board estimations of workload. It does, however, show that the initial thought process of performing sentiment analysis and price predictions occurred during initial stages of this project, but ultimately change through development to focus on how said sentiment of a cryptocurrency, Bitcoin, extracted from social media could be used in aiding in the prediction of future price.
What was produced?
As for reference to the projects problem statement and solution approach, "This project will focus on the investigation of these technologies and tools", (sentiment analysers, machine learning algorithms and neural networks), "to justify whether it is feasible to predict the price of BTC based on historical price and the sentiment gathered from Twitter". The solution outlined in the solution approach stated that the solution is "to create a system mainly consisting of; a frontend application that will display plotted; predicted and true, performance metric data to the user as a clear and concise form". A back-end prediction system of "various subsystems responsible for data collection, filtering, data pre-processing, sentiment analysis, network training, validation and training, and future price predictions".
The end result followed suitably with would was outlined in both the problem statement and solution approach, and meet all but one point in the technical specification previously discussed in the reflection - where there were issues with deploying the back-end responsible for price forecasting to the external cloud server and getting it operational. A front-end application was also created, although basic, served its purpose of presenting the required data to users in a clear format. The back-end also suitably performed all the tasks set out for it such as; data collection from the Twitter API, sentiment analysis using VADER, and predicting the next hour price of Bitcoin.
The majority of the focus was invested into implementing the back-end prediction solution, although fully implemented as intended some time was wasted hand-coding from scratch the Naive Bayes classifier. This did provide valuable information on exactly how the algorithm worked, its limitations and how additional techniques and methods overcome these but was ultimately wasted time due to an already used Python package, Scikit-Learn, having multiple in-built Naive Bayes models to choose from. If used, would have reduced time coding which could have been used elsewhere on the project. For example during testing, to implement K-fold Cross-validation, the $R^2$ statistic or produce the Mean Bias Error, and in turn possibly aid in identifying a correlation from the metrics between the two models, with and without sentiment, rather than relying on the metrics used which did not show much and visual inspection of prediction results.
To conclude, the system that has been developed to meet the requirements set out for this project has been developed to the highest possible standard in the time frame that was given for this project. Moreover, from the discussion of results, it can be believed that the system works well and predicts the next hour price of Bitcoin appropriately given the data provided. The user interface provides the necessary information, although not pretty, to the possible stakeholders in a clear and concise which is what was intended for the interface.
Interesting what would a days prediction would show due to sentiment not directly affecting the next hour price
\subsection{Future Improvements}
as such comparing recurrent neural network models, implementation and the affect of regularisation techniques, use of different optimisers would have on the network, how use of ngrams could be used to improve language detection, comparing hand-coded naive bayes model to that of scikit-learns in built classifiers, alterations and additions to the VADER lexicon to tailor it with domain specific language and relevant weighted sentiment values
Interesting what would a days prediction would show due to sentiment not directly affecting the next hour price
Shifting the predicted data by and hour and sequencing over previous data - will also allow proper use of look-back windows
@ -1860,6 +1870,8 @@ def create_sets(self, data, lookback, sentiment):
How would this work what will it show or validate?
How would changing epoch and batch size affect performance?
Use a different charting system as the plotted lines seem to jump around and skew accuracy
\newpage
\nocite{*}

View File

@ -168,10 +168,10 @@
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {14.1}Conclusion}{89}{subsection.14.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {14.2}Future Improvements}{89}{subsection.14.2}
\contentsline {subsection}{\numberline {14.2}Future Improvements}{90}{subsection.14.2}
\defcounter {refsection}{0}\relax
\contentsline {section}{\numberline {15}Appendices}{95}{section.15}
\contentsline {section}{\numberline {15}Appendices}{97}{section.15}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {15.1}Appendix A - Project Initiation Document}{95}{subsection.15.1}
\contentsline {subsection}{\numberline {15.1}Appendix A - Project Initiation Document}{97}{subsection.15.1}
\defcounter {refsection}{0}\relax
\contentsline {subsection}{\numberline {15.2}Appendix B - Log book}{108}{subsection.15.2}
\contentsline {subsection}{\numberline {15.2}Appendix B - Log book}{110}{subsection.15.2}