gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-SVN] [taler-schemafuzz] branch master updated: rewrote so paragr


From: gnunet
Subject: [GNUnet-SVN] [taler-schemafuzz] branch master updated: rewrote so paragrphs that were unclear
Date: Fri, 31 Aug 2018 15:59:27 +0200

This is an automated email from the git hooks/post-receive script.

erwan-ulrich pushed a commit to branch master
in repository schemafuzz.

The following commit(s) were added to refs/heads/master by this push:
     new 074ed00  rewrote so paragrphs that were unclear
074ed00 is described below

commit 074ed00c931d6f352c466fa2258755861e293c69
Author: Feideus <address@hidden>
AuthorDate: Fri Aug 31 15:59:21 2018 +0200

    rewrote so paragrphs that were unclear
---
 docs/Documentation.pdf       | Bin 953766 -> 958332 bytes
 docs/Documentation.tex       | 149 +++++++++++++++++++++++++++----------------
 docs/PersonnalExperience.tex |  10 ++-
 3 files changed, 102 insertions(+), 57 deletions(-)

diff --git a/docs/Documentation.pdf b/docs/Documentation.pdf
index f645a17..d8af16d 100644
Binary files a/docs/Documentation.pdf and b/docs/Documentation.pdf differ
diff --git a/docs/Documentation.tex b/docs/Documentation.tex
index 5746f20..1b6bd7e 100644
--- a/docs/Documentation.tex
+++ b/docs/Documentation.tex
@@ -50,7 +50,7 @@ Traditional fuzzing is defined as "testing an automated 
software testing techniq
 
 
 
-This quote is very well illustrated by the following example :
+This quote is   well illustrated by the following example :
                                \begin{quotation}
 Lets consider an integer in a program, which stores the result of a user's 
choice between 3 questions. When the user picks one, the choice will be 0, 1 or 
2. Which makes three practical cases. But what if we transmit 3, or 255 ? We 
can, because integers are stored a static size variable. If the default switch 
case hasn't been implemented securely, the program may crash and lead to 
"classical" security issues: (un)exploitable buffer overflows, DoS, ... 
                                \end{quotation}
@@ -64,7 +64,7 @@ However, SchemaFuzz is a database oriented fuzzer. This means 
that it focuses on
 
 This tool is meant to help developers, maintainers and more generically anyone 
that makes use of data coming from a database under his influence in their 
task. A good way to sum up the effect of this tool is to compare it with an 
"cyber attack simulator".
 This means that the idea behind it is to emulate the damage that an attacker 
may cause subtly or not to a database he illegally gained privileges on. This 
might in theory go from a simple boolean flip (subtle modifications) to 
removing/adding content to purely and simply destroying or erasing all the 
content of the database.
-SchemaFuzz focuses on the first part : modification of the content of the 
database by single small modification that may or may not overlap. These 
modifications may be very aggressive of very subtle.
+SchemaFuzz focuses on the first part : modification of the content of the 
database by single small modification that may or may not overlap. These 
modifications may be   aggressive of   subtle.
 It is interesting to point out that this last point also qualifies SchemaFuzz 
as a good "database structural flaw detector".
 That is to say that errors typically triggered by a poor management of a 
database (wrong data type usage, incoherence between database structure and use 
of the content etc ...) might also appear clearly during the execution.   
                \subsection{Perimeter}
@@ -76,9 +76,9 @@ The resulting software will generate a group of human 
readable reports on each m
                \caption{Shows the nature of the code for every distinct 
component. The slice size is a rough estimation.}
                \end{figure}
                \subsection{When to use it}
-SchemaFuzz is a very useful tool for anyone trying secure a piece of software 
that uses database resources. The target software should be GDB(introduce GDB) 
compatible and the DBMS(introduce acronym) has to grant access to the target 
database through credentials passed as argument to this tool.
+SchemaFuzz is a   useful tool for anyone trying secure a piece of software 
that uses database resources. The target software should be GDB(introduce GDB) 
compatible and the DBMS(introduce acronym) has to grant access to the target 
database through credentials passed as argument to this tool.
 
----It is very strongly advice to use a copy of the target database rather than 
on the production material. Doing so may result in the database being corrupted 
and not usable for any useful mean.
+---It is   strongly advice to use a copy of the target database rather than on 
the production material. Doing so may result in the database being corrupted 
and not usable for any useful mean.
 
                \clearpage
 
@@ -94,7 +94,7 @@ The majority of this project is built on top of this already 
existing code and i
                 
 This organization will be detailed and discussed in the following sections.
                \subsection{SchemaSpy legacy/meta data extraction}
-SchemaSpy source code has provided the meta data extraction routine. The only 
job of this routine is to initialize the connection to the database and 
retrieve its meta data at the very beginning of the execution (before any 
actual SchemaFuzz code is run). These meta data include data types, table and 
table column names, views and foreign/primary key constraints. Having this pool 
of meta data under the shape of Java objects allows the main program to 
properly frame what the possibilities  [...]
+SchemaSpy source code has provided the meta data extraction routine. The only 
job of this routine is to initialize the connection to the database and 
retrieve its meta data at the   beginning of the execution (before any actual 
SchemaFuzz code is run). These meta data include data types, table and table 
column names, views and foreign/primary key constraints. Having this pool of 
meta data under the shape of Java objects allows the main program to properly 
frame what the possibilities are [...]
 
 \clearpage
 
@@ -145,30 +145,15 @@ It also holds the informations concerning the result of 
the injection in the sha
 \caption{Structure of a Mutation}
 \end{figure}
 
-\bigskip
-                               
-                               \paragraph{Choosing pattern}
-For each iteration of the main loop, a modification has to be picked up as the 
next step in the fuzzing process. This is done by considering the current state 
of the tree.
-Three parallel code paths can be triggered from this point.
-                               \begin{itemize}
-                               \item{Continue on the current branch of the 
tree (triggered if the last mutation scored better than its parent)}
-                               \item{Pick an existing branch in the tree and 
grow it (triggered if the last mutation scored worse than its parent on a 50/50 
chance with the next bullet)}
-                               \item{Start a new branch (triggered if the last 
mutation scored worse than its parent on a 50/50 chance with the previous 
bullet)}
-                               
-\begin{figure}[h!]
-\centering
-\includegraphics[width=\textwidth]{pickingPaternDiagram.pdf}
-\caption{picking Pattern schema}
-\end{figure}                           
+\bigskip                               
                                
-                               \end{itemize}
 A branch is a succession of mutation that share the same database row as their 
modification target.
-The heuristics determining the next mutation's modification are still very 
primitive and will be thinly justed in futures versions.                        
                                     
+The heuristics determining the next mutation's modification are still   
primitive and will be thinly justed in futures versions.                        
                                        
                                \paragraph{Creating malformed data} 
                                %%ne veux rien dire.
 As the goal of running this tool is to submit unexpected or invalid data to 
the target software it is necessary to understand what t
 Fuzzing a complex type such a timestamps variables has nothing to do with 
fuzzing a trivial boolean. In practice, a significant part o
-and this matter could absolutely be the subject of a more abstract work. We 
focused here on a very simple approach (as a first step).
+and this matter could absolutely be the subject of a more abstract work. We 
focused here on a   simple approach (as a first step).
 After retrieving the current row being fuzzed (may it be a new row or a 
previously fuzzed row), the algorithm explores the different
 The algorithm then builds the possible modification for each of the fields for 
the current row.
 At the moment, the supported types are : % add a list of the supported types.
@@ -196,12 +181,12 @@ The possible modifications that this tool can produce at 
the moment are : \\ % a
                                        \item Set date to $00/00/0000$ 
                                \end{itemize}
 These "abnormal" values might in fact be totally legit in some cases. in that 
case the analyzer 
-will rank the mutation rather poorly, which will lead to this tree path not 
being very likely to be developed further more.
+will rank the mutation rather poorly, which will lead to this tree path not 
being   likely to be developed further more.
                                \\*
                                \paragraph{SQL handling}
-All the SQL statements are generated within the code. This means that the data 
concerning the current and future state of the mutations have to be very 
precise. Otherwise, the SQL statement is very likely to fail. Sadly, since 
SchemaFuzz only supports postgreSQL, the implemented syntax follow the one of 
postgres
-DBMS. This is already a very big axis for future improvements and will be 
detailed in the dedicated section.
-The statement is built to target the row as precisely as possible, meaning 
that it uses all of the non fuzzed values from the row to avoid updating other 
row accidentally. Only the types that can possibly be fuzzed will be used in 
the building of the SQL statement. Since this part of the code is very delicate 
in the sense that it highly depends on an arbitrary large pool of variables 
from various types it is a good bug provider. 
+All the SQL statements are generated within the code. This means that the data 
concerning the current and future state of the mutations have to be   precise. 
Otherwise, the SQL statement is   likely to fail. Sadly, since SchemaFuzz only 
supports postgreSQL, the implemented syntax follow the one of postgres
+DBMS. This is already a   big axis for future improvements and will be 
detailed in the dedicated section.
+The statement is built to target the row as precisely as possible, meaning 
that it uses all of the non fuzzed values from the row to avoid updating other 
row accidentally. Only the types that can possibly be fuzzed will be used in 
the building of the SQL statement. Since this part of the code is   delicate in 
the sense that it highly depends on an arbitrary large pool of variables from 
various types it is a good bug provider. 
                                
                                \paragraph{Injecting} :
 The injection process sends the built statement to the DBMS so that the 
modification can be operated. After the execution of the query, depending of 
the output of the injection (one modification, several modifications, transfer) 
informations are updated so that they can match the database state after the 
modification. If the modification failed, no trace of this mutation is kept, it 
is erased and running goes on like nothing happened.                            
     
@@ -209,7 +194,7 @@ The injection process sends the built statement to the DBMS 
so that the modifica
 The mutation transfer is a special case of a modification being applied to the 
database.
 It is triggered when the value that was supposed to be fuzzed is under the 
influence of a FKC as the child.
 In the case a FKC (In CASCADE mode), only the father can be changed, which 
also triggers the same modification on all of his children. The algorithm then 
"transfers" the modification from the original mutation to its father.
-After injecting the transfered mutation, the children mutation is indeed 
modified but the modification "splashed" on some parts of the database that was 
not meant to be changed.
+After injecting the transfered mutation, the children mutation is modified but 
the modification cascades on some parts of the database that was not meant to 
be changed.
 Hopefully, this does not impact the life of the algorithm until this mutation 
is reverted (see next paragraph).
                                \paragraph{Do/Undo routine} :
 The Do/Undo mechanism is at the center of this software. Its behavior is 
crucial for the execution and will have a strong impact on the coherence of the 
data nested in the code or inside the target database throughout the runtime.
@@ -219,6 +204,23 @@ Reverting mutations is the key to flawlessly shifting the 
current position in th
 The case of the transfered mutation is no exception to this. In this case, the 
mutation applied changes on an unknown number of fields in the database. But, 
the FKC still bounds all the children to their father at this point (this is 
always the case unless this software is not used as intended).  
 Changing the father's field value back to its original state will splash the 
original values back on all the children.
 This mechanism might trigger failing mutations in some cases (usually 
mutations following a transfer). This issue will be addressed in the known 
issues section. 
+
+                               \subsubsection{Choosing pattern}
+For each iteration of the main loop, a modification has to be picked up as the 
next step in the fuzzing process. This is done by considering the current state 
of the tree.
+Three parallel code paths can be triggered from this point.
+                               \begin{itemize}
+                               \item{Continue on the current branch of the 
tree (triggered if the last mutation scored better than its parent)}
+                               \item{Pick an existing branch in the tree and 
grow it (triggered if the last mutation scored worse than its parent on a 50/50 
chance with the next bullet)}
+                               \item{Start a new branch (triggered if the last 
mutation scored worse than its parent on a 50/50 chance with the previous 
bullet)}
+                               \end{itemize}                           
+                               
+                               \bigskip
+                       \begin{figure}[ht!]
+                       
\includegraphics[width=\textwidth]{pickingPaternDiagram.pdf}
+                       \caption{Diagram that shows, for each iteration of the 
main loop, the mutation is }
+                       \end{figure}                    
+               
+                                               \bigskip
                        \subsubsection{Tree Based data structure}
 All the mutations that are injected at least once in the course of the 
execution of this software will be stored properly in a tree data structure. 
Having such a data structure makes parent-children relations between mutations 
possible. The tree follows the traditional definition of the a n-ary 
algorithmic tree.
 It is made of nodes (mutations) including a root (first mutation to be 
processed on a field selected randomly in the database)  
@@ -226,33 +228,33 @@ Each node has a number of children that depends on the 
ranking its mutation and
                                \paragraph{Weight}
 Weighting the nodes is an important part of the runtime. Each mutation has a 
weight that is equal to the analyzer's output. This value reflects the 
mutation's value. If it had an interesting impact on the target program 
behavior (if it triggered new bugs or uncommon code paths) than this value is 
high and vice-versa. The weight is then used as a mean of determining the 
upcoming modification. The chance that a mutation gets a child is directly 
proportional to its weight.
 This value currently isn't biased by any other parameter, but this might 
change in the future.  
-                               \paragraph{Path} %% changer la frase sur resolve
+                               \paragraph{Path} 
 Since the weighting of the mutation allows to go back to previous more 
interesting mutations, 
-there is a need for a path finder mechanism. Concretely, this routines 
resolves the nodes that separate nodes A and B in the tree. A and B might be 
children and parent but can also belong to completely different branches. This 
path is then given to the do/undo routine that processes back the modifications 
to set the database up in the required state for the upcoming mutation. 
+there is a need for a path finder mechanism. In practice, this routine 
resolves the chain of node that separates two nodes in the tree. This is done 
by, from both nodes, going in the direction of the root until a common ancestor 
is found. Fusing the lists of both chains results in creating the full path 
between the two nodes. The path is then used when the main loop goes through 
the undo mechanism. Undoing from mutation A to mutation B is implemented as 
undoing every mutation between A and B
 
 \bigskip
 
 \begin{figure}[h!] 
 \centering
 \includegraphics[width=\textwidth]{CommonAncestorDiagram.pdf}
-\caption{Objects returned by the meta data extraction routine.}
+\caption{Example of path between two nodes in the tree}
 \end{figure}
 
 \bigskip
                        \subsubsection{The analyzer}
-Analyzing the output of the target program is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution to try measuring its interest. The main criteria that defines a 
mutation interest is its proximity to previously parsed stack traces. The more 
distance between the new mutation and the old ones, the better the ranking. 
%%enlever les mots definis plus bas.
+Analyzing the output of the target program is another critical part of 
SchemaFuzz. The analyzer parses in the stack trace of the target software's 
execution in order measure how interesting the output of the execution was. 
Since crashes and unexpected behavior from the target software is what the tool 
is triggering, it is the main criteria of a valuable mutation. A stack trace is 
a text block structured to present all the information related to a crash 
during a software's execution. The  [...]
                                \paragraph{Stack Trace Parser}
 The stack trace parser is a separate Bash script that processes stack traces 
generated by the GDB C language debugger and stores all the relevant 
informations (function's name, line number, file name) into a Java object. The 
parser also generates as a secondary job a human readable file for each 
mutation that synthesizes the stack trace values as well as additional 
interesting information useful for other mechanisms (that also require 
parsing). These additional informations include the p [...]
                                \paragraph{Hashing}
-In order to be used in the clustering algorithm, the stack trace of a mutation 
has to be hashed.
+The clustering algorithm used to compute the crash rank take a triplet of 
numerical values as an input.Therefore, the stack trace of a mutation has to be 
hashed into a triplet of numerical values. This set of value is used as a 
representation of the original stack trace object.
 Hashing is usually defined as follows : 
                                \begin{quotation}
 "A hash value (or simply hash), also called a message digest, is a number 
generated from a string of text. The hash is substantially smaller than the 
text itself, and is generated by a formula in such a way that it is extremely 
unlikely that some other text will produce the same hash value."
                                \end{quotation}
                                
-In the present case, we used a different approach. Since proximity between two 
stack traces is the key to a relevant ranking, it is mandatory to have a 
hashing function that preserves the proximity of the two strings. 
+In the present case, we used a different approach. Since proximity between two 
stack traces is the key to a relevant ranking, it is mandatory to have a 
hashing function that preserves the proximity of two strings. 
 In that regards, we implemented a version of the Levenshtein Distance 
algorithm.
-This algorithm can roughly be explain by the following :
+This algorithm can be explained by the following statement:
                                \begin{quotation}
 "The Levenshtein distance between two words is the minimum number of 
single-character edits (insertions, deletions or substitutions) required to 
change one word into the other."
                                \end{quotation}                          
@@ -271,35 +273,72 @@ This algorithm can roughly be explain by the following :
 
 The distance for this example is $2\div8\times100$                             
                                
-After hashing the file name and the function name into numerical values trough 
Levenshtein distance, we are creating a triplet the fully (but not fully 
accurately yet) represents the stack trace that is being parsed. This triplet 
will be used in the clustering method. 
-
+After hashing the file name and the function name into numerical values trough 
Levenshtein distance, the analyzer creates the triplet that numerically 
represents the stack trace being parsed. This triplet will be used in the 
clustering method detailed in the following paragraph.
+It is interesting to note that this triplet is not the most accurate 
representation of a stack trace. The analyzer will be improved in the future is 
that regard.
 
                                \paragraph{The Scoring mechanism}
-The "score" (or rank) of a mutation is a numerical value that reflects how 
interesting the outcome was. Crashes and unexpected behavior to raise this 
value whereas no crash tend to lower it. This value is calculated through a 
modified version of a clustering method that computes an n-uplet into a integer 
depending on the sum of the Euclidean distances from the n-uplet to the 
existing centroids (groups of mutation's n-uplets that were already processed).
-This value is then set as the mutation's rank and used as a mean of choosing 
the upcoming mutation.
+The "score" (or rank) of a mutation is a numerical value that reflects how 
interesting the outcome was. Crashes and unexpected behavior are what makes a 
mutation valuable since it indicates a wrongly implemented code piece in the 
target source in most cases. This value is calculated through a modified 
version of a clustering method.
+This clustering mechanism runs as follows:
+       \begin{itemize}
+       \item{Represent the triplets in a 3 dimensional space}
+       \item{Create clusters that includes most similar triplets}
+       \item{Calculate the centroid of each cluster}
+       \item{Calculate the Euclidean distance between the current mutation's 
triplet and all the centroids}
+       \item{Add up all the distances generated by the last step into a single 
value}
+       \end{itemize}
+
+The centroid of a cluster is the triplet of values that define the center of a 
cluster.
+the Euclidean distance is defined as
+       \begin{quotation}
+       In mathematics, the Euclidean distance or Euclidean metric is the 
"ordinary" straight-line distance between two points in Euclidean space
+       \end{quotation}
+       
+If a triplet represents a unique crash it will be placed far away in the 
Euclidean space. This induces that the sum of the Euclidean distances to the 
centroids will be a high value compared to a common crash. This sum is then 
used as the "score" of the mutation. 
+
+Mutations that do note trigger any crash result in having a null score. 
Therefore, the side of the tree they are in has a lower statistical chance of 
being chosen for further exploration.
 
+For a more concrete view of what the analyzer outputs, please refer to the 
Result and Example section. 
 \begin{figure} [h!]
   \includegraphics[width=\textwidth]{Scoring.png}
 \end{figure}   
                \subsection{Known issues}               
 About one mutation out of 15 will fail for invalid reasons.
                        \subsubsection{Context Coherence}
-A significant amount of the failing mutations do so because of the transfer 
mechanism. As said in the dedicated section, this mechanism applies more than 
one change to the database (Potentially the whole database). In specific case, 
this property can become problematic. 
-More specifically, when the main loop identifies a mutation's child as the 
upcoming mutation and its parent row has been splashed with the effect of a 
transfer. In this case, the data embedded in the schemaFuzz data structure may 
not match the data that are actually in the database, this delta will likely 
induce a wrongly designed SQL statement that will result in a SQL failure 
(meaning that 0 row were updated by the statement).
+A significant amount of the failing mutations do so because of the mutation 
transfer mechanism. As said in the dedicated section, this mechanism applies 
more than one change to the database (potentially the whole database). In 
specific cases, this property can become problematic. 
+More specifically, when the main loop chooses the next mutation and its parent 
has been the subject of a transfer. In this case, the data embedded in the 
schemaFuzz data structure may not match the data that are present in the 
database, this delta may induce a wrong SQL statement that will result in a SQL 
error (in practice, the DBMS indicates that 0 rows were updated by the 
statement).
                        \subsubsection{Foreign Key constraints}                 
 For a reason that is not yet clear, some of the implied FKC of the target 
database can't be properly set to CASCADE mode. This result in a FKC error 
(mutation fails but the program can carry on)                     
                        \subsubsection{Tests}
-Besides the test suit written by the SchemaSpy team for their own tool (still 
implemented in SchemaFuzz for the meta data extraction), the tests for this 
project are very poor. Their are only very few of them and their utility is 
debatable. This is due to the lack of experience in that regard of the main 
developer. Obviously, we are very well aware of this being a really severe flaw 
in this project and will be one of the main future improvements.
-This big lack of good maintenance equipment might also explain some of the 
silent and non silent bugs that still remain in the code to this day.
+Due to a lack of time and a omission in the project planning, this project's 
test suit is not yet complete. In its current state, the test suit includes the 
tests written for the meta data extraction routine as well as a bundle of unit 
tests that cover the following points :
+                       \begin{itemize}
+                       \item{instantiation of the Mutation class}
+                       \item{Creation of the modification possibilities}
+                       \item{the do/Undo routine}
+                       \item{Uniformity of the tree weighting}
+                       \end{itemize}
+                       
+The following list details the tests that will be implemented in future 
releases by order of importance.
+
+                       \begin{itemize}
+                       \item{Integration tests}
+                       \item{Regression tests}
+                       \item{More complete and specific Unit tests.}
+                       \item{Performance tests}
+                       \end{itemize}
 
                        \subsubsection{Code Quality}
-We are well aware that this tool's source code is of debatable quality. This 
fact induces the  bugs and unexpected behaviors discussed earlier on some 
components of this program. 
-The following points constitute the main flaw of the source code:
+The code in its current state is still in beta. This means that the code will 
be the subject of structural and syntax changes. The following list contains 
the major aspects of these changes
+       
                        \begin{itemize}
-                       \item Hard to maintain. The code is not optimized 
either in term of size or                     efficiency. Bad coding habits 
tend to make it rather weak and unstable to context changes.
-                       \item Structure is not intuitive. The main loop of the 
program lacks a good             structure.
+                       \item{Code structure}
+                       \item{Code concision}
+                       \item{Code style. More precisely, updating code pieces 
that cointain bad coding habits}
                        \end{itemize}
-                       
+For example, the following code:
+$if(myVariable == 0)$
+should be changed to:
+$if(0 == myVariable)$
+to avoid unwanted affectation in the case of the omission of an $=$ sign.      
                
                        \clearpage
 
        \section{Results and examples}
@@ -394,21 +433,21 @@ This section will provide more insights on the future 
features that might/may/wi
 Any suggestion will be greatly appreciated as long as it is relevant. All the 
relevant information regarding the contributions are detailed in the so called 
section.
 
                \subsection{General Report}
-In its future state, SchemaFuzz will generate a synthesized report concerning 
the overall execution of the tool (which it does not do right now). This 
general report will primarily contain the most "interesting" mutations (meaning 
the mutations with the highest score mark) for the whole run.
-A more advanced version of this report would also take into account the code 
coverage rate for each mutation and execute a last clustering round at the end 
of the execution to generate a "global" score that would represent the global 
value of each mutations.
+In its future state, SchemaFuzz will generate a synthesized report concerning 
the overall execution of the tool. This general report will primarily contain 
the most "interesting" mutations (meaning the mutations with the highest score 
mark) for the whole run.
+A more advanced version of this report would also take into account the code 
coverage ratio for each mutation and execute a last clustering round at the end 
of the execution to generate a "global" score that would represent the global 
value of each mutations.
        
                \subsection{Code coverage}
-We are considering changing or simply adding code coverage in the clustering 
method as a parameters.Not only would this increase the accuracy of the scoring 
but also increase the accuracy of the "type" of each mutation. To this day, 
this tool does not make a concrete difference in terms of scoring or 
information generating (reports) between a mutation with a new stack trace in a 
very common code path and a very common stack trace in a very rarely triggered 
code path.
+We are considering changing or simply adding code coverage in the clustering 
method as a parameters. Not only would this increase the accuracy of the 
scoring but also give more detail on what the mutation triggered in the target 
software's code therefore helping locate the origin of the crash. By adding 
code coverage this tool could make a concrete difference in terms of scoring 
and informations being generated in the reports between a mutation with a new 
stack trace in a common code pat [...]
 
                \subsection{Data type Pre-analyzing}
 This idea for this feature to be is to implement some kind of "auto learning" 
mechanism.
 To be more precise, this routine is meant to performed a statistical analysis 
on a representative portion database's content. This analysis would provide the 
rest of the program the most common values encountered for each field. More 
generically, this would allow the software to have a global view over the 
format of the data that the database holds.
-Such global understanding of the content format is very interesting to make 
the modifications possibilities more relevant. Indeed, one of the major 
limitation of SchemaFuzz is its "blindness".
+Such global understanding of the content format is   interesting to make the 
modifications possibilities more relevant. Indeed, one of the major limitation 
of SchemaFuzz is its "blindness".
 That is to say that some of the modifications performed in the course the 
execution of the program are irrelevant due to the lack of information on what 
is supposed to be stored in this precise field.
-For instance, a field that only holds numerical values that go from 1 to 1000 
even if it has enough bits to encode from -32767 to 32767 would have a very low 
chance of triggering a crash if this software modifies its value from 10 to 55.
-on the other end, if the software modifies this very same field from 10 to 
-12000, then a crash is much more likely to pop up.
+For instance, a field that only holds numerical values that go from 1 to 1000 
even if it has enough bits to encode from -32767 to 32767 would have a   low 
chance of triggering a crash if this software modifies its value from 10 to 55.
+on the other end, if the software modifies this   same field from 10 to 
-12000, then a crash is much more likely to pop up.
 Same principle applies to strings. Suppose a field can encode 10 characters.
-the pre-analysis, detected that, for this field, most of the value were 
surnames beginning with the letter "a". Changing this field from "Sylvain" to 
"Sylvaim" will probably not be very effective. However, changing this same 
field from "Sylvain" to "NULL" might indeed triggered an unexpected behavior. 
+the pre-analysis, detected that, for this field, most of the value were 
surnames beginning with the letter "a". Changing this field from "Sylvain" to 
"Sylvaim" will probably not be effective. However, changing this same field 
from "Sylvain" to "NULL" might indeed triggered an unexpected behavior. 
   
 This pre-analysis routine would only be executed once at the start of the 
execution, right after the meta data extraction. The result of this analysis 
will be held by a specific object. 
 this object's lifespan is equal to the duration of the main loop's execution 
(so that every mutation can benefits from the analysis data.)
diff --git a/docs/PersonnalExperience.tex b/docs/PersonnalExperience.tex
index b07036f..add1e07 100644
--- a/docs/PersonnalExperience.tex
+++ b/docs/PersonnalExperience.tex
@@ -1,5 +1,5 @@
 
-\section{Internship organisation} 
+\section{Internship organization} 
        \subsection{Introduction}
 
 This section is meant to be added to the University version of this 
documentation. It will be written as Erwan Ulrich and will focus on the 
different aspects of the organization of the project. The following text will 
also be written with a more personal and more critical point of view as a mean 
of self analyze.
@@ -20,7 +20,13 @@ It is also a personal reminder of what should be improved in 
my work habits and
        
        \begin{itemize}
        \item{Defining tasks/features as daily/weekly sub goals}
-       \item{Improving general project planning} %% bad title.
+       \item{Improving general project planning}
+               \begin{itemize}
+               \item{Include the test writing in the planning as a "real" task}
+               \item{Build the project's code structure beforehand}
+               \item{Decide what approach to use for each component beforehand}
+               \item{Decide for each component what technologies should be 
used beforehand}
+               \end{itemize}
        \item{Setting up more fluid communication}
        \end{itemize}           
 

-- 
To stop receiving notification emails like this one, please contact
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]