- Solution Graph
- Anonymized Dataset
Data import in Amnesia is initiated in 4 different ways: a) By clicking the orange button “Load Dataset” located up and left to the Index screen, b) By using the left side menu: "Source" -> "Load From Local", c) by clicking the image (“Drop files to upload”). When the dataset appears in the screen, you need to press the button “upload”, which located up and right inside the image, and d) through the dataset screen ("Source"->"Manage") by pressing the button “Load New Dataset”, located in the up right menu. After the initialization of the loading a wizard guides the user to correctly model the data in Amnesia. In the first step, the wizard asks for the delimiter and the type (tabular, set-values) of the dataset. Following the user choice, the wizard offers a preview of the dataset which allows the user to evaluate and correct her or his choices. Amnesia accepts as input only csv or txt files. Each record must be a single line in the file, and each attribute value must be separated with a delimiter common for the whole dataset. For tabular data the number of of attributes must be fixed for each record. In set-value data each record can have an arbitrary number of values of the same type, separated by a delimiter.
Data import in Amnesia is initiated by using the left side menu: "Source" -> "Load From Zenodo". A wizard guides the user to set up a connection with zenodo, in order to have permissions in his files. In the first step, the wizard asks for the user access token, which can be found in his zenodo profile. Then, a table appears, filled with a full description of the files, that are saved in users zenodo profile. User chooses the file by clicking it and the wizard that we described above(in load section) is starting.
Data is only saved locally with this option. From the left menu navigate to the dataset screen ("Source" -> "Manage") and click button “Save To Local”, located up and right to the screen. The saved file is in txt format with comma as delimiter.
Data is saved to Zanodo. From the left menu navigate to the dataset screen ("Source" -> "Manage") and click button “Save To Zenodo”, located up and right to the screen. A wizard will be initiated which will guide the user in the saving process. In the first step, user should provide the user authentication token (taken from her or his Zenodo account), Author, Affiliation, Filename, Title, Description, Contributors and Keywords. In the next step, a summary that describes the file that will be published in Zenodo is displayed to be confirmed by the user. The last column of the table is the percentage of similarity between this specific file, with the file that the user wants to save. This similarity percentage arises from the comparison of the attributes: fileName, keywords and checksum between the two files. Upon confirmation of the user, the filed is published. The saved file is in txt format with comma as delimiter.
This option allows the user to check whether the source dataset is already anonymous or not, according to k-anonymity. This option is available through the dataset screen ("Source" -> "Manage" ), by cliking the “Check Anonymization” button, located down and right. A wizard will be initiated and the k parameter for the anonymization guaranty will be asked from the user. In the next step, a graphical representation of the dataset as a pie chart appears. The pie chart indicates all groups or records and their size and highlights the percentage of records which fall to groups with size less than k. The user can trivially anonymize the dataset by suppresing all records that fall in the latter category.
Saved solutions, i.e., collections of rules for the generalization of values, can be loaded and applied to different dataset. Note, that applying a solution to a different dataset does not guarantee that the new dataset will be anonymous. Loading of anonymization rules is accessible from the dataset screen("Source" -> "Manage"), with through the button “Load Anon Rules” located in the up and right menu.
Load allows loading saved hierarchies. Loading is initiated at two points : a) by using the left menu: "Hierarchy" -> "Load From Local" and b) from the hierarchy screen ("Hierarchy" -> "Manage") by pressing the button “Load New Hierarchy” located in the up right menu.
Hierarchies created by Amnesia can be saved as local files. Saving an hierarchy is accessible from the Hierarchy screen ("Hierarchy" -> "Manage") by pressing the button “Save Hierarchy” in the up right menu.
Amnesia helps the user to create custom hierarchies, by automatically creating one based on the domain and active domain of an attribute and the user parameters. The hierarchy is created based on the loaded dataset. The user must first choose: the attribute, the type (distinct or range) and the variable type (domain) of hierarchy. In the following the user must make a choice based on whether a hierarchy or ranges or distinct values will be created. In the latter case, the user can further custom the hierarchy creation by choosing: the sorting function for the domain values (numeric, alphabetical, random), the name of hierarchy and the fanout (i.e., the average number of children of each node). In former case, the user must choose the name of hierarchy, the boundaries of the attribute domain, the step, i.e., the size of ranges at the lower level of the hierarchy and the fanout. This feature is accessible by using the left menu: "Hierarchy" -> "Auto Generate" and through the hierarchy screen ("Hierarchy" -> "Manage") by pressing the button “Autogenerate Hierarchy” located in the up and right menu.
Algorithm execution is initiated through the algorithms screen, which is accessible from the left menu. The up right panel shows the loaded dataset and up left is the hierarchy panel. In the down left panel the user must associate each attribute that acts as a quasi-identifier with an already loaded hierarchy. The same hierarchy can be used in several attributes. One hierarchy must be defined for each quasi identifier. Finally, in down and right panel, the user can choose the algorithm and its parameters. Anonymization is initiated by clicking the button “execute”.
Amnesia offers a visual representation of the solution space for k-anonymity algorithms. From the left menu, the user must navigate to solution screen (Solution Graph) The different solutions are represented as nodes in a graph. Each node corresponds to a different combination of anonymization levels for each quasi identifiers. For example, if we have the quasi identifiers "Age" and "Zipcode", one solution will represent "Age" anonymized to the first hierarchy level and "Zipcode" to the second and another "Age" anonymized to the second hierarchy level and "Zipcode" to the first. All possible combinations will be represented. Blue nodes indicate safe solutions and red nodes unsafe. By double-clicking on a node the respective solution is applied and the platform will redirect the user to the anonymized dataset.
From the solution graph, there is the chance to see a sample of the anonymized dataset before apply a solution. User select (one click) a solution and a pop-up will appear, then user clicks to “Show How the Anonymized dataset is looks like with this solution”.
From the solution graph, user can explore some statistics about a specific solution. User select (one click) a solution and a pop-up will appear, then user clicks to “Show Statistics of the dataset with this solution. If a solution is not safe (red color), user can suppress the values that causing the problem(same as in the “Check Anonymization” section). A candidate solution, i.e., a possible anonymization, may result to 99% of the records to appear indistinguishable from other k-1 records. Instead of discarding this solution as inadequate and choosing to generalizing all values even more the user can instead choose to delete the 1% of the remaining values and create a valid k-anonymous dataset.
An anonymization solution comprises a series of anonymization rules, that define how each quasi identifier must be anonymized, e.g., Rule 1: "Country" attribute should be anonymized to the continent level. Amnesia allows the user to save these rules, so they can be reused in the same or similar datasets in the future. Rules are saved by using the "Save Rules" button in the results screen.
Saving the anonymized dataset is possible through the results screen or the anonymized dataset screen (Anonymized->Source), by pressing the button “Save To Local”, located in the up right menu.
Amnesia allows the user to assess the quality of the anonymized dataset with simple ad hoc queries that show the value distribution in the original and the anonymized dataset. Users must navigate to the results screen and press the button “Statistics” located in the down right menu. A pop-up window will appear that allows the user to choose the quasi identifiers and the query parameters and press the “Apply” button. The next screen displays a diagram with four different values in the y axis. The first one is “nonOccurences”, which stands for occurrences of the value queried by the user in the original dataset. The other three are “min”, which stands for the minumum occurrences of the value in the anonymized dataset, ”max” which stands to the maximum occurrences of the value in the anonymized dataset, and “estimated”, which is an estimation of the appearances of the queried value in the anonymized dataset. Remember that anonymization can substitute the original values, with more generic ones, e..g, "Greece" with "European Country". When an analyst tries to calculate the appearances of "Greece" in the anonymized dataset, the "min" value would be the exact occurrences of "Greece" in the dataset, the "max" value would be the exact occurences of "Greece" plus all the occurances of more abstract values, e.g., "European Country", which might have been "Greece" in the original dataset. The "estimated" value would be an estimation of total occurances of "Greece" in the original dataset based on the estimated probability of every abstract value to reflect "Greece". In Amnesia we have assumed a uniform distribution of the probability that an abstract value might be a generalization of any of its more specific ones in the generalization hierarchy, i.e., if value "a" have 10 leaf values under it in the generalization hierarchy, the probability to be each leaf value is 1/10.