These are the (10) master thesis proposals proposed at UPC. 


- Advanced Indexing for NOSQL Solutions (MongoDB)

Most NOSQL tools are at an early state of developement, which affects many

of their functionalities. Specifically, MongoDB community has been working

hard in the improvement of its indexes. This thesis should study the

current state of development of MongoDB, and compare its indexing features

against the well known and more mature relational technology in the

market. The expected outcomes of the thesis would be a detailed analysis

of such features, their consequences and potential improvement.


- Cost Estimation of (Distributed) ETL Processes


One of the weak points of query languages like Pig, Hive or CQL is the

optimizer underneath. Such optimizer should be based on a cost model that

considers the distribution of data in the cluster. Also ETL processes are

not automatically optimized because of the difficulties in estimating

their cost. The expected outcomes of the thesis would be a state of the

art of distributed cost models that may be used in said languages and ETL

processes, detecting existing problems and proposing how to overcome them.


- Detection of ETL Bottlenecks by Using Process Mining


ETL is nothing else than a process. Thus, as any other process can be

redesigned and optimized. The expected outcomes of this thesis would be a

study of the problems that can be solved in ETL processes by using Process

Mining techniques and a proof of concept showing its feasibility.


- Using BPMN for declarative ETL design


ETL is nothing else than a process. Thus, we should be able to translate

it from and to BPMN. The expected outcomes of this thesis would be a

prototype showing the possibilities and difficulties of such translation.


- Coupling Databases and Advanced Analytical Tools (R)


R is being implemented in many DBMSs (e.g., PostgreSQL, Oracle, and

SAP-HANA). The expected outcomes of this thesis would be a study of the

current phase of development of each of this systems, and a comparison of

the implementation approach as well as functionalities and performance

results. The baseline for the comparison should be a standalone

implementation of R.


Requirement-Driven ETL Design


This topic refers to the automatic generation of ETL processes from high-level end-user information requirements.

During this process, several tasks must be carried out, understand the data sources, capture the end-user requirements and processsuch information to generate automatic ETL workflows.


Discovering Analytical Concepts from User Profiles

Users generated lots of metadata when exploring and navigating data. In this topic we focus on gathering and

exploiting such knowledge to better assist the user and improve his / her experience with the system (e.g, query recommendation).


A Visual Language for Gathering Information Requirements


Information requirements are a kind of requirements that focus on what information is of relevance for decision makers.

In this master thesis we aim at developing a visual language over a representation of a domain such that non-technical

end-users are able to state to a BI system what are his / her information needs.


Data Visualization in the CID Project


The CID project is a joint project with the WHO that gathers information from different sources related to the Chagas disease. Once gathered and assembled,this data is aimed to be exploited by means of dashboards, maps and analytical tools. The aim of this master thesis is toidentify the best means to visualize data according to the project needs.


Definition of a DSL for Interoperability in Data Intensive Systems [jointly offered with TUB]


This is a joint master thesis with TUB. The aim is to define a DSL to enable interoperability between different data intensive systems. Several questions needs to be answered. What would be the precise purpose of such languaage? what (analytical concepts) such it consider? should it be textual or graphical? and eventually, create a example scenario showing the feasibility of such language.