These are the (10) master thesis proposals proposed at UPC. - Advanced Indexing for NOSQL Solutions (MongoDB) Most NOSQL tools are at an early state of developement, which affects many of their functionalities. Specifically, MongoDB community has been working hard in the improvement of its indexes. This thesis should study the current state of development of MongoDB, and compare its indexing features against the well known and more mature relational technology in the market. The expected outcomes of the thesis would be a detailed analysis of such features, their consequences and potential improvement. - Cost Estimation of (Distributed) ETL Processes One of the weak points of query languages like Pig, Hive or CQL is the optimizer underneath. Such optimizer should be based on a cost model that considers the distribution of data in the cluster. Also ETL processes are not automatically optimized because of the difficulties in estimating their cost. The expected outcomes of the thesis would be a state of the art of distributed cost models that may be used in said languages and ETL processes, detecting existing problems and proposing how to overcome them. - Detection of ETL Bottlenecks by Using Process Mining ETL is nothing else than a process. Thus, as any other process can be redesigned and optimized. The expected outcomes of this thesis would be a study of the problems that can be solved in ETL processes by using Process Mining techniques and a proof of concept showing its feasibility. - Using BPMN for declarative ETL design ETL is nothing else than a process. Thus, we should be able to translate it from and to BPMN. The expected outcomes of this thesis would be a prototype showing the possibilities and difficulties of such translation. - Coupling Databases and Advanced Analytical Tools (R) R is being implemented in many DBMSs (e.g., PostgreSQL, Oracle, and SAP-HANA). The expected outcomes of this thesis would be a study of the current phase of development of each of this systems, and a comparison of the implementation approach as well as functionalities and performance results. The baseline for the comparison should be a standalone implementation of R. Requirement-Driven ETL Design This topic refers to the automatic generation of ETL processes from high-level end-user information requirements. During this process, several tasks must be carried out, understand the data sources, capture the end-user requirements and processsuch information to generate automatic ETL workflows. Discovering Analytical Concepts from User Profiles Users generated lots of metadata when exploring and navigating data. In this topic we focus on gathering and exploiting such knowledge to better assist the user and improve his / her experience with the system (e.g, query recommendation). A Visual Language for Gathering Information Requirements Information requirements are a kind of requirements that focus on what information is of relevance for decision makers. In this master thesis we aim at developing a visual language over a representation of a domain such that non-technical end-users are able to state to a BI system what are his / her information needs. Data Visualization in the CID Project The CID project is a joint project with the WHO that gathers information from different sources related to the Chagas disease. Once gathered and assembled,this data is aimed to be exploited by means of dashboards, maps and analytical tools. The aim of this master thesis is toidentify the best means to visualize data according to the project needs. Definition of a DSL for Interoperability in Data Intensive Systems [jointly offered with TUB] This is a joint master thesis with TUB. The aim is to define a DSL to enable interoperability between different data intensive systems. Several questions needs to be answered. What would be the precise purpose of such languaage? what (analytical concepts) such it consider? should it be textual or graphical? and eventually, create a example scenario showing the feasibility of such language.