APP
English

大数据采集与预处理技术

Class time: 2025-10-09 ~ 2026-12-31,Total 64 weeks

The first 1 period {{ item.serial_label }}
It has been carried out to the first week 10 · Number of participants:128 · Total Credits:0 | Added to favorites Favorites
Chapter 1 Data
Data is ubiquitous in our lives. By leveraging this data, we can better understand the world and ourselves, making more informed decisions. With the comprehensive integration of computer technology into social life, network data has experienced explosive growth, propelling us into a new era of big data. We can expect even more data to emerge, bringing greater convenience and intelligence to our lives. Through this task, students will understand the concept and value of data, the characteristics and applications of big data, and the concept of data analysis and common data processing software. This will enable them to better appreciate the importance and application value of data and big data in today's society, laying a foundation for future work in data analysis.
Chapter 2 The scientific computing library NumPy
The core of NumPy is a powerful N-dimensional array object, which enables Python programmers to directly manipulate multidimensional arrays and perform various computations. NumPy provides an extensive library of mathematical functions for array operations. By gaining a deep understanding and mastery of NumPy's functionalities and usage, you will be able to more effectively utilize other data analysis tools, such as Pandas, SciPy, and Matplotlib, thereby conducting better data analysis and visualization. This project will provide a detailed explanation of the basic functionalities of the NumPy library.
Chapter 3 Pandas
Pandas is a free, open-source third-party Python library and one of the essential tools for Python data analysis. Its goal is to be a powerful, flexible, and programming-language-agnostic data analysis tool. It can easily read and write various data formats, such as CSV, Excel, SQL databases, etc., and supports a wide range of data analysis tasks, including data cleaning, transformation, and statistical analysis. Pandas is highly flexible, allowing for customized operations based on needs, which is why it is widely used in fields like finance, statistics, social sciences, and construction engineering.
Chapter 4 Data Acquisition
preprocessing. In practical applications, manual and automated collection can be combined to leverage their respective strengths. For instance, manual collection can be used at critical data points to ensure accuracy, while automated collection can be employed for large-scale data scraping to improve efficiency.
Chapter 5 Data Preprocessing
A foreign-funded real estate group plans to enter the Chinese market and needs to understand and analyze the real estate sales situation in two first-tier cities, Beijing and Shanghai, to develop a real estate development strategy for mainland China. Technical personnel need to integrate, transform, clean, and perform simple statistics on the collected real estate sales data for subsequent strategy formulation.
Chapter 6 Comprehensive Practice — Data Analysis for the Position
With the advent of the big data era, data analysts have become highly sought-after talent, creating a massive demand for such professionals. This project aims to understand the real-world scenario of data analyst positions in Hangzhou. Following the data analysis workflow, it will utilize data collected from recruitment websites, process and visualize the data using Pandas and Matplotlib libraries, and ultimately summarize the required skills, company locations, educational requirements, company sizes, and other relevant information for these positions.
Teaching team
Shen Jianguang Chief Lecturer

-

Chen Ningchao 陈柠超 Leading Institute

-

related courses
  • No data

{{ dialogTitle }}

Apply for certificate