中文 | English
This document will guide you through the installation, configuration, and first run of DataAgent.
- JDK: 17 or higher
- MySQL: 5.7 or higher
- Node.js: 16 or higher
- Docker: (Optional) For Python code execution
- Vector Database: (Optional) Uses in-memory vector store by default
You can get test tables and data from the project repository:
Files are located in: data-agent-management/src/main/resources/sql, which contains 4 files:
schema.sql- Table structure for featuresdata.sql- Data for featuresproduct_schema.sql- Sample data table structureproduct_data.sql- Sample data
Import the tables and data into your MySQL database.
# Example: Import using MySQL command line
mysql -u root -p your_database < data-agent-management/src/main/resources/sql/schema.sql
mysql -u root -p your_database < data-agent-management/src/main/resources/sql/data.sql
mysql -u root -p your_database < data-agent-management/src/main/resources/sql/product_schema.sql
mysql -u root -p your_database < data-agent-management/src/main/resources/sql/product_data.sqlConfigure your MySQL database connection in data-agent-management/src/main/resources/application.yml.
Initialization Behavior: Auto table creation and sample data insertion is enabled by default (
spring.sql.init.mode: always). For production environments, it's recommended to disable this to avoid sample data overwriting your business data.
spring:
datasource:
url: jdbc:mysql://127.0.0.1:3306/saa_data_agent?useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=convertToNull&transformedBitIsBoolean=true&allowMultiQueries=true&allowPublicKeyRetrieval=true&useSSL=false&serverTimezone=Asia/Shanghai
username: ${MYSQL_USERNAME:root}
password: ${MYSQL_PASSWORD:root}
driver-class-name: com.mysql.cj.jdbc.Driver
type: com.alibaba.druid.pool.DruidDataSourceAuto initialization is enabled by default (spring.sql.init.mode: always).
For information on how to disable auto initialization, please refer to Developer Guide - Database Initialization Configuration.
If you need to manually manage model dependencies (not using default Starter), please refer to Developer Guide - Dependency Extension Configuration.
Start the project, click on Model Configuration, add a new model and fill in your API key.
-
Standard Provider Integration: If you're using a built-in supported AI provider (like OpenAI, Deepseek, etc.), you usually only need to provide the Model Name and API Key.
-
Custom and Local Model Integration (Ollama/Self-hosted Gateway): This system is based on Spring AI architecture and supports the standard OpenAI interface protocol. If you're connecting to Ollama or other custom gateways, please note the following:
-
Protocol Compatibility: Please refer to the Spring AI official documentation about OpenAI compatibility to ensure your gateway response format meets the standard.
-
Address Configuration: For self-deployed models, please accurately fill in the base-url and completions-path. The system will concatenate them into the complete call address, for example: http://localhost:11434/v1/chat/completions
-
-
Troubleshooting: If the configuration doesn't work after setup, we recommend first using Postman to test your interface address to confirm network connectivity and parameter format are correct.
For detailed configuration parameters, please refer to Developer Guide - Development Configuration Manual.
The system uses an in-memory vector store by default, and also provides hybrid search support for Elasticsearch.
You can import your preferred persistent vector store. You just need to provide a bean of type org.springframework.ai.vectorstore.VectorStore to the IoC container. For example, directly import the PGvector starter:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
</dependency>For detailed vector store documentation, refer to: https://springdoc.cn/spring-ai/api/vectordbs.html
Below is the ES schema structure. For other vector stores like Milvus, PG, etc., you can create your own schema based on this ES structure. Pay special attention to the data type of each field in metadata.
{
"mappings": {
"properties": {
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"embedding": {
"type": "dense_vector",
"dims": 1024,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"metadata": {
"properties": {
"agentId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"agentKnowledgeId": {
"type": "long"
},
"businessTermId": {
"type": "long"
},
"concreteAgentKnowledgeType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"vectorType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}For detailed configuration parameters, please refer to Developer Guide - Development Configuration Manual.
For detailed configuration parameters, please refer to Developer Guide - Development Configuration Manual.
For information on how to replace the default in-memory vector store (e.g., using PGVector, Milvus, etc.), please refer to Developer Guide - Dependency Extension Configuration.
In the data-agent-management directory, run the DataAgentApplication.java class.
cd data-agent-management
./mvnw spring-boot:runOr run DataAgentApplication.java directly in your IDE.
Navigate to the data-agent-frontend directory
# Using npm
npm install
# Or using yarn
yarn install# Using npm
npm run dev
# Or using yarn
yarn devAfter successful startup, access http://localhost:3000
Visit http://localhost:3000 to see the current list of agents (there are four placeholder agents by default that are not connected to data; you can delete them and create new agents)
Click "Create Agent" in the upper right corner. Here you only need to enter the agent name, and use default settings for other configurations.
After creation, you can see the agent configuration page.
Go to the data source configuration page and configure the business database (the business database we provided in the first step of environment initialization).
After adding, you can verify the data source connection on the list page.
For newly added data sources, you need to select which data tables to use for data analysis.
Then click the "Initialize Data Source" button in the upper right corner.
Preset question management allows you to set preset questions for the agent.
Semantic model management allows you to set semantic models for the agent.
The semantic model library defines precise conversion rules from business terms to database physical structures, storing field name mappings.
For example, customerSatisfactionScore corresponds to the csat_score field in the database.
Business knowledge management allows you to set business knowledge for the agent. Business knowledge defines business terms and business rules, such as GMV = Gross Merchandise Volume, including paid and unpaid order amounts. Business knowledge can be set to recall or not recall. After configuration, click the "Sync to Vector Store" button in the upper right corner.
After success, you can click "Go to Run Interface" to use the agent for data queries. After debugging is complete, you can publish the agent.
Note: "Access API" is not fully implemented in the current version and is reserved for secondary development.
Run Interface
The left side of the run interface shows historical message records, and the right side shows current session records, input box, and request parameter configuration.
Enter a question in the input box and click the "Send" button to start querying.
The analysis report is in HTML format. Click the "Download Report" button to download the final report.
Besides the default request mode, the agent runtime also supports "Human Feedback", "NL2SQL Only", "Concise Report", and "Show SQL Results" modes.
Default Mode
By default, human feedback mode is not enabled. The agent automatically generates and executes the plan, parses SQL execution results, and generates a report.
Human Feedback Mode
If human feedback mode is enabled, the agent will wait for user confirmation after generating the plan, then modify or execute the plan based on the user's selected feedback result.
NL2SQL Only Mode
"NL2SQL Only Mode" makes the agent only generate SQL and retrieve results without generating a report.
Show SQL Results
"Show SQL Results" displays the SQL execution results to the user after generating SQL and retrieving results.
- Learn about Architecture Design to understand the system principles in depth
- Check Advanced Features to learn about more advanced features
- Read Developer Documentation to contribute to the project
















