MongoDB Data Processing (Python)
Processing data from MongoDB in Python
This post will give an insight of data processing from MonogDB in Python.
This post is to give an insight of data processing from MongoDB in Python. I will be developing this project in IntelliJ. The full code is available on my GitHub Repo
I will be using the python-mongodb driver - PyMongo
Pre-requisites
- Python 3.x
- IntelliJ
- Anaconda Environment
Setup IntelliJ for MongoDB
Install pymongo from terminal. In my case, the below command will install
pymongo
in the Anaconda environment. Check this post to install Python with Anaconda.Pavans-MacBook-Pro:~ pavanpkulkarni$ python -m pip install pymongo
Open IntelliJ and import the project from my GitHub Repo
Make sure you choose
conda
as your venv for IntelliJ by going toFile --> Project Structure --> Platform Settings --> SDKs --> + --> Python SDK --> <Choose conda >
Now we are ready to get coding.
Dive into the code
Connect to MongoDB
import pymongo
from pymongo import MongoClient
client = MongoClient('mongodb://127.0.0.1:27017')
db = client.super_hero_db
collection = db.students
Retrieve one sample document from MongoDB
students.find_one()
Output:
{'_id': ObjectId('5afcc577aebca2bc98a7135e'),
'courses_registered': [{'cid': 'CS003', 'sem': 'Spring_2011'},
{'cid': 'CS006', 'sem': 'Summer_2011'},
{'cid': 'CS009', 'sem': 'Fall_2011'}],
'id': 12,
'name': 'Black Panther',
'year_graduated': '2011'}
Count number of documents in the given collection
print("Number of documents : ", students.count())
Number of documents : 13
We can count documents based on filters as well
students.find({'id':5}).count()
1
Get all documents from collection
import pprint
# Get all docs from MongoDB
for student in students.find():
pprint.pprint(student)
Get documents based on a filter
for student in students.find({'id':2}):
pprint.pprint(student)
Let’s go ahead and insert new document to Mongo.
newStudent = {
"id": 13,
"name":"Thanos",
"courses_registered": [
{ "CID": "CS003",
"sem": "Spring_2011"},
{ "CID": "CS002",
"sem": "Summer_2011"},
{ "CID": "CS001",
"sem": "Fall_2011"}],
"year_graduated": "2011"
}
students.insert_one(newStudent)
for student in students.find({'id':13}):
pprint.pprint(student)
Output:
{'_id': ObjectId('5b030807f2729ee0e207059a'),
'courses_registered': [{'CID': 'CS003', 'sem': 'Spring_2011'},
{'CID': 'CS002', 'sem': 'Summer_2011'},
{'CID': 'CS001', 'sem': 'Fall_2011'}],
'id': 13,
'name': 'Thanos',
'year_graduated': '2011'}
As you see above, _id
is generated, for the newly inserted document.
Finally, we see how to delete a document
students.delete_one({'id': 13})
print("After deleting new document")
for student in students.find({'id':13}):
pprint.pprint(student)
Output:
After running delete_one()
, one record is deleted from MongoDB.
The other pymongo API can be found here
Share this post
Twitter
Google+
Facebook
LinkedIn
Email